The Smith–Waterman (SW) algorithm explores all the possible alignments between two or more sequences and as a result it returns the optimal local alignment. However, the computational cost of this algorithm is very high, and the exponential growth of computation makes SW unrealistic for searching similarities in large sets of sequences. Fortunately, the dynamic programming kernel of the SW algorithm involves mathematical operations over affine control loops whose iteration space can be represented by the polyhedral model. This allows us to apply polyhedral compilation techniques to optimize the studied SW dense array code. In this article, we present an approach to generate efficient SW implementations for two and three sequences by using the transitive closure of a dependence graph and loop skewing. Generated programs are represented with parallel tiled loop nests, which expose significantly higher performance than that of programs obtained with closely related compilers. The approach is able to tile all loops of original loop nests as opposed to well-known affine transformation techniques. Furthermore, it allows for code optimization of three-sequence alignment. Such a code cannot be generated by means of state-of-the-art automatic optimizing compilers. We demonstrate that an under-approximation of transitive closure (instead of exact transitive closure) can be used to generate valid parallel tiled code. This considerably reduces the computational complexity of the approach. Generated codes were run on cores of a modern Intel multiprocessor and they expose high speedup and good scalability on this platform.
1. Introduction
The Smith–Waterman (SW) algorithm is one of the widely used algorithms for sequence alignment in bioinformatics. It is based on a dynamic programming (DP; also known as dynamic optimization) that explores all the possible alignments between sequences and returns the optimal local alignment guarantying maximal sensitivity (Smith and Waterman, 1981). However, the SW algorithm is not commonly used to search sequence databases because its computational cost is very high with the exponential growth. Faster algorithms such as heuristic FASTA (Pearson and Lipman, 1985) and BLAST (Altschul et al., 1990) are available, but they achieve high speed at the cost of reduced accuracy. Thus, it is highly desirable to develop high-performance solutions for the SW sequence alignment.
Various manual approaches have been successfully adopted to accelerate the SW algorithm in hardware and graphic cards (Li et al., 2007; Manavski and Valle, 2008; Liu et al., 2013; Marmolejo-Tejada et al., 2014; Zhao and Sahni, 2015; Rucci et al., 2017). The main motivation of our work is to propose an approach allowing us to automatically generate efficient parallel tiled code implementing the SW algorithm. For this purpose, we use both the transitive closure of a dependence graph and the loop skewing technique. We exploit the computational power and cache efficiency of modern multicore processors to run generated code and compare results with those obtained with a closely related compiler.
Listing 1: Calculating scoring matrix H using the SW algorithm for two sequences
H [i] [j] = max (0, H [i – 1, j – 1] + s(a [i], b[i]), m1 [i] [j], m2 [i] [j]); s2
}
The SW algorithm constructs a scoring matrix H, which is used to keep track of the degree of similarity between the cells ai and bj of two sequences to be aligned, where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$1 \le i \le N , \,1 \le j \le M$$
\end{document}. The size of the scoring matrix is (N + 1)*(M + 1). Matrix H is first initialized with \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${H_{0 , 0}} = {H_{0 , j}} = {H_{i , 0}} = 0$$
\end{document} for all i and j.
The next step is filling matrix H. Each element \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${H_{i , j}}$$
\end{document} of H is calculated as follows.
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
{H_{i , j}} = \max \left\{ { \begin{matrix} {{H_{i - 1 , j - 1}} + s ( {a_i} , {b_j} ) } \hfill \\ { \mathop { \max } \limits_{1 \le k < i} ( {H_{i - k , j}} - {W_k} ) } \hfill \\ { \mathop { \max } \limits_{1 \le k < j} ( {H_{i , j - k}} - {W_k} ) } \hfill \\ 0 \hfill \\ \end{matrix} } \right. ,
\end{align*}
\end{document}
where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s ( {a_i} , {b_j} )$$
\end{document} is a similarity score of elements ai, bj that constitute the two sequences, and Wk is a penalty of a gap that has length k.
The final step of the SW algorithm is to trace back to generate the best local alignment. The step starts at the cell with the highest score in matrix H and continues up to the cell where the score falls down to 0.
To reduce SW algorithm computation time, we propose to tile and parallelize the loop nest implementing the SW algorithm presented in Listing 1. It consists of triply nested affine loops with three statements accessing two-dimensional array H and auxiliary arrays \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$m1$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$m2$$
\end{document}. This algorithm is \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \cal O}$$
\end{document}(MN(M + N)) in time and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \cal O}$$
\end{document}(MN) in memory. The extra time factor of (M + N) comes from finding optimal k by looking back over entire rows and columns.
Loop tiling or blocking is a crucial cache optimization technique that offers a number of benefits. It is used to improve code locality, expose parallelism, and allow for adjusting parallel code granularity or balance. All those factors impact parallel code performance (Iooss et al., 2015).
To our best knowledge, well-known tiling techniques and optimizing compilers are based on linear or affine transformations (Xue, 2000; Griebl, 2004; Bondhugula et al., 2008). However, classical tiling algorithms implemented in polyhedral tools fail to generate efficient code for DP problems of bioinformatics (Wonnacott et al., 2015; Palkowski and Bielecki, 2017).
Mullapudi and Bondhugula (2014) presented iterative tiling for dynamic scheduling to overcome limitations of affine transformations by means of reduction chains. Operations along each chain can be reordered to eliminate cycles. Speedup is presented for DP kernel of the Zuker algorithm for RNA folding. Their approach involves dynamic scheduling of tiles, rather than the generation of a static schedule.
Wonnacott et al. (2015) introduced serial tiling of “mostly tileable” loop nests of DP in an article. This approach tiles nonproblematic iterations for Nussinov's recurrence applying classic tiling strategies, while problematic iterations of the loop nest are peeled off and executed later. However, the article does not consider any parallelism, and the generated code is serial.
Sbirlea et al. (2016) proposed the use of a DFGL (Data-Flow Graph Language) as an embedded domain-specific language for expressing the dataflow components in an application. It enables individual computations to be implemented as sequential code that operates on a set of explicit inputs and outputs, and defers the packaging and coordination of interstep parallelism to the compiler and the runtime system. The compiler extracts parallelism through code transformations such as loop tiling and code generation of parallel loops with coarse-grain (fork-join) and fine-grain (doacross) synchronizations. The authors optimize the SW algorithm for two sequences, applying the polyhedral framework. However, the experimental study is limited only to a simplified SW algorithm with a linear gap penalty.
In our article (Palkowski and Bielecki, 2017), we presented loop tiling based on the transitive closure of a dependence graph for Nussinov's algorithm. The approach transforms (corrects) original rectangular fixed tiles so that all target tiles are valid under lexicographic order. It allows for generating target tiles such that there is no cycle in a corresponding inter-tile dependence graph. Parallelism is extracted by applying the loop skewing technique. We demonstrated higher speedup of generated tiled code (for a properly chosen size of original tiles) than that of code produced with state-of-the-art source-to-source optimizing compilers.
Motivated and experienced by the previous work, we adapted and applied the approach, presented in our article (Palkowski and Bielecki, 2017), to code, implementing the SW algorithm for two and three local sequence alignments with an arbitrary gap penalty. We suggested to calculate and apply an underapproximation of transitive closure instead of exact transitive closure. This allows us to significantly reduce the computational complexity of the approach to generate parallel tiled code. We proved the correctness of such a modification. This solution is useful particularly for multiple sequence alignment (MSA), known as a combinatorial optimization problem, when exact transitive closure calculation is a hard problem and commonly known implementations, for example, the ISL library (Verdoolaege, 2010), fail to calculate transitive closure. Generated parallel tiled code demonstrates high speedup on modern multicore platforms.
2. Methods
2.1. Brief introduction
The polyhedral model is a semantical, algebraic representation for analyzing, parallelizing, and transforming an important class of compute- and data-intensive programs, or program fragments consisting of (sequences of) arbitrarily nested loops. Loop bounds, statement conditions, and array accesses are affine functions in the program. This makes the extraction independent of the input programming language.
Within the polyhedral model for analysis and transformation of affine programs, we deal with sets and relations whose constraints need to be affine, that is, presented with linear expressions and constant terms. Affine constraints may be combined through the conjunction (\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\wedge$$
\end{document}), disjunction (\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\vee$$
\end{document}), projection (\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\exists$$
\end{document}), and negation (\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\neg$$
\end{document}) operators.
An access relation connects iterations of a statement to the array elements accessed by those iterations. Relations are defined in similar way as sets, except that the single space is replaced by a pair of spaces separated by the arrow sign \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\to$$
\end{document}. We use the exact dependence analysis proposed by Pugh and Wonnacott (1994), where loop dependences are represented with relations.
Standard operations on relations and sets are used, such as intersection (\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\cap$$
\end{document}), union (\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\cup$$
\end{document}), difference (−), domain (dom R), range (ran R), relation application \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$[ S^ \prime = R ( S ) :e ^\prime \in S ^\prime$$
\end{document} iff exists e s.t. \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$e \to e ^\prime \in R , e \in S ] .$$
\end{document} The detailed description of these operations is presented in Pugh and Wonnacott (1994).
In sequential loop nests, the iteration i executes before j if i is lexicographically less than j, denoted as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$i \prec j$$
\end{document}, that is, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${i_1} < {j_1} \vee \exists k \ge 1 :{i_k} < {j_k} \wedge {i_t} = {j_t} , { \rm{for}} \;t < k$$
\end{document}.
The positive transitive closure of a given lexicographically forward dependence relation R, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${R^ + }$$
\end{document} is defined as follows (Pugh and Rosser, 1999):
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
{R^ + } = \{ e \to e ^\prime : \;e \to e ^\prime \in R \; \vee \exists e \prime\prime s.t. \;e \to e \prime\prime \in R \wedge \;e \prime\prime \to e ^\prime \in {R^ + } \} .
\end{align*}
\end{document}
It describes which vertices e′ in a dependence graph (represented by relation R) are connected directly or transitively with vertex e.
A schedule is a function \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\sigma :LD \to \mathbb{Z}$$
\end{document} that assigns a discrete time of execution to each loop nest statement instance. A schedule is valid if for each pair of dependent statement instances, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${s_1} ( I )$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${s_2} ( J ) ,$$
\end{document} satisfying the condition \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${s_1} ( I ) \prec {s_2} ( J )$$
\end{document}, the condition \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\sigma ( {s_1} ( I ) ) < \sigma ( {s_2} ( J ) )$$
\end{document} holds true, that is, all dependences are preserved when statement instances are executed in an increasing order of schedule times.
In this article, we deal with two kinds of dependence graphs. A loop nest dependence graph is represented with a set of loop nest statement instances (nodes) and a set of dependences among those instances (edges). While the reduced dependence graph (RDG) of a program is the graph whose nodes are the assignment statements of the program, edges capture data dependences and the directions of the data dependence between statements. The first one can be unbounded, while the second always contains the limited number of nodes.
2.2. Generation of tiles for the SW filling stage loop nest for two sequences
Each \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${H_{i , j}}$$
\end{document} entry depends on the values of three neighboring entries \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${H_{i , j - 1}}$$
\end{document}, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${H_{i - 1 , j}}$$
\end{document}, and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${H_{i - 1 , j - 1}}$$
\end{document} and two nonlocal entries (Fig. 1a). Hence, the data dependence pattern of the SW algorithm is complicated but typical for nonserial polyadic dynamic programming (NPDP). The term nonserial polyadic stands for another family of DP with nonuniform data dependences, which is more difficult to be optimized (Liu et al., 2011).
Multiple entries of matrix \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$H.$$
\end{document}a(b) depicts the dependences of cell \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${H_{ij}}$$
\end{document} on three (seven) adjacent cells and two (six) nonlocal cells for two (three)-sequence alignment, respectively.
The loop nest from Listing 1 involves seven dependence relations. The union of all dependence relations is represented with relation R below.
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
R = \left\{ \begin{matrix} s0 \to s0: \{ [ i , j , k ] \to [ i , j , k \prime ] :i \le N \wedge 0 \le j \le M \wedge \;k > 0 \wedge k < k \prime \le i \} \hfill \\ s1 \to s1: \{ [ i , j , k ] \to [ i , j , k \prime ] :0 \le i \le N \wedge j \le M \wedge \;k > 0 \wedge k < k \prime \le j \} \hfill \\ s2 \to s2: \{ [ i , j , 0 ] \to [ i + 1 , j + 1 , 0 ] :0 \le i < N \; \wedge 0 \le j < M \} \hfill \\ s0 \to s2: \{ [ i , j , k ] \to [ i , j , 0 ] :0 \le i \le N \; \wedge j \le M \wedge \;0 < k \le i \} \hfill \\ s1 \to s2: \{ [ i , j , k ] \to [ i , j , 0 ] :0 \le i \le N \; \wedge j \le M \wedge \;0 < k \le j \} \hfill \\ s2 \to s0: \{ [ i , j , 0 ] \to [ i , j \prime , i \prime - i ] :0 \le j \le M \; \wedge i \ge 0 \wedge \;i < i \prime \le N \} \hfill \\ s2 \to s1: \{ [ i , j , 0 ] \to [ i , j \prime , j \prime - j ] :0 \le i \le N \; \wedge j \ge 0 \wedge \;j < j \prime \le M \} . \hfill \\\end{matrix} \right.
\end{align*}
\end{document}
To generate tiled code for the SW algorithm, we apply the approach presented in our article (Bielecki and Palkowski, 2016), which is based on the transitive closure of dependence graphs. Below, we present the results of applying that approach to the loop nest in Listing 1.
The iteration domain of the SW loop nest (Listing 1) is represented with the following set.
Let vector I = \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ ( i , j , k ) ^T}$$
\end{document} represent indices of the SW loop nest, diagonal matrix B = [\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${b_1} , \,{b_2} , \,{b_3}$$
\end{document}] define tile sizes, and vectors II = \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ ( ii , jj , kk ) ^T}$$
\end{document} and II′ = \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ ( iip , jjp , kkp ) ^T}$$
\end{document} specify tile identifiers. Each tile identifier is represented with a non-negative integer, that is, the following constraint II\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\ge$$
\end{document} 0 has to be satisfied.
First, we form a parametric set, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$TILE ( \textbf{\textit{II}} , \,\textbf{\textit{B}} )$$
\end{document}, including statement instances belonging to a parametric rectangular tile (parameters are tile identifiers) as follows.
TILE_LT (TILE_GT) is the union of all the tiles whose identifiers are lexicographically less (greater) than that of TILE(II, B):
For calculating exact relation \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${R^ + }$$
\end{document}, where R is the union of all dependence relations, we apply the ISL library (Verdoolaege, 2010). Next, we calculate the following set:
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
TILE \_ITR = TILE - {R^ + } ( TILE \_GT ) ,
\end{align*}
\end{document}
which does not include any invalid dependence target, that is, it does not include any dependence target whose source is within set TILE_GT.
includes all the iterations that (i) belong to the tiles whose identifiers are lexicographically less than that of set TILE_ITR, (ii) are the targets of the dependences whose sources are contained in set TILE_ITR, and (iii) are not any target of a dependence whose source belongs to set TILE_GT.
Target tiles are defined by the following set:
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
TILE \_VLD = TILE \_ITR \cup TVLD \_LT.
\end{align*}
\end{document}
It is worth noting that, for the considered loop nest, target tiles (i) for statement \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s0$$
\end{document} include only instances of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s0$$
\end{document}, (ii) for statement \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s1$$
\end{document} comprise only instances of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s1$$
\end{document}, and (iii) for statement \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s2$$
\end{document} include instances of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s0 , s1 , s2$$
\end{document}.
Next, we form set TILE_VLD_EXT by means of inserting into the first positions of the tuple of set TILE_VLD elements of vector II: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$ii , jj , kk$$
\end{document}.
The following step is to use the original schedule of the original loop nest statement instances, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$SCHED \_ORIG$$
\end{document}, to form a target set allowing for regeneration of serial valid code. The original schedule can be extracted by means of the Clan tool (Bastoul, 2004) and is shown below.
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
SCHED \_ORIG = \left\{ { \begin{matrix} {s0:0 , i , 0 , j , 0 , k} \\ \ {s1:0 , i , 0 , j , 1 , k , } \\ {s2:0 , i , 0 , j , 2 , k.} \\ \end{matrix} } \right.
\end{align*}
\end{document}
For each statement, we transform this schedule by means of repeating on its right hand side the same sequences as the original ones and replacing in their first halves of the resulting sequence indices \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$i , j , k$$
\end{document} for indices \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$ii , jj , kk$$
\end{document}. The number of sequences for each statement is defined with the number of statements whose instances constitute a valid target tile represented with set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$TILE \_VLD$$
\end{document}. This results in the following schedule.
Let us note that target tiles, formed for statement s2, comprise instances of statements s0, s1, and s2.
In the next step, we form the following relation, Rmap.
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
Rmap = \left. { \left\{ { \begin{matrix} { \; \; [ ii , jj , kk , i , j , k ] \to \;SCHED:} \\ {constraints \;of \;TILE \_VLD \_EXT} \\ \end{matrix} } \right.} \right\} .
\end{align*}
\end{document}
This relation maps each tile identifier and each statement instance within a tile from the original iteration space to those in the transformed iteration space.
Finally, we form target set, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$TILE \_VLD \_EXT ^\prime$$
\end{document}, as the range of relation Rmap\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
TILE \_VLD \_EXT ^\prime = ran \;Rmap.
\end{align*}
\end{document}
This set represents tile identifiers and statement instances within a tile in the transformed iteration space.
2.3. Tiled code parallelization
In our study (Palkowski and Bielecki, 2017), we discuss parallelization of tiled code by means of loop skewing, which honors all dependences among generated valid tiles. Nussinov's RNA folding and filling the SW score matrix algorithms have similar dependence patterns. Cell \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$( i , j )$$
\end{document} is dependent on local neighbors \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$( i - 1 , j ) , ( i , j - 1 ) , ( i - 1 , j - 1 )$$
\end{document} and nonlocal ones \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$( i - k , j )$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$( i , j - k )$$
\end{document}.
To parallelize serial tiled code, we apply the well-known loop skewing transformation (Wolfe, 1986). Loop skewing is a transformation that has been used to remap an iteration space by creating a new outermost loop whose index is a linear combination of two or more loop indices. Provided that this transformation is valid, its application to original code results in code whose outermost loop is serial while the reminding loops can be parallelized.
To form the index of the outermost loop, we use the following linear combination \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$ii + jj$$
\end{document}, where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$ii , jj$$
\end{document} are the indices of the first two loops in tiled code representing tile identifiers.
To apply the loop skewing transformation, we create the following relation:
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
R \_SCHED = \{ [ 0 , ii , 0 , jj , \ldots , 0 , i , 0 , j , \ldots ] \to [ 0 , ii + jj , 0 , jj , \ldots , 0 , i , 0 , j , \ldots ] : \\\,{ \rm{constraints}} \;{ \rm{of}} \;{ \rm{set}} \;TILE \_VLD \_EXT ^\prime \; \} .
\end{align*}
\end{document}
This relation maps tiles in the original iteration space to tiles in the target iteration space. We apply it to set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$TILE \_VLD ^\prime$$
\end{document}.
To prove the validity of relation \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$R \_SCHED$$
\end{document}, we form the following relation.
where
(*) means that J is the destination of the dependence whose source is I,
(**) means that I, J belong to the tiles with identifiers II and JJ, respectively,
(***) means that the schedule time of the tile with identifier II is greater or the same as that of the tile with identifier JJ, that is, if this condition is true, the schedule is invalid because the dependence I\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\to$$
\end{document}J is not respected (its target is scheduled earlier than its source).
Listing 2: Parallel tiled code implementing the SW algorithm (two sequences) with the tile size of 16 × 16 × 16.
for (c1 = 0; c1 < = floord (M + N − 2, 16); c1 + = 1) // ii + jj
#pragma omp parallel for
for (c3 = max (0, c1 + floord (−N, 16) + 1); c3 < = min (c1, floord (M − 1, 16)); c3 + = 1) //jj
/* SCHED for s0 */
for (c5 = 0; c5 < = c1 − c3; c5 + = 1) //kk
for (c9 = 16*c3 + 1; c9 < = min (M, 16*c3 + 16); c9 + = 1) //j
for (c9 = 16*c3 + 1; c9 < = min (M, 16*c3 + 16); c9 + = 1) { //j
for (c10 = max (0, 16*c1 − 16*c3 − c7 + 2); c10 < = min (1, −16*c3 + c9 − 1); c10 + = 1) {
if (c10 = = 1) {
for (c11 = 1; c11 < = c9; c11 + = 1) //k
/*s0*/ m2 [c7] [c9] = max (m2 [c7] [c9], H [c7] [c9 − c11] + W [c11]) ;
} else
for (c11 = 1; c11 < = c7; c11 + = 1) //k
/*s1*/ m1 [c7] [c9] = max (m1 [c7] [c9] , H [c7 − c11] [c9] + W [c11]);
}
/*s2*/ H [c7] [c9] = max (0, max ( H [c7 − 1] [ c9 − 1] + s (a[c7], b[c9]),
max (m1 [c7] [c9], m2 [c7] ) ) ) ;
}
}
This relation returns the empty set when all original inter-tile dependences are respected, otherwise it represents all the pairs of the tile identifiers for which original ones are not respected.
Relation \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$R \_VALID$$
\end{document} is empty for generated SW code; this proves the validity of relation \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$R \_SCHED$$
\end{document} to be used for code generation.
Target pseudocode is generated by means of applying the ISL AST code generator (Verdoolaege, 2010) allowing for scanning elements of the set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$R \_SCHED ( TILE \_VLD \_EXT ^\prime )$$
\end{document} in lexicographic order.
Then we postprocess this code to make it compilable by replacing pseudostatements for the original loop nest statements and insert the work-sharing OpenMP parallel for pragmas (OpenMP Architecture Review Board, 2015). Listing 2 presents the target parallel tiled code for the SW loop nest presented in Listing 1 for the tile size of 16 × 16 × 16.
2.4. Code generation of the SW local sequence alignment for three sequences
Multiple sequence alignments are computationally difficult to produce (much harder than that of pairwise alignment) and most formulations of the problem lead to NP-complete combinatorial optimization problems.
Scoring matrix H is similarly constructed to align cells ai, bj, and cl of three sequences, where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$1 \le i \le N , 1 \le j \le M , 1 \le l \le P$$
\end{document}. The size of the scoring matrix is (N + 1)*(M + 1)*(P + 1). Matrix H is first initialized with \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${H_{0 , 0 , 0}} = {H_{i , 0 , 0}} = {H_{0 , j , 0}} = {H_{0 , 0 , l}} = 0$$
\end{document} for all i, j, and l. Each element \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${H_{i , j , l}}$$
\end{document} is calculated as follows.
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
{H_{i , j , l}} = \max \left\{ { \begin{matrix} {{H_{i - 1 , j - 1 , l - 1}} + s ( {a_i} , {b_j} ) + s ( {b_j} , {c_l} ) + s ( {a_i} , {c_l} ) } \hfill \\ { \mathop { \max } \limits_{1 \le k < min ( j , l ) } ( {H_{i , j - k , l - k}} + s ( {b_j} , {c_l} ) - {W_k} ) } \hfill \\ { \mathop { \max } \limits_{1 \le k < min ( i , j ) } ( {H_{i - k , j - k , l}} + s ( {a_i} , {b_j} ) - {W_k} ) } \hfill \\ { \mathop { \max } \limits_{1 \le k < min ( i , l ) } ( {H_{i - k , j , l - k}} + s ( {a_i} , {c_l} ) - {W_k} ) } \hfill \\ { \mathop { \max } \limits_{1 \le k < i} ( {H_{i - k , j , l}} - 2*{W_k} ) } \hfill \\ { \mathop { \max } \limits_{1 \le k < j} ( {H_{i , j - k , l}} - 2*{W_k} ) } \hfill \\ { \mathop { \max } \limits_{1 \le k < l} ( {H_{i , j , l - k}} - 2*{W_k} ) } \hfill \\ {0.} \hfill \\ \end{matrix} } \right.
\end{align*}
\end{document}
Multiple entries of matrix H are much more complicated by data dependences, through which each cell entry depends on the values of seven entries (Fig. 1b). The filling stage requires one more loop for l. The number of loop nest statement is populated by q sequences and it is equal to \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${2^q}$$
\end{document} − 1. For two and three sequences with the same length n, computation of one element growths is from 3*n2 to 7*n3 iterations.
Listing 3: Calculating scoring matrix H using the SW algorithm for three sequences.
Listing 3 presents the loop nest implementing the SW local alignment for three sequences. The computational complexity of the SW algorithm for three sequences is \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${ \cal O} ( NMP ( N + M + P ) )$$
\end{document}.
Let us consider a dependence pattern for q sequences. In the RDG, we have one self-dependence (the source and target of a dependence are instances of the same loop nest statement) for each statement and two dependences between the last loop nest statement, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s6$$
\end{document}, and each of the reminding loop nest statements. There are 3*(\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${2^q}$$
\end{document} − 1) − 2 dependence relations in this pattern. For three sequences, we have 19 dependences in the RDG.
With a growing number of loops in a nest and the number of dependence relations, the calculation of transitive closure for the union of all dependence relations stays quickly more complex. For three-sequence alignment, the ISL library is not able to compute exact \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${R^ + }$$
\end{document} or an overapproximation of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${R^ + }$$
\end{document}. Fortunately, the data dependence pattern of the SW algorithm and particular properties of the used loop tiling technique allow us to calculate an underapproximation of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${R^ + }$$
\end{document} in short time and it can be used to generate valid tiled code.
To clarify this fact, let us consider first the RDG for the SW algorithm for two local sequences depicted in Figure 2a. Loop nest statements \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s0$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s1$$
\end{document} form two cliques* with statement \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s2$$
\end{document} (depicted with solid ellipses) in RDG.
Program-reduced dependence graphs for two- and three-sequence alignment of the SW algorithm. (a) Presents the reduced dependence graph for two sequences. Cliques are depicted with the solid ellipses. (b) Presents the reduced dependence graph for three sequences. SW, Smith–Waterman.
Exact transitive closure \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${R^ + }$$
\end{document} represents transitive paths among instances of statement \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s0$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s1$$
\end{document} (the red dashed arrow). Below, we show that those paths can be removed from \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${R^ + }$$
\end{document} and we still will obtain valid tiled code.
According to the tiling technique presented in our article (Bielecki and Palkowski, 2016), we form tiles for each loop nest statement separately, in other words, we tile separately the iteration space of each statement. The tile whose identifier is II comprises an invalid dependence target(s) if the source of the dependence with this target belongs to a tile whose identifier is greater than II. Analyzing the dependence pattern of the considered loop nest, we can conclude that only tiles formed for statements \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s0$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s1$$
\end{document} can include invalid targets because identifiers of tiles formed for statement \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s0$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s1$$
\end{document} are less than those formed for statement \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s2$$
\end{document} for the same values of indices \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$ii , jj , kk$$
\end{document} defining tile identifiers.
In general, to find invalid targets, we form set, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$TILE \_INV$$
\end{document} as follows.
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
TILE \_INV: = TILE \cap {R^ + } ( TILE \_GT ) ,
\end{align*}
\end{document}
where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${R^ + }$$
\end{document} is the exact transitive closure of the loop nest dependence graph. We can obtain the same set if instead of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${R^ + }$$
\end{document} we apply the following underapproximation of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${R^ + }$$
\end{document}, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$R \_UNDER$$
\end{document}:
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
R \_UNDER = \bigcup \limits_{0 \le i \le 1} {{{ ( {R_i} ) }^ + }} ,
\end{align*}
\end{document}
where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\mathop {{R_i}} \limits_{0 \le i \le 1} = {R_{i2}} \cup {R_{2i}} \cup {R_{22}} \cup {R_{ii}}$$
\end{document}, that is, Ri is the union of all the dependence relations associated with the ith clique.
There is no problem to calculate the exact transitive closure for each clique applying the ISL library.
We call that relation the underapproximation of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${R^ + }$$
\end{document} because it does not represent transitive paths among instances of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s0$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s1$$
\end{document}, that is, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$R \_UNDER \subset {R^ + }$$
\end{document}.
From the dependence pattern of the considered loop nest, we may conclude that (i) there is no direct path among instances of statements \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s0$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s1$$
\end{document}, (ii) each transitive path among instances of statements \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s0$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s1$$
\end{document} goes via an instance of statement \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s2$$
\end{document}. Taking into account that set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$TILE \_GT$$
\end{document} for statements \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s0$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s1$$
\end{document} includes tiles formed for statement \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s2$$
\end{document}, we can exclude all transitive paths among instances of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s0$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s1$$
\end{document} because direct paths among instances of statements \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s2$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s0$$
\end{document} as well as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s2$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s1$$
\end{document} will define the same invalid targets as those obtained with transitive paths among instances of statements \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s0$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s1$$
\end{document}.
So, to calculate set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$TILE \_ITR$$
\end{document}, we can use relation \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$R \_UNDER$$
\end{document} instead of relation \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${R^ + }$$
\end{document}.
Invalid targets are to be included in the tiles that (i) comprise the sources of the dependences whose targets are invalid, (ii) have the lexicographically maximal identifiers among all those identifiers of the tiles, including sources of the dependence whose target is invalid, for details refer to our article (Bielecki and Palkowski, 2016).
Mathematically, invalid targets, to be included in the tile with identifier II, are comprised in the following set.
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
TVLD \_LT ( II ) = ( {R^ + } ( TILE \_ITR ) \cap TILE \_LT ) - {R^ + } ( TILE \_GT ) .
\end{align*}
\end{document}
As we concluded above, the set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${R^ + } ( TILE \_GT )$$
\end{document} can be replaced with the set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$R \_UNDER ( TILE \_GT )$$
\end{document}.
Let us recall that set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$TILE \_ITR$$
\end{document} represents tiles not including invalid dependence targets. So the set
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
TVLD \_LT ( II ) = ( {R^ + } ( TILE \_ITR ) \cap TILE \_LT )
\end{align*}
\end{document}
comprises invalid targets to be included in the tile with identifier II.
From the dependence pattern, presented in Figure 2a, we can conclude that invalid targets available in tiles formed for statements \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s0$$
\end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s1$$
\end{document} can be moved only to the tiles formed for statement \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s2$$
\end{document} because there does not exist a direct dependence among them, there are only transitive paths between them each going via an instance of statement \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s2$$
\end{document}.
So, in the formula above, defining set \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$TVLD \_LT$$
\end{document}, we can replace relation \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${R^ + }$$
\end{document} for relation \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$R \_UNDER$$
\end{document}.
Figure 2b illustrates the RDG for three-sequence alignment. The structure of this graph is similar to that for two-sequence alignment. It includes seven nodes representing loop nest statements and six cliques. So, we can form the following relations, Ri, each associated with the ith clique.
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
\mathop {{R_i}} \limits_{0 \le i \le 5} = {R_{i6}} \cup {R_{6i}} \cup {R_{66}} \cup {R_{ii}}.
\end{align*}
\end{document}
Then we calculate an underapproximation, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$R \_UNDER$$
\end{document}, of the exact transitive closure of the dependence graph associated with the loop nest, implementing three-sequence alignment, as follows.
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
R \_UNDER = \bigcup \limits_{0 \le i \le 5} {{{ ( {R_i} ) }^ + }} .
\end{align*}
\end{document}
In the same way as for two-sequence alignment, we can validate that instead of relation \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${R^ + }$$
\end{document}, relation \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$R \_UNDER$$
\end{document} can be used to generate a valid tiled loop nest implementing three-sequence alignment.
To generate and validate parallel tiled code, implementing three-sequence alignment, we use the same approach as that for two sequences presented in the previous subsection. Because generated code is too long to be inserted in the article, we inserted it at website http://traco.sourceforge.net/sw3d.html
In general, for q sequences, we form an underapproximation of transitive closure as follows.
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}
R \_UNDER = \bigcup \limits_{0 \le i \le {2^q} - 3} {{{ ( {R_i} ) }^ + }} ,
\end{align*}
\end{document}
where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\mathop {{R_i}} \limits_{0 \le i \le {2^q} - 3} = {R_{i{{ ( 2}^q} - 2 ) }} \cup {R_{{{ ( 2}^q} - 2 ) i}} \cup {R_{ ( i{2^q} - {{2 ) ( 2}^q} - 2 ) }} \cup {R_{ii}}$$
\end{document}. Ri is the union of all the dependence relations associated with the i-th clique in the RDG for the loop nest implementing alignment for q sequences.
Then, applying \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$R \_UNDER$$
\end{document}, we can generate parallel tiled code in the same way as that used for code generation for two and three sequences.
3. Results and Discussion
The presented approach has been implemented as a part of the polyhedral source-to-source TRACO compiler (http://traco.sourceforge.net). It takes on input an original loop nest in the C language, a tile size, and affine transformations for each loop nest statement to parallelize serial tiled code. All parallel tiled codes, implementing the SW algorithms, were generated by means of the Intel C++ Compiler (icc 17.0.1) with the -O3 flag of optimization.
This section presents speedup† of generated parallel tiled code. We consider alignment of two and three sequences of the same length N. To carry out experiments, we used an Intel Xeon E5-2699 v3 multiple-processor (2.3 Ghz in base and 3.6 Ghz in turbo, 36 threads, 45 MB Cache).
For generated tiled code aligning two sequences, we empirically recognized that the best tile size is 1 × 128 × 16 and the most efficient work-sharing is achieved by applying the OpenMP for directive (OpenMP Architecture Review Board, 2015).
Table 1 presents the execution times of the original serial code (Listing 1) and parallel tiled code implementing the SW two-sequence alignment algorithm (Listing 2) from 500 to 5000 length of randomly generated sequences. As we can see, for all cases, the execution time of the tiled code is shorter than that of the original one and it reduces with increasing the number of threads. We notice also superlinear speedup (>32), which is illustrated in Figure 3 in a graphical way, when the length of sequences varies from 3000 to 4000.
Speedup of parallel tiled codes implementing the SW algorithm aligning 2 sequences with 32 threads run on Intel Xeon E5-2699 v3. The horizontal coordinate represents the length of sequences, the vertical one shows the speedup of the examined codes.
Execution Time (in Seconds) of the Smith–Waterman Parallel Tiled Code for Alignment of Two Sequences of Various Lengths, N, and 32 Threads
N
Original
TRACO [1 × 128 × 16]
Pluto [8 × 8 × 1]
500
0.23
0.06
0.02
1000
1.28
0.27
0.12
1500
8.85
0.60
0.77
2000
23.57
1.13
1.90
2500
53.52
1.79
4.11
3000
104.49
2.53
7.40
3500
130.60
3.64
13.77
4000
186.93
5.25
22.58
4500
239.31
9.81
33.49
5000
331.77
14.08
46.66
We present also the time and speedup of parallel tiled code produced with the state-of-the-art Pluto optimizing compiler (version 0.11.4) (Bondhugula et al., 2008), which does not enable to tile the third innermost loops in the SW loop nest of the filling stage. The tile size was chosen to be [8 × 8 × 1] empirically, as suggested by experiments with various tile sizes for Pluto tiled SW code. The results show that the tiled code generated with the proposed approach outperforms that generated with standard affine transformations found and applied with Pluto for longer sequences. Lengths greater than 1000 allow for effective cache reuse at execution of parallel tiled code. Tiling the innermost loops becomes a dominant factor in reaching high tiled code performance.
The results in Table 2, graphically presented in Figure 4, demonstrate that our tiled code is scalable, that is, increasing the number of threads increases code speedup.
Speedup of parallel tiled codes implementing the SW algorithm aligning 2 sequences of the length of 5000 run on Intel Xeon E5-2699 v3. The horizontal coordinate represents the number of threads, the vertical one shows the speedup of the examined codes.
Execution Time (in Seconds) of the Smith–Waterman Parallel Tiled Code for Two-Sequence Alignment with the Same Length N = 5000
Threads
Original
TRACO [1 × 128 × 16]
Pluto [8 × 8 × 1]
1
215.60
254.45
2
91.97
211.54
4
52.45
169.50
8
331.77
29.59
127.56
16
18.96
83.54
24
14.70
53.30
32
14.08
46.66
For three-sequence alignment code, we empirically discover that the optimal original tile size maximizing code speedup is [16 × 16 × 16 × 16]. Tables 3 and 4 show that this code is scalable with increasing the length of sequences and the number of threads. Pluto fails to generate any tiled or parallel code for the three-sequence alignment algorithm implemented with the loop nest presented in Listing 3.
Execution Time (in Seconds) and Speedup of the Smith–Waterman Parallel Tiled Code for Alignment of Three Sequences of Various Lengths, N, and 32 Threads
N
Original
TRACO [16 × 16 × 16 × 16]
Speedup
100
0.57
0.22
2.60
200
7.79
1.98
3.93
300
43.10
7.06
6.10
400
166.65
21.35
7.81
500
621.83
65.88
9.44
Execution Time (in Seconds) and Speedup of the Smith–Waterman Parallel Tiled Code for Three-Sequence Alignment with the Same Length N = 500 and a Different Number of Threads
Threads
Original
TRACO [16 × 16 × 16 × 16]
Speedup
1
636.38
0.98
2
219.08
2.84
4
142.48
4.36
8
621.83
82.99
7.49
16
74.74
8.32
24
68.12
9.13
32
65.88
9.44
It is worth noting that the global sequence alignment, realized by the Needleman–Wusch (NW) algorithm (Needleman and Wunsch, 1970)‡, can be implemented in parallel tiled code in the same manner as that used for the SW algorithm. This algorithm has the same dependence pattern and array accesses as those of the SW algorithm. There are only differences concerning initialization and trace-back steps§. Hence, the presented approach can be applied to generate parallel tiled code implementing the NW algorithm and we can expect the same time benefits as those obtained for code implementing the SW algorithm.
Summing up, we conclude that parallel tiled code, generated with the presented approach for two-sequence alignment, outperforms that generated with Pluto, which fails to tile the innermost loops. Tiling that loop is crucial to obtain high code locality for the examined algorithm. The presented approach allows us to generate parallel tiled code implementing three-sequence alignment and exposing satisfactory speedup, while Pluto, implementing affine transformation techniques, fails to generate any code for this case.
4. Conclusion
This article presents automatic tiling and parallelization of code implementing the SW algorithm for local sequence alignment with an arbitrary gap penalty. We demonstrate how to generate efficient parallel tiled code for aligning two and three sequences. The transitive closure of dependence graphs is used to tile original code, whereas for finding parallel tiles the affine loop skewing transformation is applied.
For three-sequence alignment, we proposed and validated the application of an underapproximation of transitive closure instead of exact transitive closure. This allows us to significantly reduce the computational complexity of the tiling approach and generate parallel tiled code.
An experimental study demonstrates significant parallel tiled code speedup achieved on a modern multicore computer system. Generated codes outperform those produced with the state-of-the-art affine transformation techniques.
The presented approach is the continuation of our research aimed at effective tiling and parallelization of computational biology NPDP codes. In future, we are going to accelerate codes implementing other algorithms for sequence alignment, in particular scoring algorithms by strips and progressive alignment for MSA. Furthermore, we plan to carry out research on other important DP kernels of bioinformatics, which code cannot be automatically optimized by means of well-known loop nest transformations.
Footnotes
Author Disclosure Statement
The authors declare that no competing financial interests exist.
*
A maximal subset in the directed graph where each pair of nodes are connected themselves in any direction.
†
Ratio of original code execution time to transformed code execution time.
‡
Smith and Waterman () modified the Needleman–Wunsch algorithm so as to determine the best local alignment.
§
The NW scores can be negative while the negative SW scores are set to 0. Hence, the NW filling is the same except that it does not consider 0 for maximum values in the scoring array.
References
1.
AltschulS.F., GishW., MillerW., et al.1990. A basic local alignment search tool. J. Mol. Biol. 215, 403410.
2.
BastoulC.2004. Code generation in the polyhedral model is easier than you think. PACT’13 IEEE International Conference on Parallel Architecture and Compilation Techniques, Juan-les-Pins, pp. 7–16.
3.
BieleckiW., and PalkowskiM.2016. Tiling arbitrarily nested loops by means of the transitive closure of dependence graphs. Int. J. Appl. Math. Comput. Sci., 26, 919–939.
4.
BondhugulaU., HartonoA., RamanujamJ., et al.2008. A practical automatic polyhedral parallelizer and locality optimizer. SIGPLAN Not. 43, 101–113.
5.
GrieblM.2004. Automatic parallelization of loop programs for distributed memory architectures. Habilitation thesis, University of Passau.
6.
IoossG., RajopadhyeS., AliasC., et al.2015. Mono-parametric tiling is a polyhedral transformation. Research Report RR-8802, INRIA Grenoble Rhône-Alpes; CNRS.
7.
LiI.T.S., ShumW., and TruongK.2007. 160-Fold acceleration of the Smith-Waterman algorithm using a field programmable gate array (FPGA). BMC Bioinformatics, 8, 185.
8.
LiuL., WangM., JiangJ., et al.2011. Efficient nonserial polyadic dynamic programming on the cell processor. In IPDPS Workshops. IEEE, Anchorage, Alaska, pp. 460–471.
9.
LiuY., WirawanA., and SchmidtB.2013. CUDASW++ 3.0: Accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions. BMC Bioinformatics, 14, 117.
10.
ManavskiS., and ValleG.2008. Cuda compatible gpu cards as efficient hardware accelerators for Smith-Waterman sequence alignment. BMC Bioinformatics, 9 (Suppl 2), S10.
11.
Marmolejo-TejadaJ.M., Trujillo-OlayaV., Renteria-MejiaC.P., and Velasco-MedinaJ.2014. Hardware implementation of the Smith-Waterman algorithm using a systolic architecture. In LASCAS. IEEE, Santiago, Chile, pp. 1–4.
12.
MullapudiR.T., and BondhugulaU.2014. Tiling for dynamic scheduling. In RajopadhyeS., and VerdoolaegeS. eds. Proceedings of the 4th International Workshop on Polyhedral Compilation Techniques. Vienna, Austria.
13.
NeedlemanS.B., and WunschC.D.1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol., 48, 443–453.
14.
OpenMP Architecture Review Board.. 2015. OpenMP application program interface version 4.5.
15.
PalkowskiM., and BieleckiW.2017. Parallel tiled Nussinov RNA folding loop nest generated using both dependence graph transitive closure and loop skewing. BMC Bioinformatics, 18, 290.
16.
PearsonW.R., and LipmanD.J.1985. Rapid and sensitive protein similarity searches. Science, 227, 1435–1441.
17.
PughW., and RosserE.1999. Iteration space slicing for locality, 164–184. In GaoG.R., PollockL.L., CavazosJ., and XiaomingL. eds. LCPC, volume 1863 of Lecture Notes in Computer Science. Springer, La Jolla, CA.
18.
PughW., and WonnacottD.1994. An Exact Method for Analysis of Value-Based Array Data Dependences. Springer, Berlin, Heidelberg, pp. 546–566.
19.
RucciE., GarciaC., BotellaG., et al.2017. Accelerating Smith-Waterman Alignment of Long DNA Sequences with OpenCL on FPGA. Springer International Publishing, Cham, pp. 500–511.
20.
SbirleaA., ShirakoJ., PouchetL.-N., et al.2016. Polyhedral optimizations for a data-flow graph language, 57–72. In Revised Selected Papers of the 28th International Workshop on Languages and Compilers for Parallel Computing—Volume 9519, LCPC 2015. Springer-Verlag New York, Inc., New York, NY.
21.
SmithT., and WatermanM.1981. Identification of common molecular subsequences. J. Mol. Biol., 147, 195–197.
22.
VerdoolaegeS.2010. isl: An integer set library for the polyhedral model, 299–302. In Mathematical Software—ICMS 2010, volume 6327 of Lecture Notes in Computer Science. Springer, Berlin Heidelberg.
23.
WolfeM.1986. Loops skewing: The wavefront method revisited. Int. J. Parallel Programm., 15, 279–293.
24.
WonnacottD., JinT., and LakeA.2015. Automatic tiling of “mostly-tileable” loop nests. IMPACT 2015: 5th International Workshop on Polyhedral Compilation Techniques, Amsterdam, The Netherlands.
25.
XueJ.2000. Loop Tiling for Parallelism, volume 575 of The Springer International Series in Engineering and Computer Science. Springer, United States.
26.
ZhaoC., and SahniS.2015. Cache and energy efficient alignment of very long sequences. 5th IEEE International Conference on Computational Advances in Bio and Medical Sciences (ICCABS). IEEE, Miami, FL, pp. 1–6.