IMPERIAL COLLEGE LONDON DEPARTMENT OF ELECTRICAL AND ELECTRONIC ENGINEERING **EXAMINATIONS 2007** MSc and EEE/ISE PART IV: MEng and ACGI Corrected Copy ## SYNTHESIS OF DIGITAL ARCHITECTURES Thursday, 3 May 10:00 am Time allowed: 3:00 hours There are FIVE questions on this paper. Answer THREE questions. All questions carry equal marks Any special instructions for invigilators and information for candidates are on page 1. Examiners responsible First Marker(s): G.A. Constantinides Second Marker(s): C. Bouganis ## SYNTHESIS OF DIGITAL ARCHITECTURES ## Notation The following notation is used in this exam paper. - ullet $\mathbb{Z}$ : the set of integers. - N: the set of natural numbers. - $\mathcal{P}(S)$ : the power set of a set S, i.e. the set containing all subsets of S. ## The Questions - 1. In this question, the set of statements defined in some nested loop code is denoted V. The set of iterations of the loop nest, the *iteration space*, is denoted $\mathscr{I}$ . Throughout the question, execution time of loop index calculations may be neglected, and all statements take a single cycle to execute. - Discuss the relative merits of the following alternative scheduling procedures for nested-loop code: - Unrolling the loops and using a list-scheduling algorithm on the resulting straight-line code, - ii) Imposing an affine-by-statement form on the scheduling function $S: V \times \mathscr{I} \to \mathbb{N}$ . [4] - b) Consider the code shown in Figure 1.1. - i) Express $\mathscr{I}$ in the form $\mathscr{I} = \{i | Ai \leq b \text{ and } i \in \mathbb{Z}^n \}$ , for some matrix A and vector b. [4] ii) Identify any data dependences present in this code, and write the corresponding scheduling constraints in terms of iteration vector i and affine-by-statement parameter t, where the scheduled start time of iteration i is given by $t^T i$ . [4] iii) Use the *vertex method* to identify appropriate additional constraints (linear in t), bounding the execution time below a further variable. Hence write a linear program to schedule this code. [5] Suggest a solution to this linear program and hence state a scheduling function explicitly. ``` for i_1 = 2 to 10 for i_2 = i_1 to 12 - i_1 s[i_1][i_2] = s[i_1-1][i_2] + s[i_1][i_2-1] end ``` Figure 1.1 Some nested loop code - 2. In this question, you are asked to derive a methodology for resource binding a set of identical operations V. You are given a pre-defined conflict graph G(V, E) which is not necessarily an interval graph. No register sharing is performed in this question. - Explain the relationship between graph colouring and resource binding a resource dominated circuit. [5] b) Using an appropriate set of binary decision variables, $\{x_{vc}\}$ , with the interpretation $x_{vc} = 1$ iff operation $v \in V$ has colour $c \in \mathbb{N}$ , formulate an ILP for colouring G(V, E) using the minimum number of colours. [6] c) Suggest appropriate functions to model the growth of multiplexer area and combinational delay with the number of data inputs, assuming all data inputs are of an identical word-length. [3] d) Hence modify your ILP so that it can be used to minimize the silicon area (resources + multiplexers) of a resource binding, given both a conflict graph G(V,E) and an upper-bound S on the worst-case combinational delay introduced by resource-sharing multiplexers. [6] - 3. A significant problem in architectural synthesis is to quickly estimate the capacitance of wires given a netlist, without any prior information on physical placement on silicon. Capacitance is approximately proportional to wirelength in VLSI design. Each wire has one source and one or more destinations on the chip. The bounding box of a wire is the smallest rectangle on the chip that encloses both the source and all destinations of a wire, and that has vertical and horizontal sides (i.e. its sides are parallel to the axes). The fanout of a wire is the number of destinations. Typically the source and destinations of a wire are approximated as the respective centres of their (rectangular) source and destination resources. - a) Half the perimeter of the bounding box is often used as a measure of wirelength. Informally justify this measure by example, and suggest any possible shortcomings. (*Hint: Consider separately examples of fanout 1, 2, and 3*). [4] b) Let R denote the set of resources and $E \subseteq \mathcal{P}(R)$ denote a set of nets (wires) connecting resources such that $\{r_1, r_2, \ldots, r_n\} \in E$ iff resources $r_1, r_2, \ldots, r_n$ are connected via a single net. Let a subset $R' \subseteq R$ of resources have fixed x and y centre coordinates determined a priori, given by $x: R' \to \mathbb{R}$ and $y: R' \to \mathbb{R}$ , respectively. Let the width and height of the resources be given by $w: R \to \mathbb{R}$ and $h: R \to \mathbb{R}$ , respectively. Formulate a floorplanning mixed integer linear program involving the variables listed in Table 3.1, with the objective to minimize the total length of wiring in the design. [6] c) Comment on how well the mixed-ILP methodology matches the stated requirement 'to quickly estimate the capacitance of wires given a netlist'. [3] d) To simplify the estimation problem, it has been proposed to remove the 'no overlap' constraints in the floorplan. Write the correspondingly simplified mathematical program, and comment on the worst-case solution time for this modified program. [4] e) Comment on the accuracy of the resulting estimate. Table 3.1. Floorplanning variables. | $x_i$ | The <i>x</i> -coordinate of the centre of resource $i \in R$ . | |--------------------------|-----------------------------------------------------------------------------------| | $y_i$ | The y-coordinate of the centre of resource $i \in R$ . | | $\delta_{ij}, \eta_{ij}$ | Binary variables encoding relative placement of resources $i \in R$ and $j \in R$ | | 8e | The coordinate of the rightmost edge of the bounding box enclosing $e \in E$ . | | $q_e$ | The coordinate of the leftmost edge of the bounding box enclosing $e \in E$ . | | $a_e$ | The coordinate of the topmost edge of the bounding box enclosing $e \in E$ . | | Se | The coordinate of the bottommost edge of the bounding box enclosing $e \in E$ . | - 4. You are designing a special-purpose co-processor. The co-processor reads a data value $x \in [0, \pi/2]$ from an input, calculates the expressions $y = x * \cos(x) x$ and $z = (x + \sin(x)) 3$ , and writes the outputs y and z. Multiplication takes two cycles, whereas addition, subtraction, external reads and writes all take a single cycle. The entire behaviour must complete within 5 cycles. Additional reference material for this question is given in Table 4.1. - a) Draw a CDFG for this behaviour, leaving the function calls as unexpanded F nodes. [3] b) Perform an ASAP and ALAP schedule for all nodes in this CDFG, expressing your answers in terms of parameters $d_1$ and $d_2$ , corresponding to the unknown delays of the cos and sin functions, respectively. [4] c) If the functions are to be approximated by polynomial evaluation using Horner's scheme, determine the greatest feasible order of the two polynomials. [4] d) Hence determine appropriate coefficients of maximal-accuracy (in the uniformly weighted least-squares sense) polynomials $f(x) = c_0 + c_1x + ... + c_nx^n$ for the two functions, and the corresponding CDFGs in the lower level of CDFG hierarchy. Show all your working. [7] e) Is the greatest feasible polynomial order under Horner's scheme equal to the greatest feasible order regardless of evaluation scheme? Justify your answer. [2] Table 4.1. Some Orthogonal Polynomials over [-1, 1]. | Chebyshev-I | Legendre | |------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------| | $\phi_i(x) = 2^{i-1} \prod_{k=1}^{i} \left\{ x - \cos \left[ \frac{(2k-1)\pi}{2i} \right] \right\}.$ | $\phi_i(x) = \frac{1}{2^i i!} \frac{d^i}{dx^i} \{ (x^2 - 1)^i \}.$ | On a modern field-programmable gate array (FPGA), there is a mixture of fine-grain logic elements, typically small ROMs known as lookup tables (LUTs), and dedicated circuitry for performing the common operations of multiplication and data storage (RAMs). These resource types can be expressed as $R = \{L, M, S\}$ , for LUT, Multiplier, and Storage, respectively. This question concerns the mapping of CDFGs consisting of addition, multiplication, and memory nodes onto such a technology. Associated with a CDFG G(V, E) is a type function $T: V \to \mathcal{P}(R)$ that indicates which possible FPGA components could implement each given node. Addition can be implemented only by LUTs, while multiplication can be implemented by either LUTs or multipliers and data storage can be implemented by LUTs or RAMs. The delay of a node $v \in V$ when implemented using resource type $r \in R$ is denoted $d_{vr}$ . The area of a node $v \in V$ when implemented using resource type $r \in R$ is denoted $d_{vr}$ . Only ASAP scheduling, and no resource sharing should be performed in this question. a) Using binary decision variables $\{x_{\nu r}\}$ , with the interpretation $x_{\nu r}=1$ iff node $\nu \in V$ is implemented using resource type $r \in R$ , formulate an ILP to model the combined ASAP scheduling and module selection for this problem. The objective should be to minimize the overall latency of the CDFG, and a feasible solution must use no more than total area A (a given constant). [9] b) FPGAs are *configurable* devices, meaning that they should be able to implement any one of a *set* of CDFGs. The same resource can be used to implement different nodes in different CDFGs, but no resource sharing should be performed *within* a CDFG. A set of CDFGs $\{G_1(V_1, E_1), G_2(V_2, E_2), \ldots, G_k(V_k, E_k)\}$ is given. Formulate (and explain) an ILP to find the appropriate allocation of silicon area to each of the three component types in order to minimize the worst case overall latency across all CDFGs while using no more than a total area A (a given constant). [8] Given that $a_{vr} \le a_{vr'}$ for all v whenever r = M and r' = L and also whenever r = S and r' = L, is it ever possible that a minimum-area FPGA resulting from the preceding formulation has $x_{vr} = 1$ for some $v \in V$ and r = L? Justify your answer. Synthesis of Digital Architectures SILITIONS 2007 E4-43 15E H. 43 a) (i) => (orge (exponential) wivere in code since => AOI (orge run time, poor houristic quility (of 1. (depeling on heuristice) (ii) => restricted penetrond from >> Con't exprove all //im. (RODKWORK) [1.7 (BOOKWORK) [4] b) (i) I = {x | Aa 5b 1 a & Z^3} (NEW COMPUTED EXAMPLE) $A = \begin{pmatrix} -1 & 0 \\ +1 & 0 \\ +1 & -1 \\ +1 & +1 \end{pmatrix} \qquad b = \begin{pmatrix} -2 \\ 10 \\ 0 \\ 12 \end{pmatrix}$ of Two dependences $\binom{i_1}{i_2} = \binom{i_1}{i_2-1}$ (1 0)t 7, 4 (1) Winds (01) t 7/1 (2) SOLUTIONS 2007 If Vertices at $$\binom{2}{2}$$ , $\binom{6}{6}$ , $\binom{10}{2}$ $$(2 \ 2)t \neq + | \langle \lambda | (3)$$ $$(6 6) t + 1 5 \lambda$$ (4) $$(10 2) t + 1 \leq \lambda \qquad (5)$$ $$\xi : \xi : (1) - (\mathbf{S}).$$ $$t = \begin{pmatrix} 1 \\ 1 \end{pmatrix}$$ $$(v_i)$$ $$S(i) = (1 | 1)i$$ = $i_1 + i_2$ . Q 2 (BOOKWORK) a) Nodes & Operations Edges & Resource conflicts [2]Graph colouring >> Min # resources b) Need at most $|V|^2$ variables - one colour per (NEW APPLICATION Note max. OF THEORY) min Zyc s.t. Y {v, , vz } e E Y c e {1, ..., |v|}, (1)XVICT XVZC El ∀ v ∈ V ∀ c ∈ {1, ..., 1 v 1} Me > xvc (2) c) Area (n) = K, n (NEW COMPUTED EXAMPLE) Delay(n) = K2 log2n Q2 (NEW APPLICATION OF THEORY) d) We require $u_2 \log_2 n \leq S$ =) n \ 2 , a constant. So: min: [V] CET cingulo). CET (MC + 2K, Zavc) VEV 5.4. (1) - (2) $\forall c$ $\in \{1, ..., |V|\}$ $\forall v \in V$ $\forall z \in \mathbb{Z}$ $\forall x \in \mathbb{Z}$ (3) [6] Q3. a) 2 permeter is min Manhttan werelength (NEW THEORY) pr janot = 1 or 2: 1 for prot 3 or more This is not the cone b) Min: $\sum (ge - ge + ge - Se)$ (NEW APPLICATION OF THEORY) S.L. HeEE HIEE Jerry; ge Ex; ae 7, y; Se Ey; A: EKAjEK | Wgij + Mwij + x: -x; > = [(4:)+4:)) Mdij + M(1-8mij) + 2j-3; 7, 2 (wi)+4j) (2) $M(1-\delta ij) + Mmij + Mi - mj = 2 = (h(i) + h(j))$ $M(1-\delta ij) + M(1-mij) + mj - mj = 2 = (h(i) + h(j))$ $\delta ij \in \{0,13, mij \in 0,1\}$ $\forall i \in R'$ $\begin{cases} \alpha_i = \alpha(i) \\ y_i = y(i) \end{cases}$ Doc) Bodly, as # citize vons is II(IRI2) and MILP run time is IL(f#vors). (NEW COMPUTED EXAMPLE) [3] d) Renne (2) I all intèger vors. Nou a pure LP => ply run time. (NEW APPLICATION OF THEORY) [4] e) If willted heights are stull compared to tied a dy coordiles, likely to be occurre (your THORY) C3J 6 | order | min cegales | |-------|-------------| | 0 | 10 | | 1 | 3 | | 2 | 6 | | | 1 | (NEW APPLICATION OF THEORY) Require $$\max(d_1+3, d_2+2) \in 5$$ => $d_1 \in 2$ & $d_2 \in 3$ Max order of $\sin = 1$ $\cos = 0$ [4] (NEW COMPUTED EXAMPLE) $$\langle \phi_0, \phi_0 \rangle = \int_{-1}^{1} du = 2$$ $\langle \phi_1, \phi_1 \rangle = \int_{-1}^{1} u^2 du = \frac{1}{3}u^3 = \frac{2}{3}$ $\Rightarrow \alpha = \frac{4\alpha}{\pi} - 1 \in [-1, 1]$ (i) sin(x) $$\langle \sin \frac{\pi(u+1)}{4}, \phi_0 \rangle = \int_{-1}^{1} \sin \frac{\pi(u+1)}{4} du$$ $$= -\frac{4}{\pi} \cos \left( \frac{\pi(u+1)}{4} \right) \Big|_{-1}^{1}$$ $$= -\frac{4}{\pi} \left( 0 - 1 \right)$$ $$= \frac{4}{\pi}$$ $$\begin{array}{lll} &=& \int_{-1}^{1} u \sin \frac{\pi(\alpha+1)}{4} du \\ &=& -\frac{4u}{\pi} \cos \frac{\pi(\alpha+1)}{4} + \int_{-1}^{1} \frac{4}{\pi} \cos \frac{\pi(\alpha+1)}{4} du \\ &=& -\frac{4u}{\pi} \left( \cos \frac{\pi(\alpha+1)}{4} + \int_{-1}^{1} \frac{4}{\pi} \cos \frac{\pi(\alpha+1)}{4} du \right) \\ &=& -\frac{4u}{\pi} \left( \cos \frac{\pi(\alpha+1)}{4} + \left( \frac{4u}{\pi} \right)^{2} \sin \frac{\pi(\alpha+1)}{4} \right) \\ &=& -\frac{4u}{\pi} \left( \cos \frac{\pi(\alpha+1)}{4} + \left( \frac{4u}{\pi} \right)^{2} + \left( \frac{4u}{\pi} \right)^{2} \right) \\ &=& -\frac{4u}{\pi} \left( \frac{4u}{\pi} - 1 \frac{4u$$ (ii) $$\frac{(a)}{(a+1)}$$ $\frac{(a+1)}{(a+1)}$ $\frac{(a+1)$ e) while in general they may not be equal, e.g. Estin's wellood, in this example they are. No re-anagened of a degree of or degree ! polynomial can speed-up its execution. 95 a) let us dente the sink node as V\*. (NEW APPLIENTION) OF THEORY) min: SV\* Sit. $\forall (v_1, v_2) \in E$ Sv\_ > Sv\_ + \subseteq dvr \times\_vr rest(V) ANEN Zorn = 1 reT(v) > Javr Xvr & A VEV rET(V) [9] b) We will use binony voribles avr for VE VIUVZU --- UVx on rER. min: } s.t. Yxi & {1,..., n} Sv\* x & x ∀i ∈ (1, ..., N) ∀(V,, Vz) ∈ €; Svz >, Sv, + Z xvr VEV; ret(v) 1) [8] C) Yes, because although LUT-based rooks are larger, they could potentially be shared across more bartok COFGs. For eagle, when one COFG contains multipliate but another Joen't. (NEW THEORY) (3)