DEPARTMENT OF ELECTRICAL AND ELECTRONIC ENGINEERING **EXAMINATIONS 2004** MSc and EEE/ISE PART IV: MEng and ACGI ## SYNTHESIS OF DIGITAL ARCHITECTURES Monday, 17 May 10:00 am Time allowed: 3:00 hours There are FIVE questions on this paper. **Answer THREE questions.** **Corrected Copy** All questions carry equal marks Q3 c) Any special instructions for invigilators and information for candidates are on page 1. Examiners responsible First Marker(s): G.A. Constantinides Second Marker(s): P.Y.K. Cheung 1 In numerical analysis, a common task is to estimate the value of the differential of a function. Some pseudo-code for this task is shown below. ``` diff( x, h, f0, f1, f2 ) { p = x/h; t = (p-0.5)*f0 - 2*p*f1 + (p+0.5)*f2; return (t/h); } ``` This behaviour is to be implemented by an architecture in which there are three types of resource: adder/subtractor (latency 1 cycle); multiplier (latency 2 cycles); divider (latency 2 cycles). a) Construct a Control Data Flow Graph for the algorithm diff. [2] b) Perform an ASAP and an ALAP scheduling on your CDFG for the lowest possible latency constraint, stating the ASAP time, ALAP time, and mobility of each node. [4] c) State the minimum overall latency of this design. [1] You wish to schedule this behaviour given that at most 1 multiplier, 1 divider, and 4 adder/subtractors can be used. d) Name an appropriate algorithm to solve this problem. [1] e) Solve this scheduling problem, stating the scheduled start time of each node and the overall latency required. [6] To find the optimal solution to a general scheduling problem for a DFG G(V,E), it is proposed to use Integer Linear Programming. You are given a set of binary variables $x_{vt}$ for each $v \in V$ and time-step t, equal to 1 iff operation v is scheduled at time step t; a latency (delay) $d_v$ for each node $v \in V$ ; a function $T: V \to R$ , and an upper-bound $a_r$ on each resource type $r \in R$ . You also know the ASAP and ALAP times ASAP<sub>v</sub> and ALAP<sub>v</sub> for each node $v \in V$ , and the overall latency constraint $\lambda$ . f) Formulate the constraint "each operation must have a unique start time" as a set of linear constraints in $x_{vt}$ . [2] g) Formulate the data dependency constraints as a set of linear constraints in $x_{vt}$ . [2] h) Formulate the constraint "no more than $a_r$ resources of type r can be used" as a set of linear constraints in $x_{vt}$ . [2] Some pseudo-code for a main program main(), and two functions sin() and cos(), is shown below. ``` main() { read y; x = 2*y; for i = 1 to 2 { read y; c = y > 0; if( c ) { x = x + sin( y ); } else { x = x + cos( y ); } } } sin( x ) { return ( x - (0.167*x)*(x*x) ); } cos( x ) { return ( 1 - 0.5*x*x ); } ``` a) Create a CDFG for the above code, representing read calls as read nodes, and give each node a unique label. [6] You have a library of two resource types: ALU (capable of performing addition, subtraction, and comparison); and multiplier (capable of performing multiplication). b) Derive an ASAP schedule for this code, under the assumption that all operations, reads and writes take one cycle to execute, while "start task" and "end task" nodes take zero cycles. For each labelled node, state the ASAP schedule time(s), noting that some nodes will be scheduled in more than one cycle. [6] ## [ Question 2 continues on page 3 ] ## [Continuation of Question 2] It is suggested that the performance of the design may be improved by repeating the body of the loop, instead of using a loop construct. The equivalent pseudo-code is shown below. ``` main() { read y; x = 2*y; read y1; c1 = y1 > 0; if(c1) { x1 = x + \sin(y1); } else { x1 = x + cos(y1); read y2; c2 = y2 > 0; if(c2) { x2 = x1 + \sin(y2); } else { x2 = x1 + \cos(y2); } sin(x) { return (x - (0.167*x)*(x*x)); cos(x) return ( 1 - 0.5*x*x ); ``` c) Derive the minimum overall latency for the modified code, and account for any difference in performance. [8] 3. a) Describe the initialisation problem in retiming. Some pseudo-code is shown below. ``` bq1 = 0; bq2 = 0; cq = 0; while( true ) { read x; a = 2*x; c = a + bq2; b = a*c; write cq; bq2 = bq1; bq1 = b; cq = c; } ``` b) Construct a delay-weighted DFG to represent the inner loop of this code, using nodes of type "\*", "+", "r" (read) and "w" (write). c) Given the symbols shown in Figure 1, formulate the retiming problem for a general delay-weighted DFG G(V,E) as an integer linear program. L - length of the longest combinational path in the DFG $s_v$ - length of the longest combinational path up to the input of node v d(v) - the combinational delay of node v $x_{u,v}$ - binary decision variable. $x_{u,v} = 1$ iff $w_r(u,v) = 0$ (equal to v) $v_v = 0$ (for $v_v = 0$ ) $v_v = 0$ (equal to $v_v = 0$ ) $v_v = 0$ (equal to $v_v = 0$ ) $v_v = 0$ (equal to $v_v = 0$ ) $v_v = 0$ (equal to $v_v = 0$ ) $v_v = 0$ (equal to $v_v = 0$ ) $v_v = 0$ (equal to $v_v = 0$ ) $v_v = 0$ (equal to $v_v = 0$ ) $v_v = 0$ (equal to $v_v = 0$ ) $v_v = 0$ (equal to $v_v = 0$ ) $v_v = 0$ (equal to $v_v = 0$ ) $v_v = 0$ (equal to $v_v = 0$ ) $v_v = 0$ (equal to $v_v = 0$ ) $v_v = 0$ (equal to $v_v = 0$ ) $v_v = 0$ (equal to $v_v = 0$ ) $v_v = 0$ (equal to $v_v = 0$ ) $v_v = 0$ (equal to $v_v = 0$ ) $v_v = 0$ (equal to $v_v = 0$ ) $v_v = 0$ (equal to $v_v = 0$ ) $v_v = 0$ (equal to $v_v = 0$ ) $v_v = 0$ (equal to $v_v = 0$ ) $v_v = 0$ (equal to $v_v = 0$ ) $v_v = 0$ (equal to $v_v = 0$ ) $v_v = 0$ (equal to $v_v = 0$ ) $v_v = 0$ (equal to $v_v = 0$ ) After retiming the code shown above, the delay-weighted DFG shown below is obtained. d) Derive the pseudo-code corresponding to the retimed DFG. [6] [2] [6] A polynomial g(x) of order n can be expressed as a linear combination of basis functions $\{\phi_i(x)\}$ , where $\phi_i$ is a polynomial of order i (Equation 1). $$g(x) = \sum_{i=0}^{n} a_i \phi_i(x)$$ (Equation 1) a) Given the inner-product definition of (Equation 2), prove that (Equation 3) defines the least-squares approximation to f(x) under the condition that $\phi_i$ and $\phi_j$ are orthogonal for $i \neq j$ . $$\langle f, g \rangle = \int_{-1}^{1} f(x)g(x)dx$$ (Equation 2) $$a_i = \frac{\langle f, \phi_i \rangle}{\langle \phi_i, \phi_i \rangle}$$ (Equation 3) The Legendre polynomials are defined by (Equation 4). $$\phi_i(x) = \frac{1}{2^i i!} \frac{d^i}{dx^i} \{ (x^2 - 1)^i \}$$ (Equation 4) b) Derive a first order least-squares polynomial approximation to the $sin(\pi x)$ function for x over [-1,1] as a weighted sum of Legendre polynomials. [4] c) Construct the CDFG corresponding to the Horner's scheme evaluation of a 3rd order polynomial. [2] The polynomial is to be evaluated using multiplier and adder resources, each of which has unit latency. d) Schedule this Horner's scheme evaluation (i) with ASAP scheduling, and (ii) under the resource constraints of one multiplier and one adder. Provide the start time for each operation. [4] An alternative scheme for evaluating a 3rd order polynomial is to use the code $v = (c_0 + c_1 * x) + (x * x) * (c_2 + c_3 * x)$ . e) Schedule the evaluation under this alternative scheme (i) with ASAP scheduling and (ii) under the resource constraints of one multiplier and one adder. Provide the start time for each operation. [4] 5 of 6 5. Some pseudo-code for a memory-mapped co-processor responsible for calculating a quadratic function is shown below. read(x) corresponds to a read of the address bus, and write(y) corresponds to a write to the data bus. signal\_done() corresponds to asserting an external "done" signal. ``` read(x); y = x*x + 3*x + 5; signal_done(); write(y); ``` This behaviour is to be implemented using multiplier resources of latency 2 cycles and adder resources of latency 1 cycle. read, write, and signal\_done have zero latency. a) Construct a control/data flow graph (CDFG) for this code, including nodes for "read" and "write", and "signal\_done". [5] b) Perform an ASAP scheduling on this CDFG. [4] The bus specification requires data to be written to the data bus at least two cycles and at most five cycles after the address is read. In addition it is required that the "done" signal must be asserted exactly one clock cycle before the output is written to the data bus. c) Construct an edge-weighted graph corresponding to this computation, such that the longest-path from the source node to each other node forms the ASAP time of the latter node under the above timing constraints. [4] d) Perform an ASAP scheduling under these timing constraints. [4] Your project manager asks you to additionally ensure that the "done" signal is produced no later than the first addition is calculated. e) Re-draw the edge-weighted graph from part (c) to incorporate this constraint. [1] f) By reference to a cycle in the graph, demonstrate that your manager is asking the impossible. [2] 1. a) [2 MARKS] $\bigcirc$ | b) | Node | ASAP | ALAP | HOSILITY | |----|-----------------|--------------|---------------|--------------| | | とっているようななななること | 002223346781 | 003424546780 | 001201200000 | | | | | | [4 MARKS] | | c) | 10 CYCLES | | | [ MARK] | | 4) | RESOURCE - CONS | araneo les | T SCHEPULTING | [ 1 MARK] | e) CYCLE READY LIST SCHEDULED p. 2 of 1) 0 (4,3 E43 2345678 (Vs) (Vs) (V7) [or {va3] EUG. U83 = 13 cycles OVERALL LATENCY f) Y VEV, B Z XVt = 1 g) $\forall (v_1, v_2) \in E$ , $\frac{Aur^2v_1}{\sum_{i} t \cdot x_{i}t} + dv_i \leq \frac{Aur^2v_2}{\sum_{i} t \cdot x_{i}t}$ t=ASAPV. t = ASAPV2 h) Yt & {o, ..., \}, \r & R, $\circ$ 2. a) | NODE | | A: | SAP | TIME(s) | |------------|----|----|-------------|---------| | Vo | | | | | | V, | | | | | | $V_2$ | | • | 1 2 | | | Vz | | | | | | √لب | | | jι | + | | ٧s | | | 4 | ,10 | | <b>U</b> 6 | | | ب | , 10 | | V٦ | | | > | , " | | Vg | | | 6, | 12 | | Va | | | 7, | 13 | | Vio | | | 4 | 10 | | Ý,, | | | 4, | 10 | | V12 | | | 7, | 13 | | V13 | | | β, | 14 | | V/4 | | | 2, | 8 | | VIS | | | 2 | 8<br>9 | | V16<br>V17 | 17 | | رد<br>ما | 10 | | Vie | | | ₹. | 14 | | Via | | | ر ما<br>رما | 10 | | Vro | | | ` | , 10 | | V21 | | | | (11 | | Vri | | | 8. | 14 | | V23 | | | 4 | 10 | | V24 | | | 4 | ,10 | | U25 | | | ۴, | 10 | | V26 | | | . 5, | f ( | | V27 | | | 6, | 12 | | Yze | | | 7, | 13 | | | | | | | [\$6 MARKS] MOTTOM Ą Scofe (o CYCLES INCHERASED CATENCY 11 OVERAL CA OUFFERENCES OUT OF LOOP. (io) # COPE PARALLELISM ONE TO [8 MARKS] PART (a) B PER BAANCHES (2) 4 ALTERNATIVE (e) E << TIMES ASAP 2 H (c) MENIFICIE L SUSSECT TO SU 7, Su + d(u) + $$\pi_{uv}N$$ for all $(u,v) \in E$ $W_r(u,v) \leq \pi_{u,v}P$ for all $(u,v) \in E$ $S_v + d(v) \leq L$ for all $v \in V$ $W_r(u,v) = W(u,v) + r(v) - r(u)$ for all $(u,v) \in E$ $W_r(u,v) \geqslant 0$ for all $(u,v) \in E$ $W_r(u,v) \geqslant 0$ for all $(u,v) \in E$ [6 MARKS] 4. (a) Let $$E = Error$$ . Then $E = \int_{-1}^{1} (f(x) - g(x))^2 dx$ $$= \int_{1}^{2} f(x) - \sum_{i=0}^{n} a_i \phi_i(x) \int_{1}^{2} dx$$ $$= \int_{1}^{2} f^{2}(x) - 2 \int_{1=0}^{n} a_i \int_{1}^{1} f(x) \phi_i(x) dx + \sum_{i=0}^{n} \int_{1=0}^{n} a_i a_i \int_{1}^{2} f^{2}(x) dx$$ $$= \int_{1}^{2} f^{2}(x) - 2 \int_{1=0}^{n} a_i \int_{1}^{2} f(x) \phi_i(x) dx + \sum_{i=0}^{n} \int_{1}^{n} a_i a_i \int_{1}^{2} f^{2}(x) dx$$ $$= \int_{1}^{2} f^{2}(x) - 2 \int_{1=0}^{n} a_i \int_{1}^{2} f(x) dx + \int_{1=0}^{n} \int_{1}^{n} a_i a_i \int_{1}^{2} f^{2}(x) dx$$ $$= \int_{1}^{2} f^{2}(x) - 2 \int_{1=0}^{n} a_i \int_{1}^{2} f(x) dx + \int_{1=0}^{n} \int_{1}^{n} a_i a_i \int_{1}^{2} f^{2}(x) dx$$ $$= \int_{1}^{2} f^{2}(x) - 2 \int_{1=0}^{n} a_i \int_{1}^{2} f(x) dx + \int_{1=0}^{n} \int_{1}^{n} a_i a_i \int_{1}^{2} f^{2}(x) dx$$ $$= \int_{1}^{2} f^{2}(x) - 2 \int_{1}^{2} a_i \int_{1}^{2} f(x) \int_{1}^{2} dx \int_{1}^{2} f(x) \int_{1}^{2} dx \int_{1}^{2} f(x) \int_{1}^{2} dx \int_{1}^{2} f(x) \int_{1}^{2} dx \int_{1}^{2} f(x) \int_{1}^{2} dx \int_{1}^{2} f(x) \int_{1}^{2}$$ Set $$\frac{\partial \mathcal{E}}{\partial a_i} = 0$$ . Then $$Q_i = \frac{\langle f, \phi_i \rangle}{\langle \phi_i, \phi_i \rangle}$$ [6 MARKS] (b) $$\phi_{0}(x) = 1$$ $$\phi_{1}(x) = \frac{1}{2} \frac{1}{6x} \left\{ x^{2} - 1 \right\}$$ $$= \frac{1}{2} (2x) = x$$ $$\alpha_{0} = \int_{-1}^{1} \sin \pi x \, dx = -\frac{1}{\pi} \left[ \cos \pi x \right]_{-1}^{1} = -\frac{1}{\pi} \left\{ (-1) - (-1) \right\} = 0$$ $$\alpha_{1} = \int_{-1}^{1} x \sin \pi x \, dx = \left[ -\frac{\pi}{\pi} \cos \pi x \right]_{-1}^{1} + \frac{1}{\pi} \int_{-1}^{1} \cos \pi x \, dx$$ $$= -\frac{1}{\pi} \left[ -1 - 1 \right]_{-1}^{1} - \frac{1}{\pi^{2}} \left[ \sin \pi x \right]_{-1}^{1}$$ $$= \frac{2}{\pi}$$ $$\delta_{0} \quad g(x) \approx \int_{-1}^{1} \sin \pi x \, dx \, dx$$ $$g(x) = \frac{2}{\pi} x.$$ [4 MARKS] 4. (c) $$y = c_0 + \alpha \left[ c_1 + \alpha c_2 \right]$$ [2 MARKS] RESOURCE - CONSTRAINED S(v) IDENTICAL TO ASAID (e) | NO | ASAP 00 0 0 0 1 1 2 3 4 | S(v) | RESOURCE -CONSTRAINE<br>0<br>1<br>2<br>0<br>2<br>1<br>3<br>4<br>5 | ED S(v) | |-----------------|----------------------------------------------------|------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------| | CYCLE 0 1 2 3 4 | (ANOIDATE (Vo, Vo, Vo, Vo, Vo, Vo, Vo, Vo, Vo, Vo, | 3 | SCHEDULED SET<br>{V <sub>2</sub> }<br>{V <sub>0</sub> , V <sub>4</sub> }<br>{V <sub>1</sub> , V <sub>3</sub> }<br>{V <sub>5</sub> }<br>{V <sub>6</sub> }<br>{V <sub>4</sub> } | [cour be N] | [5 MARKS] b) NOOE Vo V12 V3 V4 V5 V6 V8 ASO 00023404 [4 MARKS] c) [4 MARUS] 5. (d) NOTE ASAP Vo V1 V2 V3 V3 V4 V5 V6 V7 V8 [4 MARKS] (e) .' [ MARK] (f) CYCLE MARKED W/ (4) ABOVE HAD POSITIVE TOTAL WEZGHT => NO LONGEST PATH OF FINITE LENGTH. [2 MARKS.]