## Signal Processing for Wireless Communications and Multimedia: Design, Tools, Architectures Advanced Digital System Design Course 2006, EPF-L

Prof. Heinrich Meyr RWTH Aachen University , Germany and Chief Scientific Officer, CoWare Inc

| ISS  | Agenda                                                                                                                                                                                                                                                                                                                                                          |   |
|------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---|
|      | <ul> <li>Future Wireless Communication System</li> <li>Future Wireless Communication Systems and ist Impact on ESL</li> <li>The End of Moore's Law</li> <li>Receiver Structure, Models and Performance Metrics</li> <li>Massive Parallel Processing on heterogeneous MPSoC</li> <li>Application Specific Processors</li> <li>Summary and Conclusions</li> </ul> |   |
| RWTH | AACKEN                                                                                                                                                                                                                                                                                                                                                          | 2 |



















































































| 1 ISS       |                           | Complexity DVB                  | -S                  |       |
|-------------|---------------------------|---------------------------------|---------------------|-------|
|             |                           | Area<br>(cell area without RAM) | Lines VHDL          |       |
|             | Timing &<br>Carrier Sync. | 32 %                            | 7000<br>(+1000 .dc) |       |
|             | Viterbi Dec.              | 40 % (+RAM 15 mm^2)             | 4000<br>(+ 340 .dc) |       |
|             | Frame Sync.               | 1.5 %                           | 700                 |       |
|             | Deinterleaver             | 2 % (+RAM 1.5)                  | 640                 |       |
|             | RS Decoder                | 23 % (+RAM 1.4)                 | 5400<br>(+ 630 .dc) |       |
|             | Descrambler               | 1 %                             | 360                 |       |
|             | System                    | 100 %                           | 18100               |       |
| Sour        | ce: Digital Commur        | ication Receivers, H. Meyr, M.  | Moeneclaey, S.A. Fe | chtel |
| RWITHAACHEN |                           |                                 |                     | 49    |



| 1<br>ISS | DVB-T Specifications                                  |    |
|----------|-------------------------------------------------------|----|
|          | Digital terrestrial video broadcasting:               |    |
|          | high symbol rates: up to 7.4 Msym/s                   |    |
|          | sensitive modulation: 4 - 64 QAM                      |    |
|          | net bit rate up to 31.67 Mb/s                         |    |
|          | wide range of channels: (AWGN) 0 < Tau < 224 Os (SFN) |    |
|          | error correction:                                     |    |
|          | outer coder: Reed Solomon (204,188)                   |    |
|          | inner code: punctured convolutional                   |    |
|          | BER < 10e-9 (after RS)                                |    |
|          | 3dB < Es/No < 40 dB                                   |    |
|          | Challenges: > 200 transmission modes                  |    |
|          | <ul> <li>algorithms</li> </ul>                        |    |
|          | <ul> <li>design methodology</li> </ul>                |    |
| RWTH     | AACHEN                                                | 51 |











| <b>1</b> ISS | Parallel Computing in Mobiles                             |                             |                          |                          |    |  |  |  |  |
|--------------|-----------------------------------------------------------|-----------------------------|--------------------------|--------------------------|----|--|--|--|--|
|              | Massive Parallelism required<br>in the foreseeable future |                             |                          |                          |    |  |  |  |  |
|              | 2003 2009 2013                                            |                             |                          |                          |    |  |  |  |  |
|              | Frequency<br>(MHz)                                        | 300                         | 600                      | 1500                     |    |  |  |  |  |
|              | Giga<br>Operations                                        | 0,3                         | 14                       | 2458                     |    |  |  |  |  |
|              | Operations<br>per Cycle                                   | 1                           | 23                       | 1638                     |    |  |  |  |  |
|              | S                                                         | ource: International Techno | logy Roadmap for Semicor | nductors (ITRS, TX 2003) |    |  |  |  |  |
| RWTHAA       | KEV                                                       |                             |                          |                          | 58 |  |  |  |  |

















































| <b>TISS</b> | From Function to Algorithm Classes                                                 |    |
|-------------|------------------------------------------------------------------------------------|----|
|             | <ul> <li>Butterfly unit</li> </ul>                                                 |    |
|             | <ul> <li>Viterbi &amp; MAP decoder</li> </ul>                                      |    |
|             | <ul> <li>MLSE equalizer</li> </ul>                                                 |    |
|             | <ul> <li>Eigenvalue decomposition (EVD)</li> </ul>                                 |    |
|             | <ul> <li>Delay acquisition (CDMA)</li> </ul>                                       |    |
|             | <ul> <li>MIMO Tx processing</li> </ul>                                             |    |
|             | <ul> <li>Matrix-Matrix &amp; Matrix-Vector Multiplication</li> </ul>               |    |
|             | <ul> <li>MIMO processing (Rx &amp; Tx)</li> </ul>                                  |    |
|             | <ul> <li>LMMSE channel estimation (OFDM &amp; MIMO)</li> </ul>                     |    |
|             | <ul> <li>Iterative (Turbo) Decoding</li> </ul>                                     |    |
|             | <ul> <li>Message Passing Algorithm , LDPC Decoding</li> </ul>                      |    |
|             | <ul> <li>CORDIC</li> </ul>                                                         |    |
|             | <ul> <li>Frequency offset estimation (e.g. AFC)</li> </ul>                         |    |
|             | <ul> <li>OFDM post-FFT synchronization (sampling clock, fine frequency)</li> </ul> |    |
|             | <ul> <li>FFT &amp; IFFT (spectral processing)</li> </ul>                           |    |
|             | <ul> <li>OFDM</li> </ul>                                                           |    |
|             | <ul> <li>Speech post processing (noise suppression)</li> </ul>                     |    |
|             | <ul> <li>Image processing (not FFT but DCT)</li> </ul>                             |    |
| RWTH        | AEGUEV                                                                             | 83 |

| <b>i</b> iss | Decoder for Convolutional Codes                                                                                                                                                                                                                                                                                                                                                                                                                          |                                             |                                                                                                                                                                                                                    |                                                                                  |          |                                                                    |
|--------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------|----------|--------------------------------------------------------------------|
|              | transition matrix calculation and matrix vector multiplication       Image: Constraint of the sector multiplication and soft output calculation         transition matrix calculation, matrix vector multiplication and soft output calculation       Image: Constraint of the sector multiplication and soft output calculation $\underline{x}_{k+1} = \begin{bmatrix} x_{1,k+1} \\ x_{2,k+1} \end{bmatrix} = \begin{bmatrix} a_1 \\ a_2 \end{bmatrix}$ | forwar<br>recurs<br>backw<br>recurs<br>Late | $\begin{bmatrix} z_{0} \\ 0 \\ 0 \end{bmatrix} = \begin{bmatrix} z_{0} \\ 0 \\ 0 \end{bmatrix}$ ard ion incy: ~ 2N $\begin{bmatrix} x_{1,k} \\ x_{2,k} \end{bmatrix} = \begin{bmatrix} x_{1,k} \\ 0 \end{bmatrix}$ | $\begin{bmatrix} a_{11,k} \otimes x_{1} \\ a_{21,k} \otimes x_{1} \end{bmatrix}$ | symbols  | ×<br>×<br>×<br>×<br>×<br>×<br>×<br>×<br>×<br>×<br>×<br>×<br>×<br>× |
|              | OPER                                                                                                                                                                                                                                                                                                                                                                                                                                                     | ATIONS                                      | MAP                                                                                                                                                                                                                | LOGMAP                                                                           | VITERBI  |                                                                    |
|              | x                                                                                                                                                                                                                                                                                                                                                                                                                                                        | ∋ y                                         | x + y                                                                                                                                                                                                              | $\log_e[e^x + e^y]$                                                              | max(x,y) |                                                                    |
|              | x                                                                                                                                                                                                                                                                                                                                                                                                                                                        | ⊗ y                                         | х·у                                                                                                                                                                                                                | x + y                                                                            | x + y    |                                                                    |
| RWTH         | ACHEN                                                                                                                                                                                                                                                                                                                                                                                                                                                    |                                             |                                                                                                                                                                                                                    |                                                                                  |          | 84                                                                 |

































































| System                                | Athlon XP 3000+                                                      | Retinex ASIP<br>mapped on FPGA                                                |
|---------------------------------------|----------------------------------------------------------------------|-------------------------------------------------------------------------------|
| Design Flow                           | plain C-application,<br>compiled with gcc,<br>executed on AMD Athlon | Optimized ASIP and<br>handwritten<br>assembly program<br>(~100 lines of code) |
| Frequency                             | 2100 MHz                                                             | 16 MHz                                                                        |
| Computation time<br>(Picture 513x385) | ~ 3000 ms                                                            | 593 ms<br>~ 20 % of<br>Athlon run-time                                        |











| Initial Model 4 weeks                                                                                             |
|-------------------------------------------------------------------------------------------------------------------|
|                                                                                                                   |
| Design Space Analysis 3 weeks                                                                                     |
| Design Space Exploration 4 weeks                                                                                  |
| - Address Calculation 1 week<br>- Non-delayed Branches 1 week<br>- Timing Improvement ½ week<br>- Others 1½ weeks |
| Translation Script 5 weeks                                                                                        |
| Move Elimination 2 week                                                                                           |
| Verification Script 5 weeks                                                                                       |
| Synthesis & FPGA Mapping 1 day<br>FPGA System (one time effort)                                                   |
|                                                                                                                   |
| RWITHAACHEN                                                                                                       |





# PHILIPS

Processor Designer in a video deblocking unit













#### PHILIPS

## **Results**

- Architecture far from the initial RISC
- Target of 166 MHz easily reached
- Size comparable to a all RTL design (processor = 50 kgates)
- Performances reached
- IP taped out in a Set Top Box chip

## **Next steps**

- No problem met yet on prototype
- Make the block more generic to handle others standards

Semiconductors

| PHILIPS                    |                             |                          |                |                    |  |  |
|----------------------------|-----------------------------|--------------------------|----------------|--------------------|--|--|
| Planning                   |                             |                          |                |                    |  |  |
| 8 weeks                    | 2 weeks                     | 4 weeks                  | 2 weeks        | 5 weeks            |  |  |
| Application<br>development | Lt_risc_32p5<br>integration | Use of pixel<br>memories | Pin interfaces | Optimisations<br>+ |  |  |
| Step 1                     |                             | Step 2                   |                | Step 3             |  |  |
| Semiconductors             |                             |                          |                | 151                |  |  |

#### PHILIPS

## Conclusion

- Con
- Long learning
- First use -> rough estimate of time needed

#### + Pro

- RTL and SystemC always consistent (=> most of the validation can be run on SC)
- Faster than writing independent SC and RTL models
- Fast exploration of architecture choices
- Use of firmware :
  - can be generic
  - C debug
  - If program ram : fixes and feature changes can be downloaded
- No royalties

Semiconductors

