# FIR Filters With Reduced Word-Length

Artur Wróblewski\*

Marek Wróblewski\*

Josef A. Nossek\*

Abstract — In this paper architectural modifications to standard FIR filters are proposed. They are based on the fact, that most of filter coefficients as well as processed data do not take advantage of the full word-length given in the filter specification. Especially small coefficients and numbers can be shifted before processing and back-shifted afterwards, while the multiplication itself can be performed with smaller word-length. This allows to decrease the size of the used multipliers and thus save required area and reduce power consumption.

#### 1 INTRODUCTION

One of the most used subsystems in digital signal processing is multiplication. In many applications, e.g. in digital filtering, multipliers are responsible for a large percentage of power consumed by a circuit. Moreover, in applications where speed is important, parallel implementation is often preferred, which results in high area and thus increased cost of manufacturing of the device. Therefore, optimizing multipliers [1] is a widely investigated problem and their number in a specific design, became a benchmark for its usability and price. However, often it can be more advantageous not to concentrate on multipliers only, but to take a broader look at the application they are used for. In this paper we focus on FIR digital filters and propose a method of optimizing their architecture. Digital filtering is widely used in many applications [3] like e.g. mobile communications systems and as much as it is indispensable it also represents a system component with highest power consumption. The properties of FIR filters is defined by filter coefficients. Especially the accuracy of their representation is crucial to preserve both stop-band and pass-band attenuation. However, the required word-length is mainly determined by coefficient's numerical range. While some of them are large, others are few orders of magnitude smaller. The word-length is then chosen in such a way, that the error introduced by truncation of small coefficients does not influence the overall filter performance. Thus, only large coefficients profit from the high numerical range. Their number depends on the order of the filter and is usually much lower than the length of the filter. Nevertheless, for the filtering operation, full size multipliers (multipliers utilizing full word-length) are used for all coefficients. In this paper we propose to reduce the size of the implemented multipliers by shifting filter coefficients as well as the data during the operation. We target reconfigurable filters, where coefficients can change in different modes of operation, like e.g. in Software Defined Radio (SDR)[2]. In Sections 2 and 3 we present the general idea of reducing the multiplier size. Section 4 gives an overview of the hardware realization. It is followed by experimental results and conclusions of Sections 5 and 6.

## 2 THOUGHTS ON DATA PRECISION

Looking at the impulse response of an FIR filter in time domain one can notice that most of the samples (being filter coefficients at the same time) are very small. In many cases only a few are larger than  $2^{-4}$ , as shown below.

```
\begin{array}{l} a[0] = 11111111110001001 \\ a[1] = 11111111001000100 \\ a[2] = 11111110100001010 \\ a[3] = 11111110100001011 \\ a[4] = 0000010110010010 \\ a[5] = 00010001101101101 \\ a[6] = 0001110111001101 \\ a[7] = 0010001011110101 \\ a[8] = 0001110111001101 \\ a[9] = 0001000110110101 \\ a[10] = 000001011001001 \\ a[11] = 1111111010000101 \\ a[12] = 11111111100001001 \\ a[13] = 11111111110001001 \\ a[14] = 11111111110001001 \\ \end{array}
```

Nevertheless, full size multipliers are used for their implementation. The smaller the coefficients, the larger the word-length needed to represent them and higher the precision needed for large coefficients. However, the word-length needed for small coefficients introduces in practice a precision which is too high to represent large coefficients. In fact it is acceptable to truncate few of the least significant bits of large coefficients without impact on filter properties. Figure 1 shows the frequency response of a filter, whose small coefficients have been quantized by 13 bits and the large ones (>  $2^{-4}$ ) by 9 bits only. It is compared to usual implementation with word-length of 13 bits. As can be clearly seen, even if the number of bits in the usual implementation

<sup>\*</sup>Munich University of Technology, Arcisstr. 21, 80290 Munich, Germany, e-mail: Artur.Wroblewski@ei.tum.de



Figure 1: Frequency response, stop-band ripple and passband ripple of a 17 taps low-pass FIR with small coefficients quantized by 13 bits and large coefficients quantized by 9 bits (Truncated) compared to a standard implementation with 13 bits (Full).

has been chosen to be as small as possible (some distortions can be observed in the stop-band: the ripples are not all of the same magnitude), there's virtually no impact on stop-band and passband attenuation in the modified structure with truncated coefficients. Phase response is of course linear in both cases due to filter symmetry.

#### 3 REDUCED MULTIPLIER

Even if in usual approach small coefficients are represented by full number of bits, the effective word-length is much smaller due to large amount of zeros (or ones) for most significant bits which is shown shaded below on the left.

| 1 11111111 0001001                      | 1 1111 0001001 | 0000  |
|-----------------------------------------|----------------|-------|
| 1 111111 001000100                      | 1 11 001000100 | 0000  |
| 1 11111 0100001010                      | 1 1 0100001010 | 0000  |
| 1 111111 010001011                      | 1 11 010001011 | 0000  |
| 0 0000 10110010010                      | 010110010010   | 0000  |
| 0 00 1000110110101                      | 010001101101   | 01 00 |
| 0 00 1110111001101                      | 011101110011   | 01 00 |
| $0\ 0\ 10001011101001\ \longrightarrow$ | 010001011101   | 001 0 |
| 0 00 1110111001101                      | 011101110011   | 01 00 |
| 0 00 1000110110101                      | 010001101101   | 01 00 |
| 0 0000 10110010010                      | 1 11 010001011 | 0000  |
| 1 11111 0100001010                      | 1 1 0100001010 | 0000  |
| 1 111111 001000100                      | 1 11 001000100 | 0000  |
| 1 11111111 0001001                      | 1 1111 0001001 | 0000  |
|                                         |                |       |

It is sufficient to perform multiplications with the not shaded part of the mantissa and thus with reduced word-length. As stated in Section 2 the reduced word-length has to be sufficient to represent large coefficients with required accuracy. Therefore the size of the multipliers used can be reduced only by as many bits as there's no degradation in the performance of the filter due to smaller numerical range. The same considerations apply to incoming data. Thus, both the data and the coefficients can be shifted by the number of bits to be disregarded. as shown above on the right, and then truncated. No additional quantization is introduced for small coefficients since they are always represented with the same accuracy as in the conventional implementation.

Once the numbers have been shifted and multiplied with reduced word-length, the result needs to be back-shifted to obtain the correct result. Figure 2 depicts this idea.



Figure 2: Multiplier with reduced data precision. Data and coefficients are shifted before and back-shifted after multiplication to reduce size of the multiplier. Back-shift information is stored in the shift-register.

Incoming data and coefficients are first fed into adaptive shift operators (ASO), shifted and truncated. The information on the number of shifts performed is then stored in the shift register. It will be needed to correctly perform back-shift operation. The modified numbers are then multiplied and back-shifted. The word-length of the result is the same as for a standard full-size multiplier (sum of data and coefficient word-lengths). Some additional control logic is needed, which for simplicity hasn't been shown in the figure.

#### 4 FILTER IMPLEMENTATION

#### 4.1 "Shifted" structure

Of course in transversal structures like FIR filters, there's no need to shift incoming data before every single multiplication. It is sufficient to perform this operation only once at the very input of the filter. In that case also the word length of the registers (and therefore their size) is reduced. The back-shift operation however has to be performed after every multiplication, before the results will be summed up, to guarantee correct results. This leads to the structure of Figure 3.



Figure 3: FIR filter with reduced word-length multipliers and registers as well as adaptive shift operators.

There are two shifters at the input of the circuit - one for data and another one for coefficients. Shifted data is first stored in an additional register, which makes up for a glitch barrier. All registers in the circuit have reduced word-length. Information on performed shifts is stored in the bank of shift registers (SR). There's one shift register for every state. Even if the coefficients do not change for a given mode of operation, the values of the shift register have to be updated with every incoming data word. These values control the back-shifters placed after every single multiplier. The resulting data is then summed up as in a usual FIR filter.

## 4.2 "Blocks" structure

In the above described filter implementation (below referred to as "Shifted") all multipliers became smaller, but their size depends on the acceptable error for large coefficients. Most of the coefficients however are still represented with much higher precision than necessary. To avoid this we introduce two sets of multipliers of different size. First of them is dedicated to large coefficients and comprises multipliers of reduced size as described

above. In the second one a further reduction in coefficient word-length can be applied and thus the size of the multipliers further reduced. As will be shown in Section 5, their word-length could be decreased by another 2-5 bits, which results in even more area and power savings. Unfortunately, data word-length cannot be reduced anymore in order to achieve accurate results. In dedicated hardware the number of small and large multipliers is fixed and cannot be changed. To enable the filter to adapt to different modes of operation, some logic overhead is required. The task of this logic block is to correctly assign large and small coefficients and the data (states) stored in registers to appropriate multipliers. This structure will be here referred to as "blocks" and is shown in Figure 4.



Figure 4: FIR filter with two blocks of multiplier-arrays.

Of course there must be a sufficient number of large multipliers available to guarantee results that are accurate enough to preserve filter properties. In our experiments for most filters no more than 11 of them were required. In high order filters this number can be traded off for the size of small multipliers. The higher the number of large multipliers, the more reduction in word-length of small multipliers and more savings in area and power consumption.

## 5 EXPERIMENTAL RESULTS

To test the proposed structure its HDL description has been written and compared to a similar description of the conventional filter. Filter coefficients have been obtained from commercially available tools. In the first step the minimum word length for standard filter has been determined in such a way, that there was no degradation in the frequency impulse response (as explained in Sec-

| Filter | FIR   | Shifted | Shifted | Blocks  | FIR      | Shifted | Blocks  | FIR  | Shifted | Small |
|--------|-------|---------|---------|---------|----------|---------|---------|------|---------|-------|
| Taps   | Power | Power   | Power   | Power   | Area     | Area    | Area    | bits | bits    | Mult  |
| (Type) | (mW)  | Savings | Savings | Savings | $(mm^2)$ | Savings | Savings |      |         | Bits  |
|        |       | 1000    | RRC     | RRC     |          |         |         |      |         |       |
| 15(AP) | 8.3   | 34%     | 28 %    | n/a     | 0.15     | 27 %    | n/a     | 10   | 7       | n/a   |
| 15(BP) | 14.7  | 22%     | 18 %    | 20 %    | 0.28     | 20 %    | 16 %    | 14   | 11      | 8     |
| 17(LP) | 18.0  | 34%     | 32 %    | 37 %    | 0.27     | 21 %    | 21 %    | 13   | 10      | 6     |
| 21(LP) | 91.3  | 10 %    | 8 %     | 46 %    | 1.01     | 15 %    | 36 %    | 23   | 20      | 15    |
| 25(LP) | 48.7  | 34%     | 31 %    | 40 %    | 0.61     | 25 %    | 26 %    | 16   | 11      | 9     |
| 31(BS) | 60.5  | 22%     | 17 %    | 34 %    | 0.76     | 15 %    | 21 %    | 16   | 13      | 8     |
| 75(LP) | 273   | 25%     | 19 %    | 23 %    | 2.84     | 24 %    | 26 %    | 19   | 15      | 13    |

Table 1: Comparison of the two architectures. FIR - standard filter realization, Shifted and Blocks - proposed structures with reduced multipliers. FIR power consumption given for decimator output stimuli. Simulations performed with PowerMill with  $0.18\mu m$  technology at 1.6V. Area estimation from Synopsys Design Analyzer.

tion 2). Afterwards, in the same manner, the minimum size of reduced multipliers for the modified structure has been obtained. Then a set of test data, consisting of an impulse response of a 150 taps FIR filter, has been fed into both filters and their frequency response compared and the size of reduced multipliers increased if needed. Both filters have then been synthesized and afterwards simulated with PowerMill with two sets of data. First of them comprised 1000 random data vectors. Second one was a response of a 6stage decimator[4]. Several thousands oversampled Root-Raised-Cosine (RRC) symbols (commonly encountered in communications systems, here without noise however) have been fed to the input of the decimator and its response forwarded to the filter under test. The response of the decimator has been calculated with double precision. Both "shifted" and "blocks" structures have been simulated. The results are summarized in Table 1. Obviously better results can be obtained for random data. Reason for that is that the numbers are uniformly distributed between 0..1, with a mean value of approx. 0.5, while it is not the case for RRC symbols (mean value approx. 0.33). Since for RRC noise has not been take into account, it can be expected that for real signals the power savings will be bound by these two results. Also in these simulations no degradation in the frequency response of the output signal has been observed, when compared to filters with full-size multipliers. For 15 taps all-pass filter no improvements for "blocks" structure compared with "shifted" has been observed. This is a nonsymmetrical filter, whose coefficients are all  $> 2^{-5}$ . Therefore, no further reduction in word-length is possible here.

#### 6 CONCLUSION

In this paper a method of realizing FIR filters with reduced word-length has been presented. Backed up by numerous experimental results, it can be stated that for FIR digital filters a reduction in word-length by 3 to 5 bits can be achieved with no negative influence on filter performance ("shifted" structure). For small numbers additional reduction in word-length can be achieved utilizing the "blocks" structure. The shift operations are performed and calculated online depending on the mantissa of the incoming data and filter coefficients. Although additional control logic and adaptive shift operands have to be implemented, it could be shown, that considerable savings in power consumption (up to 40 %) and area (up to 36 %) can be achieved. Due to extreme importance of digital filters in a variety of applications, the proposed method represents an advantageous alternative for their low-power implementation. Moreover the method can be applied without additional optimization tools. Thus it can be easily employed in the standard design flow.

# References

- K. Hwang. Computer Arithmetic. John Wiley & Sons, New York, 1979.
- [2] J. Mitola. Software Radio Architectures. Wiley-Interscience, New York, 2000.
- [3] L. R. Rabiner and B. Gold. Theory And Application of Digital Signal Processing. Prentice-Hall, Inc. Englewood Cliffs, New Jersey, 1975.
- [4] A. Wróblewski and J. A. Nossek. Filter Structures For Decimation: A Comparison. Proc. IEEE Int. Symp. on Circuits and Systems, ISCAS'2003, Bangkok, May 2003.