A typical multi-beam sonar system consists of a large sensor array followed by a digital beamformer that spatially analyzes the received data. Figure 1.1 shows the block diagram of such a system.
In order to process the sensors' data values digitally, the signals are first converted into the digital domain. Before the received signals can be digitized, they have to be amplified and lowpass-filtered in order to avoid aliasing.
In the past, the conversion to the digital domain has been accomplished by off-the-shelf analog-to-digital (A/D) converters with multi-bit outputs, which subsequently have been fed into the digital beamformer. For systems with a higher number of channels (say more than 50), issues like hardware cost, dimensionality, and electrical problems gain importance.
A more cost and space effective way of implementing the analog-to-digital conversion is to use Delta-Sigma (DS) modulators that are specially designed to acquire the sensors' output signals. The modulators' pulse density modulated (PDM) output comprises the oversampled input signals and wideband noise (off-band), which stems from the quantization process. In order to obtain wideband output signals, which can be further processed by either a beamformer or additional filters for narrowband beamforming, the DS modulator output signals have to be digitally low-pass filtered and decimated. Figure 1.2 shows one channel of such a system in form of a block diagram.
The purpose of this thesis is to design and implement a digital decimation filter for a monolithic sonar receiver.
In Chapter 3, we will discuss the design of a linear-phase FIR filter based on specifications derived from the requirements of the sonar receiver. After designing the filter using the Signal Processing Toolbox [1], we will verify the performance with MATLAB simulations. We will then develop a basic hardware concept for the filter. We will show that, because of the 1-bit format of the input signal and the fact that the output signal is decimated, the hardware effort can be significantly reduced. With a script written in MATLAB, we will perform a behavioral simulation of the filter impulse response based on the developed hardware concept.
In Chapter 4, an FPGA implementation of the digital decimation filter is proposed. The basic building blocks are explored in detail, and simulations are performed to observe the behavior of the implementation. By simulating the impulse response of the filter and comparing the results to the simulations obtained in Chapter 3, we will be able to verify the correct function of the implementation.
Chapter 5 will summarize the resulting work and explain the steps toward a commercialized version of the digital decimation filter.
| (2.1) |
| (2.2) |
where the cycle time of the sampling frequency is equated to unity. Transforming these equations into the z-domain yields:
| (2.3) |
| (2.4) |
By eliminating W(z) and solving for Y(z), we obtain
| (2.5) |
According to this result, the modulator digitally differentiates the quantization error while leaving the signal unchanged, except for a delay. Thus, the modulator can be characterized by two transfer functions: a signal transfer function (STF) defined as the ratio Y/X (in the absence of noise) and a noise transfer function (NTF) defined as Y/N (setting X = 0).
| (2.6) |
If the quantization error represents the dominant system noise source, the total mean square in-band noise at the output of the modulator loop can be computed as
| (2.7) |
where BW represents the effective bandwidth, and eq2 corresponds to the input noise power. When we replace the signal bandwidth BW by the ratio [(fs)/ 2 OSR], the expression is more practical and depends on three parameters only. Furthermore, since f/fs << 1 in the passband region of the modulator, we can approximate the sine function by its argument (pf/fs). The in-band noise power present at the output can now be written as
| (2.8) |
| (2.9) |
As we see from (2.9), the output noise power strongly depends on the OSR, beyond the quantization error. Each doubling of the oversampling ratio reduces the output noise power level by 9 dB and provides 1.5 bits of extra resolution. If we increase the order of the system by cascading m identical cells, the output noise power can even be further reduced [2].
Figure 2.4 shows a block diagram of a digital decimation filter, sometimes referred to as a down-sampling filter. First, the signal is fed into a digital low-pass filter which approximates the ideal characteristic
| (2.10) |
| (2.11) |
| (2.12) |
| (2.13) |
Due to the out-of-band components of the modulated signal, abrupt low-pass filters are needed. Such filters are expensive to build for elevated sampling rates and thus the implementation has to be analyzed carefully. Kusch [4] investigates several filter structures and shows their trade-offs.
Since it is crucial in beamforming applications to preserve the phase information of received signals, the decimation filter has to yield a constant group delay for all frequencies, i.e., linear phase. Due to their nonrecursive structure FIR filters are always stable and, if the coefficient sets are symmetric, provide linear phase.
| ||||||||||||||
| (2.15) |
| (2.16) |
| (2.17) |
To produce a linear phase response the constraint is simply that the finite-duration impulse response has conjugate-even or conjugate-odd symmetry about its midpoint. To see that this constraint ensures linear phase, consider the FIR system function (2.17) with
|
|
By applying a constant input x[n] = 1 to these filters, we observe that odd symmetry implies zero output for all DC inputs
|
|
Because of the given zero locations it is not possible for all filter types to have an arbitrary magnitude response. While only type I filters can be used to realize all-pass filters, type II filter would be employed to synthesize low-pass filters. Because both type III and IV filters have zero output for DC-inputs, they can not be used to implement low-pass functions. Since type III filters also have a zero at the Nyquist-frequency, they are suitable to build band-pass filters, while type IV filters can be used to realize high-pass filters.
A further effect of the linear-phase constraint on the zeros of H(z) is seen by noting from (2.17) that
| (2.18) |
To verify the linear phase-response let us consider the type I case with real coefficients of even symmetry and N even. We then may rewrite (2.17) as
| |||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||
|
|
| (2.21) |
The more general proof that type I, II, III, and IV filters yield phase linearity for complex h[n] can be found in [5].
As for other FIR design methods, the specification on the passband and stopband is assumed to be
| ||||||||||||||
For convenience let us assume that a type I filter impulse response is made symmetric about n = 0. Hence, h[n] = h[-n], and we obtain the following frequency response
| ||||||||||||||||||
| (2.24) |
An important parameter in the Remez exchange algorithm is the total number of extrema in the Nyquist interval [0,p]. To determine this number we take the derivative of H(w) and set it to zero. We then have
| (2.25) |
Since CAD software is readily available in MATLAB\ [1] to design equiripple filters, we will only outline the general design formulations and the Remez exchange algorithm. Even though it can be closely estimated, we do not know a priori what minimum ripple amplitudes d1 and d2 can be achieved for a given order N. By applying a weight K [6] to the stopband specification, a single unknown Kd2 = d1 = d is produced, which is determined iteratively along with the filter coefficients. If the specifications can not be met, the order has to be increased and the algorithm repeated.
To describe the Remez exchange algorithm, let w1i, w2i, ¼ be the estimates at the ith iteration of the frequencies at which alternations of zero slope will occur, and let di be the estimate of d. In addition, we know that there are alternations at wc and wr and at either 0 or p or both (let w0i our current estimate of which one). These estimated alternation points are indicated by dots (·) in Figure 2.9. A trigonometric polynomial of form (2.24) is passed through the points 1±di in the passband and ±[(di)/ K] in the stopband at these frequencies with alternating sign such that 1-di occurs at wc and +[(di)/ K] at wr, as depicted in Figure 2.9. This new estimate of H(w) is then evaluated with high resolution to locate the frequencies w0i+1, w1i+1, ¼, at which extrema actually occur (choosing w0i+1 by whether the extremum at 0 or p is larger or using both if needed) as indicated by circles (°) in Figure 2.9. These frequencies plus a new estimate di+1 form the input to the (i+1)st iteration.
| (2.26) |
Direct form structures, in general, realize the given system function with the smallest possible number of delays, adders, and multipliers. The number of each of these components required is as follows:
| (2.27) |
Since real-valued linear-phase FIR filters of order N do not consist of N+1 distinct coefficients, but of [(N+1)/ 2] pairs of equal or complementary coefficients (plus one middle coefficient for N even), the direct form network structure can be implemented more efficiently. To explain the benefit of having coefficient symmetry, let us consider a type II filter with even symmetry and N odd. By splitting the sum in (2.26) at the midpoint we obtain
| (2.28) |
| (2.29) |
| (2.30) |
The total number of two-port adders needed amounts to [(N+1)/ 2] adders to create the multiplier input signals, plus [(N-1)/ 2] adders for the summation node. The number of required components to implement both type I and II filters are listed in Table 2.1.
| Component | N odd | N even |
| delays | N | N |
| adders | N | N |
| multipliers | [(N+1)/ 2] | N/2+1 |
| coefficients | [(N+1)/ 2] | N/2+1 |
Note that for N odd, every coefficient requires one adder, one multiplier, and one input of the output summation node. For N even, the extra middle coefficient only requires one multiplier and one summation input.
The numbers in Table 2.1 also apply to type III and IV filters. But because of the odd coefficient symmetry, each signal x[n-k] has to be added to -x[n-N+k]. This subtraction can be achieved by inserting a sign-inverting component into the delay line immediately after the midpoint. This results in -x[n] being shifted through the second half of the delay line.
| (2.31) |
By taking the z-transform of (2.31) we obtain
| (2.32) |
As explained in Chapter 2.2.4 only half of the coefficients have to be stored for the implementation of a linear-phase FIR filter. Because of the way a such a system is realized, [^h][n] and [^h][N-n] correspond to the same stored coefficient. Hence, the impulse response of a linear-phase filter with quantized coefficients is symmetric around its midpoint. Since the ideal impulse response h[n] is symmetric, e[n] is also symmetric. Consequently, the linear-phase property is not affected be the quantization of the coefficients. Because of their nonrecursive structure, FIR filters do not suffer from limit-cycle effects or instability due to coefficient quantization. In fact, FIR filters with quantized coefficients are always stable.
Figure 2.13 shows the effect of quantization on the magnitude response of a linear-phase FIR lowpass filter of order N = 64 with coefficients that have been rounded to a word length of B = 9 bits. The dashed line indicates the magnitude response of the ideal filter, while the solid line shows the actual response of the filter with quantized coefficients. It is obvious that 9 bits do not provide a precise enough representation of the coefficients for this filter. The distortion introduced by the process of quantization is strongest in the stopband. It can be seen in Figure 2.13 that at some frequencies the stopband attenuation dropped by more than 20dB. To minimize the error caused by coefficient quantization the number of resolution bits has to be increased. Within this Chapter we will introduce the means to estimate the minimum number of bits required to ensure a given minimum attenuation.
The frequency response [^H](w) can be obtained by substituting z by ejw in (2.32)
| (2.33) |
| (2.34) |
| (2.35) |
The dot-dashed line (·-) indicates the previously calculated error bound. The comparison shows that the deviation introduced by the quantization errors is less than the corresponding bound in all cases. \breveEdB is a pessimistic bound, since all of the quantization errors would have to be of the same sign and equal to q/2 to be as large as the bound. The probability of this occurring is very small.
Because of the inherently hard-to-predict nature of quantization errors, a statistical analysis [7] of the effect of coefficient quantization is appropriate, even though for a given filter the quantization process is performed only once, after which the filter response is exactly determined. The statistical model to be used is a very reasonable one that assumes the errors due to the quantization of different coefficients to be independent and uniformly distributed between -q/2 and q/2, as depicted in Figure 2.15.
It can be seen in the Figure that the probability distribution function fe(e) has a mean me = 0. The variance s2e for each e[n] can be calculated as follows
| (2.36) |
| ||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||
| (2.39) |
| (2.40) |
| |||||||||||||||||||||||||||||||||
Figure 2.16 shows W(w) for a filters of order N = 8 and N = 64. The behavior of W(w) in the limit of large N is readily seen from (2.41). In the range 0 < w < p,
| (2.42) |
| (2.43) |
Equation (2.41) shows W(0) = W(p) = 1, and 0 < W(w) £ 1 for all N. Thus
| (2.44) |
| (2.45) |
| (2.46) |
To investigate the distortion of the frequency response caused by coefficient quantization, let L(w) be some real, ideal band-select function to be approximated by the frequency response H(w) of a linear-phase FIR filter. The usual design specifications consist of a set of disjoint frequency bands Wk Ì [0,p],k = 1,¼,M, and a set of corresponding in-band bounds dk > 0, such that for each k, L(w) is approximated by H(w) within an error of dk for all w Î Wk. Hence
| (2.47) |
| ||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||
| (2.50) |
Equations (2.49) and (2.50) are further investigated be rephrasing them in terms of in-band attenuation in decibels. We define the in-band attenuation on Wk of the ideal filter as
| (2.51) |
| (2.52) |
In the process of filter design it is sometimes more useful to know what minimum number of bits are required for a given maximum deviation from the initial in-band attenuation, rather than what maximum attenuation can be achieved for a given number of bits. Let us put a constraint on [^A]k by defining Dk to be the maximum change of in-band attenuation allowed for a given ideal filter, hence [^A]k ³ Ak-Dk. Then, by substituting [^A]k in (2.52) with this constraint and solving for B we obtain
| (2.53) |
Sometimes the maximum number of bits that can be used to quantize the filter coefficients is limited by a given hardware setup, i.e., a digital signal processor with fixed point arithmetic and a fixed data bus width. In this case it would be helpful to have a formula that offers the possibility of investigating the maximum decrease of in-band attenuation caused by coefficient quantization to a given number of bits. By negating (2.52) and adding Ak on both sides we obtain
| (2.54) |
Figure 2.19 depicts the effect on the in-band attenuation of linear-phase FIR filters with an initial in-band attenuation of 60dB versus filter order N for a given number of bits used to represent the filter coefficients. Let us consider the previous example, a filter of order N = 64 and an initial stopband attenuation of 60dB. By using 2.54 we find that the quantization of the coefficients to a length of 9 bits results in a drop of stopband attenuation of up to 22.82dB, which can be seen in Figure 2.13.
Equations (2.52), (2.53), and (2.54) provide a very helpful toolbox which enables us to study the impact of coefficient quantization on a filter designed with conventional design techniques. Let us apply this toolbox to the example first introduced at the beginning of this section. Recall that the coefficients of a linear-phase FIR lowpass filter of order N = 64 and an initial stopband attenuation of 60dB were quantized to a length of 9 bits. It can be seen in Figure 2.13 that 9 bits are clearly not enough to guarantee a sufficient rejection in the stopband. By allowing a maximum decrease of 6dB in the stopband, we find that we need a minimum of 13 bits to represent the coefficients. The solid line in Figure 2.20 indicates the the Magnitude response of the filter with 13-bit coefficients. It can be seen that the filter now meets the required specifications.
| (3.1) |
To ensure the stability of the DS Modulator its input is limited to the range -1/2Vref¼+1/2Vref, i.e., half of the maximum possible output range. Figure 3.1 shows the output spectrum of the fifth-order IFLF DS Modulator for an input signal x(t) with frequency fx = 102kHz and amplitude Vx = 1/2Vref.
Since the input signal amplitude will always be within the interval [-1/2,+1/2] with respect to Vref, the DS Modulator output signal can be amplified by a factor 1.5 ([^ = ]3.52dB) without causing any overflow. The range of the output signal component will then be [-3/4,+3/4] with respect to Vref. The remaining 25% of the output range serve as a reserve to accommodate the noise.
The edges of the passband and the stopband are defined by the maximum input signal frequency fmax and its decimated alias fa = fd-fmax. Thus, the filter has to approximate the following characteristic
| (3.2) |
To ensure that all spectral components for f ³ fa are less than -100dB in amplitude, the stopband attenuation has to be at least 100dB. The required gain of 1.5 can easily be realized by scaling all the coefficients by 1.5. Note that by doing so, we increase not only the passband gain but also the stopband gain by 3.52dB. Consequently, to maintain the minimum stopband attenuation of 100dB for the filter with the scaled coefficients, the stopband ripple of the unscaled filter has to be ds = -103.52dB or less. The last constraint on the frequency response of the filter is the maximum allowable error in the passband, i.e., the passband ripple dp. It is important in beamforming applications that the gain does not fluctuate more than ±0.1dB over the width of the passband.
By considering all the requirements mentioned above we define a first set of filter specifications in Table 3.1. Figure 3.2 graphically displays these specifications.
| Cut-off frequency | 106.7kHz |
| Stopband edge | 293.3kHz |
| Passband ripple | 0.1dB |
| Stopband ripple | -104dB |
| (3.3) |
Choosing a filter order that is higher than necessary leaves the possibility to improve the filter characteristic. The minimum order required to realize an arbitrary filter characteristic is generally determined by the relative width of the transition bands with respect to the sampling frequency and the maximum allowable error in the individual frequency bands. For our application, the ripples previously defined are sufficient, but by reducing the transition bandwidth, we can significantly improve the filter's performance. It can be seen in Figure 3.2 that immediately above the cut-off frequency fc, the quantization noise power increases drastically. By narrowing the transition band, the noise rejection between fc and fr can be enhanced.
We can use the function remezord to estimate the minimum required filter order for a given transition bandwidth. Figure 3.3 depicts the required filter order for our given set of specifications with a variable transition bandwidth.
According to these results, the transition bandwidth can be chosen as low as 100kHz. But knowing that remezord only estimates the filter order, we make a more conservative choice for the new transition bandwidth. By setting the new value for ft to 115kHz, we obtain the improved set of filter specifications listed in Table 3.2.
| Cut-off frequency | 106.7kHz |
| Stopband edge | 221.7kHz |
| Passband ripple | 0.1dB |
| Stopband ripple | -104dB |
| Filter order | 255 |
The estimated minimum filter order for these specifications is N = 212.
| (3.4) |
Note that the impulse response h is symmetric around its midpoint, which is typical for linear-phase FIR filters.
The frequency response H(f) can be calculated by taking the Fourier-Transform of h. The MATLAB function H = fft(h,M) computes the M-point FFT of h, padded with zeros if h has less than M points and truncated if it has more. The complex resultant vector H represents the frequency response of the FIR filter with impulse response h. Note that choosing M higher than the length of h results in a larger set of points to represent H(f), and hence a smoother plot. Figure 3.5
shows the magnitude response |H(f)| of our filter for M = 214. It can be seen that the stopband attenuation is about 8dB higher than specified. This is due to running the Remez Exchange Algorithm for more filter taps than necessary. By observing the magnitude response within [0,fc] depicted in Figure 3.6, we see that choosing N = 255 also results in a better approximation in the passband. The ripple is 0.6dB less than previously specified.
We expect our linear-phase filter to have a constant group delay for all frequencies. This can easily be verified by computing the slope of the phase response. From (2.21) we get
| (3.5) |
I can be seen that the phase response is indeed linear, and that the filter yields a constant group delay D. This is true for all frequencies except at those places where |H(f) = 0|. At each of these zero locations the phase jumps by p because the sign of R(f), the real-valued part of H(f), changes its sign. But since for these frequencies there is no output contribution the filter is still referred to as having linear phase. See equation (2.20) for further reference.
y = filter(1.5*h,1,x);The DS Modulator output signal x can be generated by using the appropriate functions comprised in the DelSi Toolbox [9].
The performance of the FIR filter is best verified by observing the spectrum of its output signal y, which is obtained by simply computing the FFT of a windowed version of y. Figure 3.8 depicts the comparison of the unfiltered input signal x and the filter output signal y, where the dashed line indicates the magnitude response of the scaled FIR filter. It can be seen that the scaling of the coefficients results in a 3.52dB amplification of all frequency components.
| (3.6) |
The quantization of the filter coefficients stored in the vector h with a quantization range of D = 1 can easily be simulated inMATLAB by the following statement:
hq = round(2^B*h)/2^B;While the coefficients in h are stored with full precision, the values stored in hq are rounded to a precision of B bits. Figure 3.9 shows the magnitude response |Hq(f)| of our filter with coefficients quantized to length of B = 22bits .
The dashed curve shows the response before coefficient quantization. A lower bound on the stopband attenuation can be computed by inserting the corresponding values into (2.52). We then get
| (3.7) |
| (3.8) |
| (3.9) |
In MATLAB this decimation by D can be performed by the following statement:
d = 1:D:length(y); yd = y(d);The first command line creates a vector d containing the indices of every Dth sample of y. The second command extracts these samples by copying them in to a new vector yd. This new vector contains the decimated version of the original signal y. The stem plot in Figure 3.11 shows the decimated impulse response of our filter with an initial sampling frequency of fs = 6.4MHz and a downsampling ratio of D = 16. The dashed line indicates the impulse response before decimation. The original signal was represented by 256 samples, the decimated version only consists of 16 samples.
By applying the decimation operation to the filtered DS Modulator output signal, we obtain a signal with the spectrum depicted in Figure 3.12. It can be seen that the sampling rate of the decimation filter output signal is indeed fd = fs/D = 400kHz, because its spectrum is symmetric around the new Nyquist-frequency fd/2.
The input signal x[n] is the 1-bit quantized output signal of a DS Modulator. As discussed in Chapter 2.1, the signal x[n] can only assume the binary values 0 and 1, which represent the voltages +Vref and -Vref, respectively. Hence, the values wk[n] at the output of the multipliers are limited to the set {+2h[k],0,-2h[k]}, corresponding to the voltages {+2Vrefh[k],0,-2Vrefh[k]}. Table 3.3 lists these values and its conditions on the input signal.
| x[n-k] | x[n-N+k] | wk[n] |
| 0 | 0 | -2h[k] |
| 0 | 1 | 0 |
| 1 | 0 | 0 |
| 1 | 1 | +2h[k] |
It can be seen that wk[n] ¹ 0 only if x[n-k] = x[n-N+k], otherwise wk[n] will always be 0. Since the contribution to the output signal is limited to +2h[k], -2h[k], or 0, this operation can be implemented without the use of multipliers. For x[n-k] ¹ x[n-N+k], there is no contribution, and hence, no action is required. For x[n-k] = x[n-N+k], the value 2h[k] has to be either added to or subtracted from the final output value, depending on the value of x[n-k] and x[n-N+k]. Figure 3.14 shows the multiplier-free implementation of the FIR filter.
Note that the thin lines represent single-bit signals while the thicker lines indicate multi-bit signals. Since the input signal x[n] shifted trough the N delays is a single-bit signal, the delay-line for our filter can simply be implemented by a 255-bit shift register with serial input and parallel output. The decision, whether x[n-k] and x[n-N+k] are equal or different, is realized by an exclusive-nor logic gate. The output value of this gate then determines whether mk[n] = 2h[k] or mk[n] = 0 is fed into a sign-inverter. The sign-inverter negates its input value mk[n] if x[n-k] = 0 or passes it trough as is if x[n-k] = 1. Table 3.4 lists the states of the signals involved for the multiplier-free computation of wk[n].
| x[n-k] | x[n-N+k] | [`(x[n-k]Åx[n-N+k])] | mk[n] | wk[n] |
| 0 | 0 | 1 | 2h[k] | -2h[k] |
| 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 |
| 1 | 1 | 1 | 2h[k] | +2h[k] |
The signal mk[n] is obtained by either loading 2h[k] from the coefficient memory or setting it to 0. An easy way of realizing this with logic gates is presented in Chapter 4. To avoid multiplying the coefficient h[k] by 2, we store 2h[k] in the coefficient memory. This requires one additional memory bit for each coefficient. The topology of the sign-inverter depends on the number format employed to store the coefficients, which we will discuss later in this section. We can then express wk[n] as
| (3.10) |
| (3.11) |
| ||||||||||||||||||||
| (3.13) |
The result vl[Dm] of the lth accumulator at the time Dm·T can be written as
| (3.14) |
| (3.15) |
Since each of the L accumulators only accumulates one summand per master clock cycle, it has to be taken into account that the samples x[Dm-k] and x[Dm-N+k], needed to compute each of the D summands, are shifted to the next delay after one clock cycle. For the analysis of the accumulation procedure, let us consider the first accumulator of our filter with N = 255 and D = 16. It performs the following calculation:
| (3.16) |
|
| (3.17) |
| Cycle | Operation |
| 16m-15 | Clear Accumulator |
| add [`(d0Åd225)]·s(d0)·2h[15] | |
| 16m-14 | add [`(d0Åd227)]·s(d0)·2h[14] |
| 16m-13 | add [`(d0Åd229)]·s(d0)·2h[13] |
| 16m-12 | add [`(d0Åd231)]·s(d0)·2h[12] |
| 16m-11 | add [`(d0Åd233)]·s(d0)·2h[11] |
| 16m-10 | add [`(d0Åd235)]·s(d0)·2h[10] |
| 16m-9 | add [`(d0Åd237)]·s(d0)·2h[9] |
| 16m-8 | add [`(d0Åd239)]·s(d0)·2h[8] |
| 16m-7 | add [`(d0Åd241)]·s(d0)·2h[7] |
| 16m-6 | add [`(d0Åd243)]·s(d0)·2h[6] |
| 16m-5 | add [`(d0Åd245)]·s(d0)·2h[5] |
| 16m-4 | add [`(d0Åd247)]·s(d0)·2h[4] |
| 16m-3 | add [`(d0Åd249)]·s(d0)·2h[3] |
| 16m-2 | add [`(d0Åd251)]·s(d0)·2h[2] |
| 16m-1 | add [`(d0Åd253)]·s(d0)·2h[1] |
| 16m | add [`(d0Åd255)]·s(d0)·2h[0] |
| Store result in output register |
Consequently, the operation of the first accumulator can be written as
| (3.18) |
The fact that the tap containing the first of the two samples used to generate a summand remains the same during all the clock cycles allows the use of a single exclusive-nor gate. The second sample can then be picked out of the delay-line by a simple D-bit multiplexer. Figure 3.15 depicts the hardware concept for the first accumulator section of our filter.
Both the 16 coefficient memory cells and the 16 inputs of the multiplexer are addressed by the output signals of a 4-bit counter, which counts the clock cycles of the master clock. Because of the order in which the individual summands are computed, the coefficients have to be stored in descending order, i.e., 2h[15] is stored in the first and 2h[0] is stored in the last memory cell. At the beginning of the accumulation, during the first cycle, the delay register of the accumulator has to be cleared. During the last cycle, once the final result is accumulated, it is latched into the output register.
From (3.18) we can derive the general case. The lth accumulator performs the following computation
| (3.19) |
Each of these sections has one input (in0) connected to a fixed tap of the delay line, and D multiplexer inputs (in1¼inD), all connected to specific filter taps separated by two delays. The outputs of all L sections are summed by a summation node to generate the final filter output signal y[Dm]. Since every section latches its output result, it will be available at the output for D clock cycles. Hence, this addition can be realized by a simple accumulator that loads the individual results from the accumulation sections, and sums them sequentially. This accumulator should also be followed by a data register which is updated every D cycles with a new final result.
In the process of implementing the hardware concept, it is crucial to know, how the delay-line taps and the inputs of the individual accumulation sections have to be connected. The number of the fixed tap, denoted dn0, to which the input in0 of the l-th accumulation section has to be connected, can be determined as follows:
| (3.20) |
| (3.21) |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
| in0 | d0 | d16 | d32 | d48 | d64 | d80 | d96 | d112 |
| in1 | d225 | d209 | d193 | d177 | d161 | d145 | d129 | d113 |
| in2 | d227 | d211 | d195 | d179 | d163 | d147 | d131 | d115 |
| in3 | d229 | d213 | d197 | d181 | d165 | d149 | d133 | d117 |
| in4 | d231 | d215 | d199 | d183 | d167 | d151 | d135 | d119 |
| in5 | d233 | d217 | d201 | d185 | d169 | d153 | d137 | d121 |
| in6 | d235 | d219 | d203 | d187 | d171 | d155 | d139 | d123 |
| in7 | d237 | d221 | d205 | d189 | d173 | d157 | d141 | d125 |
| in8 | d239 | d223 | d207 | d191 | d175 | d159 | d143 | d127 |
| in9 | d241 | d225 | d209 | d193 | d177 | d161 | d145 | d129 |
| in10 | d243 | d227 | d211 | d195 | d179 | d163 | d147 | d131 |
| in11 | d245 | d229 | d213 | d197 | d181 | d165 | d149 | d133 |
| in12 | d247 | d231 | d215 | d199 | d183 | d167 | d151 | d135 |
| in13 | d249 | d233 | d217 | d201 | d185 | d169 | d153 | d137 |
| in14 | d251 | d235 | d219 | d203 | d187 | d171 | d155 | d139 |
| in15 | d253 | d237 | d221 | d205 | d189 | d173 | d157 | d141 |
| in16 | d255 | d239 | d223 | d207 | d191 | d175 | d159 | d143 |
Note that these coefficients are already scaled by 1.5 (see section 3.1.3 for reference), quantized, and multiplied by 2. The dashed lines indicate the separation into 8 sections. Figures 3.4 and 3.17 show that the values of the coefficients closer to the middle of the impulse response are larger than the ones that are further away. The values in section 1 are much smaller than the values in section 8. Consequently, less memory-bits are needed to store the coefficients of section 1 compared to section 8.
Assuming a quantization range of D = 1 and a quantization bit-depth of B, the number bk of bits required to store the magnitude of the coefficient 2h[k] is
| (3.22) |
With (3.22) the required number of bits bk can be calculated for every value |2h[k]|. If the coefficient set consists of both positive and negative numbers, an additional bit is needed to store the sign of the corresponding coefficient. In order to maintain simplicity of the implementation, all the coefficients within one section are stored with the same memory word-length. If they all have the same sign, no additional sign-bit is required. Instead, the sign-information can be 'hard-wired' inside the accumulation section.
Figure 3.18 depicts a log2-plot of the coefficient magnitudes |2h[k]| of our filter.
The numbers on top of the graph represent the number of memory-bits chosen for the coefficient memories of the individual sections. The notches between the lobes of the log2-plot indicate that the sign of the coefficients changes at these positions. If such a sign-inversion occurs within a section, an additional bit is needed to represent the sign of the coefficients. This is the case for the sections 3,4,6 and 7, while the coefficients of the other sections are strictly positive.
One of the most frequently used number formats in the field of digital signal processing is the two's complement format, because it allows the addition of both positive and negative binary numbers with simple binary adders. Also, the complement -n for a given number n can easily be computed by inverting all the bits of the binary representation of n and adding 1. To illustrate this, let us consider the number range -8,¼,7 which can be represented by the 4-bit two's complement format, as shown in Table 3.7.
| n | binary |
| 7 | 0111 |
| 1 | 0001 |
| 0 | 0000 |
| -1 | 1111 |
| -8 | 1000 |
The positive numbers 0¼7 can be represented by just 3 bits. By extending the range to negative numbers, an additional bit is needed. The sign-bit in the two's complement format is the MSB (most significant bit), which is 0 for positive and 1 for negative numbers. To show the computation of the complement -n for a given number n, let us consider n = 7. The binary representation of 7 is 0111. This number can be inverted by simply inverting all the bits of 0111 and adding 1:
|
In MATLAB the binary representation of a number can be computed by the command Cbin = dec2bin(C,b) which converts the decimal integer C to a binary string with at least b bits, where C must be a non-negative integer smaller than 252. Therefore, our quantized coefficients first have to be converted into integer values. This is accomplished by the command H = 2*h*2^ B. Since all coefficients h have been quantized to B = 22bits, the result H will be an integer number. Because dec2bin can only convert positive integers, the negative coefficients have to be handled differently. The following MATLAB routine converts the coefficient 2h[k] to its corresponding two's complement representation Hbin with a word-length of bl bits:
if h(k) >= 0
Hbin = dec2bin(2*h(k)*2^B,b(l));
else
Hbin = dec2bin(1+bitcmp(abs(2*h(k)*2^B),b(l)),b(l));
end;
The function Cc = bitcmp(C,b) returns the bit-wise
complement of C as a b-bit non-negative integer. By adding 1
to Cc and then converting it to a binary number, we obtain
the binary b-bit representation of -C.
The two's complement coefficients for our filter are listed in appendix A. Note that the bit-depth bl chosen for the conversion is the same for all coefficients within the lth accumulation section.
Once the coefficients are converted to the binary format, they have to be assigned to the corresponding sections in the correct order. The number nl,i of the ith coefficient in the lth section is
| (3.23) |
| l | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| bl | 11 | 13 | 15 | 15 | 16 | 18 | 19 | 20 |
| i = 0 | h[15] | h[31] | h[47] | h[63] | h[79] | h[95] | h[111] | h[127] |
| 1 | h[14] | h[30] | h[46] | h[62] | h[78] | h[94] | h[110] | h[126] |
| 2 | h[13] | h[29] | h[45] | h[61] | h[77] | h[93] | h[109] | h[125] |
| 3 | h[12] | h[28] | h[44] | h[60] | h[76] | h[92] | h[108] | h[124] |
| 4 | h[11] | h[27] | h[43] | h[59] | h[75] | h[91] | h[107] | h[123] |
| 5 | h[10] | h[26] | h[42] | h[58] | h[74] | h[90] | h[106] | h[122] |
| 6 | h[9] | h[25] | h[41] | h[57] | h[73] | h[89] | h[105] | h[121 |
| 7 | h[8] | h[24] | h[40] | h[56] | h[72] | h[88] | h[104] | h[120] |
| 8 | h[7] | h[23] | h[39] | h[55] | h[71] | h[87] | h[103] | h[119] |
| 9 | h[6] | h[22] | h[38] | h[54] | h[70] | h[86] | h[102] | h[118] |
| 10 | h[5] | h[21] | h[37] | h[53] | h[69] | h[85] | h[101] | h[117] |
| 11 | h[4] | h[20] | h[36] | h[52] | h[68] | h[84] | h[100] | h[116] |
| 12 | h[3] | h[19] | h[35] | h[51] | h[67] | h[83] | h[99] | h[115] |
| 13 | h[2] | h[18] | h[34] | h[50] | h[66] | h[82] | h[98] | h[114] |
| 14 | h[1] | h[17] | h[33] | h[49] | h[65] | h[81] | h[97] | h[113] |
| 15 | h[0] | h[16] | h[32] | h[48] | h[64] | h[80] | h[96] | h[112] |
| |||||||||||||||||||||||
| (3.25) |
Figure 3.19 depicts a log2-plot of the accumulations performed in each section. The segments represent the growth of each accumulator sum during the accumulation process for the case that all the summands are positive. The last and highest value in each segment is the highest number that can occur, i.e., the term in parentheses in (3.25). The numbers on top of the graph represent the adder-width chosen for the accumulator of the corresponding section.
Finally, the adder-width for the accumulator that adds the results vl[Dm] of the L sections has to be determined. Just as for the other accumulators, the upper bound for y[Dm] could be computed and used to calculate the number of bit required to avoid overflow. But due to the nature of the input signal x[n] and the fact that it is limited to ±1/2Vref, we are able to compute a less pessimistic bound of the output signal. Assuming an input signal x[n] to be equal to 1 for all n, which corresponds to a DC signal of +Vref, we obtain
| (3.26) |
| (3.27) |
y_t = floor(y*2^(a_out_t-1))/2^(a_out_t-1);where a_out_t denotes [a\tilde]out. This command reduces the precision of the magnitude of y to [a\tilde]out-1 bits. Since y can assume both positive and negative values, the effective precision of y is [a\tilde]out bits. The choice of [a\tilde]out is determined by the desired dynamic range of the signal of interest. According to the specifications [8], the DS modulator used in our application has a maximum narrow-band signal-to-noise ratio (SNR) of 97dB, which is equivalent to a precision of 16 bits, hence [a\tilde]out = 16.
d = zeros(1,N+1);This statement initializes the vector d by setting all N+1 elements to 0. This corresponds to their initial value. Note that these elements can only be addressed by their number, i.e., d0 = d(1),¼dk = d(k+1). In order to be able to address the filter taps dn0 and dnk needed for the computations, the tap numbers n0 and nk have to be computed and stored. By employing (3.20), we can compute the tap numbers n0 for all L sections. The following statement creates a vector with L elements holding the corresponding tap numbers:
l = 1:L; n_0 = D*(l-1)+1;Accordingly, a matrix storing the D numbers of the filter taps serving as multiplexer inputs in each of the L sections can be generated by employing (3.21) as follows:
[k,l] = meshgrid(1:D,1:L); n_k = N-D*(l+1)+2*k+1;When performing the calculations, the currently needed samples at the k cycle in the lth section are d(n_0(l)) and d(n_k(k,l)). Since these samples can be obtained directly from the shift register, a script implementing the D-bit multiplexer is not necessary.
dec = bin2dec(Hbin);
for k = 1:M
if sig(k) == -1
dec(k) = -bitcmp(dec(k)-1,b(k));
end
end
coeff_rom = zeros(L,D);
for l = 1:L
coeff_rom(l,:) = dec(l*D:-1:(l-1)*D+1)';
end
The function bin2dec converts a binary number
into an equivalent positive integer. Because of that, the sign
(stored in sig) and word-length of each coefficient
(stored in b) has to be known, so that in case a
coefficient is negative, the positive integer generated by
bin2dec can be converted to the correct negative integer.
accu = zeros(1,L);Storing the values of all accumulators in a single vector allows the simple calculation of the final output result by the command sum(accu), which adds the elements of accu, and hence, makes the implementation of an accumulation script obsolete.
for k = 1:length(u)
c = mod(k-1,D)+1;
d = [u(k) d(1:end-1)]; % Shifting the shift register
if c == 1
accu = accu*0; % clear accumulator
end
for l = 1:L % Do the computations section by section
if d(n_0(l)) == d(n_k(l,c))
s = 2*d(n_0(l))-1; % sign s(x[n-k])
accu(l) = accu(l)+s*coeff_rom(l,c);
end
end
if c == D
result = sum(accu);
resultout = floor(result/2^tr); % truncate by 'tr' bits
Y = [Y resultout];
end
end
The total number of computations that are performed by
this script is determined by the length of the input vector
u. The samples of u have to be either 0 or 1,
corresponding to the voltages -Vref and +Vref,
respectively. Note that the values in Y are signed
integers with a precision that has been reduced by
tr = aout-[a\tilde]out bits. In order to obtain the
real values Y, has to be scaled properly.
The simulation of the filter's impulse response requires an input signal of the form u[n] = d[n]Vref. Since u[n] is the output signal of a DS modulator, its samples can only assume the values ±Vref and hence, the impulse response can not be simulated. However, it is possible to generate the signal
| (3.28) |
u = [1 zeros(1,I)];where I denotes the numbers of zeros following the first sample. The number of iterations performed by the simulator is determined by the number of samples stored in the input vector u. Thus, in order to simulate the full impulse response of our filter, the input signal has to be at least 256 samples long. The decimated output vector y will then consist of 16 samples.
Figure 3.20 shows the response of our filter to the signal defined in (3.28) under the assumption, that the initial value of all the delays was 0, corresponding to a DC input voltage of -Vref. The dashed line indicates the impulse response of our original filter as designed in Chapter 3.2. The hexadecimal values for the output signal as well as the accumulator output values of this simulation are listed in Appendix C.
Even though there will probably be a large market for our filter in the future, it is both more cost and time efficient to implement a first prototype using a programmable device. In CMOS, one may divide this spectrum of programmable devices into three areas:
A very flexible, but also complex device is the programmable gate array. This approach may be further categorized into ad-hoc and structured arrays [10].
An array of Configurable Logic Blocks (CLBs) is embedded within a set of horizontal and vertical channels that contain routing, which can be personalized to interconnect CLBs. The configuration of the interconnects is achieved by turning on n-channel CMOS pass transistors. The state that determines a given interconnect pattern is held in static RAM cells distributed across the chip close to the controlled elements. The CLBs and routing channels are surrounded by a set of programmable input/output (I/O) buffers.
The detailed structure of a CLB is shown in Figure 4.2.
It consists of two registers, a number of multiplexers, and a combinatorial function unit. The latter can generate two functions of four variables, any function of five variables, or a selection between two functions of four variables. The function bit and each multiplexer is controlled by a number of RAM state bits. More recent CLBs feature enhanced table lookup function generators, which can be used to build logic functions or serve as register storage. The XC4000 series has built-in support for carry chains, to conveniently build data paths. Each input and output on a CLB has a particular local interconnect pattern, which allows most local interconnection between adjacent CLBs to take place. At the junction of the horizontal and vertical routing channels, where the general-purpose interconnect runs, programmable switching matrices are employed to redirect routes. Figure 4.3 shows a typical CLB surrounded by switching matrices. They perform crossbar switching of the global interconnects, which run both vertically and horizontally. Programmable Interconnect Points (PIPs) interconnect the global routing to CLBs. Both PIPs and the switching matrices are implemented as n-channel pass gates controlled by 1-bit RAM cells. Extra specially long-distance interconnects are used to route important timing signals with low skew.
After capturing a circuit using CAD software, the design proceeds by mapping the logic to the CLBs. The software places and routes the CLBs by loading the internal state RAM with the codes needed to program the I/Os, the CLBs, and the routing. The design is then ready to be tested or used.
The XC4000X series achieve high speed through advanced semiconductor technology and improved architecture. This series support system clock rates of up to 80MHz and internal performance in excess of 150MHz. Currently, the largest devices of this series contain 3,136 CLBs which corresponds to 85,000 logic gates, if none of the CLBs are used for RAM.
In Chapter 3.5 we developed the digital hardware concept for our digital decimation filter in order to be able to simulate the behavior of the system. Because the filter decimates the output signal by a factor of D = 16, the summation node was divided into 8 accumulation sections, each performing 16 additions. The input signal is fed in to a 256-bit shift register with serial input and parallel outputs. The latter are connected to the accumulation sections. The output signal is generated at the reduced sampling rate fs/16 by an accumulator that sums the result of the 8 accumulation sections. This relatively simple basic concept defines the hierarchy of the hardware implementation of our filter. Figure 4.4 shows the major functional blocks of our FIR filte. Note that the outputs of all 8 accumulation sections are connected to the same 24-bit data bus v[23:0], which is connected to the input of the final accumulator. Therefore, all the output registers of the accumulation sections have to be 24 bit wide tristate registers. These registers are controlled by the timing control circuit, which also generates the necessary control signals for the final accumulator.
Each accumulation section sums 16 numbers within 16 cycles. The accumulators within these sections are clocked directly by the positive edge of the master clock clk. At the beginning of the accumulation, at cycle 0, the delay registers are cleared by the signal clr, and the 16 numbers are added during the following 16 cycles. The results of the sections are available during cycle 15, when the last number is added to the sum. We define the signal load, which latches these results into the corresponding tristate data registers at the falling clock edge during the 15th cycle, where they will be stored for the next 16 cycles. The delay registers in the accumulation sections are cleared again one half cycle later. The control signals clr and load are generated by the timing control circuit.
The output samples are also generated every 16 cycles. Hence, the final accumulator sums the 8 results from the accumulation sections within the duration of 16 cycles of the master clock. When the tristate output register of an accumulation section is addressed by the the timing control circuit, it will write it's data to the data bus v[23:0], which is connected to the input of the final accumulator. Thus, the timing control circuit is responsible for selecting the tristate registers and clocking the accumulator with the signal ld when the number to be added is available on the bus v[23:0]. Since only 8 numbers have to be summed, the accumulation can be performed at half the master clock rate. It can be seen in Figure 4.5 that the output registers of the accumulation sections are only selected during even cycles, while during odd cycles the registers are in tristate mode. Once the final result is calculated, it is latched into the output register by the signal ld_out. One half clock cycle later, at the beginning of cycle 15, the delay register of the final accumulator is cleared.
Since the shift register comprises 256 components, it is advantageous to define a macro in order to maintain a transparent design hierarchy. Figure 4.7 shows the macro symbol for our shift register with two inputs and a 256-bit output bus.
Because of the order in which the accumulation is performed, the coefficients have to be stored in reverse order (see Table 3.8). Within the framework of this thesis, a MATLAB script was written which not only calculates the two's complement representation of filter coefficients, but also generates the necessary .mem files for the implementation of ROM cells in XILINX. The binary coefficients for each section are listed in Appendix A.
The delay-register used in this implementation consists of Flip-Flops that acquire the adder output-signal at the falling edge of the master clock, and feed it back into the adder at the rising edge, thus realizing a full clock cycle delay. The register can be cleared synchronously at the rising edge of the master clock by setting CLR to logic '1' before the preceding falling clock edge. As specified in Chapter 4.2.2, this register has to be cleared at the beginning of clock cycle 0. This can be achieved by generating a signal CLR that is equal to logic '1' during cycle 15, and equal to logic '0' during all the other cycles.
| 1001 | [^ = ] | -7 |
| ¯ | ||
| 1111001 | [^ = ] | -7 |
All output registers of the accumulation sections are connected to a 24-bit data bus via 24-bit tristate buffers. Since the word length of the output register is generally less than 24 bits, it has to be expanded by wiring the additional input signals of the buffer to the MSB of the output signal of the register (see Figure 4.8). The tristate buffers are negative-level triggered, i.e., the outputs can be made active by setting the signal SELECT_INV to logic '0'.
As specified in Chapter 3.5 the output signal has to be 16 bits wide. Hence, the 8 least significant bits (LSBs) are not connected to the output buffers.
The 4-bit synchronous counter generates the counter signals used in the accumulations to address the coefficients in the coefficient memories and to select the input signal samples via the 16-bit multiplexer. Furthermore, the counter signals are fed into a 4-to-16 bit decoder which generates the signals DEC[15:0] and their complement SEL[15:0]. The latter are used to control the tristate buffers of the accumulation sections. The two circuits on the bottom of Figure 4.10 generate the signals LD, CLR, LD_OUT, and CLR_OUT, which are needed to control the accumulators and output registers in the accumulator sections and the final accumulator. The timing diagram for these control circuits are depicted in Figure 4.11.
Figure 4.12 shows the top-level schematic of the complete FIR filter.
The Probe Tool within the schematic capture toolbox allows the simple addition of signals to the waveform viewer tool of the simulator. By just clicking on a signal name in the schematic, the corresponding signal is added to the list of signals to be analyzed. With the Stimulus Tool, it is possible to define stimuli by means of custom formulae, internal binary counter outputs, stimulator state selectors, script files, waveform files, and simple keystrokes.
Only 2 stimuli have to be generated for the simulation of the impulse response, namely the master clock and the input signal. We can define both stimuli with custom formulae:
| C1: L78nsH78ns |
| F0: H200nsL45000ns |
Signals of interest for the simulation on the top level hierarchy are the master clock, the counter output signals, the input signal, important timing control signals, the data signals in the final accumulator, and the output signal. Figure 4.13 shows an example simulation in the waveform viewer of the logic simulator. It depicts the states of important signals of our design between 19.9ms and 21.5ms.
To simulate the entire impulse response, it is necessary to run the simulation for at least 256 clock cycles of the master clock CLK_IN, i.e., 39.94ms. The resulting output signal will then consist of 16 samples, generated at the decimated clock frequency CLK_OUT of 400kHz. Figure 4.14 shows the plot of the functional simulation of the impulse response of our filter.
By comparing these simulation results to the results obtained with the behavioral simulation in Chapter 3.5.3, we see that they are identical. Hence, the design works properly.
When implementing the design, the software first translates the design of the system into a description that can later be mapped onto the array of CLBs and IOBs. The software then places and routes these blocks in order to optimize the performance of the device. Table 4.1 shows an excerpt of the device utilization report generated by the design implementation tool in Foundation. It can be seen that all CLBs on the device are used for logic functions.
| Number of External IOBs | 19 | out of | 192 | 9% | ||||
| Flops: | 0 | |||||||
| Latches: | 0 | |||||||
| Number of CLBs | 576 | out of | 576 | 100% | ||||
| Total Latches: | 0 | out of | 1152 | 0% | ||||
| Total CLB Flops: | 793 | out of | 1152 | 68% | ||||
| 4 input LUTs: | 700 | out of | 1152 | 60% | ||||
| 3 input LUTs: | 453 | out of | 576 | 78% | ||||
| Number of BUFGLSs | 3 | out of | 8 | 37% | ||||
| Number of TBUFs | 192 | out of | 1248 | 15% | ||||
Once a design is successfully implemented, the software creates a .bin file containing the interconnect-pattern for the CLBs. This file can then be loaded into the static configuration RAM of the XILINX FPGA, and the device is ready to be used.
Figure 4.15 depicts the structural simulation of our filter between 19.9ms and 20.7ms.
It shows the timing behavior of the accumulation section 1 (DO0, B0, and SUM0), and the output signal of the filter (OUT15). Unlike the functional simulation, this simulation shows the delays caused by simple gate-delays and routing. It can be seen that, depending on the value of the bits at the input, the sum of the ripple-carry adders is delayed up to 30ns in some cases. But since the period time of the master clock is 156ns this does not cause a malfunction of the system.
After performing a thorough structural simulation of the impulse response, it can be said that our design is robust and works properly for clock frequencies up to 10Mhz. The results obtained from the structural simulation are identical to the ones from the behavior simulation with MATLAB and the functional simulation in Foundation.
The design began with the requirements on the sonar receiver that were used to derive a set of specifications for the digital decimation filter. The linear-phase FIR filter was designed using the Remez-Exchange Algorithm from the Signal Processing Toolbox [1] of MATLAB. The obtained filter coefficients were quantized to a word-length of 22 bits, and the frequency response of the filter was simulated and verified by MATLAB\ routines.
A digital hardware concept for the decimation filter was developed, which takes advantage of the 1-bit input signal from the DS modulator. Due to the decimation, the hardware effort could be reduced significantly. A concept was developed that generates the output samples directly at the decimated sampling rate. A simulator in MATLAB, based on the hardware concept, was programmed and used to verify the impulse response of the digital decimation filter.
As to the hardware implementation, the digital decimation filter was realized on an XILINX XC4013XL Field Programmable Gate Array (FPGA). The design was captured entirely with a schematic-based design entry tool, and simulated with the logic simulator provided by the XILINX Foundation tool suite. The placed and routed design was simulated with the same simulator, with the only difference that worst-case routing delays were included in the simulation. Comparisons of the simulation of the actual design and the results obtained from the behavioral MATLAB simulator showed that the implemented digital decimation filter was working properly up to input sampling frequencies of up to 10MHz.
As a next step toward a commercialized product, the digital decimation filter could be realized as a custom IC. This would significantly reduce the cost the device, given that there is a large market for it. Since the design was captured entirely with the schematic capture tool, and a strict design hierarchy was maintained, the design is very well suited for a VLSI implementation. Because only basic logic building blocks were used in the capture of the design, the transition to VLSI layout using a basic logic cell-library is significantly simplified.
The following is a list of the 128 filter coefficients for the 256-tap linear-phase FIR filter described in this thesis. These coefficients, which have to be stored in digital memory, have been divided into 8 sections of 16 coefficients each. The list contains the 'analog' value and the two's complement binary and hexadecimal representation for each of the coefficients. Furthermore, the list shows the number of memory-bits selected for each section and the data-path width chosen for the accumulator of the corresponding section.
This list was generated by a MATLAB script written for this purpose during the design process of the filter discussed in this thesis. Besides computing the two's complement representation for any given set of coefficients, this script also generates the necessary .mem files needed for the implementation of ROM cells with XILINX FPGA design tools, as discussed in Chapter 4. Furthermore, it generates the layout cells to be used in Magic, a part of the UC Berkeley VLSI Tool Suite.
SECTION 1
Coefficient resolution: 11 Bits
Maximum no. of bits needed for accumulation: 14
No. orig.coeff.(2x) two`s complement
--------------------------------------------------
1 0.00000048 00000000010 0002
2 0.00000763 00000100000 0020
3 0.00000954 00000101000 0028
4 0.00001574 00001000010 0042
5 0.00002337 00001100010 0062
6 0.00003386 00010001110 008E
7 0.00004721 00011000110 00C6
8 0.00006390 00100001100 010C
9 0.00008440 00101100010 0162
10 0.00010967 00111001100 01CC
11 0.00013971 01001001010 024A
12 0.00017452 01011011100 02DC
13 0.00021553 01110001000 0388
14 0.00026178 10001001010 044A
15 0.00031424 10100100110 0526
16 0.00037241 11000011010 061A
SECTION 2
Coefficient resolution: 13 Bits
Maximum no. of bits needed for accumulation: 17
No. orig.coeff.(2x) two`s complement
--------------------------------------------------
17 0.00043583 0011100100100 00724
18 0.00050497 0100001000110 00846
19 0.00057793 0100101111000 00978
20 0.00065470 0101010111010 00ABA
21 0.00073338 0110000000100 00C04
22 0.00081301 0110101010010 00D52
23 0.00089169 0111010011100 00E9C
24 0.00096750 0111111011010 00FDA
25 0.00103760 1000100000000 01100
26 0.00110054 1001000001000 01208
27 0.00115252 1001011100010 012E2
28 0.00119162 1001110000110 01386
29 0.00121450 1001111100110 013E6
30 0.00121880 1001111111000 013F8
31 0.00120068 1001110101100 013AC
32 0.00115824 1001011111010 012FA
SECTION 3
Coefficient resolution: 15 Bits
Maximum no. of bits needed for accumulation: 18
No. orig.coeff.(2x) two`s complement
--------------------------------------------------
33 0.00108910 001000111011000 011D8
34 0.00099039 001000000111010 0103A
35 0.00086164 000111000011110 00E1E
36 0.00070095 000101101111100 00B7C
37 0.00050831 000100001010100 00854
38 0.00028419 000010010101000 004A8
39 0.00002909 000000001111010 0007A
40 -0.00025415 111101111010110 07BD6
41 -0.00056267 111011011001000 076C8
42 -0.00089264 111000101100000 07160
43 -0.00123882 110101110110100 06BB4
44 -0.00159550 110010111011100 065DC
45 -0.00195551 101111111110110 05FF6
46 -0.00231123 101101000100010 05A22
47 -0.00265360 101010010000110 05486
48 -0.00297356 100111101001000 04F48
SECTION 4
Coefficient resolution: 15 Bits
Maximum no. of bits needed for accumulation: 19
No. orig.coeff.(2x) two`s complement
--------------------------------------------------
49 -0.00326157 100101010010000 04A90
50 -0.00350761 100011010001000 04688
51 -0.00370169 100001101011010 0435A
52 -0.00383425 100000100101110 0412E
53 -0.00389624 100000000101010 0402A
54 -0.00387812 100000001110110 04076
55 -0.00377321 100001000101110 0422E
56 -0.00357533 100010101101100 0456C
57 -0.00327969 100101001000100 04A44
58 -0.00288343 101000011000010 050C2
59 -0.00238705 101100011100100 058E4
60 -0.00179148 110001010100110 062A6
61 -0.00110149 110110111110100 06DF4
62 -0.00032425 111101010110000 07AB0
63 0.00053120 000100010110100 008B4
64 0.00145149 001011111001000 017C8
SECTION 5
Coefficient resolution: 16 Bits
Maximum no. of bits needed for accumulation: 20
No. orig.coeff.(2x) two`s complement
--------------------------------------------------
65 0.00242138 0010011110101100 027AC
66 0.00342369 0011100000011000 03818
67 0.00443840 0100100010111000 048B8
68 0.00544262 0101100100101100 0592C
69 0.00641346 0110100100010100 06914
70 0.00732470 0111100000000010 07802
71 0.00815058 1000010110001010 0858A
72 0.00886536 1001000101000000 09140
73 0.00944233 1001101010110100 09AB4
74 0.00985718 1010000110000000 0A180
75 0.01008606 1010010101000000 0A540
76 0.01010752 1010010110011010 0A59A
77 0.00990391 1010001001000100 0A244
78 0.00946045 1001101100000000 09B00
79 0.00876760 1000111110100110 08FA6
80 0.00781870 1000000000011010 0801A
SECTION 6
Coefficient resolution: 18 Bits
Maximum no. of bits needed for accumulation: 21
No. orig.coeff.(2x) two`s complement
--------------------------------------------------
81 0.00661469 000110110001100000 006C60
82 0.00516081 000101010010001110 00548E
83 0.00346899 000011100011010110 0038D6
84 0.00155687 000001100110000010 001982
85 -0.00055027 111111011011111100 03F6FC
86 -0.00282240 111101000111000010 03D1C2
87 -0.00522232 111010101001110000 03AA70
88 -0.00770664 111000000110111100 0381BC
89 -0.01022720 110101100001110000 035870
90 -0.01273155 110010111101101000 032F68
91 -0.01516199 110000011110010110 030796
92 -0.01745892 101110000111110100 02E1F4
93 -0.01955938 101011111110001010 02BF8A
94 -0.02139997 101010000101100010 02A162
95 -0.02291727 101000100010000110 028886
96 -0.02404928 100111010111111010 0275FA
SECTION 7
Coefficient resolution: 19 Bits
Maximum no. of bits needed for accumulation: 22
No. orig.coeff.(2x) two`s complement
--------------------------------------------------
97 -0.02473640 1100110101010111000 066AB8
98 -0.02492237 1100110011110101100 0667AC
99 -0.02455759 1100110110110100110 066DA6
100 -0.02359629 1100111110101100110 067D66
101 -0.02200317 1101001011110000000 069780
102 -0.01974821 1101011110001110010 06BC72
103 -0.01681376 1101110110010000110 06EC86
104 -0.01319027 1110010011111100100 0727E4
105 -0.00887966 1110110111010000100 076E84
106 -0.00389576 1111100000000101100 07C02C
107 0.00173712 0000001110001110110 001C76
108 0.00798225 0001000001011001000 0082C8
109 0.01479244 0001111001001011100 00F25C
110 0.02210903 0010110101000111100 016A3C
111 0.02986383 0011110100101001010 01E94A
112 0.03797770 0100110111000111010 026E3A
SECTION 8
Coefficient resolution: 20 Bits
Maximum no. of bits needed for accumulation: 24
No. orig.coeff.(2x) two`s complement
--------------------------------------------------
113 0.04636526 00101111011110100110 02F7A6
114 0.05493164 00111000010000000000 038400
115 0.06357813 01000001000110101010 0411AA
116 0.07220078 01001001111011110000 049EF0
117 0.08069324 01010010101000010100 052A14
118 0.08894873 01011011000101010110 05B156
119 0.09685993 01100011001011110100 0632F4
120 0.10432434 01101010110101000000 06AD40
121 0.11124134 01110001111010010100 071E94
122 0.11751842 01111000010101101100 07856C
123 0.12307024 01111110000001100010 07E062
124 0.12781954 10000010111000110010 082E32
125 0.13170147 10000110110111001100 086DCC
126 0.13466167 10001001111001001100 089E4C
127 0.13665819 10001011111100000010 08BF02
128 0.13766384 10001100111101111100 08CF7C
Maximum no. of bits used for final addition: 24
The following is the MATLAB\ function simulfilt.m, which performs a behavioral simulation for a given linear-phase FIR filter with single-bit input. It is called as follows:
[y,yhex] = simulfilt(u,Hbin,sig,b,a,a_out,a_out_t,D,B,textfile);The vector u contains the samples of the input signal, which must be either 0 or 1, corresponding to the reference voltages -Vref and +Vref, respectively. Hbin is a character array containing the two's complement equivalent of the filter coefficients, one coefficient per row. The vector sig contains the signs for each coefficient in Hbin. The elements of sig must be either -1 or 1, and the length must be equal to the number of coefficients stored in Hbin. The vector b and a contain the memory bits for each coefficient and the accumulator word-length of the section to which the coefficient belongs, respectively. Both b and a must have the same length as sig, i.e., they must have one entry per coefficient. The variables a_out and a_out_t represent the word-length of the final accumulator, and the word-length, to which the output signal y is truncated, respectively. D is the downsampling ratio, and B is the number of bits to which the coefficients have originally been quantized. textfile is a string containing the name of the output text-file to which the simulation results will be written. After the execution of simulfilt, the output signal will be stored in the vectors y and yhex.
function [y,yhex] = simulfilt(u,Hbin,sig,b,a,a_out,a_out_t,D,B,textfile);
% [y,yhex] = simulfilt(u,Hbin,sig,b,a,a_out,a_out_t,D,B,textfile);
%
% Simulates the behavior of the implemented decimating FIR filter
%
% u: input signal
% Hbin: vector containing the binary coefficients (2's complement)
% sig: vector containing the signs of the coefficients
% b: vector containing the wordlength (in bits) of the coefficients)
% a: vector containing the necessary accumulator width of a
% particular section
% a_out: wordlength of the output signal
% a_out_t: wordlength og the truncated output signal
% D: section size (no. of coefficients per section)
% B: quantization width depth of the coefficients
% textfile: Name of the textfile containing the hecadecimal output values
%
% y: Output signal, real values
% yhex: Output signal, hexadecimal values
%
% Roger Meier, 7/99
% Dept. of Electrical Engineering
% University of Rhode Island
M = length(sig); % M: no. of coefficients
L = M/D; % L: no. of sections
N = 2*M-1; % N: Filter order
% Create (N+1)-bit shift register
% -------------------------------
d = zeros(1,N+1);
% Assigning in0 pins for each section
l = 1:L;
n_0 = D*(l-1)+1;
% Assigning multiplexed input pins for each section
[k,l] = meshgrid(1:D,1:L);
n_k = N-D*(l+1)+2*k+1;
% Create Coefficient ROM
% ----------------------
dec = bin2dec(Hbin);
for k = 1:M
if sig(k) == -1
dec(k) = -bitcmp(dec(k)-1,b(k));
end
end
coeff_rom = zeros(L,D);
for l = 1:L
coeff_rom(l,:) = dec(l*D:-1:(l-1)*D+1)';
end
% Create Accumulator Registers
accu = zeros(1,L);
% Truncation of the output signal
trunc = a_out-a_out_t;
% Main Loop
% ---------
delete(textfile)
echo off
diary(textfile);
if length(u) < 32
u = [u zeros(1,32-length(u))];
end
y = [];
for k = 1:length(u)
cycle = mod(k-1,D)+1;
% Shifting the shift register
d = [u(k) d(1:end-1)];
if cycle == 1
accu = accu*0; % clear accumulator
end
% Do the accumulation section by section
for l = 1:L
input1 = n_0(l);
input2 = n_k(l,cycle);
if d(input1) == d(input2)
opsign = 2*d(input1)-1;
accu(l) = accu(l)+opsign*coeff_rom(l,cycle);
end
end
if cycle == D
disp(['Output ' num2str(k/D)]);
result = sum(accu);
resultout = floor(result/2^trunc); % truncate outputsignal
y = [y resultout];
% ------------------------------------------------------
% everything between the dashed lines is used for displaying
% the hexadecimal results
for p = 1:L
if sign(accu(p)) == -1
hexstring = ...
dec2hex(1+bitcmp(abs(accu(p)),a_out),ceil(a_out/4));
else
hexstring = ...
dec2hex(abs(accu(p)),ceil(a_out/4));
end
disp(['Accumulator ' num2str(p) ': ' hexstring])
end
if sign(result) == -1
yfull = dec2hex(1+bitcmp(abs(result),a_out),ceil(a_out/4));
yhex(k/D,:) = dec2hex(1+bitcmp(abs(resultout),a_out_t),ceil(a_out_t/4));
else
yfull = dec2hex(abs(result),ceil(a_out/4));
yhex(k/D,:) = dec2hex(abs(resultout),ceil(a_out_t/4));
end
disp(['Sum: ' yfull]);
disp(['Output: ' yhex(k/D,:)]);
disp('---------------------')
% ----------------------------------------------------------------
end
end
diary off
% calculating the real value of the output signal
y = y/2^(B-trunc);
% plotting the output signal
figure
clf reset
subplot(2,1,1)
plot(y)
v = axis;
axis([1 length(y) v(3) v(4)])
grid on
% plotting the output signal, scaling the y axis with the maximum output range
subplot(2,1,2)
plot(y)
axis([1 length(y) -2^(a_out_t-(B-trunc)-1) 2^(a_out_t-(B-trunc)-1)])
grid on
[y,yhex] = simulfilt(u,Hbin,sig,b,a,24,16,16,22,'output.txt');where the input signal is generated by the following statement:
u = [1 zeros(1,255)];The result is a quasi-impulse response, which is depicted in Figure 3.20.
Output 1
Accumulator 1: FFE766
Accumulator 2: FF0CA4
Accumulator 3: 00A36A
Accumulator 4: 028276
Accumulator 5: F83266
Accumulator 6: 092702
Accumulator 7: 04582A
Accumulator 8: 97CAF8
Sum: 9F9674
Output: 9F96
---------------------
Output 2
Accumulator 1: FFE14C
Accumulator 2: FF1F9E
Accumulator 3: 00A36A
Accumulator 4: 028276
Accumulator 5: F83266
Accumulator 6: 092702
Accumulator 7: 04582A
Accumulator 8: 97CAF8
Sum: 9FA354
Output: 9FA3
---------------------
Output 3
Accumulator 1: FFE14C
Accumulator 2: FF0CA4
Accumulator 3: 0072B2
Accumulator 4: 028276
Accumulator 5: F83266
Accumulator 6: 092702
Accumulator 7: 04582A
Accumulator 8: 97CAF8
Sum: 9F5FA2
Output: 9F5F
---------------------
Output 4
Accumulator 1: FFE14C
Accumulator 2: FF0CA4
Accumulator 3: 00A36A
Accumulator 4: 029A3E
Accumulator 5: F83266
Accumulator 6: 092702
Accumulator 7: 04582A
Accumulator 8: 97CAF8
Sum: 9FA822
Output: 9FA8
---------------------
Output 5
Accumulator 1: FFE14C
Accumulator 2: FF0CA4
Accumulator 3: 00A36A
Accumulator 4: 028276
Accumulator 5: F8B280
Accumulator 6: 092702
Accumulator 7: 04582A
Accumulator 8: 97CAF8
Sum: A01074
Output: A010
---------------------
Output 6
Accumulator 1: FFE14C
Accumulator 2: FF0CA4
Accumulator 3: 00A36A
Accumulator 4: 028276
Accumulator 5: F83266
Accumulator 6: 079CFC
Accumulator 7: 04582A
Accumulator 8: 97CAF8
Sum: 9E0654
Output: 9E06
---------------------
Output 7
Accumulator 1: FFE14C
Accumulator 2: FF0CA4
Accumulator 3: 00A36A
Accumulator 4: 028276
Accumulator 5: F83266
Accumulator 6: 092702
Accumulator 7: 06C664
Accumulator 8: 97CAF8
Sum: A1FE94
Output: A1FE
---------------------
Output 8
Accumulator 1: FFE14C
Accumulator 2: FF0CA4
Accumulator 3: 00A36A
Accumulator 4: 028276
Accumulator 5: F83266
Accumulator 6: 092702
Accumulator 7: 04582A
Accumulator 8: A09A74
Sum: A85FD6
Output: A85F
---------------------
Output 9
Accumulator 1: FFE14C
Accumulator 2: FF0CA4
Accumulator 3: 00A36A
Accumulator 4: 028276
Accumulator 5: F83266
Accumulator 6: 092702
Accumulator 7: 04582A
Accumulator 8: 9AC29E
Sum: A28800
Output: A288
---------------------
Output 10
Accumulator 1: FFE14C
Accumulator 2: FF0CA4
Accumulator 3: 00A36A
Accumulator 4: 028276
Accumulator 5: F83266
Accumulator 6: 092702
Accumulator 7: 02C2E2
Accumulator 8: 97CAF8
Sum: 9DFB12
Output: 9DFB
---------------------
Output 11
Accumulator 1: FFE14C
Accumulator 2: FF0CA4
Accumulator 3: 00A36A
Accumulator 4: 028276
Accumulator 5: F83266
Accumulator 6: 099362
Accumulator 7: 04582A
Accumulator 8: 97CAF8
Sum: 9FFCBA
Output: 9FFC
---------------------
Output 12
Accumulator 1: FFE14C
Accumulator 2: FF0CA4
Accumulator 3: 00A36A
Accumulator 4: 028276
Accumulator 5: F85A12
Accumulator 6: 092702
Accumulator 7: 04582A
Accumulator 8: 97CAF8
Sum: 9FB806
Output: 9FB8
---------------------
Output 13
Accumulator 1: FFE14C
Accumulator 2: FF0CA4
Accumulator 3: 00A36A
Accumulator 4: 024D06
Accumulator 5: F83266
Accumulator 6: 092702
Accumulator 7: 04582A
Accumulator 8: 97CAF8
Sum: 9F5AEA
Output: 9F5A
---------------------
Output 14
Accumulator 1: FFE14C
Accumulator 2: FF0CA4
Accumulator 3: 00B542
Accumulator 4: 028276
Accumulator 5: F83266
Accumulator 6: 092702
Accumulator 7: 04582A
Accumulator 8: 97CAF8
Sum: 9FA232
Output: 9FA2
---------------------
Output 15
Accumulator 1: FFE14C
Accumulator 2: FF13C8
Accumulator 3: 00A36A
Accumulator 4: 028276
Accumulator 5: F83266
Accumulator 6: 092702
Accumulator 7: 04582A
Accumulator 8: 97CAF8
Sum: 9F977E
Output: 9F97
---------------------
Output 16
Accumulator 1: FFE14E
Accumulator 2: FF0CA4
Accumulator 3: 00A36A
Accumulator 4: 028276
Accumulator 5: F83266
Accumulator 6: 092702
Accumulator 7: 04582A
Accumulator 8: 97CAF8
Sum: 9F905C
Output: 9F90
---------------------