# A Power-Scalable Variable-Length Analogue DFT Processor for Multi-Standard Wireless Transceivers 

By<br>Ghazal Tanhaei

A Thesis submitted to<br>The University of Birmingham<br>for the degree of<br>DOCTOR OF PHILOSOPHY

School of Electronic, Electrical and System Engineering
College of Engineering and Physical Sciences
The University of Birmingham
September 2016

# UNIVERSITYOF <br> BIRMINGHAM 

## University of Birmingham Research Archive

e-theses repository

This unpublished thesis/dissertation is copyright of the author and/or third parties. The intellectual property rights of the author or third parties in respect of this work are as defined by The Copyright Designs and Patents Act 1988 or as modified by any successor legislation.

Any use made of information contained in this thesis/dissertation must be in accordance with that legislation and must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the permission of the copyright holder.

Dedicated to my parents, Nasrin and Khosrow
for their endless love, encouragements and sacrifices

## Abstract

Since the invention of the mobile phone, a new generation of mobile communication standard has emerged every 10 years. Upgrading the technology of mobile networks in all areas takes few years. Hence, mobile phones should support the previous communication standards as well as the latest standards. Realizing a multi-standard mobile phone by multiple transceivers in parallel is neither a size-efficient nor a cost-efficient solution. Hence, modern mobile phones demand reconfigurable transceivers. It is also essential for mobile phones to consume power efficiently. Hence, the multi-standard transceiver should scale its power consumption to the standard specifications. Many recent communication standards are based on the Orthogonal Frequency-Division Multiplexing (OFDM). In the OFDM based transceivers, digital computation of the Discrete Fourier Transform (DFT) is a power hungry process. Reduction in the hardware cost and power consumption is possible by implementing the DFT processor with analogue circuits. Accordingly, the goal of this work is to design a powerscalable variable-length analogue DFT processor for multi-standard OFDM transceivers.

Since the Fast Fourier Transform (FFT) algorithm reduces the computational burden of the DFT, it has been used to reduce the hardware cost and power consumption of the digital DFT processors for years. However, the FFT algorithm was originally designed for discrete-time signal processing. This thesis presents the real-time recursive DFT architecture, which was designed based on the characteristics of the analogue signal processing domain. The optimal architecture for the analogue DFT is achieved by keeping the signal continuous as long as possible.

In order to analyse the performance of the proposed architecture, system-level simulations on the real-time recursive DFT processor and the radix-2 FFT processor of length 8 were performed. Results of the system performance analysis indicate that the average dynamic range of the proposed processor is 4.7 dB higher than the FFT processor. In the Monte Carlo analysis, the DFT processors that succeed in meeting the minimum dynamic range requirement ( 34 dB ) contribute to the yield. Accordingly, the proposed architecture has a yield of $99.3 \%$ while the yield of the FFT processor is $82.8 \%$.

The real-time recursive DFT architecture was realized by the four-quadrant transconductance multipliers and the parasitic-insensitive switched-capacitor integrators. The real-time recursive DFT processor was designed in 180 nm CMOS technology. Sensitivity of the realtime recursive DFT processor to device mismatch was analysed using the Pelgrom's model. Results of device mismatch analysis indicate that the 8-point recursive DFT processor has a yield of $97.5 \%$ for the BPSK modulated signal. For the QPSK modulated signal, however, yield of the 8 -point recursive DFT processor is $8.9 \%$. Moreover, doubling the transform length reduces the average dynamic range by 3 dB . Accordingly, the 16 -point recursive DFT processor has a yield of $43.4 \%$ for the BPSK modulated signal. Power consumption of the recursive DFT processor is about $1 / 6$ of the power consumption of a previous analogue FFT processor.

This thesis provided a proof-of-concept for the power-scalable variable-length analogue DFT processor. Previously, changing the transform length and scaling the power could only be performed by digital FFT processors. By using the real-time recursive DFT processor, the analogue decimation filter is eliminated. Thus, further reduction in the hardware cost and power consumption of the multi-standard transceiver is achieved.

## ACKNOWLEDGEMENTS

First and foremost, I would like to thank my parents and my sister for their encouragements and financial support. My parents have been financially supporting me during the recession only because of their high value for education. None of this would have been possible without their love and support.

I would like to express my sincere gratitude to my advisors, Dr Steven Quigley and Professor Peter Gardner, for their guidance and patience throughout my PhD study. Although they have provided me a peaceful atmosphere at the University, they gave me the freedom to work from home.

Besides my advisors, I would like to thank Dr Kamyar Keikhosravy, postdoctoral fellow at the University of British Columbia, for sharing his expertise despite the distance. His modest personality makes him approachable. I am glad that this research has led to a great friendship. I would also like to thank Professor Costas Constantinou and Professor Khaled Hayatleh for their insightful comments and constructive criticisms.

I am also indebted to all my teachers from elementary school to graduate school, especially my electronics professor at the Amirkabir University of Technology, Dr Saeed Khatami (Rest In Peace). In the early years of higher education, his encouragements inspired me with confidence.

I wish also to express my gratitude to the University of Birmingham for awarding me the TI Group scholarship, which partially supported my research.

## Contents

CHAPTER 1 INTRODUCTION ..... 1
1.1 Evolution of Communication Systems ..... 1
1.2 Statement of the Problem ..... 6
1.3 Dissertation Objectives ..... 8
1.4 SIGNIFICANCE OF THE RESEARCH ..... 8
1.5 Thesis Outline ..... 9
CHAPTER 2 BACKGROUND STUDY AND LITERATURE REVIEW ..... 11
2.1 Fundamentals of OFDM ..... 11
2.2 WIFI and WIMAX PhYsical Layer Overview ..... 20
2.3 State-of-the-Art FFT Processors ..... 22
2.4 Comparison of Analogue and Digital signal processing ..... 22
2.5 ANAlOGUE FOURIER TransForm Architectures ..... 24
2.5.1 The Direct Form Finite Impulse Response ..... 24
2.5.2 The Fast Fourier Transform ..... 26
2.6 SUMMARY ..... 28
CHAPTER 3 REAL-TIME RECURSIVE DFT ARCHITECTURE ..... 29
3.1 Real-Time Recursive DFT for Digital Signal ..... 29
3.2 Real-Time Recursive DFT for Analogue Signal ..... 32
3.3 Advantages of the Proposed Architecture ..... 35
3.4 SUMMARY ..... 37
CHAPTER 4 SYSTEM PERFORMANCE ANALYSIS ..... 38
4.1 Performance Metrics for DFT Processor ..... 38
4.2 Performance Requirements ..... 40
4.3 Behavioural Modelling ..... 43
4.3.1 Behavioural Model of the Multiplier ..... 44
4.3.2 Behavioural Model of the Integrator ..... 46
4.3.3 Behavioural Modelling of the FFT Processor ..... 48
4.3.4 Behavioural Modelling of the Recursive DFT Processor ..... 54
4.4 Determining the Design Specifications. ..... 57
4.4.1 Power Budget ..... 57
4.4.2 Design Specifications of the Multiplier ..... 58
4.4.3 Design Specifications of the Integrator. ..... 63
4.5 Yield Prediction ..... 66
4.6 Performance Analysis Results ..... 70
4.7 SUMMARY ..... 73
CHAPTER 5 CIRCUIT DESIGN ..... 74
5.1 Previous Work on the Analogue FFT Processor ..... 74
5.2 Analogue Multiplier ..... 80
5.2.1 Principle of Operation ..... 81
5.2.2 Analysis of the CMOS Gilbert Cell ..... 82
5.2.3 Circuit Realization ..... 90
5.3 DISCRETE-TIME Integrator ..... 97
5.3.1 Analysis of the Parasitic-Insensitive Integrator. ..... 98
5.3.2 Speed and Precision Considerations ..... 101
5.3.3 Circuit Realization ..... 107
5.4 Real-Time Recursive DFT Processor ..... 110
5.5 Accuracy of the Results, ..... 113
5.6 SUMMARY ..... 115
CHAPTER 6 DEVICE MISMATCH ANALYSIS AND RESULTS ..... 116
6.1 MOS Transistor Matching Models ..... 116
6.2 MOS Transistor Optimum Matching ..... 118
6.3 Impact of Mismatch on the Performance Tradeoffs ..... 120
6.4 Impact of Technology Scaling on the Mismatch ..... 122
6.5 MISMATCH ANalysis Results ..... 123
6.6 Root Cause Analysis. ..... 128
6.7 Mitigation of the Effect of Device Mismatch ..... 131
6.8 SUMMARY ..... 133
CHAPTER 7 CONCLUSION AND FUTURE WORK ..... 134
7.1 Contributions to Knowledge ..... 134
7.1.1 Methodology ..... 134
7.1.2 Limitations and Considerations. ..... 135
7.2 Future Work ..... 137
7.2.1 Design Enhancements ..... 137
7.2.2 Further Analysis ..... 138
List of References. ..... 139
APPENDIX A ..... 155
APPENDIX B ..... 161

## TABLE OF TABLES

Table 1-1: Evolution of Communication Systems ..... 5
TABLE 2-1: IEEE 802.11A/G PHY SPECIFICATIONS ..... 20
Table 2-2: IEEE 802.16e PHY SPECIFICATIONS ..... 21
TABLE 3-1: COMPUTATIONAL EFFICIENCY AND RESOURCE COSTS OF DIFFERENT DFT ARCHITECTURES ..... 35
TABLE 4-1: RECEIVER PERFORMANCE REQUIREMENTS FOR BER $=10^{-6}$ ..... 41
TABLE 4-2: SUMMARY OF THE OPTIMAL VALUE FOR THE BEHAVIOURAL MODEL PARAMETERS ..... 70
Table 4-3: Summary of the Monte Carlo analysis for the recursive DFT and the radix-2 FFT processors ..... 71
TABLE 5-1: INITIAL ASPECT RATIOS OF THE COMPLEX MULTIPLIER ..... 96
TABLE 5-2: FINAL ASPECT RATIOS OF THE COMPLEX MULTIPLIER ..... 96
TABLE 5-3: INITIAL ASPECT RATIOS OF THE OP-AMP ..... 109
TABLE 5-4: FINAL ASPECT RATIOS OF THE PARASITIC-INSENSITIVE INTEGRATOR ..... 109
Table 6-1: Summary of the Monte Carlo analysis for the recursive DFT processors of length 8 ..... 125
TABLE 6-2: SUMMARY OF THE YIELD PREDICTION FOR THE RECURSIVE DFT PROCESSORS OF LENGTH 8 AND 16 ..... 127
Table 6-3: Performance comparison of the analogue Fourier Transform processors ..... 127

## TAble of Figures

## Figure 1-1: Martin Cooper holds the Dynatac 8000x phone and his Current mobile phone during the Prince of

$\qquad$

## Asturias Awards ceremony in 2009 [32].

Figure 1-2: Analogue and Digital signal processing sections in (a) the classical OFDM receiver (b) the Software ..... 7
FIGURE 2-1: THE SPECTRUM OF THE FDM SIGNAL CONSISting OF NONOVERLAPPING SUBCHANNELS [43] ...... ..... 12
Figure 2-2: THE SPECTRUM OF AN OFDM SIGNAL CONSISTING OF THREE OVERLAPPING SUBCARRIERS [42] ..... 14
Figure 2-3: summation of the OFDM subcarriers in the time domain [1] ..... 14
Figure 2-4: effect of the ISI on the OFDM symbol in (A) the absence of the guard interval (b) the presence of the
GUARD INTERVAL [1] ..... 15
Figure 2-5: the OFDM symbol in the frequency domain [45] ..... 16
Figure 2-6: allocation of subcarriers to users in the OFDM and OFDMA technologies [1] ..... 16
Figure 2-7: Windowed OFDM symbol in the time domain [1] ..... 17
Figure 2-8: Spectrum of the OFDM signal before and after windowing [45] ..... 17
Figure 2-9: SYMBOL MAPPING BASED ON THE QPSK MODULATION [1] ..... 18
FIGURE 2-10: BLOCK DIAGRAMS OF THE CLASSICAL OFDM TRANSMITTER AND RECEIVER [1] ..... 19
FIGURE 2-11: DIRECT FORM REALIZATION OF AN FIR SYSTEM [44] ..... 24
FIgURE 2-12: sIGNAL FLOW GRAPH OF A RADIX-2 DIT FFT OF LENGTH 8 [44] ..... 26
FIGURE 2-13: SIGNAL FLOW GRAPH OF THE 2-POINT DFT [44] ..... 27
Figure 3-1: signal flow graph of the Goertzel DFT [44] ..... 30
Figure 3-2: signal flow graph of the Goertzel DFT with real multipliers ..... 31
FIGURE 3-3: BLOCK DIAGRAM OF A RECURSIVE DIFFERENCE EQUATION REPRESENTING THE DISCRETE-TIME INTEGRATOR ..... 34
FIGURE 3-4: ARCHITECTURE OF THE PROPOSED REAL-TIME RECURSIVE DFT ..... 34
FIGURE 3-5: BASEBAND SIGNAL PROCESSING SECTION IN (A) THE CLASSICAL OFDM RECEIVER (B) THE OFDM
RECEIVER WITH AN ANALOGUE FFT OR FIR DFT OR GOERTZEL DFT (C) THE OFDM RECEIVER WITH THE PROPOSED DFT ..... 37
Figure 4-1: TYpICAL SNDR vERSUS INPUT MAGNITUDE CURVE [41] ..... 40
Figure 4-2: PAPR CCDFs of two OFDM signals With WiFi and WiMAX standards [1] ..... 42
FIGURE 4-3: THE BLOCK DIAGRAM OF THE BASEBAND SIGNAL PROCESSING PART OF (A) THE CLASSICAL OFDM RECEIVER (B) THE PROPOSED OFDM RECEIVER ..... 43
Figure 4-4: ANALOGUE DFT DYNAMIC RANGE DERIVATION ..... 43
Figure 4-5: Block diagram of the analogue multiplier ..... 44
Figure 4-6: CURVES OF THE MULTIPLIER BEHAVIOURAL MODEL ..... 45
Figure 4-7: SWITCHED-CAPACITOR INTEGRATOR [78] ..... 47
FIGURE 4-8: BEHAVIOURAL MODEL OF THE SWITCHED-CAPACITOR INTEGRATOR IN SIMULINK ..... 47
Figure 4-9: signal flow graph of a Radix-2 DIT FFT of length 8 [41] ..... 48
Figure 4-10: 2-POINT DFT WIth W81 or W83 TWIDDLE FACTOR ..... 50
Figure 4-11: 2-point DFT with W80 twiddle factor ..... 52
Figure 4-12: 2-point DFT with W82 tWiddle factor ..... 53
Figure 4-13: behavioural model of the analogue fFT processor in Simulink ..... 54
Figure 4-14: 1-POINT DFT WITH PIECEWISE CONTINUOUS COEFFICIENTS ..... 55
Figure 4-15: BEHAVIOURAL MODEL OF THE REAL-TIME RECURSIVE DFT PROCESSOR IN SIMULINK ..... 56
Figure 4-16: THE INPUT-OUTPUT CHARACTERISTICS OF IDEAL MULTIPLIERS ..... 59
FIgure 4-17: SNDR CURVES FOR DIFFERENT VALUES OF Gmo ..... 59
Figure 4-18: SNDR curves for different linear ranges ..... 60
Figure 4-19: SNDR CURVES FOR VARIOUS TRANSCONDUCTANCE ERRORS ..... 61
Figure 4-20: SNDR CURVES FOR VARIOUS DC OFFSET MISMATCHES ..... 62
FIGURE 4-21: SNDR CURVES FOR VARIOUS OP-AMP GAINS ..... 65
FIGURE 4-22: SNDR CURVES FOR VARIOUS DC OFFSET MISMATCHES ..... 66
FIGURE 4-23: YIELD PREDICTION BASED ON THE Monte CARLO ANALYSIS [82] ..... 67
FIGURE 4-24: MONTE CARLO ANALYSIS RESULTS OF THE REAL-TIME RECURSIVE DFT PROCESSOR ..... 71
Figure 4-25: Monte Carlo analysis results of the radix-2 FFT processor ..... 72
Figure 4-26: The dynamic range histogram of the real-time recursive DFT processor ..... 72
Figure 4-27: The dynamic range histogram of the radix-2 FFT processor ..... 73
Figure 5-1: (A) Switched-CAPACitor amplifier (b) timing diagram of Circuit (A) ..... 75
Figure 5-2: The basic current mirror [80] ..... 76
Figure 5-3: The passive Switched-Capacitor multiplier ..... 76
Figure 5-4: The Switched-Transconductor multipler ..... 77
Figure 5-5: The floating-Gate multipler. ..... 78
Figure 5-6: Two-quadrant analogue multiplier [96] ..... 81
FIGURE 5-7: BLOCK DIAGRAM OF THE GILBERT CELL ..... 82
Figure 5-8: $\mathrm{G}_{\text {m }}$ TRANSCONDUCTOR [80] ..... 83
Figure 5-9: Gilbert cell ..... 86
FIgURE 5-10: InPUT-OUTPUT CHARACTERISTIC OF A DIFFERENTIAL PAIR [80] ..... 89
Figure 5-11: Degenerated Gilbert cell with diode-connected load ..... 91
Figure 5-12: Degenerated Gilbert cell with CMFB network. ..... 93
Figure 5-13: topology of the complex multiplier ..... 94
FIGURE 5-14: TRANSFER CHARACTERISTIC OF THE GILBERT CELL MULTIPLIER SIMULATED IN SPICE ..... 97
FIGURE 5-15: (A) CONTINUOUS-TIME INTEGRATOR (B) disCRETE-TIME INTEGRATOR (C) TIMING DIAGRAM OF CIRCUIT (B) [80] ..... 98
Figure 5-16: (a) Parasitic-insensitive integrator (b) Circuit of (A) in Sampling mode, (c) circuit of (A) in integration MODE [80] ..... 99
FIGURE 5-17: TIMING DIAGRAM OF THE PARASITIC-INSENSITIVE INTEGRATOR ..... 100
Figure 5-18: equivalent circuit of the parasitic-insensitive integrator in integration mode ..... 103
Figure 5-19: DIfferential amplifier with single-ended output [80] ..... 105
FIGURE 5-20: SLEWING IN THE OP-AMP [80] ..... 105
Figure 5-21: Parasitic-insensitive integrator with reset switches ..... 108
FIGURE 5-22: OUTPUT OF A DIFFERENTIAL PARASITIC-INSENSITIVE INTEGRATOR SIMULATED IN SPICE ..... 110
Figure 5-23: The SNDR CURVES OF REAL-TIME RECURSIVE DFT PROCESSORS WITH IDEAL DEVICES. ..... 111
FIGURe 5-24: SNDR CURVES OF REAL-TIME RECURSIVE DFT PROCESSORS IN THE PRESENCE OF DEVICE MISMATCH ..... 112
FIGURE 5-25: SNDR CURVES OF REAL-TIME RECURSIVE DFT PROCESSORS WITH DIFFERENT TRANSFORM LENGTHS ..... 113
FIGURE 5-26: STEPS IN THE INTEGRATED CIRCUIT DESIGN FLOW [115] ..... 114
Figure 6-1: EqUAL DRAWN AREA DEVICES (A) SHORT CHANNEL (B) NARROW CHANNEL ..... 119
Figure 6-2: MODELING VTH VARIATIONS USING A DC VOLTAGE SOURCE IN SERIES WITH THE MOS GATE TERMINAL ..... 123
Figure 6-3: Mismatch analysis results of the real-time recursive DFT processor of length 8. ..... 124
FIGURE 6-4: DYNAMIC RANGE HISTOGRAM OF THE 8-POINT DFT PROCESSOR FOR BPSK MODULATED SIGNAL ..... 125
FIGURE 6-5: DYNAMIC RANGE HISTOGRAM OF THE 16-POINT DFT PROCESSOR FOR BPSK MODULATED SIGNAL ..... 126
Figure 6-6: DYnAMic range histogram of the 8-point DFT processor for QPSK modulated signal ..... 126
Figure 6-7: The SNDR curves of 8-Point recursive DFT processors with ideal devices ..... 129
Figure 6-8: The SNDR curves of 8-Point recursive DFT processors ..... 130
Figure 6-9: Offset cancellation by An Auxiliary transconductance in a negative feedback loop [80] ..... 131
Figure 6-10: Performance comparison of a 4-Ponit analogue DFT implemented on a FPAA [93] ..... 133

## List of AbBreviations

| 1G | First Generation |
| :---: | :---: |
| 2G | Second Generation |
| 3G | Third Generation |
| 4G | Fourth Generation |
| ACI | Adjacent Channel Interference |
| ADC | Analogue to Digital Converter |
| AGC | Automatic Gain Control |
| AWGN | Additive White Gaussian Noise |
| BER | Bit Error Ratio |
| BJT | Bipolar Junction Transistor |
| BPSK | Binary Phase-Shift Keying |
| BSIM | Berkeley Short-Channel IGFET Model |
| CCDF | Complementary Cumulative Distribution Function |
| CLT | Central Limit Theorem |
| CM | Common Mode |
| CMFB | Common Mode Feedback |
| CMOS | Complementary Metal-Oxide Semiconductor |
| CP | Cyclic Prefix |


| DAC | Digital to Analogue Converter |
| :---: | :---: |
| dB | decibel |
| DC | Direct Current |
| DFT | Discrete Fourier Transform |
| DIF | Decimation In Frequency |
| DIT | Decimation In Time |
| DSP | Digital Signal Processor |
| EVM | Error Vector Magnitude |
| FDM | Frequency Division Multiplexing |
| FEC | Forward Error Correction |
| FFT | Fast Fourier Transform |
| FIR | Finite Impulse Response |
| FPAA | Field Programmable Analogue Array |
| ICI | Intercarrier Interference |
| IDFT | Inverse Discrete Fourier Transform |
| IEEE | Institute of Electrical and Electronics Engineers |
| IF | Intermediate Frequency |
| IFFT | Inverse Fast Fourier Transform |
| iid | independent and identically distributed |
| IP2 | Second Intercept Point |


| ISI | Intersymbol Interference |
| :---: | :---: |
| KCL | Kirchhoff's Current Law |
| KVL | Kirchhoff's Voltage Law |
| LDPC | Low-Density Parity-Check |
| LLN | Low of Large Numbers |
| LMS | Least Mean Square |
| LPF | Low Pass Filter |
| LSB | Least Significant Bit |
| LTE | Long Term Evolution |
| MMSE | Minimum Mean Square Error |
| NMOS | N-channel Metal-Oxide Semiconductor |
| OFDM | Orthogonal Frequency-Division Multiplexing |
| OFDMA | Orthogonal Frequency Division Multiple Access |
| Op-amp | Operational amplifier |
| PAPR | Peak to Average Power Ratio |
| PCM | Pulse Code Modulation |
| PDF | Probability Density Function |
| PDK | Process Design Kit |
| PHY | Physical layer |
| PMOS | P-channel Metal-Oxide Semiconductor |


| QAM | Quadrature Amplitude Modulation |
| :---: | :---: |
| QPSK | Quadrature Phase-Shift Keying |
| RF | Radio Frequency |
| RMS | Root Mean Square |
| SC | Switched Capacitor |
| SDR | Software Defined Radio |
| SNDR | Signal to Noise and Distortion Ratio |
| SNR | Signal to Noise Ratio |
| SOFDMA | Scalable Orthogonal Frequency Division Multiple Access |
| SPICE | Simulation Program with Integrated Circuit Emphasis |
| SR | Slew Rate |
| TPC | Turbo Product Code |
| UWB | Ultra-Wideband |
| VLSI | Very Large Scale Integration |
| VoIP | Voice over Internet Protocol |
| WiFi | Wireless Fidelity |
| WiMAX | Worldwide Interoperability for Microwave Access |
| WLAN | Wireless Local Area Network |
| WMAN | Wireless Metropolitan Area Network |

## Chapter 1

## INTRODUCTION

In this chapter, a historical perspective on the development of communication systems is provided. The gaps in the previous research are discussed in the statement of the problem. Objectives and significance of the study are explained to clarify how this work will contribute to knowledge. Finally, an outline of the thesis structure is provided.

### 1.1 Evolution of Communication Systems

The proposal to use electricity in communication is dated back to the late $18^{\text {th }}$ century. In 1795, Francisco Salvá Campillo proposed an electrical telegraph as an alternative to optical ones [2]. In 1809, an electrochemical telegraph was designed by Samuel Thomas von Sömmerring [3]. The first electrical telegraph was built by Francis Ronalds in 1816 [4]. The costs of using one wire for each letter of the alphabet in early designs of telegraph were prohibitive. Therefore, in 1835, Pavel L'vovitch Shilling reduced the number of wires by developing the first binary code for the telegraph [5]. In 1838, Samuel Morse and Alfred Vail invented a single-wire telegraph and the Morse code. The Morse/Vail telegraph became the forerunner of digital communication [6].

The idea of speaking telegraph was initially proposed by Innocenzo Manzetti in 1844 [7]. Later, in 1854, Charles Bourseul wrote a memorandum on the electrical transmission of speech [8]. In 1871, Antonio Meucci filed a patent caveat for his telephone invention (telettrofono). Meucci filed a patent caveat because he could not afford the $\$ 250$ fee necessary to file a patent application [9]. In 1876, Elisha Gray filed a patent caveat for a telephone on the very same day that Alexander Graham Bell filed a patent application for a telephone. After a month, Bell's telephone patent issued and telephone became the forerunner of analogue communication [10-12].

Digital communication remained attractive owing to the contributions of Guglielmo Marconi and Karl Ferdinand Braun to the invention of wireless telegraphy in 1896 [13, 14]. In 1906, Reginald Fessenden invented the heterodyne transceiver which used the Amplitude Modulation (AM) to transmit an audio signal via a radio carrier wave [15]. In 1912, the significant role of Marconi's wireless telegraphy in rescuing the survivors of Titanic proved its vital importance for marine communication [16]. In 1918, Edwin H. Armstrong invented the superheterodyne receiver which converted the frequency of received signal to a fixed Intermediate Frequency (IF). Comparing to the heterodyne receiver, the superheterodyne receiver provided better selectivity and sensitivity. Later, in 1933, Armstrong demonstrated the Frequency Modulation (FM) which provided better sound quality and fidelity than AM [17].

In 1937, Alec Harley Reeves invented the Pulse Code Modulation (PCM) to enhance the noise immunity of audio transmission over long distances [18]. In fact, Reeves invented the first all-electronic Analogue to Digital Converters (ADC) and Digital to Analogue Converter (DAC) [19]. Another landmark of 1937 was Claude Shannon's master's thesis. Shannon proved that Boolean algebra could optimise the design of electromechanical relays in telephone routing switches. Shannon's work on the electrical implementation of Boolean functions became the foundation of digital circuit design [20]. In 1948, Shannon laid the theoretical foundations of digital communications in his paper "A Mathematical Theory of Communication" [21].

In 1947, Walter H. Brattain, John Bardeen, and William Shockley invented the transistor at Bell Laboratories [22]. In 1958, Jack Kilby realized the first germanium Integrated Circuit (IC) [23]. Few months later, Robert Noyce produced the first silicon IC [24]. These landmark innovations changed the nature of the communication systems in the second half of the $20^{\text {th }}$ century [6].

In 1965, James Cooley and John Tukey developed the Fast Fourier transform (FFT) algorithm for efficient computation of the Discrete Fourier Transform (DFT) [25]. In 1966, Robert W. Chang invented the Orthogonal Frequency Division Multiplexing (OFDM) for simultaneous transmission of data on multiple channels [26, 27]. In 1971, Weinstein and Ebert suggested to use the FFT for realization of OFDM modulator and demodulator [28].

In 1973, the first handheld mobile cell phone was invented by Martin Cooper (Figure 1.1) and his fellow teammates at Motorola [29]. 10 years after Cooper's invention, the first-generation of mobile communication (1G) systems was launched. 1G was based on analogue communication [30]. The first commercially available mobile phone (DynaTAC 8000x) resembled a brick in terms of size and weight. Besides, its battery only lasted 30 minutes after 10 hours of recharge [31].

The second-generation of mobile communication (2G) systems emerged in 1991. 2G was based on digital communication. While 1G systems had no security, 2G systems provided security by encrypting the digital signals. Moreover, 2G digital systems made error detection and error correction possible by encoding and decoding. Since error correction minimises the effect of interference, 2G systems achieved better communication quality than 1 G systems. Furthermore, comparing to 1 G systems, 2G systems provided higher spectrum efficiency by compressing the digital data. Additionally, 2G systems applied multiple access techniques which allow multiple users to share the frequency band. Thereby, 2G systems achieved higher capacity than 1G systems. Comparing to 1 G analogue systems, 2 G digital systems had longer battery life and cheaper equipment. These advantages led to the prevalence of the digital communication standards [30].

Figure 1-1: Martin Cooper holds the DynaTAC 8000x phone and his current mobile phone during the Prince of Asturias Awards ceremony in 2009 [32].

The proliferation of mobile phone users led to the growing demand for mobile internet access. In response to this demand, the third-generation of mobile communication (3G) systems emerged in 2001. 3G systems use packet switching for data transmission and circuit switching for voice calls $[17,30]$.

3G systems can not satisfy the growing demand for streaming media. Hence, the fourthgeneration of mobile communication (4G) systems emerged in 2011. 4G systems provide higher data rate than the existing 3G systems. Moreover, 4G networks use packet switching with Internet Protocol (IP) for data and voice transmission. The circuit switching in 3G networks is replaced by the Voice over Internet Protocol (VoIP) in 4G networks. Worldwide Interoperability for Microwave Access (WiMAX) and Long Term Evolution (LTE) are the two competing technologies for 4G systems [17, 30].

Table 1-1 shows the landmark innovations in the history of analogue and digital communication systems. The earliest form of electronic communication system (telegraph) was digital. However, digital signals could not convey the continuous waves of speech. Conversion from digital to analogue made the speech communication possible. For more than a century (1876-1991), analogue communication systems had been used to transmit audio signals. Laying the theoretical and practical foundations of modern digital communications took more than 50 years (1937-1991). Comparing to analogue systems, modern digital communication systems provide higher security, better communication quality, higher spectrum efficiency, and higher capacity.

Table 1-1: Evolution of Communication Systems

| Year | Innovation |
| :--- | :--- |
| 1838 | Telegraph |
| 1876 | Telephone |
| 1896 | Wireless telegraphy |
| 1906 | Heterodyne transceiver, AM broadcasting |
| 1918 | Superheterodyne receiver |
| 1933 | FM broadcasting |
| 1937 | PCM |
| 1937 | Electrical implementation of Boolean functions |
| 1947 | Transistor |
| 1948 | Mathematical Theory of Communication |
| 1958 | Integrated Circuit |
| 1965 | FFT algorithm |
| 1966 | OFDM |
| 1983 | 1G |
| 1991 | 2G |
| 2001 | 3G |
| 2011 | 4G |
| Digital | Analogue Foundation of modern Digital systems |
| Foundation of modern Analogue and Digital systems |  |

### 1.2 Statement of the Problem

The previous section revealed that a new generation of mobile communication standard has emerged approximately every 10 years. Upgrading the technology of mobile networks in all areas takes few years. Hence, mobile phones should support the previous communication standards as well as the latest standards. Moreover, since WiFi [33] provides higher data rate than WiMAX [34], in areas where both WiFi and WiMAX are available (e.g. university campus, office, home, hotel) it is preferable to use WiFi [17].

The initial approach to realise a multi-standard mobile phone was to use multiple transceivers (Figure 1.2(a)) in parallel. However, as the number of communication standards increases, size and cost of the mobile handset increases [35, 36]. To resolve this issue, Joseph Mitola proposed the concept of Software Defined Radio (SDR) according to which a single transceiver can support multiple communication standards if it is reconfigurable by software [37]. Mitola suggested that the SDR can be achieved by replacing the analogue signal processing stages of the transceiver (i.e. analogue front-end) with a Digital Signal Processor (DSP) (Figure 1.2(b)) [37]. Moving the ADC and the DSP closer to the antenna means that the signal should be sampled and processed at the Radio Frequency (RF). Frequency bands that are allocated to the mobile communication standards and WiFi are between 800 MHz to 5.5 GHz . To digitize any signal from 800 MHz to 5.5 GHz , a $12 \mathrm{bit}, 11 \mathrm{GS} / \mathrm{s}$ ADC is required. Such a demanding ADC is unrealizable with the current technology [38]. Also, since the progress of ADC dynamic range and conversion speed are slower than the Moore's law, the required ADC will remain infeasible in the foreseeable future [39]. Even if a 12 bit, $11 \mathrm{GS} / \mathrm{s}$ ADC were feasible, its power dissipation would be hundreds of watts [38]. Moreover, in the SDR receiver, the digital front-end performs the downconversion. The digital mixer requires four real multiplications per complex signal sample. Considering the sample rate of $11 \mathrm{GS} / \mathrm{s}$, the DSP must perform 44 billion multiplications per second. Considering the power dissipation of the digital mixer, implementation of the downconversion on the DSP is not sensible [40]. Hence, the SDR that was envisaged by Mitola has remained elusive [38].

(c)

Figure 1-2: Analogue and Digital signal processing sections in (a) the classical OFDM receiver (b) the Software Defined Radio receiver (c) the OFDM receiver with an analogue FFT

Demanding ADC is also a serious impediment to the Ultra-Wideband (UWB) OFDM wireless transceivers [41]. In an effort to relax the ADC requirements in the UWB OFDM transceivers, an analogue FFT processor was proposed (Figure 1.2(c)) [41]. Transferring the FFT processor from the digital back-end to the analogue front-end reduces the bit depth requirement of the ADC. Thereby, the power consumption of the ADC reduces. Moreover, the analogue FFT processor consumes significantly less power than the digital FFT [41]. However, the analogue FFT processor is not reconfigurable because it is hardwired. Hence, the analogue FFT processor is not suitable for multistandard transceivers.

### 1.3 Dissertation Objectives

After reviewing the requirements of the modern mobile handheld devices and impediments to realization of the SDR, it is clear that an alternative architecture for multi-standard transceivers must be explored. For OFDM-based transceivers, the analogue FFT processor is an attractive alternative to the power hungry digital FFT processor. Multiple OFDM-based transceivers can be integrated by a variable-length DFT processor. To consume the power efficiently, the power consumption of the variable-length DFT processor should be scalable with the length of the transform. In this thesis, a power-scalable variable-length analogue DFT processor that meets the specifications of WiFi and WiMAX standards is proposed.

The previous works on the analogue DFT processor merely focused on the circuit design methods and used the conventional architectures that were originally designed for the digital DFT processor or the discrete-time filters. Hence, a novel architecture that is designed based on the characteristics of the analogue signal processing domain is required. The main concern is the arithmetic precision of the analogue DFT processor. Therefore, performance of the proposed system should be analysed at various stages of the design process by statistical modelling of the mismatch.

### 1.4 Significance of the Research

Digital signal processing or analogue signal processing; that has been the question throughout the history of communication systems. Finding the answer to this question led to the invention of the telephone, the advent of 2G systems, and changing Mitola's paradigm of SDR [38]. A power-scalable variable-length analogue DFT processor can be another breakthrough in transceivers. Sharing the DFT processor between multiple transceivers and implementing it with analogue circuits can significantly reduce the hardware cost.

Moreover, a power-scalable analogue DFT processor can be the most power-efficient DFT processor. Hence, this research may lead to a new generation of mobile phones that are smaller, cheaper, and have longer battery life.

### 1.5 Thesis Outline

In this chapter, the evolution of communication systems was overviewed. Also, limitations of previous research on the SDR and the analogue FFT processor were discussed. Additionally, objectives and significance of the research were explained.

Chapter 2 provides the background knowledge on the OFDM technology and the OFDM-based standards. State-of-the-art FFT processors are reviewed. Also, a comparison between the analogue and digital signal processing is provided to elaborate the trade-offs in each approach.

In Chapter 3, the proposed architecture for the power-scalable variable-length analogue DFT processor is explained. Advantages and novelty of the proposed architecture are revealed by making comparisons between the proposed architecture and previous Fourier transform architectures.

In Chapter 4, performance requirements of the analogue DFT processor are derived. The behavioural models of the processor building blocks are explained. System simulations based on the behavioural models are performed to determine the design specifications of circuits. Yield prediction based on the Monte Carlo method is discussed. Moreover, performance of the proposed architecture and the FFT architecture are compared together.

Chapter 5 reviews various design approaches for the building blocks of the analogue DFT processor. Circuits that can provide the required flexibility for the power-scalable variable-length DFT processor are selected. Selected circuits are designed in 180 nm CMOS technology. Speed-power-accuracy trade-offs in circuits with ideal devices are discussed.

Chapter 6 reviews the mismatch models available in the open literature. This chapter also explains the design trade-offs that impose limitations on the performance of analogue signal processors. The effect of technology scaling on mismatch is also discussed. The impact of device mismatch on the performance of the circuit is analysed. Results of this analysis are compared with previous work. Finally, techniques that can mitigate the effect of device mismatch are mentioned

Chapter 7 presents the concluding remarks and the original contributions of this study. This chapter also provides recommendations for future research.

## Chapter 2

## BACKGROUND STUDY AND

## LITERATURE REVIEW

In this chapter, the OFDM technology and the OFDM-based standards are overviewed. Also, achievements of the latest studies on the FFT processors are mentioned. A comparison between the analogue and digital circuits is provided. Furthermore, the existing architectures for the analogue Fourier transform processor are explained.

### 2.1 Fundamentals of OFDM

Orthogonal Frequency-Division Multiplexing (OFDM) and its variants are the predominant technology in the fourth-generation of mobile communication systems (4G). OFDM is an advanced form of the Frequency Division Multiplexing (FDM). FDM is a technique that facilitates the simultaneous transmission of multiple signals on a single medium by dividing the channel bandwidth into multiple subchannels (Figure 2.1). FDM is an effective technique to combat Intersymbol Interference (ISI) and multipath fading in wireless communications. However, since FDM prevents interference between subchannels by means of guard bands, it does not use the channel bandwidth efficiently [42].


Figure 2-1: the spectrum of the FDM signal consisting of nonoverlapping subchannels [43]

In OFDM a broad frequency spectrum is divided into multiple orthogonal narrowband subchannels by the Discrete Fourier Transform (DFT). OFDM modulation and demodulation are performed by the Inverse Discrete Fourier Transform (IDFT) and DFT, respectively. Both DFT and IDFT multiply discrete samples of signal by complex exponentials [1, 44].

$$
\begin{array}{ll}
\text { IDFT: } & x(n)=\frac{1}{N} \sum_{k=0}^{N-1} X(k) e^{j 2 \pi \frac{k n}{N}}, \quad n=0,1, \ldots, N-1 \\
\text { DFT: } & X(k)=\sum_{n=0}^{N-1} x(n) e^{-j 2 \pi \frac{k n}{N}}, \quad k=0,1, \ldots, N-1 \tag{2.2}
\end{array}
$$

In the above equations, $x(n)$ and $X(k)$ represent discrete samples of the modulated and demodulated signals, respectively. Hence, elements of the sequence $\left\{e^{j 2 \pi(k n / N)}\right\}_{k=0}^{N-1}$ are the subcarriers of the $x(n)$. The orthogonality of subcarriers to each other is proven by multiplying both sides of the equation (2.1) by $e^{-j 2 \pi(m n / N)}$ and summing from $n=$ 0 to $n=N-1$ [44].

$$
\begin{equation*}
\sum_{n=0}^{N-1} x(n) e^{-j 2 \pi \frac{m n}{N}}=\sum_{n=0}^{N-1} \frac{1}{N} \sum_{k=0}^{N-1} X(k) e^{j 2 \pi \frac{(k-m) n}{N}} \tag{2.3}
\end{equation*}
$$

Interchanging the order of summation on the right hand side of the equation (2.3) gives

$$
\begin{equation*}
\sum_{n=0}^{N-1} x(n) e^{-j 2 \pi \frac{m n}{N}}=\sum_{k=0}^{N-1} X(k)\left[\frac{1}{N} \sum_{n=0}^{N-1} e^{j 2 \pi \frac{(k-m) n}{N}}\right] \tag{2.4}
\end{equation*}
$$

The term inside the bracket is [44]

$$
\frac{1}{N} \sum_{n=0}^{N-1} e^{j 2 \pi \frac{(k-m) n}{N}}=\left\{\begin{array}{lc}
1 & k=m  \tag{2.5}\\
0 & \text { Otherwise }
\end{array}\right.
$$

Hence, subcarriers are orthogonal to each other. Combining equations (2.4) and (2.5) gives

$$
\begin{equation*}
\sum_{n=0}^{N-1} x(n) e^{-j 2 \pi \frac{m n}{N}}=X(m) \tag{2.6}
\end{equation*}
$$

which is the formula for a DFT. Accordingly, applying DFT on the modulated samples demodulates them.

Elements of the sequence $\left\{e^{j 2 \pi(k n / N)}\right\}_{n=0}^{N-1}$ are samples of the time-limited $e^{j 2 \pi f_{k} t}$ which is the $k^{t h}$ subcarrier ( $f_{k}=k / T$ and $-T / 2 \leq t \leq T / 2$ ). Hence, the Fourier transform of the $k^{\text {th }}$ subcarrier is [1]

$$
\begin{equation*}
Y(f)=\int_{-T / 2}^{T / 2} e^{j 2 \pi f_{k} t} \cdot e^{-j 2 \pi f t} d t=\int_{-T / 2}^{T / 2} e^{-j 2 \pi\left(f-f_{k}\right) t} d t=\frac{\sin \left(\pi\left(f-f_{k}\right)\right)}{\pi\left(f-f_{k}\right)} \tag{2.7}
\end{equation*}
$$

Thus, $Y(f)=\operatorname{sinc}\left(f-f_{k}\right)$. Figure 2.2 shows three subcarriers of the OFDM signal. Since subcarriers are orthogonal, zero crossings of each subcarrier falls on the peaks of other subcarriers. Therefore, not only is a guard band between adjacent subcarriers unnecessary, but also the subcarriers can overlap. Thereby, OFDM uses the channel bandwidth efficiently [45].


Figure 2-2: the spectrum of an OFDM signal consisting of three overlapping subcarriers [42]

Figure 2.3 depicts the imaginary part of four subcarriers in the time domain. For a large number of modulated subcarriers $(N \gg 1)$ the OFDM symbol appears as Gaussian noise in the time domain [1].


Figure 2-3: summation of the OFDM subcarriers in the time domain [1]

The performance of the wireless communication systems depends on the channel characteristics. Multipath propagation results in phase shifting and fading. Thus, channel estimation is necessary to extract the original signal from the received signal. In order to estimate the channel, deterministic subcarriers called Pilot are added to the OFDM symbol [1].

The channel delay spread in multipath propagation creates Intersymbol Interference (ISI) between successive OFDM symbols (Figure 2.4 (a)). Also, the time-dispersive channel creates Intercarrier Interference (ICI) which destroys the orthogonality between subcarriers. In order to eliminate the effect of ISI, guard intervals are added to the OFDM symbol (Figure 2.4 (b)). Subcarriers that are transmitted during the guard interval are null. The guard interval should exceed the maximum excess delay of the multipath propagation channel [46]. Since a guard interval is ineffective in cancelling ICI, the Cyclic Prefix (CP) is used instead. CP is the copy of the last part of the OFDM symbol which is prefixed to the OFDM symbol. Thus, the CP preserves the orthogonality between subcarriers by making the OFDM symbol periodic [47]. Figure 2.5 illustrates the OFDM symbol in the frequency domain.


DC Subcarrier
Pilot Subcarriers

## Data Subcarriers

Guard band Subcarriers


Figure 2-5: the OFDM symbol in the frequency domain [45]

In OFDM technology, all the subcarriers of the OFDM symbol are allocated to one user. On the other hand, the Orthogonal Frequency Division Multiple Access (OFDMA) technology assigns the subcarriers of the OFDM symbol to different users (Figure 2.6). Thereby, the channel bandwidth is divided into subchannels and shared between several users. The data-rate of each user can be controlled by varying the number of subcarriers in the allocated subchannel [42].


Figure 2-6: allocation of subcarriers to users in the OFDM and OFDMA technologies [1]

As mentioned earlier, the spectrum of the time-limited OFDM symbol is the sum of frequency shifted sinc functions. Thus, OFDM symbols produce large out-of-band power which leads to the Adjacent Channel Interference (ACI). Hence, a guard band is used to reduce the effect of ACI. Moreover, the out-of-band power is reduced by windowing the OFDM symbol [43]. Figure 2.7 and Figure 2.8 show the effect of windowing in the time domain and the frequency domain, respectively.


Figure 2-7: Windowed OFDM symbol in the time domain [1]


Figure 2-8: Spectrum of the OFDM signal before and after windowing [45]

The frequency-selective channel may severely attenuate some of the subcarriers. Attenuation of the data subcarriers leads to bit errors. Hence, Forward Error Correction (FEC) coding and interleaving are essential in order to spread the coded bits over the bandwidth [45]. FEC codes that are used by most of the OFDM-based standards include Concatenated code, Convolutional code, Block code, Turbo code, Low-Density ParityCheck (LDPC) code, and Reed-Solomon code [43].

After the channel coding, the OFDM transmitter maps the bit stream on the constellation points. Thereby, each symbol is represented by a magnitude and a phase. Symbol mapping is performed based on the Quadrature Amplitude Modulation (QAM), the Binary Phase-Shift Keying (BPSK), or the Quadrature Phase-Shift Keying (QPSK) (Figure 2.9) [1].


Figure 2-9: Symbol mapping based on the QPSK modulation [1]

Figure 2.10 shows the block diagrams of the classical OFDM transmitter and receiver. The Fast Fourier Transform (FFT) and the Inverse Fast Fourier Transform (IFFT) processors are used to compute DFT and IDFT efficiently [1, 43]. In Figure 2.10, DAC and ADC denote the Digital to Analogue Converter and the Analogue to Digital Converter, respectively.

Figure 2-10: block diagrams of the classical OFDM transmitter and receiver [1]

### 2.2 WiFi and WiMAX Physical Layer Overview

WiFi (IEEE 802.11a/g) and WiMAX (IEEE 802.16e) are the OFDM-based standards that are supported by most 4 G mobile handheld devices. Hence, these standards are considered for the purpose of this study. WiFi (Wireless Fidelity) standards are set for Wireless Local Area Networks (WLANs). The difference between 802.11a and 802.11g standards is that the former operates in the 5 GHz band while the later operates in the 2.4 GHz band [48]. Table 2-1 summarizes the Physical layer (PHY) specifications of the 802.11a and 802.11 g standards $[33,49]$. WiFi optimizes the data rate and maintains the required Bit Error Rate (BER) by adapting modulation and coding rate to the radio link quality [1]. Accordingly, the maximum data rate of the $802.11 \mathrm{a} / \mathrm{g}$ is $54 \mathrm{Mbits} / \mathrm{s}$ which is obtained by using 64-QAM (i.e. 6 bits on each of the data subcarriers) and coding rate of $3 / 4$ : $((6 \times 48) / 4 \mu s) \times 3 / 4=54 M / s$.

Table 2-1: IEEE 802.11a/g PHY specifications

| Channel bandwidth (MHz) | 20 |
| :--- | :--- |
| IFFT/FFT size | 64 |
| IFFT/FFT clock (MHz) | 20 |
| Subcarrier spacing $(\mathrm{kHz})$ | $312.5(20 \mathrm{MHz} / 64)$ |
| Number of data subcarriers | 48 |
| Number of pilot subcarriers | 4 |
| Number of guard band subcarriers | $11(6$ on the left and 5 on the right $)$ |
| Number of DC subcarriers | 1 |
| Total number of subcarriers | 64 |
| Modulation | $\mathrm{BPSK}, \mathrm{QPSK}, 16-\mathrm{QAM}, 64-\mathrm{QAM}$ |
| $\mathrm{T}_{\mathrm{FFT}}:$ Useful symbol duration $(\mu s)$ | 3.2 |
| $\mathrm{~T}_{\mathrm{CP}}:$ Cyclic prefix duration $(\mu s)$ | $0.8\left(\mathrm{~T}_{\mathrm{FFT}} / 4\right)$ |
| OFDM symbol duration $(\mu s)$ | $4\left(\mathrm{~T}_{\mathrm{FFT}}+\mathrm{T}_{\mathrm{CP}}\right)$ |
| Channel coding | Convolutional coding rates $: 1 / 2,2 / 3,3 / 4$ |

WiMAX (Worldwide Interoperability for Microwave Access) standard is set for Wireless Metropolitan Area Networks (WMANs). WiMAX can operate in licensed and unlicensed bands between 2 to 11 GHz . The 802.16e standard uses the Scalable OFDMA (SOFDMA) to support different channel bandwidths. The SOFDMA keeps the carrier spacing constant by scaling the FFT size to the channel bandwidth [50]. The mobile devices that are supported by this standard can travel at tens of kilometres per hour while communicating. Table 2-2 summarizes the PHY specifications of the 802.16e standard [34, 48]. According to these specifications, the maximum data rate of the 802.16 e is $75 \mathrm{bits} / \mathrm{s}$.

Table 2-2: IEEE 802.16e PHY specifications

| Channel bandwidth (MHz) | 1.25 | 5 | 10 | 20 |  |
| :--- | :--- | :--- | :--- | :--- | :---: |
| IFFT/FFT size | 128 | 512 | 1024 | 2048 |  |
| IFFT/FFT clock (MHz) | 1.4 | 5.6 | 11.2 | 22.4 |  |
| Number of subchannels | 2 | 8 | 16 | 32 |  |
| Subcarrier spacing (kHz) | 10.94 | 10.94 | 10.94 | 10.94 |  |
| Number of data subcarriers | 72 | 360 | 720 | 1440 |  |
| Number of pilot subcarriers | 12 | 60 | 120 | 240 |  |
| Number of guard band and DC subcarriers | 44 | 92 | 184 | 368 |  |
| Total number of subcarriers | 128 | 512 | 1024 | 2048 |  |
| Modulation | $\mathrm{BPSK}, \mathrm{QPSK}, 16-\mathrm{QAM}, 64-\mathrm{QAM}$ |  |  |  |  |
| $\mathrm{T}_{\mathrm{FFT}}:$ Useful symbol duration $(\mu s)$ | 91.4 | 91.4 | 91.4 | 91.4 |  |
| $\mathrm{~T}_{\mathrm{CP}}:$ Cyclic prefix duration $(\mu s)$ | $\mathrm{T}_{\mathrm{FFT}} / 8$ | $\mathrm{~T}_{\mathrm{FFT}} / 8$ | $\mathrm{~T}_{\mathrm{FFT}} / 8$ | $\mathrm{~T}_{\mathrm{FFT}} / 8$ |  |
| OFDM symbol duration $(\mu s)$ | 102.8 | 102.8 | 102.8 | 102.8 |  |
| Channel coding | Convolutional, Optional Convolutional, |  |  |  |  |
|  | Turbo, Block Turbo, LDPC |  |  |  |  |

### 2.3 State-of-the-Art FFT Processors

The rapid proliferation of wireless communication standards has led to the emergence of multi-standard radios. Since classical transceiver architectures are not suitable for a oneproduct solution, new architectures should be proposed to fulfil this demand. In view of that, digital designers developed reconfigurable FFT processors to integrate multiple OFDM-based transceivers [51-53]. Transform length and throughput of the reconfigurable FFT processor must vary for each standard. Hence, energy-efficient reconfigurable FFT processors, that scale the power consumption with the transform length and throughput, were proposed $[54,55]$.

While at least 6-bit resolution is required to represent the Gaussian OFDM signal, 2 bits are sufficient to represent the QPSK symbols after the FFT demodulation. In an effort to ease the conversion burden on the ADC, FFT was applied on the discrete-time samples, prior to the ADC [41]. This approach reduces the bit depth requirement of the ADC, and consequently lowers the ADC power consumption [56]. More importantly, the analogue FFT processor consumes significantly less power than the digital FFT [57-59].

### 2.4 Comparison of Analogue and Digital signal processing

As mentioned in the previous section, latest studies show that the analogue FFT processor has significantly less power consumption than the digital FFT processor. This section provides an overview on the analogue and digital circuits to elaborate the reasons of computational efficiency in analogue circuits. In each case, the numbers of transistors that are required to implement basic operations of the Fourier transform (i.e. addition and multiplication) are given. Moreover, the compromise that is made by migrating from the digital signal processing domain to the analogue signal processing domain is mentioned.

In digital computation, variables have discrete values (i.e. 0 or 1 ); thus, each variable represents only one bit of information. Mathematical operations are performed by the Boolean algebraic functions (i.e. AND, OR, NOT, NAND, NOR, XOR, XNOR) [60]. Although digital computation is insensitive to device mismatch, quantization noise and round-off error degrade the accuracy of computation. Since the quantization noise and the round-off error only affect the Least Significant Bits (LSB), the degradation of accuracy is insignificant [61]. Addition of two 8 -bit variables in the digital domain requires 240 transistors (i.e. 8 full adders). Also, multiplication of two 8 -bit variables requires nearly 3000 transistors [62, 63].

In analogue computation, variables (i.e. current or voltage) have continuous values. Thus, each variable represents many bits of information. Mathematical operations are performed based on the physical characteristics of circuit elements (i.e. transistors, capacitors, resistors, floating gate devices) and Kirchhoff's current and voltage laws (KCL and KVL). Therefore, analogue computation is sensitive to device mismatch. In a cascade of analogue circuits, the computational errors due to mismatches accumulate. According to the KCL, a current-mode analogue adder that computes the sum of several variables can be implemented simply by connecting wires to the same node. Besides, multiplication of two variables by two-quadrant and four-quadrant analogue multipliers requires 3 and 7 transistors, respectively [62,63].

This comparison leads to the conclusion that computation of the DFT in the analogue domain saves hardware cost and power consumption. However, these advantages are achieved at the expense of precision degradation. The following section explains the existing architectures for the analogue Fourier transform processor.

### 2.5 Analogue Fourier Transform Architectures

### 2.5.1 The Direct Form Finite Impulse Response

The DFT of a sequence of length N is [64]

$$
\begin{equation*}
X(k)=\sum_{n=0}^{N-1} x(n) W_{N}^{n k}, \quad k=0,1, \ldots, N-1 \tag{2.8}
\end{equation*}
$$

where $W_{N}^{n k}=e^{-j(2 \pi k n / N)}=\cos (2 \pi k n / N)-j \sin (2 \pi k n / N)$. Hence, $X[k]$ can be considered as the discrete convolution of $x[n]$ with the impulse response

$$
h(n)=\left\{\begin{array}{lr}
W_{N}^{n k} & n=0,1, \ldots, N-1  \tag{2.9}\\
0 & \text { otherwise }
\end{array}\right.
$$

Therefore, the direct form Finite Impulse Response (FIR) architecture (Figure 2.11) can be used to implement the DFT. In this structure, the tapped delay line is made by $z^{-1}$ blocks. At each tap, signal is weighted by the impulse response value. DFT processors that were implemented by using this architecture are available in [65, 66].


Figure 2-11: direct form realization of an FIR system [44]

Since $x(n)$ is a complex number, expanding the complex multiplication $x(n) W_{N}^{n k}$ in equation (2.8) gives

$$
\begin{align*}
& X_{R e}(k)=\sum_{n=0}^{N-1} x_{R e}(n) \cos \left(\frac{2 \pi k n}{N}\right)+x_{I m}(n) \sin \left(\frac{2 \pi k n}{N}\right), \quad k=0,1, \ldots, N-1  \tag{2.10a}\\
& X_{I m}(k)=\sum_{n=0}^{N-1} x_{I m}(n) \cos \left(\frac{2 \pi k n}{N}\right)-x_{R e}(n) \sin \left(\frac{2 \pi k n}{N}\right), \quad k=0,1, \ldots, N-1 \tag{2.10b}
\end{align*}
$$

Therefore, each complex multiplication $x(n) W_{N}^{n k}$ requires four real multiplications. Thus, the direct computation of $X(k)$ requires $4 N$ multiplications. Since $X(k)$ must be computed for different values of $k$, the FIR architecture requires $4 N^{2}$ multipliers [44]. Accordingly, for large values of $N$, the area and power consumption of the FIR architecture are prohibitively large. Moreover, since mismatches in the multiplier circuits lead to erroneous calculations, the computational error in the FIR architecture has a quadratic growth.

By using the current-mode multipliers, additions can be implemented simply by connecting the outputs of two multipliers to the same node (KCL). Thus, additions do not consume area or power. More importantly, additions do not contribute to the computational error. However, since the outputs of $2 N-1$ multipliers are connected together, the connection capacitance increases by increasing $N$. Hence, as $N$ increases, the speed of processing decreases.

### 2.5.2 The Fast Fourier Transform

The FFT algorithms improve the computational efficiency of the DFT by exploiting the properties of $W_{N}^{n k}$ [67]

$$
\begin{align*}
& W_{N}^{r+N / 2}=-W_{N}^{r}  \tag{2.11a}\\
& W_{N}^{k(N-n)}=W_{N}^{n(N-k)}=W_{N}^{-k n} \quad(\text { symmmetry })  \tag{2.11b}\\
& W_{N}^{k(n+N)}=W_{N}^{n(k+N)}=W_{N}^{k n} \quad \text { (periodicity) } \tag{2.11c}
\end{align*}
$$

Moreover, for certain values of the product $n k, W_{N}^{n k}$ is simplified (i.e. $W_{N}^{0}=1$ and $W_{N}^{N / 4}=-j$ ). The most commonly used FFT algorithm is the Cooley-Tukey algorithm which recursively breaks down the DFT into smaller DFTs [25]. Decimation-In-Time (DIT), Decimation-In-Frequency (DIF), Mixed-Radix, and Split-Radix are some of the variants of the Cooley-Tukey algorithm. The signal flow graph of an 8-point DIT FFT is shown in Figure 2.12. The Radix-2 FFT of length 8 is obtained by decomposing the 8 point DFT into 2-point DFTs. Figure 2.13 depicts the signal flow graph of the 2-point DFT [44, 67, 68].


Figure 2-12: signal flow graph of a Radix-2 DIT FFT of length 8 [44]


Figure 2-13: signal flow graph of the 2-point DFT [44]

A Radix-2 (DIT or DIF) FFT computes the DFT with $(N / 2) \log _{2} N-(N-1)$ complex multiplications. Thus, the number of analogue multipliers that are required to implement a Radix-2 FFT is [41]

$$
\begin{equation*}
M=4 N+16 \sum_{k=2}^{\log _{2} N} \frac{N}{2^{k}}+12 \sum_{k=3}^{\log _{2} N} \frac{N}{4} \tag{2.12}
\end{equation*}
$$

Bandwidth of the FFT architecture with $S$ stages is approximated by [69]

$$
\begin{equation*}
B W_{F F T}=B W_{D F T} \sqrt[2 \mathrm{~L}]{2^{1 / S}-1} \tag{2.13}
\end{equation*}
$$

where $B W_{D F T}$ is the bandwidth of the DFT circuit that is used as the building block of the FFT architecture, and $L$ is the order of the equivalent Low Pass Filter (LPF). The number of stages should be reduced to increase the bandwidth. The number of stages is obtained from [67]

$$
\begin{equation*}
S=\log _{R} N \tag{2.14}
\end{equation*}
$$

where $R$ denotes the radix size. Accordingly, $S$ is reduced by using higher radix. A higher radix also reduces the number of multipliers. Thereby, the computational error, together with the area and the power consumption are reduced. On the other hand, since $X(k) \mathrm{s}$ are not computed independently, computational errors propagate in the FFT lattice and affect all the results. The state-of-the-art analogue Fourier transform processors are based on the FFT architecture [70-73].

### 2.6 Summary

This chapter has presented background knowledge on the OFDM technology and the OFDM-based standards. Literature survey was also provided to identify the gaps in the previous researches. The computational efficiency, the resource costs, and the computational accuracy of the existing analogue Fourier transform architectures are compared together. This comparison leads to the conclusion that the FFT algorithms (i.e. DIT, DIF, etc.) are optimal for sampled signal.

Migrating from the digital signal processing domain to the analogue signal processing domain should not be performed by simply implementing the same architecture with analogue circuits. Accordingly, a novel architecture that is designed based on the characteristics of the analogue signal processing domain is presented in the next chapter.

## Chapter 3

## Real-Time Recursive DFT

## ARCHITECTURE

The existing architectures for the analogue Fourier transform processor were explained in the previous chapter. In this chapter, the proposed architecture for the power-scalable variable-length analogue DFT processor is explained. The proposed architecture is compared with a similar DFT architecture that was designed for digital signal processing. Moreover, the computational efficiency, the resource costs, and the computational accuracy of the proposed architecture and the previous architectures are compared together.

### 3.1 Real-Time Recursive DFT for Digital Signal

The Goertzel algorithm [74] is a recursive DFT algorithm which was proposed for digital signal processing. Consider the DFT of a sequence of length N [64]

$$
\begin{equation*}
X(k)=\sum_{n=0}^{N-1} x(n) W_{N}^{n k}, \quad k=0,1, \ldots, N-1 \tag{3.1}
\end{equation*}
$$

where $W_{N}^{n k}=e^{-j(2 \pi k n / N)}$.

The recursive algorithm proposed by Goertzel is achieved by using the periodicity of the $W_{N}^{n k}$, namely [44]

$$
\begin{equation*}
W_{N}^{-k N}=e^{j(2 \pi / N) N k}=e^{j 2 \pi k}=1 \tag{3.2}
\end{equation*}
$$

Hence, multiplying the right side of equation (3.1) by $W_{N}^{-k N}$ does not affect the result. Accordingly [44],

$$
\begin{equation*}
X(k)=W_{N}^{-k N} \sum_{r=0}^{N-1} x(r) W_{N}^{k r}=\sum_{r=0}^{N-1} x(r) W_{N}^{-k(N-r)} \tag{3.3}
\end{equation*}
$$

Considering $X(k)$ as the response of a discrete-time system when $n=N$, equation (3.3) can be written in the time domain. Accordingly [44],

$$
\begin{equation*}
y(n)=\sum_{r=-\infty}^{\infty} x(r) W_{N}^{-k(n-r)} u(n-r) \tag{3.4}
\end{equation*}
$$

where $x(r)=0$ for $r<0$ and $r \geq N$. Equation (3.4) can be interpreted as a discrete convolution of $x(n)$ and $W_{N}^{-k N} u(n)$. Therefore, $y(n)$ is the response of a system with impulse response $W_{N}^{-k N} u(n)$ to $x(n)$. Hence, the transfer function of the Goertzel DFT is [44]

$$
\begin{equation*}
H(z)=\frac{1}{1-W_{N}^{-k} Z^{-1}} \tag{3.5}
\end{equation*}
$$

The signal flow graph of the Goertzel DFT is shown in Figure 3.1.


Figure 3-1: signal flow graph of the Goertzel DFT [44]

Since $x(n)$ and $W_{N}^{-k}$ are both complex, the multiplier and the adder in Figure 3.1 represent 4 real multiplications and 4 real additions. Thus, 4 N multiplications and 4 N additions are required to compute $X(k)$ for a particular value of $k$ (Figure 3.2).


Figure 3-2: signal flow graph of the Goertzel DFT with real multipliers

### 3.2 Real-Time Recursive DFT for Analogue Signal

The previous section explained the real-time recursive DFT architecture which was designed for digital signal processing. In this section, the proposed real-time recursive DFT architecture which is designed for analogue signal processing is explained. In equation (3.1), consider $a(n)=x(n) W_{N}^{n k}$. Expanding $a(n)$ gives

$$
\begin{align*}
& a_{R e}(n)=x_{R e}(n) \cos \left(\frac{2 \pi k n}{N}\right)+x_{I m}(n) \sin \left(\frac{2 \pi k n}{N}\right)  \tag{3.6a}\\
& a_{I m}(n)=x_{I m}(n) \cos \left(\frac{2 \pi k n}{N}\right)-x_{R e}(n) \sin \left(\frac{2 \pi k n}{N}\right) \tag{3.6b}
\end{align*}
$$

Accordingly, $X(k)$ is computed by multiplying samples of $x(t)$ by samples of $e^{-j(2 \pi f t)}=\cos (2 \pi f t)-j \sin (2 \pi f t)$, where $f=k / N$. Replacing the discrete samples with piecewise continuous signals gives

$$
\begin{align*}
& a_{R e}(t)=x_{R e}(t) \cos \left(\frac{2 \pi k t}{N}\right)+x_{I m}(t) \sin \left(\frac{2 \pi k t}{N}\right)  \tag{3.7a}\\
& a_{I m}(t)=x_{I m}(t) \cos \left(\frac{2 \pi k t}{N}\right)-x_{R e}(t) \sin \left(\frac{2 \pi k t}{N}\right)  \tag{3.7b}\\
& \text { for } \quad \frac{n T}{N} \leq t<\frac{(n+1) T}{N} \quad n=0,1, \ldots, N-1
\end{align*}
$$

where $T$ is the duration of $N$ samples. Thereby, $x(t)$ is piecewise weighted by the DFT coefficients. Hence, multiplications are performed without sampling.

In equation (3.1), $x(n)$ is in the time-domain and $X(k)$ is in the frequency-domain. Since DFT architecture is a discrete-time system, $X(k)$ is the response of the system when $n=N-1$. Considering $X(k)=y(N-1)$, equation (3.1) can be written in the time domain.

$$
\begin{equation*}
y(N-1)=\sum_{n=0}^{N-1} a(n) \tag{3.8}
\end{equation*}
$$

where $a(n)=x(n) W_{N}^{n k}$. The above equation describes a discrete-time integrator. To obtain the difference equation of the integrator, equation (3.8) can be rewritten as

$$
\begin{equation*}
y(N-1)=a(N-1)+\sum_{n=0}^{N-2} a(n) \tag{3.9}
\end{equation*}
$$

Also,

$$
\begin{equation*}
y(N-2)=\sum_{n=0}^{N-2} a(n) \tag{3.10}
\end{equation*}
$$

Combining equations (3.9) and (3.10) gives

$$
\begin{equation*}
y(N-1)=a(N-1)+y(N-2) \tag{3.11}
\end{equation*}
$$

The z-transform of the above difference equation is

$$
\begin{equation*}
z^{-1} Y(z)=z^{-1} A(z)+z^{-2} Y(z) \tag{3.12}
\end{equation*}
$$

Accordingly, the transfer function of the discrete-time integrator is given by

$$
\begin{equation*}
H(z)=\frac{Y(z)}{A(z)}=\frac{1}{1-z^{-1}} \tag{3.13}
\end{equation*}
$$

The block diagram representation of the integrator based on equation (3.13) is shown in Figure 3.3. The proposed real-time recursive DFT architecture is depicted in Figure 3.4. The piecewise Sine and Cosine waves can be generated by the Digital to Analogue Converter (DAC).


Figure 3-3: block diagram of a recursive difference equation representing the discrete-time integrator



Figure 3-4: architecture of the proposed real-time recursive DFT

### 3.3 Advantages of the Proposed Architecture

The analogue Fourier transform architectures that are available in the literature (FIR DFT and FFT) were explained in the previous chapter. This section provides a comparison between the previous analogue Fourier transform architectures and the proposed analogue DFT architecture. Also, the advantage of the proposed DFT architecture over the previous real-time recursive DFT architecture (Goertzel DFT) is discussed. Table 3-1 shows the computational efficiency and the resource costs of the aforementioned DFT architectures.

Table 3-1: computational efficiency and resource costs of different DFT architectures

| Architecture | Number of Multipliers | Number of Multiplications |
| :---: | :---: | :---: |
| FIR DFT | $4 N^{2}$ | $4 N^{2}$ |
| Radix-2 FFT | $4 N+16 \sum_{k=2}^{\log _{2} N} \frac{N}{2^{k}}+12 \sum_{k=3}^{\log _{2} N} \frac{N}{4}$ | $4 N+16 \sum_{k=2}^{\log _{2} N} \frac{N}{2^{k}}+12 \sum_{k=3}^{\log _{2} N} \frac{N}{4}$ |
| Goertzel DFT | $4 N$ | $4 N^{2}$ |
| Proposed DFT | $4 N$ | $4 N^{2}$ |

In the FIR DFT and FFT architectures, the number of multiplications and the number of multipliers are equal while in the real-time recursive DFT architectures each multiplier performs $N$ multiplications. The Goertzel DFT and the proposed DFT require $4 N^{2}$ multiplications to compute $X(k)$ for different values of $k$. These multiplications are performed by $4 N$ multipliers.

Serial-to-parallel conversion in FIR DFT and FFT architectures relaxes the bandwidth requirement of multipliers. Hence, in the FIR DFT and FFT architectures, frequency of multipliers is $f_{M}=f_{\text {in }} / N$, where $f_{\text {in }}$ is the frequency of input signal. On the other hand, the frequency of multipliers in the proposed architecture is $f_{\text {in }}$.

The total power consumption of multipliers is $P_{T}=M P_{M}$, where $M$ is the number of multipliers, and $P_{M}$ is the power consumption of each multiplier. Also, $P_{M} \propto f_{M}$. Hence, the FIR DFT and the real-time recursive DFT both have $P_{T} \propto 4 N f_{\text {in }}$. Thus, reduction of the number of multipliers does not reduce the power dissipation.

Since analogue multipliers are hardwired, they are biased whether they are in use or not. Accordingly, in the FIR DFT and FFT architectures, the power consumption is not scalable with the transform length. However, since the proposed architecture performs multiplications serially, its power consumption is scalable with the transform length.

Unlike the previous architectures, the proposed architecture does not require additional multipliers to compute the DFT of a longer sequence. Hence, the proposed architecture is especially suitable for variable-length DFT processors.

While the computational errors propagate in the FFT lattice (Figure 2.12) and affect all results, the proposed architecture (Figure 3.4) avoids the propagation of computational errors by computing DFTs independently.

In the classical OFDM receiver, a signal is sampled before digitization. Based on the Nyquist theorem the sampling frequency must be at least twice the signal frequency. Thus, signal must be decimated before processing by the digital FFT (Figure 3.5(a)) [1]. The FIR DFT, the analogue FFT, and the Goertzel DFT require a sampled signal. Thus, all these architectures need an analogue decimation filter ahead of them (Figure 3.5(b)). The simplest realization of an analogue decimation filter is a $D$-tap FIR filter which loads $D$ successive samples into $D$ capacitors, and then sum their charges [75]. In the proposed DFT architecture, multiplications are performed before sampling. Hence, by using the proposed real-time recursive DFT processor, the analogue decimation filter is eliminated (Figure 3.5(c)).

(a)

(b)

(c)

Figure 3-5: baseband signal processing section in (a) the classical OFDM receiver (b) the OFDM receiver with an analogue FFT or FIR DFT or Goertzel DFT (c) the OFDM receiver with the proposed DFT

### 3.4 Summary

In this chapter, the design techniques that are applied to make the proposed architecture reconfigurable and suitable for the multi-standard OFDM transceivers were discussed. The optimal architecture for the analogue DFT is achieved by keeping the signal continuous as long as possible. To this end, the DFT coefficients are formed into piecewise continuous signals. Thereby, the transform length can be changed by changing the coefficient signals. Instead of dedicating multipliers to individual samples of the signal, multipliers perform $N$ multiplications serially. Also, the power consumption of the proposed architecture is scalable with the transform length. Moreover, the proposed DFT architecture does not require an analogue decimation filter. Performance of the proposed DFT architecture is analysed in the next chapter.

## Chapter 4

## System Performance Analysis

In this chapter, the performance metrics and the behavioural models for the Fourier Transform processor are defined. The performance requirements of the Analogue DFT processor are derived. The behavioural model is used to make the system simulations for the real-time recursive DFT processor and the analogue FFT processor. Finally, the performance of the simulated systems is analysed by applying the Monte Carlo method.

### 4.1 Performance Metrics for DFT Processor

In digital communication systems, the Error Vector Magnitude (EVM) is a measure that is used to quantify the performance. By definition, EVM is the Root Mean Square (RMS) of the difference between the ideal symbols and the demodulated symbols [1].

$$
\begin{equation*}
E V M=\sqrt{\frac{\frac{1}{N} \sum_{k=0}^{N-1}\left[\left(I_{\text {out }}(k)-I_{\text {ideal }}(k)\right)^{2}+\left(Q_{\text {out }}(k)-Q_{\text {ideal }}(k)\right)^{2}\right]}{\frac{1}{N} \sum_{k=0}^{N-1}\left[I_{\text {ideal }}(k)^{2}+Q_{\text {ideal }}(k)^{2}\right]}} \tag{4.1}
\end{equation*}
$$

where $I(k)$ and $Q(k)$ are the In-phase and Quadrature components of the $\mathrm{k}^{\text {th }}$ symbol. Hence, EVM is the square root of the noise and distortion power to the signal power ratio; which is the inverse of the Signal to Noise and Distortion Ratio (SNDR).

$$
\begin{equation*}
E V M=\sqrt{\frac{\text { Noise }+ \text { Distortion Power }}{\text { Signal Power }}}=\frac{1}{\sqrt{S N D R}} \tag{4.2}
\end{equation*}
$$

Thereby

$$
\begin{equation*}
S N D R=\frac{1}{E V M^{2}} \tag{4.3}
\end{equation*}
$$

which in decibels is

$$
\begin{equation*}
S N D R_{d B}=10 \log _{10}\left(\frac{1}{E V M^{2}}\right)=20 \log _{10}\left(\frac{1}{E V M}\right) \tag{4.4}
\end{equation*}
$$

Thus
$S N D R=20 \log _{10} \sqrt{\frac{\frac{1}{N} \sum_{k=0}^{N-1}\left[I_{\text {ideal }}(k)^{2}+Q_{\text {ideal }}(k)^{2}\right]}{\frac{1}{N} \sum_{k=0}^{N-1}\left[\left(I_{\text {out }}(k)-I_{\text {ideal }}(k)\right)^{2}+\left(Q_{\text {out }}(k)-Q_{\text {ideal }}(k)\right)^{2}\right]}}$

The performance of the DFT processor must be evaluated at weak and strong signal levels [41]. Therefore, the aim of the simulations is to measure the SNDR as a function of the input signal magnitude. A typical SNDR versus input magnitude curve is shown in Figure 4.1. At weak signal levels, noise and distortion corrupt the SNDR. As the magnitude increases, impact of the noise and distortion on the SNDR decreases. At full scale signal, clipping reduces the SNDR rapidly. Hence, the input magnitude that gives the peak SNDR is the optimal operating point of the circuit. However, the signal is not equalized before entering the DFT processor; thus, it is a mixture of strong and weak sub-channels. Hence, the dynamic range of the circuit is the main performance metric. By definition, dynamic range is the ratio of the maximum input level that the circuit can tolerate to the minimum input level that it can detect. In logarithmic scale, dynamic range is the difference between the maximum and minimum acceptable input levels, which is the width of the SNDR curve at the minimum required SNDR [41].


Figure 4-1: Typical SNDR versus input magnitude curve [41]

### 4.2 Performance Requirements

Minimum receiver SNDR requirements that guarantee Bit Error Ratio (BER) of $10^{-6}$ in an Additive White Gaussian Noise (AWGN) channel are given in Table 4-1[33, 34]. Since 64-QAM provides the highest data rate for both WiFi and WiMAX, it is the most sensitive modulation scheme to distortion and noise. Accordingly, 64-QAM has the highest SNDR requirement. The dynamic range of the analogue DFT is determined by considering the minimum required SNDR and the maximum signal level that receiver should tolerate. The OFDM symbol is composed of a large number of modulated subcarriers ( $N \gg 1$ ). Hence, according to the Central Limit Theorem (CLT) the OFDM symbol appears as a Gaussian noise in the time domain.

Table 4-1: Receiver performance requirements for $\mathrm{BER}=10^{-6}$

| Modulation | Coding rate | Receiver SNDR (dB) |
| :---: | :---: | :---: |
| BPSK | $1 / 2$ | 3 |
| QPSK | $1 / 2$ | 5 |
| QPSK | $3 / 4$ | 8 |
| 16-QAM | $1 / 2$ | 11 |
| 16-QAM | $3 / 4$ | 14 |
| 64-QAM | $1 / 2$ | 16 |
| 64-QAM | $2 / 3$ | 18 |
| 64-QAM | $3 / 4$ | 20 |

Therefore, the Peak to Average Power Ratio (PAPR) of the signal, which is the ratio between the maximum instantaneous power and the mean power, can be very high. If clipping limits the PAPR, the SNDR will be degraded. Due to the statistical nature of the PAPR for OFDM signals, the probability of having a given PAPR is estimated by a Complementary Cumulative Distribution Function (CCDF). Figure 4.2 shows the PAPR CCDF of two OFDM signals with WiFi and WiMAX standards. Both signals are modulated with 64-QAM. Although WiFi and WiMAX have different number of subcarriers (i.e. 64 and 2048 respectively), their CCDFs are quite the same. Accordingly, OFDM symbols have consistent PAPR distribution [1].


Figure 4-2: PAPR CCDFs of two OFDM signals with WiFi and WiMAX standards [1]

The block diagram of the baseband signal processing part of the classical OFDM receiver and the proposed receiver architecture are shown in Figure 4.3. The channel selection filter cannot eliminate the Adjacent Channel Interference (ACI) completely. Thus, when ACI is stronger than the desired signal, ACI makes the most contribution to the received signal amplitude. The Automatic Gain Control (AGC) sets the peak signal level to the full scale level of the next stage. Hence, in the classical architecture, the desired signal might be below the quantization level of the ADC if no safety margin is considered for the dynamic range of the ADC [76]. When DFT processor is placed ahead of the ADC, signal is processed without quantization. However, noise and distortions of the analogue DFT corrupt the desired signal. Hence, a safety margin for the dynamic range of the analogue DFT is required.

Since the analogue front-end stages before the ADC (in the classical ODFM receiver) and the analogue DFT (in the proposed receiver) are the same, dynamic range requirements of the ADC and the analogue DFT are the same. In other words, reducing the dynamic range requirement of the ADC by moving the DFT processor from the digital back-end to the analogue front-end is at the cost of increasing the dynamic range requirement of the DFT processor.

(a)

(b)

Figure 4-3: The block diagram of the baseband signal processing part of (a) the classical OFDM receiver (b) the proposed OFDM receiver

In the estimation of the dynamic range, the AGC inaccuracy, the residual DC offset, and the thermal noise of the analogue front-end must also be taken in to account [1, 76]. A graphical decomposition of the analogue DFT dynamic range is depicted in Figure 4.4. As the graph indicates, the analogue DFT processor requires a dynamic range between 34 dB to 51 dB for the different modulation schemes of WiFi and WiMAX.


Figure 4-4: Analogue DFT dynamic range derivation

### 4.3 Behavioural Modelling

Behavioural system simulation is a top-down approach that is used to evaluate and optimize the performance of the proposed architecture. The behavioural model is based on the functions of the building blocks of the system. Based on the equation 3.1, multipliers and integrators are the main building blocks of the Fourier transform. This section describes the behavioural models of the multiplier and the integrator. This section also explains how the aforementioned models are used to simulate the real-time recursive DFT processor and the FFT processor.

### 4.3.1 Behavioural Model of the Multiplier

One approach to implement an analogue multiplier is to scale the current of the signal using a variable gain transconductor. Figure 4.5 depicts the block diagram of an analogue multiplier that scales the input signal (voltage $V_{1}$ ) by the variable gain (voltage $V_{2}$ ), and converts the output current to voltage by a transresistor.


Figure 4-5: Block diagram of the analogue multiplier

The behavioural model of the multiplier is defined by parameters that are derived from two functions, $I_{\text {out }}=f\left(V_{\text {in }}\right)$ and its derivative $G_{m}=f^{\prime}\left(V_{\text {in }}\right)$ (Figure 4.6). The model parameters extracted from the $I_{\text {out }}$ versus $V_{\text {in }}$ curve are $I_{\max }$ (the DC bias current), and $V_{i n, o s}$ (the input offset voltage). From the $G_{m}$ versus $V_{i n}$ curve, the model parameters are $G_{m o}$ (the small signal transconductance), $G_{m, o s}$ (the deviation in the $G_{m o}$ at $V_{i n}=0$ ), $a$ (the extent of the quasi-linear region), $b$ (swing of the input voltage), $A_{r}$ (the magnitude of the ripple in quasi-linear region), $\gamma$ (the slope of the quasi-linear region), and N (the number of ripples in quasi-linear region).


Figure 4-6: Curves of the multiplier behavioural model.
(a) Input-Output characteristic of transconductance (b) the derivative of (a) [77]

Using the aforementioned parameters, the $I_{\text {out }}=f\left(V_{\text {in }}\right)$ is defined as [77]

$$
=\left\{\begin{array}{lr}
-\frac{A_{1} b}{2}-\frac{A_{2} a}{2} & V_{i n} \leq-b  \tag{4.6a}\\
\frac{A_{1}(a-b)}{2 \pi} \sin \left(\pi \frac{V_{i n}+a}{a-b}\right)+\frac{A_{1} V_{i n}-a A_{2}}{2} & -b<V_{i n} \leq-a \\
G_{m o}\left[-1^{N} \frac{a A_{r}}{2 N \pi} \sin \left(\frac{\pi N V_{i n}}{a}\right)+\left(1+G_{m, o s}+\frac{A_{r}}{2}\right) V_{i n}+\frac{\gamma}{2 a} V_{i n}^{2}\right] & -a<V_{i n} \leq a \\
\frac{A_{3}(b-a)}{2 \pi} \sin \left(\pi \frac{V_{i n}-a}{b-a}\right)+\frac{A_{3} V_{i n}+a A_{2}}{2} & a<V_{i n} \leq b \\
\frac{A_{3} b}{2}+\frac{A_{2} a}{2} & V_{i n}>b
\end{array}\right.
$$

where

$$
\begin{align*}
A_{1} & =G_{m o}\left(1+A_{r}+G_{m, o s}-\gamma\right) \\
A_{2} & =G_{m o}\left(1+G_{m, o s}\right) \\
A_{3} & =G_{m o}\left(1+A_{r}+G_{m, o s}+\gamma\right) \\
b & =\frac{2 I_{\max }-A_{2} a}{A_{3}} \tag{4.6b}
\end{align*}
$$

As the equation (4.6a) indicates, multiplication occurs in the quasi-linear region $\left(-a \leq V_{i n} \leq a\right)$ where signal is scaled by $G_{m o}$. Thus, ideally the transconductance curve must be a straight line in the $[-a, a]$ interval. In reality, the nonlinear behaviour of the multiplier deviates the input-output characteristic from a straight line. Hence, the interval [-a, a] is quasi-linear [77].

In Simulink, the transconductance multiplier is modelled by a MATLAB Function block which provides the function of $I_{\text {out }}$ (equation 4.6). The MATLAB code of this function is provided in Appendix A.

### 4.3.2 Behavioural Model of the Integrator

The signal at the output of the multiplier is piecewise continuous. For an N-point DFT, the amplitude of N pieces must be summed together. The discrete-time integrator takes samples of each piece and provides their sum. The z-domain transfer function of the discrete-time integrator is [78]

$$
\begin{equation*}
H(z)=g \frac{z^{-1}}{1-\alpha z^{-1}} \tag{4.7}
\end{equation*}
$$

where $g$ and $\alpha$ are the gain and the leakage of the integrator, respectively. This transfer function can be realized by the Switched-Capacitor (SC) integrator (Figure 4.7) [78]. $C_{S}$ is the sampling capacitor and $C_{I}$ is the integrating capacitor. The timing diagram of the switches is provided in chapter 5.


Figure 4-7: Switched-Capacitor integrator [78]

The transfer function of the SC integrator is modelled in Simulink (Figure 4.8). Integrator provides the result of the N -point DFT after $N f_{S} / f_{\text {in }}$ iterations ( $f_{\text {in }}$ is frequency of the input signal, and $f_{S}$ is sampling frequency of the delay block). Thus, integrator should reset to zero after $N f_{S} / f_{\text {in }}$ iterations. To adjust the reset time, the delay block is placed in the feedback loop. The integrator leakage $(\alpha)$ is modelled by a Gain block.


Figure 4-8: behavioural model of the switched-capacitor integrator in Simulink

In the presence of mismatch, the Operational amplifier (Op-amp) suffers from dc offset at its output. The output dc offset can be defined as the input-referred offset voltage that makes the output voltage zero. The input-referred offset voltage is modelled by $V_{o s}$. For an ideal integrator, $\alpha=1$ and $V_{o s}=0$. Sensitivity of the recursive DFT processor to $\alpha$ and $V_{o s}$ is analysed in section 4.4.3.

### 4.3.3 Behavioural Modelling of the FFT Processor

The analogue FFT architecture was explained in chapter 2. Here, the behavioural model of the multiplier is used to model a radix-2 FFT processor of length 8 . Considering the 2-point DFT (Figure 2.13) as the unit cell of the FFT, the signal flow graph in Figure 2.12 can be rearranged as illustrated in Figure 4.9. Since $x(n)$ and $W_{N}^{n k}$ are complex, each of the signal flow lines in this diagram represents two signal flow lines in the Simulink model.


Figure 4-9: signal flow graph of a Radix-2 DIT FFT of length 8 [41]

Considering $a=a_{r e}+j a_{i m}$ and $b=b_{r e}+j b_{i m}$ as the inputs of the 2-point DFT, results of the 2-point DFT are

$$
\begin{align*}
& A=A_{r e}+j A_{i m}=a+W_{N}^{n k} b  \tag{4.8a}\\
& B=B_{r e}+j B_{i m}=a-W_{N}^{n k} b \tag{4.8b}
\end{align*}
$$

where $W_{N}^{n k}=\cos (2 \pi k n / N)-j \sin (2 \pi k n / N)$.

Accordingly,

$$
\begin{align*}
& A_{r e}=a_{r e}+b_{r e} \cos \left(\frac{2 \pi k n}{N}\right)+b_{i m} \sin \left(\frac{2 \pi k n}{N}\right)  \tag{4.9a}\\
& A_{i m}=a_{i m}-b_{r e} \sin \left(\frac{2 \pi k n}{N}\right)+b_{i m} \cos \left(\frac{2 \pi k n}{N}\right)  \tag{4.9b}\\
& B_{r e}=a_{r e}-b_{r e} \cos \left(\frac{2 \pi k n}{N}\right)-b_{i m} \sin \left(\frac{2 \pi k n}{N}\right)  \tag{4.9c}\\
& B_{i m}=a_{i m}+b_{r e} \sin \left(\frac{2 \pi k n}{N}\right)-b_{i m} \cos \left(\frac{2 \pi k n}{N}\right) \tag{4.9d}
\end{align*}
$$

Thus, coefficient values are $1, \cos (2 \pi k n / N)$, and $\sin (2 \pi k n / N)$. The transconductance values that represent these coefficient values are

$$
\begin{gather*}
G_{m 1}=G_{m o}  \tag{4.10a}\\
G_{m c}=\cos \left(\frac{2 \pi k n}{N}\right) G_{m o}  \tag{4.10b}\\
G_{m s}=\sin \left(\frac{2 \pi k n}{N}\right) G_{m o} \tag{4.10c}
\end{gather*}
$$

Figure 4.10 shows the Simulink model of the 2-point DFT with $W_{8}^{1}$ or $W_{8}^{3}$ twiddle factor. Transresistors are modeled by Gain blocks.


Figure 4-10: 2-point DFT with $W_{8}^{1}$ or $W_{8}^{3}$ twiddle factor

For $W_{8}^{0}=1$, outputs of the 2-point DFT are

$$
\begin{align*}
& A=\left(a_{r e}+b_{r e}\right)+j\left(a_{i m}+b_{i m}\right)  \tag{4.11a}\\
& B=\left(a_{r e}-b_{r e}\right)+j\left(a_{i m}-b_{i m}\right) \tag{4.11b}
\end{align*}
$$

Additions are performed by connecting the outputs of the transconductors to the same node (KCL). Hence, even though all coefficient values are one, voltage samples must be converted to currents. Simulink model of the 2-point DFT with $W_{8}^{0}$ twiddle factor is depicted in Figure 4.11.


Figure 4-11: 2-point DFT with $W_{8}^{0}$ twiddle factor

For $W_{8}^{2}=-j$, outputs of the 2 -point DFT are

$$
\begin{align*}
& A=\left(a_{r e}+b_{i m}\right)+j\left(a_{i m}-b_{r e}\right)  \tag{4.12a}\\
& B=\left(a_{r e}-b_{i m}\right)+j\left(a_{i m}+b_{r e}\right) \tag{4.12b}
\end{align*}
$$

Simulink model of the 2-point DFT with $W_{8}^{2}$ twiddle factor is shown in Figure 4.12.


Figure 4-12: 2-point DFT with $W_{8}^{2}$ twiddle factor

The above 2-point DFTs are used to build the FFT lattice in Figure 4.9. The block diagram of the analogue FFT processor is shown in Figure 4.13. The FFT processor converts the input signal to parallel samples by a serial-to-parallel converter. The FFT lattice provides the Fourier transform of the signal. Finally, the parallel outputs of the FFT lattice are converted to a serial data stream by the parallel-to-serial converter.


Figure 4-13: behavioural model of the analogue FFT processor in Simulink

### 4.3.4 Behavioural Modelling of the Recursive DFT Processor

Chapter 3 explained the proposed real-time recursive DFT architecture. In this section, a real-time recursive DFT processor of length 8 is modeled by the behavioural models of the multiplier and the integrator. Figure 4.14 shows a 1-point recursive DFT. The Cos and Sin blocks generate the piecewise continuous coefficients. The coefficient signals are applied to the transconductance multipliers.


Figure 4-14: 1-point DFT with piecewise continuous coefficients

Thus,

$$
\begin{align*}
& a_{R e}(t)=x_{R e}(t) \cos \left(\frac{2 \pi k t}{N}\right)+x_{I m}(t) \sin \left(\frac{2 \pi k t}{N}\right)  \tag{4.13a}\\
& a_{I m}(t)=x_{I m}(t) \cos \left(\frac{2 \pi k t}{N}\right)-x_{R e}(t) \sin \left(\frac{2 \pi k t}{N}\right)  \tag{4.13b}\\
& \text { for } \quad \frac{n T}{N} \leq t<\frac{(n+1) T}{N} \quad n=0,1, \ldots, N-1
\end{align*}
$$

are provided at the outputs of the Gain blocks (i.e. Transresistors). As mentioned in section 4.3.2, integrator provides the result of the DFT after $N f_{S} / f_{\text {in }}$ iterations. Hence,

$$
\begin{align*}
& X_{R e}(k)=\frac{f_{S}}{f_{\text {in }}} \sum_{n=0}^{N-1} a_{R e}(n)  \tag{4.14a}\\
& X_{I m}(k)=\frac{f_{S}}{f_{i n}} \sum_{n=0}^{N-1} a_{I m}(n) \tag{4.14b}
\end{align*}
$$

The block diagram of the real-time recursive DFT processor is illustrated in Figure 4.15. Eight 1-point recursive DFTs are used in parallel to create an 8-point recursive DFT processor.


Figure 4-15: behavioural model of the real-time recursive DFT processor in Simulink

### 4.4 Determining the Design Specifications

In this section, design specifications of an 8-point recursive DFT processor are determined. For this purpose, an OFDM signal with QPSK modulation is applied to the input of the DFT processor and sensitivity of the DFT processor to each of the behavioural model parameters is analysed. The $\sigma\left(V_{i n, o s}\right), \sigma\left(G_{m, o s}\right)$ and $\sigma\left(A_{r}\right)$ model the mismatch between multipliers; thus, $V_{i n, o s}, G_{m, o s}$ and $A_{r}$ values are unique to each multiplier. The $\sigma\left(V_{o S}\right)$ models the mismatch between integrators; hence, $V_{o s}$ is unique to each integrator. Other parameters are global.

### 4.4.1 Power Budget

The objective is to design an analogue DFT processor that consumes less power than the digital FFT processor. A power-scalable variable-length digital FFT processor that was fabricated in 250 nm CMOS consumes 310 mW power to perform 8-point FFT at 200 MHz [79]. Normalizing the power consumption to the 180 nm technology and 20MHz frequency gives

$$
\begin{equation*}
\text { Power }=310 \mathrm{~mW}\left(\frac{20 \mathrm{MHz}}{200 \mathrm{MHz}}\right)\left(\frac{180 \mathrm{~nm}}{250 \mathrm{~nm}}\right)\left(\frac{1.8 v}{2.5 v}\right)^{2}=11.6 \mathrm{~mW} \tag{4.15}
\end{equation*}
$$

The real-time recursive DFT requires 4 N multipliers and 2 N differential integrators to compute N-point DFT. Hence, the power consumption of the real-time recursive DFT processor is

$$
\begin{equation*}
\text { Power }_{\text {Recursive DFT }} \cong 4 N\left(\text { Power }_{\text {Multiplier }}+\text { Power }_{\text {Single-ended integrator }}\right) \tag{4.16}
\end{equation*}
$$

Accordingly,

$$
\begin{equation*}
\text { Power }_{\text {Recursive DFT }} \cong 4 N V_{D D}\left(I_{\text {Multiplier }}+I_{\text {Single-ended integrator }}\right) \tag{4.17}
\end{equation*}
$$

where $I_{\text {Multiplier }}$ and $I_{\text {Single-ended integrator }}$ are the current supplies of the multiplier and the single-ended integrator, respectively. $V_{D D}$ is the voltage supply of the multiplier and the integrator. In order to achieve a power consumption less than 11.6 mW for the DFT processor, $I_{\text {Multiplier }}=80 \mu A$ and $I_{\text {Single-ended integrator }}=50 \mu \mathrm{~A}$ are selected.

### 4.4.2 Design Specifications of the Multiplier

Considering the input-output characteristic of the transconductance (Figure 4.6(a)), the linear range of an ideal multiplier is $[-b, b]$. Hence, for an ideal multiplier,

$$
\begin{equation*}
b=\frac{I_{\max }}{G_{\operatorname{mo}}} \tag{4.18}
\end{equation*}
$$

In the previous section, $I_{\max }=80 \mu A$ was selected. The input-output characteristics of ideal multipliers with different values of $G_{m o}$ are shown in Figure 4.16. As the figure illustrates, increasing the $G_{m o}$ reduces the linear range of the multiplier. Gain of the multiplier is $A_{v}=G_{m o} R_{D}$, where $R_{D}$ is the resistance of the transresistor. $A_{v}=1 \mathrm{~V} / \mathrm{V}$ is selected; hence, $R_{D}=1 / G_{m o}$. SNDR curves for different values of $G_{m o}$ were obtained by running the behavioural system simulation (Figure 4.17). Results of this simulation indicate that smaller $G_{m o}$ results in better tolerance of high signal levels. Hence, $G_{m o}=200 \mu A / V$ is selected.


Figure 4-16: The input-output characteristics of ideal multipliers


Figure 4-17: SNDR curves for different values of $G_{m o}$

In practice, linear range of the multiplier is less than its input swing ( $a<b$ ). Considering $G_{m o}=200 \mu A / V$, SNDR curves for different linear ranges were obtained by running the behavioural system simulation (Figure 4.18). Results of this analysis indicate that a DFT processor with smaller linear region is less tolerant to high signal levels. Hence, the DFT processor with smaller linear region has smaller dynamic range. A non-ideal linear range of $a=b / 2=0.2 \mathrm{~V}$ is selected for the system performance analysis.


Figure 4-18: SNDR curves for different linear ranges

Device mismatches result in transconductance error [77]. Multipliers with various transconductance errors are modelled by assuming that $G_{m, o s}$ has a normal distribution with zero mean and standard deviation $\sigma\left(G_{m, o s}\right)$. Considering $G_{m o}=200 \mu A / V$ and $=$ 0.2 V , the effect of transconductance error on the performance of the DFT processor is analysed. Typical values of $\sigma\left(G_{m, o s}\right)$ are obtained from a previous study on the analogue FFT processor [77]. Figure 4.19 illustrates the results of this analysis. These results indicate that the DFT processor with larger transconductance errors has smaller peak SNDR. On the plus side, the dynamic range of the DFT processor is not affected by the transconductance error.


Figure 4-19: SNDR curves for various transconductance errors

In deep submicron CMOS technologies transistor mismatches lead to significant DC offset [80]. Multipliers with various DC offsets are modelled by assuming that $V_{i n, o s}$ has a normal distribution with zero mean and standard deviation $\sigma\left(V_{i n, o s}\right)$. Considering $G_{m o}=200 \mu A / V$ and $a=0.2 V$, the impact of the DC offset mismatch on the performance of the DFT processor is analysed. Typical values of $\sigma\left(V_{i n, o s}\right)$ are obtained from a previous study on the analogue FFT processor [77]. Results of this analysis are illustrated in Figure 4.20. These results indicate that the DFT processor with larger DC offset mismatch is more susceptible to noise and distortion at low signal levels. Accordingly, the DFT processor with larger DC offset mismatch has smaller dynamic range and peak SNDR.


Figure 4-20: SNDR curves for various DC offset mismatches

### 4.4.3 Design Specifications of the Integrator

Based on the Nyquist theorem, the sampling frequency $\left(f_{S}\right)$ of the SC integrator must be at least twice the signal frequency $\left(f_{\text {in }}\right)$. Considering $f_{s} / f_{\text {in }}=4$, the result of the N point DFT is

$$
\begin{equation*}
V_{\text {out }}=4 g \sum_{n=1}^{N} V_{\text {in }}(n) \tag{4.19}
\end{equation*}
$$

where $V_{\text {in }}(n)$ is the input voltage of the integrator at $n^{\text {th }}$ time interval, and $g$ is the gain of the integrator. Outputs of two multipliers are added together and the result is applied to the input of the integrator. Hence, $V_{i n}(n)=V_{O 1}(n)+V_{O 2}(n)$, where $V_{O i}(n)$ is the output of $i^{t h}$ multiplier at $n^{\text {th }}$ time interval. To avoid the reduction of the SNDR due to the Op -amp saturation,

$$
\begin{equation*}
V_{\text {out }} \leq V_{D D}-V_{i n, C M} \tag{4.20}
\end{equation*}
$$

where $V_{D D}$ is the supply voltage of the Op-amp, and $V_{i n, C M}$ is the input common-mode (CM) level of the Op-amp. In Figure 4.7, $V_{i n, C M}$ is shown by the ground symbol. The input of the integrator $\left(V_{i n}\right)$ is connected to the output of the multiplier. Hence, $V_{i n, C M}$ must be equal to the output CM level of the multiplier. Using a 1.8 V voltage supply, the output CM level of the multiplier is 1.2 V (section 5.2.3). Ideally, $g=C_{S} / C_{I}$ [78]. Substituting the aforementioned values in the equations (4.19) and (4.20) gives

$$
\begin{equation*}
4 \frac{C_{S}}{C_{I}} \sum_{n=1}^{N} V_{O 1}(n)+V_{O 2}(n) \leq 0.6 \tag{4.21}
\end{equation*}
$$

The linear range of the multiplier is $[-a, a]$. Since the maximum gain of the multiplier is one, the maximum output of the multiplier is $\left|V_{O, \max }\right|=a$. Assuming that

$$
\begin{equation*}
V_{O 1}(n)=V_{O 2}(n)=\left|V_{O, \max }\right| \quad \text { for } n=1, \ldots, N \tag{4.22}
\end{equation*}
$$

For an 8-point DFT,

$$
\begin{equation*}
64 \frac{C_{S}}{C_{I}} a \leq 0.6 \tag{4.23}
\end{equation*}
$$

The smallest capacitor that can hold the sampled voltage is $C_{S}=50 f F$. In the previous section, $a$ is estimated to be 0.2 V . Hence, $C_{I}=1 p F$ is selected.

Ideally, Op-amp has infinite open-loop gain $\left(A_{v}\right)$. Hence, ideally the integrator leakage is $\alpha=1$. In practice, however, $A_{v}<\infty$. Thus, only a fraction of the previous output of the integrator is added to the new input sample. The consequence of this integrator leakage is that $\alpha<1$. The precise value of $\alpha$ is given by [78]

$$
\begin{equation*}
\alpha=1-\frac{C_{S} / C_{I}}{A_{v}} \tag{4.24}
\end{equation*}
$$

Considering $C_{S}=50 f F$ and $C_{I}=1 p F$, the impact of the integrator leakage on the performance of the DFT processor is analysed by varying $A_{v}$ in the behavioural system simulation (Figure 4.21). Results of this analysis indicate that Op-amp with larger $A_{v}$ provides higher SNDR. In section 4.4.1, $90 \mu W$ power was assigned to the integrator. Also, as mentioned earlier, output swing of the op-amp should be at least 0.6 V . Hence, it is unlikely to achieve $A_{v}>100 \mathrm{~V} / \mathrm{V}$.


Figure 4-21: SNDR curves for various op-amp gains

Transistor mismatches lead to DC offset [80]. Integrators with various DC offsets are modelled by assuming that $V_{o s}$ has a normal distribution with zero mean and standard deviation $\sigma\left(V_{o s}\right)$. Considering $G_{m o}=200 \mu A / V$ and $a=0.2 V$, the impact of the DC offset mismatch on the performance of the DFT processor is analysed. Typical values of $\sigma\left(V_{o s}\right)$ are obtained from a previous study [77]. Results of this analysis are shown in Figure 4.22. Based on these results, the DFT processor with larger DC offset mismatch has smaller dynamic range and peak SNDR. Accordingly, the DC offsets of the integrators have the same effect as the DC offsets of the multipliers.


Figure 4-22: SNDR curves for various DC offset mismatches

### 4.5 Yield Prediction

Process variability is pivotal in submicron CMOS technologies. Variations in the physical properties of the transistors impact the performance of the designed system. Therefore, in advance of the expensive fabrication process, it is essential to predict the yield at various design stages using reliable statistical analysis [81]. The parametric yield is associated with the system performance metric $X(m)$, which is a function of the mismatch parameter $m$. Systems that succeed in meeting the requirement(s) for $X(m)$ contribute to the yield. The Monte Carlo method can be used to estimate the average result and yield. Figure 4.23 illustrates the distribution of $X$ that is estimated by the Monte Carlo method [82, 83].


Figure 4-23: Yield prediction based on the Monte Carlo analysis [82]

The Monte Carlo analysis should stop when adding new samples, no longer changes the sample mean by more than a certain threshold. In other words [84, 85],

$$
\begin{equation*}
\left|\bar{X}_{n}-\bar{X}_{w}\right| \leq \varepsilon \tag{4.25}
\end{equation*}
$$

where, $\bar{X}_{n}$ is the mean of all generated samples (sample mean), $\bar{X}_{w}$ is the mean of last $w$ samples (window mean), and $\varepsilon$ is the tolerance for convergence.

$$
\begin{gather*}
\bar{X}_{n}=\frac{X_{1}+\cdots+X_{n}}{n}  \tag{4.26}\\
\bar{X}_{w}=\frac{X_{n-w+1}+\cdots+X_{n}}{w} \tag{4.27}
\end{gather*}
$$

Sample variance is given by

$$
\begin{equation*}
S_{n}^{2}=\frac{1}{n-1} \sum_{i=1}^{n}\left(X_{i}-\bar{X}_{n}\right)^{2} \tag{4.28}
\end{equation*}
$$

Variance of the window is

$$
\begin{equation*}
S_{w}^{2}=\frac{1}{w-1} \sum_{i=1}^{w}\left(X_{i}-\bar{X}_{w}\right)^{2} \tag{4.29}
\end{equation*}
$$

Considering $\mu$ as the expected value of $X$, the probability of the random variable $\sqrt{n} / S_{n}\left(\bar{X}_{n}-\mu\right)$ falling within the range $\left[-A_{1}, A_{1}\right]$ is given by [84, 85]

$$
\begin{equation*}
P\left\{-A_{1} \leq \frac{\sqrt{n}}{S_{n}}\left(\bar{X}_{n}-\mu\right) \leq A_{1}\right\} \tag{4.30}
\end{equation*}
$$

that is equivalent to

$$
\begin{equation*}
P\left\{\bar{X}_{n}-\frac{S_{n} A_{1}}{\sqrt{n}} \leq \mu \leq \bar{X}_{n}+\frac{S_{n} A_{1}}{\sqrt{n}}\right\} \tag{4.31}
\end{equation*}
$$

The value of $A_{1}$ must be extracted from the statistical tables of the t -distribution such that [84, 85]

$$
\begin{equation*}
P\left\{\mu \in\left[\bar{X}_{n}-\frac{S_{n} A_{1}}{\sqrt{n}}, \bar{X}_{n}+\frac{S_{n} A_{1}}{\sqrt{n}}\right]\right\} \rightarrow 1-\delta \tag{4.32}
\end{equation*}
$$

where $\delta$ is the error probability. Thus, the random interval

$$
\begin{equation*}
\left[\bar{X}_{n}-\frac{S_{n} A_{1}}{\sqrt{n}}, \bar{X}_{n}+\frac{S_{n} A_{1}}{\sqrt{n}}\right] \tag{4.33}
\end{equation*}
$$

is a $100(1-\delta) \%$ confidence interval for $\mu[84,85]$.

Same confidence interval can be obtained from

$$
\begin{equation*}
\left[\bar{X}_{w}-\frac{S_{w} A_{2}}{\sqrt{w}}, \bar{X}_{w}+\frac{S_{w} A_{2}}{\sqrt{w}}\right] \tag{4.34}
\end{equation*}
$$

Thus,

$$
\begin{equation*}
\bar{X}_{n}+\frac{S_{n} A_{1}}{\sqrt{n}}=\bar{X}_{w}+\frac{S_{w} A_{2}}{\sqrt{w}} \tag{4.35}
\end{equation*}
$$

Accordingly,

$$
\begin{equation*}
\left|\bar{X}_{n}-\bar{X}_{w}\right|=\left|\frac{S_{n} A_{1}}{\sqrt{n}}-\frac{S_{w} A_{2}}{\sqrt{w}}\right| \tag{4.36}
\end{equation*}
$$

For $w=10$ and $\delta=0.05, A_{2}=2.228$. For yield prediction, a large number of samples are required. Hence, $A_{1} \approx 1.96$. Thereby, the tolerance is

$$
\begin{equation*}
\varepsilon=\left|\frac{1.96 S_{n}}{\sqrt{n}}-0.7 S_{w}\right| \tag{4.37}
\end{equation*}
$$

Since $A_{1}=1.96$ is used in the above equation, at least 500 samples are required $\left(n_{\min }=500\right)$ before checking the convergence (equation 4.25).

### 4.6 Performance Analysis Results

Behavioural models of a real-time recursive DFT processor and a radix-2 FFT processor of length 8 were described in section 4.3. The behavioural model of the radix-2 FFT processor is based on a previous study on the analogue FFT processor [77]. Initially, the model parameters of the FFT processor was set at the values provided in [77] to verify the accuracy of the Simulink model.

In this section, Monte Carlo analysis is used to evaluate the performance of the aforementioned processors. For this purpose, OFDM signal with BPSK modulation is generated by Simulink. The MATLAB code of this analysis is available in Appendix A. Model parameters are set at the values in Table 4-2. Typical values of $A_{r}, \sigma\left(V_{i n, o s}\right)$, $\sigma\left(G_{m, o s}\right), \sigma\left(A_{r}\right), \sigma\left(V_{o s}\right), \gamma$, and $N$ are obtained from [77].

Table 4-2: Summary of the optimal value for the behavioural model parameters

| Parameter | Value |
| :---: | :---: |
| $a$ | 0.2 V |
| $b$ | 0.4 V |
| $I_{\max }$ | $80 \mu \mathrm{~A}$ |
| $G_{m o}$ | $200 \mu \mathrm{~A} / \mathrm{V}$ |
| $R_{D}$ | $5 \mathrm{~K} \Omega$ |
| $A_{r}$ | $10 \mu \mathrm{~A} / \mathrm{V}$ |
| $\sigma\left(V_{i n, o s}\right)$ | 0.5 mV |
| $\sigma\left(G_{m, o s}\right)$ | $2 \mu \mathrm{~A} / \mathrm{V}$ |
| $\sigma\left(A_{r}\right)$ | $10 \mu \mathrm{~A} / \mathrm{V}$ |
| $A_{v}$ | $100 \mathrm{~V} / \mathrm{V}$ |
| $\sigma\left(V_{o s}\right)$ | 0.1 mV |
| $\gamma$ | 0 |
| $N$ | 1 |

Table 4-3 gives the results of the Monte Carlo analysis. These results indicate that the average dynamic range of the proposed architecture is 4.7 dB higher than the FFT processor.

Table 4-3: Summary of the Monte Carlo analysis for the recursive DFT and the radix-2 FFT processors

| $(\mathrm{dB})$ | Dynamic range |  | Peak SNDR |  |
| :---: | :---: | :---: | :---: | :---: |
|  | Recursive DFT | Radix-2 FFT | Recursive DFT | Radix-2 FFT |
| Mean | 41.3 | 36.6 | 40.8 | 41.8 |
| Standard deviation | 3.4 | 3.1 | 1.6 | 1.7 |

The results of the Monte Carlo analysis are shown in Figure 4.24 and Figure 4.25. The histograms of the dynamic range for DFT and FFT processors are shown in Figure 4.26 and Figure 4.27. Based on these histograms, the real-time recursive DFT processor has a yield of $99.3 \%$ while the yield of the FFT processor is $82.8 \%$.


Figure 4-24: Monte Carlo analysis results of the real-time recursive DFT processor


Figure 4-25: Monte Carlo analysis results of the radix-2 FFT processor


Figure 4-26: The dynamic range histogram of the real-time recursive DFT processor


Figure 4-27: The dynamic range histogram of the radix-2 FFT processor

### 4.7 Summary

In this chapter, dynamic range requirements for an analogue DFT processor that supports WiFi and WiMAX standards were derived. Moreover, the behavioural models of the real-time recursive DFT processor and FFT processor were explained. Also, design specifications of an 8-point recursive DFT processor were determined.

The results of the Monte Carlo analysis on system simulations indicate that the average dynamic range of the real-time recursive DFT processor is 4.7 dB higher than the radix-2 FFT processor. Moreover, the proposed architecture has a yield of $99.3 \%$ while the yield of the FFT processor is $82.8 \%$. The enhanced performance of the real-time recursive DFT processor over the FFT processor convinced the designer to proceed to the transistor-level circuit designs, which will be presented in the next chapter.

## Chapter 5

## Circuit Design

In this chapter, various design approaches for the building blocks of the real-time recursive DFT processor are reviewed to find the suitable circuits. A rigorous analysis on each of the selected circuits is performed to optimize the design. Design considerations that are applied to provide the optimum matching will be discussed in the next chapter. Circuits are designed using 180 nm TSMC technology. The Berkeley Short-Channel IGFET Model (BSIM3v3) from the University of California, Berkeley is used for device modelling. Circuit simulations are performed by the Eldo SPICE simulator from the Mentor Graphics. The SPICE process parameters are provided by the MOSIS [86].

### 5.1 Previous Work on the Analogue FFT Processor

In early attempts to implement the analogue Fourier transform, discrete circuits were used [65, 66]. In these designs, the Switched-Capacitor amplifier was used as the coefficient multiplier (Figure 5.1(a)). The clock signals that control the circuit are shown in Figure 5.1(b). In sampling mode, $S_{1}$ and $S_{2}$ are on and $S_{3}$ is off (Figure 5.1(c)). Hence, the voltage across $\mathrm{C}_{1}$ tracks the input voltage. In the transition from the sampling mode to the amplification mode (Figure 5.1(d)), the channel charge injection leads to voltage error. This error is alleviated if $S_{2}$ turns off slightly before $S_{1}$ turns off and $S_{3}$ turns on [80].


Figure 5-1: (a) Switched-Capacitor amplifier (b) timing diagram of circuit (a) (c) circuit (a) in sampling mode (d) circuit (a) in amplification mode [80]

Accordingly, the output voltage is given by

$$
\begin{equation*}
V_{\text {out }}=\frac{C_{1}}{C_{2}} V_{\text {in }} \tag{5.1}
\end{equation*}
$$

Thus, the voltage sample is multiplied by the capacitance ratio.
In recent years, different design approaches have been taken to implement the FFT algorithm as an analogue integrated circuit. In one study, multiplication is performed by the current mirror (Figure 5.2) that generates a scaled copy of the reference current $\left(I_{r e f}\right)$ at its output [87]

$$
\begin{equation*}
I_{o u t}=\frac{(W / L)_{2}}{(W / L)_{1}} \cdot I_{r e f} \tag{5.2}
\end{equation*}
$$

where $(W / L)_{x}$ is the width to length ratio of device $M_{x}$.


Figure 5-2: The basic current mirror [80]

In another study, the passive Switched-Capacitor (Figure 5.3) is used as the multiplier in order to minimize the power consumption [88]. In this approach, signal is multiplied by $m=C_{1} /\left(C_{1}+C_{2}\right)$ when charges are transferred from capacitor $C_{1}$ to capacitor $C_{2}$ [89]. Since all of the aforementioned approaches merely use the physical properties to perform the multiplication, their scaling factors are unchangeable. Thus, the aforementioned multipliers are not suitable for the variable-length DFT processor.


Figure 5-3: The passive Switched-Capacitor multiplier

In another attempt, a 64-point FFT processor was realized with the SwitchedTransconductor multiplier (Figure 5.4) [90, 91]. In this approach, differential pairs with various $\mathrm{W} / \mathrm{L}$ ratios are connected together.


Figure 5-4: The Switched-Transconductor multiplier

As it will be proved in section 5.2.2, the differential current of the $\mathrm{n}^{\text {th }}$ pair is

$$
\begin{equation*}
\Delta I_{D n} \propto\left(\frac{W}{L}\right)_{n} \Delta V_{i n} \tag{5.3}
\end{equation*}
$$

For any pair that is connected to the common voltage $\Delta V_{i n}=0$; hence, $\Delta I_{D n}=0$. The current that leaves each of the output nodes is equal to the sum of currents entering that node. Thereby, the differential output current is

$$
\begin{equation*}
\Delta I_{\text {out }} \propto\left[\left(\frac{W}{L}\right)_{1}+\cdots+\left(\frac{W}{L}\right)_{n}\right] \Delta V_{i n} \tag{5.4}
\end{equation*}
$$

Accordingly, the scalar factor is adjusted by controlling the differential pairs that are connected to the input. The number of the scalar factors increases with the Fourier transform length. Thus, each multiplier requires more differential pairs as the Fourier transform length increases. Accordingly, this approach is not area efficient.

A reconfigurable DFT processor that is implemented on a Field Programmable Analogue Array (FPAA) was proposed in [92]. The reconfigurable multipliers of this processor are realized by the floating-gate transistors (Figure 5.5).


Figure 5-5: The floating-gate multiplier

In this approach, both of the PMOS transistors operate in the subthreshold region. Hence, the input and output currents are given by [92]

$$
\begin{align*}
& I_{\text {in }}=I_{o} \frac{W}{L} \exp \left(\frac{V_{S}-\kappa V_{G 1}}{V_{T}}\right)\left[1-\exp \left(\frac{V_{D S}}{V_{T}}\right)\right]  \tag{5.5}\\
& I_{\text {out }}=I_{o} \frac{W}{L} \exp \left(\frac{V_{S}-\kappa V_{G 2}}{V_{T}}\right)\left[1-\exp \left(\frac{V_{D S}}{V_{T}}\right)\right] \tag{5.6}
\end{align*}
$$

where $I_{o}$ is a process-dependant constant, $V_{T}$ is the thermal voltage, and $\kappa$ denotes the gate coupling coefficient which is

$$
\begin{equation*}
\kappa=\frac{C_{g}}{C_{T}}\left(\frac{C_{o x}}{C_{o x}+C_{d e p}}\right) \tag{5.7}
\end{equation*}
$$

In the above equation, $C_{o x}$ is the gate oxide capacitance, $C_{d e p}$ is the depletion region capacitance, and $C_{T}$ is the total capacitance at the gate (i.e. $C_{g}$ and the internal capacitances of the transistor).

The voltages across the gate capacitors are

$$
\begin{align*}
& V_{G 1}-V_{F}=\frac{Q_{1}}{C_{g}}  \tag{5.8}\\
& V_{G 2}-V_{F}=\frac{Q_{2}}{C_{g}} \tag{5.9}
\end{align*}
$$

Substituting equations (5.8) and (5.9) in (5.5) and (5.6) gives

$$
\begin{align*}
& I_{\text {in }}=I_{o} \frac{W}{L} \exp \left(\frac{V_{S}-\kappa V_{F}}{V_{T}}\right) \exp \left(\frac{-\kappa Q_{1}}{C_{g} V_{T}}\right)\left[1-\exp \left(\frac{V_{D S}}{V_{T}}\right)\right]  \tag{5.10}\\
& I_{\text {out }}=I_{o} \frac{W}{L} \exp \left(\frac{V_{S}-\kappa V_{F}}{V_{T}}\right) \exp \left(\frac{-\kappa Q_{2}}{C_{g} V_{T}}\right)\left[1-\exp \left(\frac{V_{D S}}{V_{T}}\right)\right] \tag{5.11}
\end{align*}
$$

Accordingly,

$$
\begin{equation*}
I_{o u t}=\exp \left(\frac{\kappa\left(Q_{1}-Q_{2}\right)}{C_{g} V_{T}}\right) \cdot I_{i n} \tag{5.12}
\end{equation*}
$$

which means that the scalar factor can be adjusted by controlling the amount of charges that are stored in the $C_{g}$ capacitors. These equations are valid if the transistors are biased in the subthreshold region. To maintain this condition, a comparator is used to adjust $\mathrm{V}_{\mathrm{S}}$ to the changes of $Q_{1}$ and $Q_{2}$. Structure of the FPAA demands to use $16 \mathrm{~N}^{2}$ multipliers for an N-length DFT [93]. Thus, the FPAA approach is not area efficient, and becomes unfeasible as N increases.

In chapter 3, it was explained that in the proposed architecture the OFDM signal is multiplied by a piecewise continuous signal to eliminate sampling. All of the circuits that have been explained so far (Figures 5.1 to 5.5) actually scale the signal rather than multiplying two signals together. Moreover, adjusting the physical properties of each multiplier to provide various scaling factors makes the design process cumbersome. This problem becomes more severe as the transform length increases. Hence, it is essential to provide coefficients by signals rather than physical properties of the circuit. In view of that, a FFT processor was designed using a four-quadrant multiplier [41]. Although in this FFT processor discrete-time signals are multiplied together, fourquadrant multipliers can also be used for continuous signals. Analysis of the fourquadrant multiplier is provided in the next section.

### 5.2 Analogue Multiplier

Analogue multipliers provide the linear product of two input signals $x$ and $y$, yielding output signal $z=K x y$ where $K$ is the multiplication constant. Multipliers are classified into three main categories based on the signals' polarity. These categorise are singlequadrant (where $x$ and $y$ are unipolar), two-quadrant (where $x$ or $y$ is bipolar), and fourquadrant (where $x$ and $y$ are bipolar). Modulators and mixers are particular cases of multipliers that are used in communication systems. Despite the large number of multipliers that are reported in the literature, they can be classified into a few categories based on their architectures [94]. Design specifications, such as bandwidth and power budget, determine the suitable circuit topology. The design of a suitable analogue multiplier for the real-time recursive DFT processor is discussed in this section.

### 5.2.1 Principle of Operation

The basic operation of an analogue multiplier is to generate a high order polynomial of the two signals using nonlinear devices, and then cancel all terms other than $K x y$. Since MOS transistors have square-law characteristics, they can be used for this purpose. For MOS transistors in the saturation region, overdrive voltage is a second order polynomial [94].

$$
\begin{equation*}
\left(V_{G S}-V_{T H}\right)^{2}=\frac{I_{D}}{\frac{1}{2} \mu C_{o x} \frac{W}{L}} \tag{5.13}
\end{equation*}
$$

Here, $V_{G S}$ is the gate-source voltage, $V_{T H}$ is the threshold voltage, $I_{D}$ is the drain current, $\mu$ is the mobility of charge carriers, $C_{o x}$ is the gate oxide capacitance per unit area, $W$ is the width and $L$ is the length of the channel. Since the overdrive polynomial is achieved by the drain current, the analogue multiplier function can be realized by transconductance amplifiers. Later, a transresistor can be used to convert the output current to voltage. The simplest topology of analogue multiplier is a differential pair with a variable current source that is controlled by one of the input signals (Figure 5.6) [95, 96].


Figure 5-6: Two-quadrant analogue multiplier [96]

Multipliers of the DFT processor downconvert sub-channels of the OFDM signal to zero frequency; hence, they act as zero Intermediate Frequency (IF) mixers. Accordingly, the multiplier that is shown in Figure 5.6 is a single-balanced active mixer. Since each multiplier considers one of the sub-channels as its desired signal, other subchannels act as interferers that accompany the desired signal. An ideal differential pair cancels input feedthroughs. However, mismatch between the differential pair allows a fraction of the input to appear at the output without frequency translation. Hence, zero IF mixers are sensitive to even-order distortion. This problem can be resolved by raising the Second Intercept Point ( $\mathrm{IP}_{2}$ ) of the multiplier. For this purpose, input of the transconductor stage must be realized in differential form, leading to a double-balanced topology [96]. The Gilbert cell is a precision four-quadrant multiplier that is widely used as a double-balanced mixer in communication systems [97]. Hence, the Gilbert cell is considered as a suitable multiplier for the real-time recursive DFT processor.

### 5.2.2 Analysis of the CMOS Gilbert Cell

Initially, the Gilbert cell was realized based on the exponential characteristics of Bipolar Junction Transistors (BJT) [97]. Nevertheless, the same topology can be used for MOS transistors with square-low characteristics [98]. A block diagram of the Gilbert cell is shown in Figure 5.7.


Figure 5-7: Block diagram of the Gilbert cell

Each $\mathrm{G}_{\mathrm{m}}$ transconductor is realized by a differential pair (Figure 5.8).


Figure 5-8: $\mathrm{G}_{\mathrm{m}}$ transconductor [80]

The outputs of the differential pair in Figure 5.8 are given by [80]

$$
\begin{align*}
& V_{\text {out } 1}=V_{D D}-R_{D 1} I_{D 1}  \tag{5.14}\\
& V_{\text {out } 2}=V_{D D}-R_{D 2} I_{D 2} \tag{5.15}
\end{align*}
$$

If $R_{D 1}=R_{D 2}=R_{D}$, then

$$
\begin{equation*}
V_{\text {out } 1}-V_{\text {out } 2}=R_{D 2} I_{D 2}-R_{D 1} I_{D 1}=R_{D}\left(I_{D 2}-I_{D 1}\right) \tag{5.16}
\end{equation*}
$$

Voltage at node P is

$$
\begin{equation*}
V_{p}=V_{i n 1}-V_{G S 1}=V_{i n 2}-V_{G S 2} \tag{5.17}
\end{equation*}
$$

Thus,

$$
\begin{equation*}
V_{i n 1}-V_{i n 2}=V_{G S 1}-V_{G S 2} \tag{5.18}
\end{equation*}
$$

For an ideal saturated NMOS device, we have

$$
\begin{equation*}
\left(V_{G S}-V_{T H}\right)^{2}=\frac{I_{D}}{\frac{1}{2} \mu_{n} C_{o x} \frac{W}{L}} \tag{5.19}
\end{equation*}
$$

Hence,

$$
\begin{equation*}
V_{G S}=\sqrt{\frac{2 I_{D}}{\mu_{n} C_{o x} \frac{W}{L}}}+V_{T H} \tag{5.20}
\end{equation*}
$$

Combining (5.18) and (5.20) yields

$$
\begin{equation*}
V_{i n 1}-V_{i n 2}=\sqrt{\frac{2 I_{D 1}}{\mu_{n} C_{o x} \frac{W}{L}}}-\sqrt{\frac{2 I_{D 2}}{\mu_{n} C_{o x} \frac{W}{L}}} \tag{5.21}
\end{equation*}
$$

The objective is to attain the differential output current $I_{D 1}-I_{D 2}$. Therefore, by squaring both sides of (5.21) and considering that $I_{D 1}+I_{D 2}=I_{S S}$ we obtain

$$
\begin{equation*}
\left(V_{i n 1}-V_{i n 2}\right)^{2}=\frac{2}{\mu_{n} C_{o x} \frac{W}{L}}\left(I_{S S}-2 \sqrt{I_{D 1} I_{D 2}}\right) \tag{5.22}
\end{equation*}
$$

Rearranging (5.22) gives

$$
\begin{equation*}
\frac{1}{2} \mu_{n} C_{o x} \frac{W}{L}\left(V_{i n 1}-V_{i n 2}\right)^{2}-I_{S S}=-2 \sqrt{I_{D 1} I_{D 2}} \tag{5.23}
\end{equation*}
$$

Squaring both sides again and considering that $4 I_{D 1} I_{D 2}=\left(I_{D 1}+I_{D 2}\right)^{2}-\left(I_{D 1}-I_{D 2}\right)^{2}=$ $I_{S S}{ }^{2}-\left(I_{D 1}-I_{D 2}\right)^{2}$, we achieve

$$
\begin{equation*}
\left(I_{D 1}-I_{D 2}\right)^{2}=-\frac{1}{4}\left(\mu_{n} C_{o x} \frac{W}{L}\right)^{2}\left(V_{i n 1}-V_{i n 2}\right)^{4}+I_{S S} \mu_{n} C_{o x} \frac{W}{L}\left(V_{i n 1}-V_{i n 2}\right)^{2} \tag{5.24}
\end{equation*}
$$

Thereby [80],

$$
\begin{equation*}
I_{D 1}-I_{D 2}=\frac{1}{2} \mu_{n} C_{o x} \frac{W}{L}\left(V_{i n 1}-V_{i n 2}\right) \sqrt{\frac{4 I_{S S}}{\mu_{n} C_{o x} \frac{W}{L}}-\left(V_{i n 1}-V_{i n 2}\right)^{2}} \tag{5.25}
\end{equation*}
$$

In order to find $I_{D 1}$ and $I_{D 2}, I_{D 2}=I_{S S}-I_{D 1}$ and $I_{D 1}=I_{S S}-I_{D 2}$ are substituted in (5.25) respectively.

$$
\begin{align*}
& I_{D 1}=\frac{I_{S S}}{2}+\frac{1}{4} \mu_{n} C_{o x} \frac{W}{L}\left(V_{i n 1}-V_{i n 2}\right) \sqrt{\frac{4 I_{S S}}{\mu_{n} C_{o x} \frac{W}{L}}-\left(V_{i n 1}-V_{i n 2}\right)^{2}}  \tag{5.26}\\
& I_{D 2}=\frac{I_{S S}}{2}-\frac{1}{4} \mu_{n} C_{o x} \frac{W}{L}\left(V_{i n 1}-V_{i n 2}\right) \sqrt{\frac{4 I_{S S}}{\mu_{n} C_{o x} \frac{W}{L}}-\left(V_{i n 1}-V_{i n 2}\right)^{2}} \tag{5.27}
\end{align*}
$$

The circuit topology of the Gilbert cell is obtained by replacing the $\mathrm{G}_{\mathrm{m}}$ blocks in Figure 5.7 with differential pairs.


Figure 5-9: Gilbert cell

The differential output current of the Gilbert cell (Figure 5.9) is

$$
\begin{equation*}
I_{\text {out- }}-I_{\text {out }+}=\left(I_{D 1}+I_{D 3}\right)-\left(I_{D 2}+I_{D 4}\right)=\left(I_{D 1}-I_{D 2}\right)-\left(I_{D 4}-I_{D 3}\right) \tag{5.28}
\end{equation*}
$$

Here, $I_{D 1}-I_{D 2}$ and $I_{D 4}-I_{D 3}$ are the differential currents of the two pairs with $V_{2}$ input. These differential currents can be calculated from equation (5.25). The tail current sources of the two pairs with $\mathrm{V}_{2}$ input are $I_{D 5}$ and $I_{D 6}$. Hence, denoting $V_{2+}-V_{2-}$ and $I_{\text {out }-}-I_{\text {out }+}$ by $\Delta V_{2}$ and $\Delta I_{\text {out }}$, respectively, we have

$$
\begin{equation*}
\Delta I_{o u t}=\frac{1}{2} \mu_{n} C_{o x} \frac{W}{L} \Delta V_{2}\left(\sqrt{\frac{4 I_{D 5}}{\mu_{n} C_{o x} \frac{W}{L}}-\Delta V_{2}^{2}}-\sqrt{\frac{4 I_{D 6}}{\mu_{n} C_{o x} \frac{W}{L}}-\Delta V_{2}^{2}}\right) \tag{5.29}
\end{equation*}
$$

Equations (5.26) and (5.27) can be used for $I_{D 5}$ and $I_{D 6}$. Since square roots of $I_{D 5}$ and $I_{D 6}$ are taken in (5.29), equations (5.26) and (5.27) must be written in square form for simplification. For this purpose, the auxiliary term $\left(\mu_{n} C_{o x} \frac{W}{L}\left(V_{i n 1}-V_{i n 2}\right)^{2} / 8\right)$ should be added to and subtracted from (5.26) and (5.27). Thereby [99],

$$
\begin{align*}
& I_{D 1}=\frac{1}{4} \mu_{n} C_{o x} \frac{W}{L}\left(\sqrt{\frac{2 I_{S S}}{\mu_{n} C_{o x} \frac{W}{L}}-\frac{\left(V_{i n 1}-V_{i n 2}\right)^{2}}{2}}+\frac{\left(V_{i n 1}-V_{i n 2}\right)}{\sqrt{2}}\right)^{2}  \tag{5.30}\\
& I_{D 2}=\frac{1}{4} \mu_{n} C_{o x} \frac{W}{L}\left(\sqrt{\frac{2 I_{S S}}{\mu_{n} C_{o x} \frac{W}{L}}-\frac{\left(V_{i n 1}-V_{i n 2}\right)^{2}}{2}}-\frac{\left(V_{i n 1}-V_{i n 2}\right)}{\sqrt{2}}\right)^{2}
\end{align*}
$$

Now, by substituting (5.30) and (5.31) for $I_{D 5}$ and $I_{D 6}$ in (5.29) we achieve

$$
\begin{align*}
\Delta I_{o u t}=\frac{1}{2} \mu_{n} C_{o x} & \frac{W}{L} \Delta V_{2}\left(\sqrt{\left(\sqrt{\frac{2 I_{S S}}{\mu_{n} C_{o x} \frac{W}{L}}-\frac{\Delta V_{1}^{2}}{2}}+\frac{\Delta V_{1}}{\sqrt{2}}\right)^{2}-\Delta V_{2}^{2}}\right. \\
& -\sqrt{\left.\left(\sqrt{\frac{2 I_{S S}}{\mu_{n} C_{o x} \frac{W}{L}}-\frac{\Delta V_{1}^{2}}{2}}-\frac{\Delta V_{1}}{\sqrt{2}}\right)^{2}-\Delta V_{2}^{2}\right)} \tag{5.32}
\end{align*}
$$

Equation (5.32) can be approximated by

$$
\begin{align*}
\Delta I_{o u t} \cong \frac{1}{2} \mu_{n} C_{o x} \frac{W}{L} \Delta V_{2}\left(\sqrt{\left(\sqrt{\frac{2 I_{S S}}{\mu_{n} C_{o x} \frac{W}{L}}-\frac{\Delta V_{1}^{2}}{2}}+\frac{\Delta V_{1}}{\sqrt{2}}\right)^{2}}\right. \\
\left.-\sqrt{\left(\sqrt{\frac{2 I_{S S}}{\mu_{n} C_{o x} \frac{W}{L}}-\frac{\Delta V_{1}^{2}}{2}}-\frac{\Delta V_{1}}{\sqrt{2}}\right)^{2}}\right) \tag{5.33}
\end{align*}
$$

Thereby [99],

$$
\begin{equation*}
\Delta I_{o u t} \cong \frac{1}{\sqrt{2}} \mu_{n} C_{o x} \frac{W}{L} \Delta V_{1} \Delta V_{2} \tag{5.34}
\end{equation*}
$$

Equation (5.25) was derived with the assumption that both $M_{1}$ and $M_{2}$ are on. In reality however, as $\Delta V_{\text {in }}$ exceeds a limit, only one transistor is on and carries the entire $I_{S S}$. Denoting this limit by $\Delta V_{i n 1}$ and assuming that $M_{1}$ is on, $I_{D 1}=I_{S S}$ and $\Delta V_{i n 1}=V_{G S 1}-$ $V_{T H}$ should be substituted in equation (5.19). Thereby [80],

$$
\begin{equation*}
\Delta V_{i n 1}=\sqrt{\frac{2 I_{S S}}{\mu_{n} C_{o x} \frac{W}{L}}} \tag{5.35}
\end{equation*}
$$

Hence, equations (5.25) to (5.34) are valid for input range $-\Delta V_{i n 1}<\Delta V_{i n}<\Delta V_{i n 1}$. Considering the characteristic of a differential pair that is shown in Figure 5.10, $\left[-\Delta V_{\text {in } 1}, \Delta V_{\text {in } 1}\right]$ is the linear range of operation. Based on the equation (5.35), the linear range can be increased by increasing $I_{S S}$ or decreasing $W / L$. Increasing $I_{S S}$, increases the power consumption. On the other hand, as will be explained in the next chapter, devices with smaller $W / L$ provide better matching.


Figure 5-10: Input-output characteristic of a differential pair [80]

The transconductance of the differential pair is the slope of the characteristic (Figure 5.10). Thus, $G_{m}$ of the differential pair is obtained by taking the derivative of equation (5.25) [80].

$$
\begin{equation*}
G_{m}=\frac{\partial \Delta I_{D}}{\partial \Delta V_{i n}}=\frac{1}{2} \mu_{n} C_{o x} \frac{W}{L} \frac{\frac{4 I_{S S}}{\mu_{n} C_{o x} \frac{W}{L}}-2 \Delta V_{i n}{ }^{2}}{\sqrt{\frac{4 I_{S S}}{\mu_{n} C_{o x} \frac{W}{L}}-\Delta V_{i n}^{2}}} \tag{5.36}
\end{equation*}
$$

In the equilibrium condition $\Delta V_{i n}=0$; thus, $G_{m}=\sqrt{\mu_{n} C_{o x}(W / L) I_{S S}}$. Substituting $\Delta I_{D}=G_{m} \Delta V_{i n}$ in the equation (5.16) gives [80]

$$
\begin{equation*}
\Delta V_{\text {out }}=R_{D} \Delta I_{D}=R_{D} G_{m} \Delta V_{\text {in }} \tag{5.37}
\end{equation*}
$$

Thus, the small-signal voltage gain of the differential pair in the equilibrium condition is [80]

$$
\begin{equation*}
\left|A_{v}\right|=\frac{\Delta V_{\text {out }}}{\Delta V_{\text {in }}}=\sqrt{\mu_{n} C_{o x} \frac{W}{L} I_{S S}} R_{D} \tag{5.38}
\end{equation*}
$$

Accordingly, reducing $W / L$ to make the circuit more linear inevitably decreases the transconductance and voltage gain. Linearization techniques can be applied to increase the linear range further. The simplest linearization technique is resistive source degeneration. Inductive and capacitive degeneration also increase the linearity. Inductive degeneration has low noise and provides higher linearity comparing to the resistive and capacitive degeneration [100, 101]. However, inductors require a large layout area. Several other methods were proposed to improve the linearity of the Gilbert cell [99, 102-105]. However, these methods increase the complexity and power consumption of the multiplier. Since the DFT processor with OFDM application requires a large number of multipliers, the simplest linearization technique is preferable.

### 5.2.3 Circuit Realization

It is difficult to fabricate resistors with accurate values or a reasonable physical size in CMOS technologies. Thus, degeneration resistors can be replaced by transistors that operate in the deep triode region. Moreover, $\mathrm{R}_{\mathrm{D}}$ resistors can be replaced by diodeconnected transistors [80]. Therefore, the circuit is modified as depicted in Figure 5.11. In this topology, $M_{1}-M_{6}$ and $M_{9}-M_{11}$ operate in the saturation region. Besides, $M_{7}$ and $M_{8}$ operate in the deep triode region to perform the resistive degeneration. The gain of a circuit with diode-connected load is $A_{v} \propto \mu_{\text {input device }} / \mu_{\text {Load device }}$, where $\mu$ is the mobility of charge carriers. Accordingly, higher gain can be achieved by using PMOS devices with lower mobility of carriers (i.e. in modern processes $\mu_{p} C_{o x} \approx 0.25 \mu_{n} C_{o x}$ ) as load.


Figure 5-11: Degenerated Gilbert cell with diode-connected load

The output common-mode (CM) level of the Gilbert cell with diode-connected loads is $V_{D D}-V_{G S 10}$, where $V_{G S 10}$ is the gate-source voltage of $M_{10} . M_{10}$ and $M_{11}$ are always in saturation because the drain and the gate have the same potential. Hence, the CM level of the Gilbert cell with diode-connected loads is well-defined. The voltage gain of the Gilbert cell with diode-connected loads is

$$
\begin{equation*}
A_{v}=-G_{m}\left(R_{O}\left\|r_{O 10}\right\| \frac{1}{g_{m 10}}\right) \approx \frac{-G_{m}}{g_{m 10}} \tag{5.39}
\end{equation*}
$$

where $G_{m}$ is the transconductance of the Gilbert cell, $R_{O}$ is the output resistance of the transconductor, $r_{O 10}$ is the output resistance of $M_{10}$, and $g_{m 10}$ is the transconductance of $M_{10}$. Hence, the Gilbert cell with diode-connected loads has a low voltage gain. To increase the voltage gain, $M_{10}$ and $M_{11}$ must operate as current sources for the differential signals. Since $I_{D 1}+I_{D 3}=I_{D 2}+I_{D 4}=I_{S S} / 2$, the CM level depends on how close $I_{D 10}$ and $I_{D 11}$ are to this value.

In practice, mismatches in the NMOS current source $\left(M_{9}\right)$ and PMOS current sources ( $M_{10}$ and $M_{11}$ ) create an error between $I_{D 10,11}$ and $I_{S S} / 2$. Thus, in the Gilbert cell with current-source loads, the difference between the currents that are generated by p-type and n-type current sources flow through the output impedance. Hence, the mismatch between the p-type and n-type current sources creates the voltage error of $\left(\left(I_{D 10}+\right.\right.$ $\left.\left.I_{D 11}\right)-I_{S S}\right)\left(R_{O} \| r_{O 10}\right)$ at the output. Therefore, the output CM level of the Gilbert cell with current-source loads is sensitive to device properties and mismatches. Hence, a common-mode feedback (CMFB) network is required to sense the CM level of $V_{\text {out }+}$ and $V_{\text {out- }}$ and adjust one of the bias currents accordingly [80, 101]. Therefore, the circuit is modified as depicted in Figure 5.12, where $M_{12}$ and $M_{13}$ operate in the deep triode region. For differential changes at $V_{\text {out }+}$ and $V_{\text {out- }}$, node P is a virtual ground. Hence, the voltage gain is

$$
\begin{equation*}
A_{v}=-G_{m}\left(R_{O}\left\|r_{O 10}\right\| R_{o n 12}\right) \tag{5.40}
\end{equation*}
$$

where $R_{o n 12}$ is the on-resistance of $M_{12}$. For CM levels, $M_{10}$ and $M_{11}$ operate as diodeconnected loads.


Figure 5-12: Degenerated Gilbert cell with CMFB network

As explained in chapter 2, complex multiplication is performed by adding the results of two real multiplications. Based on KCL, currents that are entering the same node are added together. Thus, addition is provided by connecting the outputs of two multipliers to each other and sharing the load (transresistor) between the multipliers. The topology of the complex multiplier is shown in Figure 5.13.


Figure 5-13: topology of the complex multiplier

Based on the design specifications in section 4.4.2, the linear range of the multiplier should be at least $[-0.2 \mathrm{~V}, 0.2 \mathrm{~V}]$. Additionally, gain of the complex multiplier should be $A_{v}=1 V / V$. Therefore, each of the output nodes in Figure 5.13 must be able to swing by 0.2 V without driving $M_{10}$, and $M_{11}$ into the triode region. Thus, the overdrive voltage of $M_{10}$, and $M_{11}$ should be $\left|V_{O D 10}\right|=0.2 \mathrm{~V}$. As mentioned earlier, for CM levels, $M_{10}$ and $M_{11}$ operate as diode-connected loads. Hence, with $\left|V_{T H}\right|=0.4 \mathrm{~V}$ for PMOS transistors and $\left|V_{O D 10}\right|=0.2 \mathrm{~V}$, the drain-source voltage of $M_{10}$, and $M_{11}$ is $\left|V_{D S 10}\right|=0.6 \mathrm{~V}$. Thus, considering $V_{D D}=1.8 \mathrm{~V}$, the output CM level is $V_{\text {out }}=1.2 \mathrm{~V}$. Accordingly, the total voltage available for NMOS transistors is 1.2 V . Based on the design specifications in section 4.4.2, the input swing should be [-0.4V,0.4V]. Therefore, $V_{D S 1}=0.4 \mathrm{~V}$ is allocated to $M_{1 a}-M_{6 a}$ and $M_{1 b}-M_{6 b}$. From the remaining voltage, $V_{D S 9}=0.3 \mathrm{~V}$ is allocated to the current supplies $\left(M_{9 a}\right.$ and $\left.M_{9 b}\right)$ and $V_{D S 7}=$ 0.1 V is allocated to the degeneration transistors $\left(M_{7 a}, M_{8 a}, M_{7 b}, M_{8 b}\right)$. Considering the linear range, $V_{D S 1}-V_{O D 1}>0.2 V$ is required for $M_{1 a}-M_{6 a}$ and $M_{1 b}-M_{6 b}$. Hence, $V_{O D 1}=0.1 V$ is allocated to $M_{1 a}-M_{6 a}$ and $M_{1 b}-M_{6 b}$.

The length of the current source transistors ( $M_{9 a}$ and $M_{9 b}$ ) must be larger than the minimum length to reduce the channel-length modulation effect. Based on the equation (5.19), increasing the length reduces the supply current. Thus, either the width or the overdrive voltage must increase to provide the required current. Since increasing width and length together is not an area efficient solution, the overdrive voltage is selected to be $V_{O D 9}=0.2 \mathrm{~V}$.

Considering the power budget (section 4.4.1), $I_{\text {Multiplier }}=80 \mu A$ is allocated to each of the current sources ( $M_{9 a}$ and $M_{9 b}$ ). Thus, each of the transistors in the bottom differential pairs ( $M_{5 a}-M_{8 a}$ and $M_{5 b}-M_{8 b}$ ) carries a current of $40 \mu A$. Therefore, each of the transistors in the top differential pairs $\left(M_{1 a}-M_{4 a}\right.$ and $\left.M_{1 b}-M_{4 b}\right)$ carries a current of $20 \mu \mathrm{~A}$.

With the bias current and overdrive voltage of each transistor known, the aspect ratios of the transistors in the saturation region can be determined from

$$
\begin{equation*}
I_{D}=\frac{1}{2} \mu C_{o x} \frac{W}{L} V_{O D}^{2} \tag{5.41}
\end{equation*}
$$

Also, the aspect ratios of the transistors in the triode region can be determined from

$$
\begin{equation*}
R_{o n}=\frac{1}{\mu C_{o x} \frac{W}{L} V_{O D}} \tag{5.42}
\end{equation*}
$$

To minimize the device capacitances, the minimum length ( $0.2 \mu \mathrm{~m}$ ) was chosen for all transistors except $M_{9 a}$ and $M_{9 b}$. Table 5-1 shows the aspect ratios of the initial design which satisfies the swing and power budget specifications.

Table 5-1: initial aspect ratios of the complex multiplier

| Transistor | $M_{1 a}-M_{4 a}$ <br> $M_{1 b}-M_{4 b}$ | $M_{5 a}, M_{6 a}$ <br> $M_{5 b}, M_{6 b}$ | $M_{7 a}, M_{8 a}$ <br> $M_{7 b}, M_{8 b}$ | $M_{9 a}$ <br> $M_{9 b}$ | $M_{10 a}, M_{11 a}$ <br> $M_{10 b}, M_{11 b}$ | $M_{12 a}, M_{13 a}$ <br> $M_{12 b}, M_{13 b}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $\frac{W}{L}\left(\frac{\mu m}{\mu m}\right)$ | $\frac{3}{0.2}$ | $\frac{6}{0.2}$ | $\frac{3}{1.6}$ | $\frac{10}{0.8}$ | $\frac{12}{0.2}$ | $\frac{4}{0.8}$ |

As it will be explained in section 6.2, devices with larger channel area ( $W L$ ) provide better matching. In order to maintain a constant overdrive voltage, width and length of each transistor must scale together. Since $M_{1 a}-M_{8 a}$ and $M_{1 b}-M_{8 b}$ appear in the signals paths, their maximum lengths are determined by the bandwidth requirement ( $\mathrm{BW}>20 \mathrm{MHz}$ ). Table 5-2 shows the final aspect ratios.

Table 5-2: final aspect ratios of the complex multiplier

| Transistor | $M_{1 a}-M_{4 a}$ <br> $M_{1 b}-M_{4 b}$ | $M_{5 a}, M_{6 a}$ <br> $M_{5 b}, M_{6 b}$ | $M_{7 a}, M_{8 a}$ <br> $M_{7 b}, M_{8 b}$ | $M_{9 a}$ <br> $M_{9 b}$ | $M_{10 a}, M_{11 a}$ <br> $M_{10 b}, M_{11 b}$ | $M_{12 a}, M_{13 a}$ <br> $M_{12 b}, M_{13 b}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $\frac{W}{L}\left(\frac{\mu m}{\mu m}\right)$ | $\frac{15}{1}$ | $\frac{30}{1}$ | $\frac{3}{1.6}$ | $\frac{50}{4}$ | $\frac{60}{1}$ | $\frac{4}{0.8}$ |

The maximum output swing is achieved by choosing the CM levels $V_{1}=V_{3}=1 \mathrm{~V}$, $V_{2}=V_{4}=1.5 \mathrm{~V}$, and the bias voltage $V_{\text {bias }}=0.6 \mathrm{~V}$. Thereby, $M_{9 a}$ and $M_{9 b}$ each provide $66 \mu \mathrm{~A}$. The power consumption of the complex multiplier is $239 \mu \mathrm{~W}$.

Figure 5.14 shows the transfer characteristics of the designed circuit for various multiplication coefficients. Variations of the differential input ( $\Delta V_{1}$ ) and differential output $\left(\Delta V_{\text {out }}\right)$ signals are given on the horizontal and vertical axes respectively. Changing the multiplication coefficient $\left(\Delta V_{2}\right)$ changes the slope of the transfer characteristic. According to these characteristics, the designed circuit has a linear multiplication range of $[-0.3 \mathrm{~V}, 0.3 \mathrm{~V}]$.


Figure 5-14: Transfer characteristic of the Gilbert cell multiplier simulated in SPICE

### 5.3 Discrete-Time Integrator

The signal at the output of the multiplier is piecewise continuous. For an N-point DFT, the amplitude of N pieces must be summed together. Accordingly, a discrete-time integrator that takes samples of each piece and provides their sum is required. The discrete-time integrator was first realized by replacing the resistor in the Operational amplifier (Op-amp) integrator (i.e. continuous-time integrator) with a capacitor and two MOS switches (Figure 5.15) [106]. The nonoverlapping complementary clock signals ( $\varphi_{1}$ and $\varphi_{2}$ ) that control the circuit are shown in Figure 5.15(c). When the clock signal is high, the transistor is in the triode region. Thus, the transistor operates as a resistor and conducts current. Therefore, the switch turns on as the clock signal goes high. In the sampling mode $\mathrm{S}_{1}$ is on and $C_{S}$ absorbs a charge equal to $C_{S} V_{i n}$. In the integration mode $\mathrm{S}_{2}$ is on and $C_{S}$ deposits its charge on $C_{I}$.


Figure 5-15: (a) continuous-time integrator (b) discrete-time integrator (c) timing diagram of circuit (b) [80]

The equivalent capacitance of the parasitic capacitors that affect the output is represented by the $C_{p}$. Sensitivity of the output to $C_{p}$ can be reduced by enlarging the sampling and integrating capacitors. Therefore, this integrator demands large layout area which is unsuitable for Very Large Scale Integration (VLSI) circuits. The design of a switched-capacitor (SC) integrator that is insensitive to the parasitic capacitors is discussed in the following sections.

### 5.3.1 Analysis of the Parasitic-Insensitive Integrator

A parasitic-insensitive switched-capacitor (SC) integrator [106, 107] is shown in Figure 5.16(a). In the sampling mode (Figure 5.16(b)), $S_{1}$ and $S_{3}$ are on and $S_{2}$ and $S_{4}$ are off. Thereby, the sampling capacitor $\left(C_{S}\right)$ absorbs a charge equal to $C_{S} V_{i n}$ while the integrating capacitor $\left(C_{I}\right)$ holds the previous value. The channel charge injection in the transition from the sampling mode to the integration mode (Figure 5.16(c)) can be alleviated by proper switch timing. To this end, $S_{3}$ turns off first, then $S_{1}$ turns off, and finally $S_{2}$ and $S_{4}$ turn on (Figure 5.17). Since the output voltage is measured after node
$P$ is connected to ground, the final value of $V_{p}$ is fixed (zero). Thus, the charge injection or absorption of $S_{1}$ and $S_{2}$ does not affect the output voltage. Moreover, the voltage across the junction capacitance of $S_{3}$ and $S_{4}$ changes from near zero in the sampling mode to virtual ground in the integration mode. Since this voltage variation is very small, the charge stored on the junction capacitance is negligible. Consequently, only a constant charge from $S_{3}$ is injected onto $C_{S}$ which introduces a constant offset at the output. This offset is suppressed by the differential operation [80]. The parasitic capacitor $C_{p 1}$ is periodically switched between the input and ground. Also, the parasitic capacitor $C_{p 2}$ is periodically switched between the virtual ground and ground. Hence, $C_{p 1}$ and $C_{p 2}$ do not deliver any charge to $C_{I}$. Therefore, the output voltage is insensitive to the parasitic capacitors. Accordingly, there is no need to alleviate the effect of parasitic capacitors by enlarging $C_{S}$ and $C_{I}$. Thus, the parasitic-insensitive SC integrator is area efficient $[108,109]$.

(a)


(c)

Figure 5-16: (a) Parasitic-insensitive integrator (b) circuit of (a) in sampling mode, (c) circuit of (a) in integration mode [80]


Figure 5-17: timing diagram of the parasitic-insensitive integrator

The transfer function of the SC integrator is obtained from the charge-conversion analysis. In this analysis, $Q[t]$ is the total charge stored at instant $t$. Also, the input and output voltages at instant $t$ are denoted by $V_{\text {in }}[t]$ and $V_{\text {out }}[t]$. Accordingly [110],

$$
\begin{gather*}
Q\left[(n-1) T_{c k}\right]=V_{\text {in }}\left[(n-1) T_{c k}\right] C_{S}+V_{\text {out }}\left[(n-1) T_{c k}\right] C_{I}  \tag{5.43}\\
Q\left[\left(n-\frac{1}{2}\right) T_{c k}\right]=(0) C_{S}+V_{\text {out }}\left[\left(n-\frac{1}{2}\right) T_{c k}\right] C_{I} \tag{5.44}
\end{gather*}
$$

From the charge conservation $Q\left[(n-1 / 2) T_{c k}\right]=Q\left[(n-1) T_{c k}\right]$; thus,

$$
\begin{equation*}
V_{\text {out }}\left[\left(n-\frac{1}{2}\right) T_{c k}\right] C_{I}=V_{\text {in }}\left[(n-1) T_{c k}\right] C_{S}+V_{\text {out }}\left[(n-1) T_{c k}\right] C_{I} \tag{5.45}
\end{equation*}
$$

The charge stored on $C_{I}$ is constant during $\varphi_{1}$; hence

$$
\begin{equation*}
V_{\text {out }}\left[n T_{c k}\right]=V_{\text {out }}\left[\left(n-\frac{1}{2}\right) T_{c k}\right] \tag{5.46}
\end{equation*}
$$

Combining equations (5.45) and (5.46) gives

$$
\begin{equation*}
V_{\text {out }}\left[n T_{c k}\right]=V_{\text {in }}\left[(n-1) T_{c k}\right] \frac{C_{S}}{C_{I}}+V_{\text {out }}\left[(n-1) T_{c k}\right] \tag{5.47}
\end{equation*}
$$

which is the difference equation representation of the discrete-time integrator that was given in the chapter 3. The z -transform of this difference equation is

$$
\begin{equation*}
V_{\text {out }}(z)=\frac{C_{S}}{C_{I}} z^{-1} V_{\text {in }}(z)+z^{-1} V_{\text {out }}(z) \tag{5.48}
\end{equation*}
$$

Thereby, the transfer function of the parasitic-insensitive SC integrator is

$$
\begin{equation*}
H(z)=\frac{V_{\text {out }}(z)}{V_{\text {in }}(z)}=\frac{C_{S}}{C_{I}} \frac{z^{-1}}{1-z^{-1}} \tag{5.49}
\end{equation*}
$$

### 5.3.2 Speed and Precision Considerations

Since capacitors take infinite time to be fully charged, the output is measured when it is settled within a certain error band [111]. Thus, it is essential to consider the speedprecision trade-off in designing the integrator. For this purpose, the time constant of the circuit in each mode of operation must be calculated. In the sampling mode (Figure 5.16(b)) [80],

$$
\begin{equation*}
\tau_{\text {sam }}=\left(R_{o n 1}+R_{o n 3}\right) C_{S} \tag{5.50}
\end{equation*}
$$

where $R_{o n}$ is the on-resistance of the switch (i.e. transistor in the triode region).

Hence,

$$
\begin{equation*}
\tau_{s a m}=\frac{C_{S}}{\mu_{n} C_{o x}(W / L)}\left(\frac{1}{\left(V_{D D}-V_{i n}-V_{T H}\right)}+\frac{1}{\left(V_{D D}-V_{T H}\right)}\right) \tag{5.51}
\end{equation*}
$$

which indicates that a smaller sampling capacitor and a larger $W / L$ yield higher sampling frequency. However, as the switch turns off, $R_{\text {on }}$ generates thermal noise which is stored on the $C_{S}$. The RMS voltage of the sampled noise is [112]

$$
\begin{equation*}
v_{n}=\sqrt{k T / C_{S}} \tag{5.52}
\end{equation*}
$$

where $k$ is the Boltzmann constant, and $T$ is the absolute temperature. On the other hand, the channel charge injection introduces an error to the sampled voltage which is [80]

$$
\begin{equation*}
\Delta V \propto \frac{W L C_{o x}}{C_{S}} \tag{5.53}
\end{equation*}
$$

Therefore, $C_{S}$ must be sufficiently large to achieve a low noise and a low error. Since the effect of the channel charge injection is alleviated by the switch timing, it is possible to increase the speed by enlarging $W$.

Figure 5.18 depicts the equivalent circuit of Figure 5.16(c). The output resistance of the op-amp is denoted by $R_{\text {out }}$.


Figure 5-18: equivalent circuit of the parasitic-insensitive integrator in integration mode
$V_{i}=-V_{\text {in }}$; thus, KCL at the output node gives

$$
\begin{equation*}
\left(-V_{\text {in }} A_{v}-V_{\text {out }}\right) \frac{1}{R_{\text {out }}}=V_{\text {out }} \frac{C_{I} C_{S}}{C_{I}+C_{S}} s \tag{5.54}
\end{equation*}
$$

Thereby,

$$
\begin{equation*}
\frac{V_{\text {out }}}{V_{\text {in }}}(s)=\frac{-A_{v}}{1+\frac{C_{I} C_{S}}{C_{I}+C_{S}} R_{\text {out }} s} \tag{5.55}
\end{equation*}
$$

Hence, the time constant in the integration mode is

$$
\begin{equation*}
\tau_{\text {int }}=\frac{C_{I} C_{S}}{C_{I}+C_{S}} R_{\text {out }} \tag{5.56}
\end{equation*}
$$

which indicates that an op-amp with smaller $R_{\text {out }}$ yields higher integrating frequency. However, $A_{v}$ is directly proportional to $R_{\text {out }}$. As explained in chapter 4, smaller $A_{v}$ yields lower SNDR. Accordingly, there is a trade-off between the speed and precision requirements in the integration mode.

Differential amplifier is the simplest op-amp topology. Gain of the differential amplifier is relatively low. Adding cascode devices to the differential amplifier increases its output impedance. Thereby, differential cascode topologies attain higher gain than the differential amplifier. It is also possible to further increase the output impedance by gain boosting. However, as mentioned earlier, increasing the output impedance reduces the speed of the integrator. Moreover, higher gain in these configurations comes at the cost of higher power dissipation, lower output swing, and additional poles. Another method of increasing the gain of a differential amplifier is to add a second stage to it. The gain of the two-stage op-amp is comparable with that of a cascode op-amp. However, the speed of the two-stage op-amp is lower than the speed of a cascode op-amp [80].

The objective is to design an analogue DFT processor with lower power consumption than the digital FFT processor. Considering the power budget and the above comparison between the principal op-amp topologies, the differential amplifier has been selected. For the differential amplifier in Figure $5.19\left|A_{v}\right|=G_{m} R_{o u t}=g_{m 2}\left(r_{o 2}| | r_{o 4}\right)$, where $g_{m x}$ and $r_{o x}$ are the transconductance and the output resistance of $M_{x}$, respectively. Therefore, the speed and precision requirements in the integration mode can be met by increasing $G_{m}$ instead of $R_{\text {out }} . G_{m}=\sqrt{\mu_{n} C_{o x}(W / L) I_{S S}}$; hence, $G_{m}$ can be increased by choosing a larger aspect ratio for the input transistors or increasing the supply current. Increasing $W / L$ increases the input capacitance and reduces the speed; thus, $I_{S S}$ has to be increased.


Figure 5-19: differential amplifier with single-ended output [80]
$\tau_{\text {int }}$ only determines the time-domain response of the small-signal. For large-signal, speed is limited by the slew rate. Slewing is a nonlinear phenomenon that distorts the output. In order to calculate the slew rate, the op-amp of Figure 5.16(c) is replaced by the differential amplifier (Figure 5.20). Slewing occurs when the sampled voltage $\left|V_{s}\right|$ is so large that one transistor $\left(M_{1}\right.$ or $\left.M_{2}\right)$ carries the entire $I_{S S}$ and the other transistor turns off [80].


Figure 5-20: Slewing in the op-amp [80]

Since the feedback loop is broken, the output voltage is [80]

$$
\begin{equation*}
\left|V_{\text {out }}(t)\right|=I_{S S}\left(\frac{C_{S}+C_{I}}{C_{S} C_{I}}\right) t \tag{5.57}
\end{equation*}
$$

Slew Rate (SR) is the slope of the output voltage.

$$
\begin{equation*}
S R=I_{S S}\left(\frac{C_{S}+C_{I}}{C_{S} C_{I}}\right) \tag{5.58}
\end{equation*}
$$

Output voltage of the integrator during the $n$th integration period is [113]

$$
\begin{equation*}
V_{\text {out }}(t)=V_{\text {out }}\left(n T_{c k}-T_{c k}\right)+\alpha V_{s}\left(1-e^{-\frac{t}{\tau_{\text {int }}}}\right) \quad 0<t<\frac{T_{c k}}{2} \tag{5.59}
\end{equation*}
$$

where, $V_{s}=V_{i n}\left(n T_{c k}-T_{c k} / 2\right)$ and $\alpha$ is the integrator leakage. Maximum slope of the output voltage is

$$
\begin{equation*}
\left.\frac{d V_{\text {out }}}{d t}\right|_{t=0}=\frac{\alpha V_{s}}{\tau_{\text {int }}} \tag{5.60}
\end{equation*}
$$

In order to prevent slewing, the maximum slope of the $V_{\text {out }}$ must be lower than the SR [113]. Hence,

$$
\begin{equation*}
\frac{\alpha V_{S}}{R_{\text {out }}}<I_{S S} \tag{5.61}
\end{equation*}
$$

Thus, the lower limit of $I_{S S}$ is $\alpha V_{S} / R_{\text {out }}$ while its upper limit is determined by the power budget.

### 5.3.3 Circuit Realization

Based on the Nyquist theorem, the sampling frequency must be at least twice the signal frequency. Since the DFT processor should support WiFi and WiMAX standards, the maximum signal frequency is 20 MHz . Thus, the maximum sampling frequency is considered to be 80 MHz . Settling time of the op-amp is

$$
\begin{equation*}
T_{\text {set }} \approx 5 \tau_{\text {int }} \tag{5.62}
\end{equation*}
$$

The durations of the sampling mode and integration mode must be equal; thus, $\tau_{\text {int }}$ can be replaced by $\tau_{\text {sam }}$. Thereby, the unity gain bandwidth of the op-amp must be at least five times greater than the sampling frequency [114].

$$
\begin{equation*}
f_{U} \geq 5 f_{S} \tag{5.63}
\end{equation*}
$$

The output of the integrator should be sampled at $(N+1 / 2) T_{c k}$, when the DFT computation is complete. As the DFT length ( $N$ ) increases, more samples should be stored on the $C_{I}$. Thus, the required $C_{I}$ for the WiMAX standard becomes prohibitively large. Large capacitors demand large layout area. More importantly, due to the fact that the gain of the integrator is inversely proportional to the $C_{I}$, increasing the value of $C_{I}$ attenuates the signal. The attenuation might be so severe that ADC cannot detect the signal. Thus, the upper limit of $C_{I}$ is determined by the quantization level of ADC. To overcome this problem, the DFT sum is broken into partial sums (equation (5.64)) and $C_{I}$ is discharged after each partial sum is calculated. The results of partial sums can be added together in the Digital Signal Processor (DSP).

$$
\begin{equation*}
X(k)=\sum_{n=0}^{N-1} x(n) W_{N}^{n k}=\sum_{n=0}^{M-1} x(n) W_{N}^{n k}+\sum_{M}^{2 M-1} x(n) W_{N}^{n k}+\cdots+\sum_{N-M}^{N-1} x(n) W_{N}^{n k} \tag{5.64}
\end{equation*}
$$

To discharge $C_{I}$ at $\varphi_{M}, \mathrm{~S}_{5}$ and $\mathrm{S}_{6}$ connect both sides of $C_{I}$ to the input CM level of the Op-amp (Figure 5.21). In Figure 5.21, the input CM level of the Op-amp is shown by the ground symbol.


Figure 5-21: Parasitic-insensitive integrator with reset switches

Since $C_{I} \gg C_{S}, C_{I}$ cannot be fully discharged during the sampling time of $C_{S}$. Therefore, the number of integrators is doubled so that multipliers can switch between two integrators. Thereby, one integrator is calculating a partial sum while the output of the other integrator in being read.

Input of the integrator is connected to the output of the complex multiplier. Thus, the input CM level of the op-amp (Figure 5.19) is equal to the output CM level of the complex multiplier ( $V_{i n, C M}=1.2 \mathrm{~V}$ ). In order to keep $M_{2}$ in the saturation region, the output voltage should be $V_{\text {out }} \geq V_{\text {in, CM }}-V_{T H 2}$. With $V_{T H 2}=0.6 \mathrm{~V}$, the output voltage should be $V_{\text {out }} \geq 0.6 \mathrm{~V}$. Hence, $V_{\text {out }}=1.2 \mathrm{~V}$ is selected. Since $V_{\text {out }}=V_{D D}-\left|V_{G S 3}\right|$ and $\left|V_{T H 3}\right|=0.4 \mathrm{~V}$, the overdrive voltage of $M_{3}$ is $\left|V_{O D 3}\right|=0.2 \mathrm{~V}$.

Gain of the op-amp is $A_{v}=-g_{m 1}\left(r_{O 1} \| r_{O 3}\right)$. Also, $g_{m 1}=2 I_{D 1} / V_{O D 1}$. Therefore, $V_{O D 1}=0.1 \mathrm{~V}$ is selected to achieve a high gain. $V_{O D 5}=0.2 \mathrm{~V}$ is allocated to $M_{5}$. Considering the power budget (section 4.4.1), $I_{\text {Single-ended integrator }}=50 \mu \mathrm{~A}$ is allocated to $M_{5}$. With the bias current and overdrive voltage of each transistor known, the aspect ratios of the transistors can be determined. To minimize the device capacitances, the minimum length $(0.2 \mu \mathrm{~m})$ was chosen for all transistors except $M_{5}$. Table 5-3 shows the aspect ratios of the initial design which satisfies the swing and power budget specifications.

Table 5-3: initial aspect ratios of the op-amp

| Transistor | $M_{1}-M_{2}$ | $M_{3}-M_{4}$ | $M_{5}$ |
| :---: | :---: | :---: | :---: |
| $\frac{W}{L}\left(\frac{\mu m}{\mu m}\right)$ | $\frac{2}{0.2}$ | $\frac{4}{0.2}$ | $\frac{8}{0.8}$ |

Since $g_{m} r_{O} \propto \sqrt{W L / I_{D}}$, gain can be increased by increasing the width and length of the transistors [80]. Since $M_{1}-M_{4}$ appear in the signal path, their maximum lengths are determined by the bandwidth requirement $\left(f_{U}=400 \mathrm{MHz}\right)$.

Based on the equations 5.51 and 5.53 , switches of the integrator (Figure 5.21) must have large $W / L$ and small $W L$. Hence, the minimum length is chosen for switches. Thereby, the width of $1 \mu \mathrm{~m}$ is required to yield $f_{s} \geq 80 \mathrm{MHz}$. Since the source and drain terminals may interchange, the bulk terminal of the NMOS switches must be connected to the ground. Table 5-4 shows the final aspect ratios. By selecting $V_{\text {bias }}=0.6 \mathrm{~V}, M_{5}$ provides $54 \mu A$.

Table 5-4: final aspect ratios of the parasitic-insensitive integrator

| Transistor | $M_{1}-M_{2}$ | $M_{3}-M_{4}$ | $M_{5}$ | $S_{1}-S_{6}$ |
| :---: | :---: | :---: | :---: | :---: |
| $\frac{W}{L}\left(\frac{\mu m}{\mu m}\right)$ | $\frac{10}{1}$ | $\frac{20}{1}$ | $\frac{40}{4}$ | $\frac{1}{0.2}$ |

$C_{S}=50 f F$ is selected because it is the smallest capacitor that holds the sampled voltage without dropping due to charge leakage. $C_{I}$ must be at least ten times bigger than $C_{S}$, otherwise the circuit in Figure 5.21 acts as a Low Pass Filter (LPF) instead of an integrator. Figure 5.22 shows the output of a differential parasitic-insensitive integrator that sums 8 pieces of the piecewise continuous signal $(M=8)$. This integrator is realized with $C_{I}=1 p F$. The power consumption of the differential integrator is $395 \mu \mathrm{~W}$. While $\varphi_{M}=0 \mathrm{~V}$, the differential integrator is in the integration mode ( $C_{I}$ is charging). While $\varphi_{M}=2 V$, the differential integrator is in the reset mode ( $C_{I}$ is discharging).


Figure 5-22: output of a differential parasitic-insensitive integrator simulated in SPICE

### 5.4 Real-Time Recursive DFT Processor

As mentioned in the previous section, the DFT sum is broken into partial sums to overcome the limitation on $C_{I}$. To determine the maximum value of $M$ in equation 5.64, the impact of $C_{I}$ on the DFT processor performance must be analysed. For the purpose of this analysis, the analogue multiplier and the parasitic-insensitive integrator that were designed in previous sections are used to realize the real-time recursive DFT processor. An OFDM signal with QPSK modulation was applied to the input of the processor. SNDR curves of 8-point DFT with ideal devices were obtained for different values of $C_{I}$ (Figure 5.23). The SNDR curve of an 8 -point DFT with ideal integrators (integrations are performed by MATLAB) is also shown in Figure 5.23.


Figure 5-23: The SNDR curves of real-time recursive DFT processors with ideal devices

These results indicate that in the absence of mismatch the DFT processor with larger $C_{I}$ provides higher SNDR at low signal levels (input magnitude $\leq-25 \mathrm{dBV}$ ). At high signal levels (input magnitude >-25 dBV), however, the DFT processor with larger $C_{I}$ is more susceptible to the Op-amp saturation.

Using the device mismatch model that is provided in chapter 6, the impact of $C_{I}$ on the performance of the DFT processor is analysed (Figure 5.24). These results indicate that in the presence of device mismatch the DFT processor with larger $C_{I}$ is more susceptible to noise and distortion at low signal levels. This inference is in contrast to the inference from DFT with ideal devices (Figure 5.23). Reduction of the SNDR with $C_{I}$ increase is due to the fact that a DFT with larger $C_{I}$ has a lower gain. The dynamic range and the peak SNDR of DFT processors with $C_{I}=500 f F$ and $C_{I}=1 p F$ are almost the same. By selecting $C_{I}=1 p F$, the maximum length of partial sums in equation (5.64) becomes $M=8$.


Figure 5-24: SNDR curves of real-time recursive DFT processors in the presence of device mismatch

For multi-standard radio applications, DFT processor should compute Fourier transform with various lengths. Hence, the impact of the transform length on the DFT processor performance must be analysed. For the purpose of this analysis, an OFDM signal with BPSK modulation was applied to the input of the processor. Figure 5.25 shows the SNDR curves of 8 -point DFT and 16-point DFT with ideal devices. As mentioned earlier $M=8$. Hence, the 16 -point DFT is calculated by breaking the DFT sum into two partial sums and adding the results of partial sums in MATLAB (equation (5.64)). Results of this analysis indicate that in the absence of mismatch increasing the transform length does not affect the performance of the recursive DFT processor.


Figure 5-25: SNDR curves of real-time recursive DFT processors with different transform lengths

### 5.5 Accuracy of the Results

Figure 5.26 shows the design and verification steps that must pass to create an Integrated Circuit (IC). The architectural design was explained in chapter 3. The behavioural models and the system-level performance analysis were covered in chapter 4. This chapter and chapter 6 provide the circuit design and the circuit-level performance analysis.


Figure 5-26: steps in the integrated circuit design flow [115]

Interconnects properties (i.e. series resistance and parallel capacitance) impact the performance of the circuit. For long interconnects, the parasitic resistance and capacitance cause signal delay. Also, the series resistances in supply and ground lines create dc and transient voltage drops. Besides, charging extra capacitances increases the power consumption. To increase the accuracy of the circuit model, parasitic devices should be extracted from the layout design and annotated on the pre-layout schematic netlist [80].

The physical verification step requires the access to the design rules for layout. Since the Process Design Kit (PDK) was not available, results of the pre-layout simulations are provided in this chapter and in the next chapter. For frequencies below 100 MHz , results of the pre-layout simulations are in good agreement with the experimental results [116]. The sampling frequency of the Switched-Capacitor integrator is $f_{s}=80 \mathrm{MHz}$. Hence, results of the circuit-level performance analysis are reliable.

### 5.6 Summary

The real-time recursive DFT processor is realized by analogue multipliers in conjunction with switched capacitor integrators. Differential circuits have an oddsymmetric input/output characteristic; hence, they do not produce even harmonics. Accordingly, to enhance the nonlinearity cancellation, a fully differential configuration is used. The advantage of the proposed design approach over the previous designs is that it is both reconfigurable and area efficient. In this chapter, speed-power-accuracy trade-offs in circuits with ideal devices has been discussed. In order to analyse the impact of the transform length on the DFT processor performance, an 8-point DFT and a 16-point DFT were simulated with ideal devices. Results of this analysis indicate that in the absence of mismatch increasing the transform length does not affect the performance of the recursive DFT processor. The performance of the real-time recursive DFT processor in the presence of device mismatch will be analysed in the next chapter.

## Chapter 6

## Device Mismatch Analysis and

## ReSULTS

In the previous chapter, the real-time recursive DFT processor was simulated with perfectly symmetric circuits. In reality, however, uncertainties in the manufacturing process lead to mismatch between nominally identical devices. In this chapter, the impact of device mismatch on the performance of the circuit is analysed. To this aim, first the mismatch models available in the open literature are reviewed. Then, the design tradeoffs that impose limitations on the performance of analogue signal processors are explained. Next, the effect of technology scaling on mismatch is discussed. Results of the mismatch analysis are presented and compared with previous work. Finally, some techniques that can mitigate the effect of device mismatch are briefly described.

### 6.1 MOS Transistor Matching Models

Generally, process mismatch analysis is based on global and local variations. Global mismatch is the total variation over a wafer or a batch. Local mismatch occurs between adjacent devices on the same chip. For a matched pair of MOS transistors, threshold voltage differences $\Delta V_{T H}$ and current factor differences $\Delta \beta$ ( $\left.\beta=\mu C_{o x} W / L\right)$ are the dominant sources of mismatch [117].

Pelgrom's mismatch model [117] describes the behaviour of $\Delta V_{T H}$ and $\Delta \beta$ as the spatial variations of device parameters

$$
\begin{gather*}
\sigma^{2}\left(\Delta V_{T H}\right)=\frac{A_{V T H}^{2}}{W L}+S_{V T H}^{2} D^{2}  \tag{6.1}\\
\frac{\sigma^{2}(\Delta \beta)}{\beta^{2}}=\frac{A_{W}^{2}}{W^{2} L}+\frac{A_{L}^{2}}{W L^{2}}+\frac{A_{\mu}^{2}}{W L}+\frac{A_{C o x}^{2}}{W L}+S_{\beta}^{2} D^{2} \approx \frac{A_{\beta}^{2}}{W L}+S_{\beta}^{2} D^{2} \tag{6.2}
\end{gather*}
$$

where $A_{P}$ is the area proportionality constant for parameter $P, S_{P}$ is the variation of parameter $P$ with spacing $D, W$ is the effective width and $L$ is the effective length of the channel, $\mu$ is the mobility of charge carriers, and $C_{o x}$ is the gate oxide capacitance per unit area. Equations (6.1) and (6.2) show that local variations decrease as the effective channel area ( $W L$ ) increases; whereas global variations ( $S_{V T H}$ and $S_{\beta}$ ) are independent of the device dimensions.

Since the advent of the submicron technologies, more accurate mismatch models have been proposed $[118,119]$. However, these models require the access to the standard cell libraries and Process Design Kit (PDK). Similarly, global variations of the Pelgrom's mismatch model require the access to the design rules for layout. Since the design of analogue circuits is based on the device size, local variations are the main focus of attention for circuit designers. Hence, in the absence of the standard cell libraries and PDK, the experimental data available in the open literature was used to model the local variations described by Pelgrom.

Combining (6.1) and (6.2) yields the drain current mismatch in the saturation region [120, 121]

$$
\begin{equation*}
\frac{\sigma^{2}\left(\Delta I_{D}\right)}{I_{D}{ }^{2}}=4 \frac{\sigma^{2}\left(\Delta V_{T H}\right)}{\left(V_{G S}-V_{T H}\right)^{2}}+\frac{\sigma^{2}(\Delta \beta)}{\beta^{2}} \tag{6.3}
\end{equation*}
$$

As explained in chapter 5, signal swing requirements limit the overdrive voltage of each transistor to less than 0.65 V . For $\left(V_{G S}-V_{T H}\right)<0.65 \mathrm{~V}, \Delta V_{T H}$ is the main source of the drain current mismatch $[120,121]$. Thus, the contribution of the $\Delta \beta$ mismatch can be neglected. In conclusion, a simplified version of the Pelgrom's mismatch model can be used.

$$
\begin{equation*}
\sigma^{2}\left(\Delta V_{T H}\right)=\frac{A_{V T H}^{2}}{W L} \tag{6.4}
\end{equation*}
$$

### 6.2 MOS Transistor Optimum Matching

Pelgrom's model describes the $\Delta V_{T H}$ mismatch with a variance inversely proportional to the effective transistor channel area ( $W L$ ). Accordingly, mismatch can be reduced by increasing the effective channel area. The effective channel dimensions are defined as

$$
\begin{gather*}
W_{e f f}=W_{\text {drawn }}-2 W_{D}  \tag{6.5}\\
L_{e f f}=L_{\text {drawn }}-2 L_{D} \tag{6.6}
\end{gather*}
$$

where $W_{\text {drawn }}$ and $L_{\text {drawn }}$ are the layout dimensions, $L_{D}$ is the side diffusion of source and drain, and $W_{D}$ is the field oxide encroachment upon the channel. A short channel (large $W_{\text {drawn }} / L_{\text {drawn }}$ ) and a narrow channel (small $W_{\text {drawn }} / L_{\text {drawn }}$ ) for devices with equal drawn areas are shown in Figure 6.1. Since a narrow channel has larger effective area than a short channel, devices with smaller $W / L$ provide better matching. Optimum matching is achieved when [122]

$$
\begin{equation*}
\frac{W_{\text {drawn }}}{L_{\text {drawn }}}=\frac{W_{D}}{L_{D}} \tag{6.7}
\end{equation*}
$$


(a)

(b)

Figure 6-1: Equal drawn area devices (a) short channel (b) narrow channel

Moreover, considering the $\Delta V_{T H}$ variation, drain current in the saturation region can be expressed as

$$
\begin{equation*}
I_{D}=\frac{\mu C_{o x}}{2} \frac{W}{L}\left(V_{G S}-V_{T H}-\Delta V_{T H}\right)^{2} \tag{6.8}
\end{equation*}
$$

Expanding the square term gives

$$
\begin{equation*}
I_{D}=\frac{\mu C_{o x}}{2} \frac{W}{L}\left(V_{G S}-V_{T H}\right)^{2}-\mu C_{o x} \frac{W}{L}\left(V_{G S}-V_{T H}\right) \Delta V_{T H}+\frac{\mu C_{o x}}{2} \frac{W}{L} \Delta V_{T H}{ }^{2} \tag{6.9}
\end{equation*}
$$

First term is the ideal drain current. Last term is negligible due to small $\Delta V_{T H}{ }^{2}$. Hence, second term is the dominant mismatch, which can be reduced by minimizing the $W / L$ aspect ratio. In summary, sensitivity to $\Delta V_{T H}$ mismatch is minimized by minimizing $W / L$ and maximizing $W L$ [59]. These conditions can be met if the channel length is maximized. Tradeoffs that impose an upper limit on the channel length will be discussed in the next section.

### 6.3 Impact of Mismatch on the Performance Tradeoffs

In the previous section, it has been discussed that mismatch can be reduced by increasing the channel area. However, tradeoffs in the design of the analogue circuits impose an upper limit on the device area. In view of the power budget and system specifications, circuit designers must investigate the optimal design.

In the presence of mismatch, circuits that were designed in chapter 5 (the multiplier and the op-amp of the SC integrator) suffer from dc offset at their output. The output dc offset can be defined as the input-referred offset voltage that makes the output voltage zero. Hence, accuracy (ACC) can be measured by [121]

$$
\begin{equation*}
A C C=\frac{V_{\text {in } R M S}}{3 \sigma\left(V_{O S}\right)} \tag{6.10}
\end{equation*}
$$

where $V_{\text {in } R M S}$ is the RMS of the input signal, and $V_{O S}$ is the input-referred offset voltage of the circuit (multiplier or op-amp). Since $V_{O S}$ is strongly dependent on the contribution of the input differential pair [121],

$$
\begin{equation*}
A C C \approx \frac{V_{\text {in } R M S} \sqrt{W L}}{3 A_{V T H}} \tag{6.11}
\end{equation*}
$$

where $W$ and $L$ are the width and length of the input devices, respectively. Hence, accuracy can be improved by increasing the channel area. However, increasing the device area increases the input capacitance [121]

$$
\begin{equation*}
C_{i n}=\frac{C_{g s}}{2}=\frac{1}{2} \cdot \frac{2 C_{o x} W L}{3} \tag{6.12}
\end{equation*}
$$

where $C_{i n}$ is the input capacitance of the circuit (multiplier or op-amp), and $C_{g s}$ is the junction capacitance between the gate and the source of the input device.

Combining (6.11) and (6.12) yields

$$
\begin{equation*}
A C C^{2} \approx \frac{C_{i n} V_{\text {in } R M S}{ }^{2}}{3 C_{o x} A_{V T H}{ }^{2}} \tag{6.13}
\end{equation*}
$$

The energy stored on the input capacitor is calculated by [55]

$$
\begin{equation*}
E=\frac{1}{2} C_{i n} V_{\text {in } R M S}{ }^{2} \tag{6.14}
\end{equation*}
$$

Hence, the power consumption of the circuit (multiplier or op-amp) is

$$
\begin{equation*}
P=\frac{E}{\tau}=\frac{f C_{i n} V_{\text {in RMS }}^{2}}{2} \tag{6.15}
\end{equation*}
$$

where $\tau$ and $f$ are the time constant and the operating frequency of the circuit, respectively. Combining (6.13) and (6.15) yields

$$
\begin{equation*}
P \approx \frac{3}{2} f C_{o x} A_{V T H}^{2} A C C^{2} \tag{6.16}
\end{equation*}
$$

Replacing the operating frequency with the circuit bandwidth gives

$$
\begin{equation*}
\frac{B W A C C^{2}}{P} \approx \frac{2}{3 C_{o x} A_{V T H}^{2}} \tag{6.17}
\end{equation*}
$$

which is the bandwidth-accuracy-power trade-off of the circuit (multiplier or op-amp). This trade-off is only determined by the technology parameters $C_{o x} A_{V T H}{ }^{2}$ and circuit designer has no influence on the overall trade-off. Increasing the device area increases both accuracy and input capacitance (equations (6.11) and (6.12)). For constant power, as input capacitance increases, operating frequency decreases (equation (6.15)). Thus, increasing the accuracy reduces the bandwidth.

Multiplications are performed at 20 MHz . Based on the results of the simulations in the previous chapter, the power consumption of the real multiplier is $120 \mu \mathrm{~W}$. Hence, accuracy of the multiplier is

$$
\begin{equation*}
A C C_{M}^{2} \approx \frac{4 \times 10^{-12}}{C_{o x} A_{V T H}^{2}} \tag{6.18}
\end{equation*}
$$

On the other hand, the single-ended integrator operates at 80 MHz and consumes $198 \mu \mathrm{~W}$ power. Thus, accuracy of the op-amp is

$$
\begin{equation*}
A C C_{O p}^{2} \approx \frac{1.65 \times 10^{-12}}{C_{o x} A_{V T H}^{2}} \tag{6.19}
\end{equation*}
$$

Hence, the technology parameters $C_{o x} A_{V T H}{ }^{2}$ have more impact on $A C C_{o p}{ }^{2}$ than $A C C_{M}{ }^{2}$.

### 6.4 Impact of Technology Scaling on the Mismatch

Technology scaling reduces the gate oxide thickness $\left(t_{o x}\right)$ and increases the substrate doping level. Reduction in the $t_{o x}$ reduces the $A_{V T H}$. However, increase in the substrate doping level increases the $A_{V T H}[117,123]$.

The reduction in the power supply voltage by technology scaling leads to the reduction in the power consumption and signal swing. Reduction in the signal swing leads to a quadratic reduction in the dc accuracy, while power consumption is reduced linearly.

Linear reduction of $A_{V T H}$ with feature size implies that deeper submicron technologies have better matching for devices occupying a constant area. However, quadratic reduction in the dc accuracy is more significant than the linear reduction of $A_{V T H}$ [121].

### 6.5 Mismatch Analysis Results

Sensitivity of the real-time recursive DFT processor to device mismatch is analysed using the Pelgrom's model described in section 6.1. Accordingly, the $\Delta V_{T H}$ random variation has a normal distribution with zero mean and a variance described by the equation (6.4). Thereby, a pair of matched devices named $M_{1}$ and $M_{2}$, with $\delta V_{T H i}$ random variation for each device, has random difference $\Delta V_{T H}=\delta V_{T H 1}-\delta V_{T H 2}$. Hence, variance of each device is

$$
\begin{equation*}
\sigma^{2}\left(\delta V_{T H i}\right)=\frac{A_{V T H}^{2}}{2 W L} \tag{6.20}
\end{equation*}
$$

$V_{T H}$ mismatch is modelled by an error voltage source in series with the gate of the ideal device (Figure 6.2). The $A_{V T H}$ proportionality constant is extracted from the experimental results of a study on the TSMC $0.18 \mu \mathrm{~m}$ mixed signal CMOS technology with 1.8 V supply voltage [124]. Based on this study, a device with cross-coupled layout configuration has the minimum $A_{V T H}$. Thus, assuming that the layout configuration is cross-coupled, $A_{V T H}=1.7 \mathrm{mV} \mu \mathrm{m}$ for NMOS and $A_{V T H}=1.74 \mathrm{mV} \mu \mathrm{m}$ for PMOS are used in the device mismatch analysis.


Figure 6-2: Modeling Vтн variations using a DC voltage source in series with the MOS gate terminal

The Monte Carlo analysis is performed for the real-time recursive DFT processors of length 8 and 16 . For the purpose of this analysis, OFDM signals with BPSK and QPSK modulations are generated by MATLAB/Simulink. The MATLAB code and the SPICE netlist are available in Appendix B. The results of the Monte Carlo analysis for the BPSK modulated signal are shown in Figure 6.3.


Figure 6-3: Mismatch analysis results of the real-time recursive DFT processor of length 8

Table 6-1 gives the statistics of the Monte Carlo analysis. The dynamic ranges are obtained by measuring the width of the SNDR curves at the minimum required SNDR. According to Table 4-1, the minimum receiver SNDR requirements for the OFDM signals with BPSK and QPSK modulations are 3 dB and 5 dB , respectively. As explained in section 4.2, the minimum required dynamic rage for the BPSK and QPSK modulated signals are 34 dB and 36 dB , respectively.

Table 6-1: Summary of the Monte Carlo analysis for the recursive DFT processors of length 8

| $(\mathrm{dB})$ | Dynamic range |  | Peak SNDR |  |
| :---: | :---: | :---: | :---: | :---: |
|  | BPSK | QPSK | BPSK | QPSK |
| Mean | 36.3 | 33.2 | 22.5 | 22.3 |
| Standard deviation | 1.6 | 1.6 | 1.3 | 1.3 |

For BPSK modulated signal, the dynamic range histograms of the 8-point DFT processor and the 16 -point DFT processor are shown in Figure 6.4 and Figure 6.5, respectively. The average dynamic range of the 16 -point DFT processor is 33.4 dB . Hence, doubling the length of the DFT processor reduces the average dynamic range by 3 dB . For QPSK modulated signal, the dynamic range histogram of the 8 -point DFT processor is depicted in Figure 6.6. Table 6-2 provides the results of the yield prediction.


Figure 6-4: dynamic range histogram of the 8-point DFT processor for BPSK modulated signal


Figure 6-5: dynamic range histogram of the 16-point DFT processor for BPSK modulated signal


Figure 6-6: dynamic range histogram of the 8-point DFT processor for QPSK modulated signal

Table 6-2: Summary of the yield prediction for the recursive DFT processors of length 8 and 16

| DFT Length | BPSK | QPSK |
| :---: | :---: | :---: |
| 8 | $97.5 \%$ | $8.9 \%$ |
| 16 | $43.4 \%$ | - |

Table 6-3 compares the performance of the proposed architecture with an analogue FFT processor. For the purpose of this comparison, OFDM signal with BPSK modulation is used. Also, dynamic range is measured at 7 dB which is the minimum required SNDR for the DT FFT. Dynamic range and peak SNDR of the proposed architecture are 20.7 dB and 13.5 dB less than the DT FFT processor, respectively.

Table 6-3: Performance comparison of the analogue Fourier Transform processors

| Performance Metric | Proposed DFT | DT FFT [57] |
| :---: | :---: | :---: |
| CMOS Technology | 180 nm | 130 nm |
| Supply Voltage | 1.8 V | 1.2 V |
| Input Frequency | 20 MHz | 1 GHz |
| Operating Frequency | 80 MHz | 100 MHz |
| Length | 8 | 8 |
| Peak SNDR | 22.5 dB | 36 dB |
| Dynamic Range | 28.3 dB | 49 dB |
| Power Consumption | 10 mW | 25 mW |

As explained in chapter 5, sampling frequency of the SC integrator must be at least twice the signal frequency. Hence, operating frequency of the recursive DFT processor is greater than its input frequency. On the other hand, owing to the serial to parallel conversion in the DT FFT processor, operating frequency of the DT FFT processor is less than its input frequency. Hence, parallel processing relaxes the bandwidth requirement of multipliers in the DT FFT processor.

Normalizing the power consumption of the DT FFT processor to the 180 nm technology, 1.8 V supply voltage, and 80 MHz operating frequency gives

$$
\begin{equation*}
\text { Normalized Power }=\frac{25 \mathrm{~mW}}{\left(\frac{130 \mathrm{~nm}}{180 \mathrm{~nm}}\right)\left(\frac{100 \mathrm{MHz}}{80 \mathrm{MHZ}}\right)\left(\frac{1.2 \mathrm{~V}}{1.8 \mathrm{~V}}\right)^{2}}=62.3 \mathrm{~mW} \tag{6.21}
\end{equation*}
$$

Hence, power consumption of the recursive DFT processor is about $1 / 6$ of the power consumption of the DT FFT processor. As explained in sections 6.3, linear reduction in the power consumption leads to quadratic reduction in the dc accuracy. Thus, lower peak SNDR and lower dynamic range of the recursive DFT processor are due to its lower power consumption. Other factors that contribute to the performance degradation of the recursive DFT processor are investigated in the next section.

### 6.6 Root Cause Analysis

According to the equation (2.12), the number of analogue multipliers that are required to implement a Radix-2 FFT of length 8 is 104 . On the other hand, the real-time recursive DFT performs 256 multiplications to compute DFT of length 8 . Results of the system-level performance analysis in chapter 4 showed that the proposed DFT processor has better performance than the FFT processor. Hence, performance degradation of the proposed DFT processor in the previous section is not due to the number of multiplications. This conclusion leads to the realization that the values of the mismatch parameters in the system-level performance analysis were under estimated.

To find which non-ideality makes the most contribution to the performance degradation, the effect of each non-ideality is analysed individually. The effect of multipliers saturation is revealed by the SNDR curve of a recursive DFT processor with ideal integrators (integrations are performed by MATLAB). On the other hand, the effect of integrators (Op-amps) saturation is shown by the SNDR curve of a recursive DFT processor with Switched-Capacitor integrators. The aforementioned curves are shown in Figure 6.7.


Figure 6-7: The SNDR curves of 8-point recursive DFT processors with ideal devices

The dynamic range is obtained by measuring the width of the SNDR curves at the minimum required SNDR. For the comparison between the recursive DFT and the DT FFT processors (Table 6-3), the dynamic range was measured at 7 dB . A comparison between the blue and the red curves at $\mathrm{SNDR}=7 \mathrm{~dB}$ reveals that saturation of the Opamps in the SC integrators reduces the dynamic range by 7 dB . The linear range of the multipliers in the recursive DFT processor is greater than the linear range of the multipliers in the DT FFT processor [57]. Hence, at high signal levels (input magnitude $>-25 \mathrm{dBV}$ ), the Op-amp saturation is the reason of achieving a dynamic range less than the dynamic range of the DT FFT.

The effect of multipliers' device mismatches is revealed by a comparison between the SNDR curves of a recursive DFT with ideal integrators in the absence and presence of device mismatches (the blue and the red curves in Figure 6.8). Measuring the widths of the aforementioned curves at $\mathrm{SNDR}=7 \mathrm{~dB}$ shows that multipliers' device mismatches reduce the dynamic range by 20 dB .


Figure 6-8: The SNDR curves of 8-point recursive DFT processors

The effect of SC integrators' device mismatches is revealed by a comparison between the SNDR curves of a recursive DFT with ideal integrators and a recursive DFT with SC integrators in the presence of device mismatches (the red and the yellow curves in Figure 6.8). A comparison between the widths of the aforementioned curves at $\mathrm{SNDR}=$ 7 dB shows that SC integrators' device mismatches reduce the dynamic range by 3 dB .

This analysis reaches to the conclusion that multipliers' device mismatches make the most contribution to the dynamic range reduction. Increasing the power consumption of the multiplier can improve its accuracy. Other methods of mitigating the effect of device mismatch are discussed in the next section.

### 6.7 Mitigation of the Effect of Device Mismatch

As explained in the previous section, multipliers' device mismatches reduce the dynamic range significantly. Two approaches that can be taken to solve this problem are electronic offset cancellation and error correction techniques. The topology of an offset cancellation technique, which can be used for transconductance multipliers, is shown in Figure 6.9 [80]. Each $G_{m}$ stage is a differential pair and the R stage is a transimpedance amplifier.


Figure 6-9: Offset cancellation by an auxiliary transconductance in a negative feedback loop [80]

Suppose that first only $\mathrm{S}_{1}$ and $\mathrm{S}_{2}$ are on, thus $V_{\text {out }}=G_{m 1} V_{O S 1} R$. Then, assuming that $\mathrm{S}_{3}$ and $\mathrm{S}_{4}$ are on, a negative feedback loop is made with R and $G_{m 2}$. Thus, $V_{\text {out }}=$ $G_{m 1} V_{O S 1} R$ is stored across $C_{1}$ and $C_{2}$. Afterwards, $G_{m 2}$ converts the voltage across capacitors to $I_{o u t ~} 2=G_{m 2} G_{m 1} V_{O S 1} R$. When $V_{\text {in }}$ is connected, $G_{m 2}$ adds an offset correction current at nodes X and Y [80]. Taking the offset voltage of $G_{m 2}$ into account, the stored voltage on $C_{1}$ and $C_{2}$ is [80]

$$
\begin{equation*}
V_{\text {out }}=\left[G_{m 1} V_{\text {oS } 1}-G_{m 2}\left(V_{\text {out }}-V_{O S 2}\right)\right] R \tag{6.22}
\end{equation*}
$$

Thereby,

$$
\begin{equation*}
V_{o u t}=\frac{G_{m 1} R V_{o s 1}+G_{m 2} R V_{o s 2}}{1+G_{m 2} R} \tag{6.23}
\end{equation*}
$$

Hence, the offset voltage referred to the main input is

$$
\begin{equation*}
V_{O S, \text { tot }}=\frac{V_{\text {out }}}{G_{m 1} R}=\frac{V_{O S 1}}{1+G_{m 2} R}+\frac{G_{m 2}}{G_{m 1}} \frac{V_{O S 2}}{1+G_{m 2} R} \approx \frac{V_{O S 1}}{G_{m 2} R}+\frac{V_{O S 2}}{G_{m 1} R} \tag{6.2}
\end{equation*}
$$

If $G_{m 2} R \gg 1$ and $G_{m 1} R \gg 1$, then $V_{O S, \text { tot }}$ is very small. However, $G_{m 1} R$ is the gain of the multiplier. The linear range of the multiplier decreases as its gain increases. Therefore, the aforementioned offset cancellation technique imposes a trade-off between dynamic range and accuracy. Moreover, due to the large area overhead of the offset cancellation techniques, they cannot be widely applied.

Tradeoffs in the electronic offset cancellation justify the use of error correction techniques. These techniques can be divided into three main categories; error correction codes, equalizers, and signal processing algorithms. One study showed that Turbo Product Code (TPC) effectively mitigates the mismatch loss of a 256 -point analogue FFT [87]. It is also shown that Minimum Mean Square Error (MMSE) and Least Mean Square (LMS) equalizers can mitigate the performance degradation of the DFT processor implemented on a FPAA (Figure 6.10) [93]. Another study proposed an iterative signal processing algorithm to recover the output of a 64-point analogue FFT [125]. Neural Networks can also be applied to assist the detection of the received symbols.


Figure 6-10: Performance comparison of a 4-ponit analogue DFT implemented on a FPAA [93]

### 6.8 Summary

In this chapter, it is discussed that CMOS device matching depends on the bias point. For typical bias points, the threshold voltage is the dominant source of mismatch. Mismatch also depends on the device area and technology. Circuit designers can take these relations into account to optimize matching. Nevertheless, the bandwidth-accuracy-power trade-off of the system is only determined by the technology parameters; hence, the circuit designer has no influence on the overall trade-off. Mismatch analysis results of the recursive DFT processor indicate that increasing the transform length degrades the performance. Also, the average dynamic range of the recursive DFT processor cannot meet the minimum requirement for the QPSK signal. The root cause analysis revealed that multipliers' device mismatches make the most contribution to the dynamic range reduction. Increasing the power consumption of the multiplier can improve its accuracy. Moreover, error correction techniques such as TPC can mitigate the mismatch loss.

## Chapter 7

## Conclusion and Future Work

Since analogue DFT processors consume significantly less power than digital DFT processors, they have been nominated as the next generation of the DFT processors. This work was motivated by the goal of evolving the next generation of the DFT processors. In view of that, a power-scalable variable-length analogue DFT processor has been proposed. The proposed DFT processor has application in multi-standard OFDM transceivers. This chapter presents the contributions to knowledge, the concluding remarks, and the remaining work for the future.

### 7.1 Contributions to Knowledge

### 7.1.1 Methodology

Since the classical DFT architectures (i.e. FIR DFT and FFT) were originally designed for discrete-time signal processing, they do not take advantage of analogue signals. Specifically, these architectures require an analogue decimation filter ahead of them. Moreover, the analogue implementations of the classical DFT architectures are not power-scalable. Hence, the real-time recursive DFT architecture has been proposed. In this architecture, the DFT coefficients are formed into piecewise continuous signals. Thereby, the continuous baseband signal is piecewise weighted by the DFT coefficients.

Since the proposed architecture performs multiplications serially, it does not require additional multipliers to compute DFT of a longer sequence. Moreover, the power consumption is scalable with the transform length. Hence, the real-time recursive DFT architecture is suitable for the power-scalable variable-length DFT processor.

In the classical DFT architectures, each multiplier provides the real multiplication of a signal sample and a DFT coefficient. On the other hand, in the real-time recursive DFT architecture, each multiplier provides the element-wise multiplication of a onedimensional array of the DFT coefficients and the continuous baseband signal. Hence, comparing to the classical architectures, the proposed architecture has the lowest number of multipliers. Moreover, since multiplications are performed without sampling, the analogue decimation filter is eliminated. Besides, the proposed architecture avoids propagation of the computational error to all DFTs by computing each DFT independently.

### 7.1.2 Limitations and Considerations

Reducing the dynamic range requirement of the ADC by moving the DFT processor from the digital back-end to the analogue front-end is at the cost of increasing the dynamic range requirement of the DFT processor.

As data rate increases, the minimum SNDR and the minimum dynamic range requirements increase. On the other hand, as SNDR increases, width of the SNDR curve decreases. Therefore, as data rate increases, yield of the analogue DFT processor decreases. Results of the circuit-level performance analysis indicate that the 8 -point recursive DFT processor has a yield of $97.5 \%$ for the BPSK modulated signal. For the QPSK modulated signal, however, yield of the 8-point recursive DFT processor is $8.9 \%$. Hence, dynamic range of the recursive DFT processor must be increased.

In the absence of mismatch increasing the transform length does not affect the performance of the recursive DFT processor. However, in the presence of mismatch, doubling the transform length reduces the average dynamic range by 3 dB . The 16 -point recursive DFT processor has a yield of $43.4 \%$ for the BPSK modulated signal.

As the DFT length increases, more samples should be stored on the integrating capacitor $\left(C_{I}\right)$. Thus, the required $C_{I}$ for the WiMAX standard becomes prohibitively large. Hence, the DFT sum was broken into partial sums (equation (5.64)). The results of partial sums can be added together in the Digital Signal Processor (DSP). The maximum length of the partial sum (8) was determined by finding the optimum value for $C_{I}(1 \mathrm{pF})$.

Sampling frequency of the SC integrator must be at least twice the signal frequency. Also, unity gain bandwidth of the op-amp in the SC integrator must be at least five times greater than the sampling frequency. Hence, unity gain bandwidth of the op-amp must be at least ten times greater than the signal frequency. In contrast, serial-to-parallel conversion in analogue FFT processors relaxes the bandwidth requirement of multipliers. While the analogue FFT processor was proposed for Ultra-Wideband OFDM wireless transceivers [57], the real-time recursive DFT processor is proposed for WiFi and WiMAX standards. The maximum channel bandwidth of WiFi and WiMAX standards is 20 MHz .

Trade-offs in the design of the analogue circuits impose limitations on the performance of analogue DFT processors. The bandwidth-accuracy-power trade-off is only determined by the technology parameters and circuit designer has no influence on the overall trade-off. This thesis provides a proof-of-concept for the power-scalable variable-length analogue DFT processor. The real-time recursive DFT processor was designed in 180 nm CMOS technology. The design process and the results of the circuit-level performance analysis provide guidelines for future designers to select a technology that satisfies the performance requirements for another application.

### 7.2 Future Work

### 7.2.1 Design Enhancements

Previous works on the analogue FFT processor were designed in 130 nm and 180 nm CMOS technologies [57-59]. In order to compare the performance of the proposed DFT processor with the analogue FFT processor, the real-time recursive DFT processor was designed in 180 nm CMOS technology. Equation 6.17 can be used to select a technology that provides higher accuracy while bandwidth and power meet the design specifications.

Even though $C_{I}$ was selected carefully to prevent the reduction of dynamic range (section 4.4.3), the root cause analysis showed that integrator saturation reduces the dynamic range by 7 dB . This problem can be resolved by reducing the input CM level of the Op-amp $\left(V_{i n, С м}\right)$. Since input of the integrator was connected to the output of the multiplier, $V_{\text {in,CM }}$ was set equal to the output CM level of the multiplier. By adding a source follower between multiplier output and integrator input $V_{i n, С м}$ can be shifted to a lower level.

Performance comparison between the recursive DFT processor and the DT FFT processor [57] showed that dynamic range of the DT FFT is 20.7 dB higher than the recursive DFT. The root cause analysis showed that multipliers' device mismatches made the most contribution to the dynamic range reduction. Hence, the four-quadrant multiplier that was used in [57] is less sensitive to device mismatch than the designed Gilbert cell. Therefore, replacing the Gilbert cell multipliers by the multiplier in [57] can increase the dynamic range.

### 7.2.2 Further Analysis

Since the Process Design Kit (PDK) was not available, post-layout simulations were not performed. Nevertheless, since sampling frequency of the Switched-Capacitor integrator was below 100 MHz , results of the pre-layout simulations were reliable. For frequencies above 100 MHz , however, it is essential to extract the parasitic devices and perform the post-layout simulations.

In order to investigate the effectiveness of different mismatch mitigation techniques, the trade-off in the offset cancellation techniques and the effectiveness of different error correction techniques must be analysed. A hybrid of electronic offset cancellation and error correction might resolve the problem.

## LIST OF REFERENCES

1. Smaini, L., RF Analog Impairments Modeling for Communication Systems Simulation: Application to OFDM-based Transceivers. 2012: Wiley.
2. Romeu, J. and A. Elias. Early proposals of wireless telegraphy in Spain: Francisco Salva Campillo (1751-1828). in Antennas and Propagation Society International Symposium, 2001. IEEE. 2001.
3. Michaelis, A.R., From Semaphore to Satellite. 1965, Geneva International Telecommunication Union.
4. Ronalds, F., Descriptions of an Electrical Telegraph: And of Some Other Electrical Apparatus. 1823: R. Hunter.
5. Makhrovskiy, O.V. 180 Years of telecommunication in Russia. in HISTory of ELectro-technology CONference (HISTELCON), 2012 Third IEEE. 2012.
6. Haykin, S.S., Communication systems. 2001: Wiley.
7. Klooster, J.W., Icons of Invention: The Makers of the Modern World from Gutenberg to Gates. 2009: Greenwood Press.
8. Bourseul, C., Transmission électrique de la parole. L'Illustration, 1854.
9. Pizer, R.A., The Tangled Web of Patent \#174465. 2009: AuthorHouse.
10. Evenson, A.E., The Telephone Patent Conspiracy of 1876: The Elisha GrayAlexander Bell Controversy and Its Many Players. 2000: McFarland.
11. Coe, L., The Telephone and Its Several Inventors: A History. 2006: McFarland \& Company.
12. Beauchamp, C., Invented by Law: Alexander Graham Bell and the Patent That Changed America. 2015: Harvard University Press.
13. Braun, K.F., Electrical oscillations and wireless telegraphy. 1909, [Nobel Lecture].
14. Hong, S., Wireless: From Marconi's Black-box to the Audion. 2001: MIT Press.
15. Sarkar, T.K., et al., History of Wireless. 2006: Wiley.
16. Mowbray, J.H., Sinking of the Titanic: Eyewitness Accounts. 2012: Dover Publications.
17. Couch, L.W., Digital \& Analog Communication Systems. 2012: Pearson Education.
18. Harley, R.A., Electric signaling system. 1942, Google Patents.
19. Kester, W.A. and i. Analog Devices, Data Conversion Handbook. 2005: Elsevier.
20. Shannon, C.E., A symbolic analysis of relay and switching circuits. Transactions of the American Institute of Electrical Engineers, 1938. 57(12): p. 713-723.
21. Shannon, C.E., A mathematical theory of communication. The Bell System Technical Journal, 1948. 27(3): p. 379-423.
22. "The Nobel Prize in Physics 1956" [Online]: Nobelprize.org. [Accessed 15 Sep 2016]
23. Kilby, J.S., Miniaturized electronic circuits. 1964, Google Patents.
24. Noyce, R.N., Semiconductor device-and-lead structure. 1961, Google Patents.
25. Cooley, J.W. and J.W. Tukey, An algorithm for the machine calculation of complex Fourier series. Mathematics of computation, 1965. 19(90): p. 297-301.
26. Chang, R.W., Synthesis of band-limited orthogonal signals for multichannel data transmission. The Bell System Technical Journal, 1966. 45(10): p. 17751796.
27. Chang, R.W., Orthogonal frequency multiplex data transmission system. 1970, Google Patents.
28. Weinstein, S. and P. Ebert, Data Transmission by Frequency-Division Multiplexing Using the Discrete Fourier Transform. IEEE Transactions on Communication Technology, 1971. 19(5): p. 628-634.
29. Cooper, M., et al., Radio telephone system. 1975, Google Patents.
30. Luo, F.L., Digital Front-End in Wireless Communications and Broadcasting: Circuits and Signal Processing. 2011: Cambridge University Press.
31. Gleason, A.W., Mobile Technologies for Every Library. 2015: Rowman \& Littlefield Publishers.
32. [Online]:http://www.fpa.es/multimedia-en/photo-galleries/press-conference-with-martin-cooper.html.[Accessed 15 Sep 2016]
33. IEEE Standard for Telecommunications and Information Exchange Between Systems - LAN/MAN Specific Requirements - Part 11: Wireless Medium Access Control (MAC) and physical layer (PHY) specifications: High Speed Physical Layer in the 5 GHz band. IEEE Std 802.11a-1999, 1999: p. 1-102.
34. IEEE Standard for Local and Metropolitan Area Networks Part 16: Air Interface for Fixed and Mobile Broadband Wireless Access Systems Amendment 2: Physical and Medium Access Control Layers for Combined Fixed and Mobile Operation in Licensed Bands and Corrigendum 1. IEEE Std 802.16e-2005 and IEEE Std 802.16-2004/Cor 1-2005 (Amendment and Corrigendum to IEEE Std 802.16-2004), 2006: p. 0_1-822.
35. Bagheri, R., et al., An $800-\mathrm{MHz}-6-\mathrm{GHz}$ Software-Defined Wireless Receiver in 90-nm CMOS. IEEE Journal of Solid-State Circuits, 2006. 41(12): p. 2860-2876.
36. Ru, Z., et al., Digitally Enhanced Software-Defined Radio Receiver Robust to Out-of-Band Interference. IEEE Journal of Solid-State Circuits, 2009. 44(12): p. 3359-3375.
37. Mitola, J., The software radio architecture. IEEE Communications Magazine, 1995. 33(5): p. 26-38.
38. Abidi, A.A., The Path to the Software-Defined Radio Receiver. IEEE Journal of Solid-State Circuits, 2007. 42(5): p. 954-966.
39. Walden, R.H., Analog-to-digital converter survey and analysis. IEEE Journal on Selected Areas in Communications, 1999. 17(4): p. 539-550.
40. Tuttlebee, W.H.W., Software Defined Radio: Enabling Technologies. 2003: Wiley.
41. Lehne, M., An Analog/Mixed Signal FFT Processor for Ultra-Wideband OFDM Wireless Transceivers. 2008, Virginia Polytechnic Institute and State University.
42. Yang, S.C., OFDMA System Analysis and Design. 2010: Artech House.
43. Cho, Y.S., et al., MIMO-OFDM Wireless Communications with MATLAB. 2010: Wiley.
44. Oppenheim, A.V. and R.W. Schafer, Discrete-Time Signal Processing. 2011: Pearson Education.
45. Prasad, R., OFDM for Wireless Communications Systems. 2004: Artech House.
46. Peled, A. and A. Ruiz. Frequency domain data transmission using reduced computational complexity algorithms. in Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '80. 1980.
47. Prasad, R. and F.J. Velez, WiMAX Networks: Techno-Economic Vision and Challenges. 2010: Springer Netherlands.
48. Korowajczuk, L., LTE, WiMAX and WLAN Network Design, Optimization and Performance Analysis. 2011: Wiley.
49. IEEE Standard for Information Technology- Telecommunications and Information Exchange Between Systems- Local and Metropolitan Area Networks- Specific Requirements Part Ii: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications. IEEE Std 802.11g-2003 (Amendment to IEEE Std 802.11, 1999 Edn. (Reaff 2003) as amended by IEEE Stds 802.11a-1999, 802.11b-1999, 802.11b-1999/Cor 1-2001, and 802.11d2001), 2003: p. i-67.
50. Nuaymi, P.L., WiMAX: Technology for Broadband Wireless Access. 2007: John Wiley \& Sons.
51. Kuo, J.-C., et al., VLSI design of a variable-length FFT/IFFT processor for OFDM-based communication systems. EURASIP J. Appl. Signal Process., 2003. 2003: p. 1306-1316.
52. Chun-Lung, H., L. Syu-Siang, and S. Muh-Tian. A low power and variablelength FFT processor design for flexible MIMO OFDM systems. in Circuits and Systems, 2009. ISCAS 2009. IEEE International Symposium on. 2009.
53. Lin, Y.T., P.Y. Tsai, and T.D. Chiueh, Low-power variable-length fast Fourier transform processor. Computers and Digital Techniques, IEE Proceedings -, 2005. 152(4): p. 499-506.
54. Song-Nien, T., L. Chi-Hsiang, and C. Tsin-Yuan, An Area- and Energy-Efficient Multimode FFT Processor for WPAN/WLAN/WMAN Systems. Solid-State Circuits, IEEE Journal of, 2012. 47(6): p. 1419-1435.
55. Guichang, Z., X. Fan, and A.N. Willson, Jr., A power-scalable reconfigurable FFT/IFFT IC based on a multi-processor ring. Solid-State Circuits, IEEE Journal of, 2006. 41(2): p. 483-495.
56. Uyttenhove, K. and M.S.J. Steyaert, Speed-power-accuracy tradeoff in highspeed CMOS ADCs. Circuits and Systems II: Analog and Digital Signal Processing, IEEE Transactions on, 2002. 49(4): p. 280-287.
57. Lehne, M. and S. Raman, A 0.13- $\mu m$ 1-GS/s CMOS Discrete-Time FFT Processor for Ultra-Wideband OFDM Wireless Receivers. Microwave Theory and Techniques, IEEE Transactions on, 2011. 59(6): p. 1639-1650.
58. Sadhu, B., et al. A 5GS/s $12.2 p J /$ conv. analog charge-domain FFT for a software defined radio receiver front-end in 65 nm CMOS. in Radio Frequency Integrated Circuits Symposium (RFIC), 2012 IEEE. 2012.
59. Sadeghi, N., V.C. Gaudet, and C. Schlegel, Analog DFT Processors for OFDM Receivers: Circuit Mismatch and System Performance Analysis. Circuits and Systems I: Regular Papers, IEEE Transactions on, 2009. 56(9): p. 2123-2131.
60. Mano, M.M. and P. Spasov, Digital Design. 2002: Prentice Hall.
61. Widrow, B. and I. Kollár, Quantization Noise: Roundoff Error in Digital Computation, Signal Processing, Control, and Communications. 2008: Cambridge University Press.
62. Sarpeshkar, R., Analog versus digital: extrapolating from electronics to neurobiology. Neural computation, 1998. 10(7): p. 1601-1638.
63. Hosticka, B.J., Performance comparison of analog and digital circuits. Proceedings of the IEEE, 1985. 73(1): p. 25-29.
64. Roberts, M.J., Signals and Systems: Analysis Using Transform Methods and MATLAB. 2004: McGraw-Hill.
65. Reddy, N. and M.N.S. Swamy, Switched-capacitor realization of a discrete Fourier transformer. Circuits and Systems, IEEE Transactions on, 1983. 30(4): p. 254-255.
66. Ogihara, A., S. Yamashita, and S. Yoneda. A pitch synchronous switched capacitor discrete Fourier transform circuit. in Circuits and Systems, 1991., IEEE International Sympoisum on. 1991.
67. Rao, K.R., D.N. Kim, and J.J. Hwang, Fast Fourier Transform - Algorithms and Applications. 2011: Springer Netherlands.
68. Oppenheim, A.V., A.S. Willsky, and S.H. Nawab, Signals and Systems. 1997: Prentice Hall.
69. Ismail, M. and T. Fiez, Analog VLSI: Signal and Information Processing. 1994: McGraw-Hill.
70. Boyle, K., et al. Design and implementation of an all-analog fast-fourier transform processor. in Circuits and Systems, 2007. MWSCAS 2007. 50th Midwest Symposium on. 2007.
71. Lehne, M. and S. Raman. A prototype analog/mixed-signal fast fourier transform processor IC for OFDM receivers. in Radio and Wireless Symposium, 2008 IEEE. 2008.
72. Rivet, F., et al., A Disruptive Receiver Architecture Dedicated to SoftwareDefined Radio. Circuits and Systems II: Express Briefs, IEEE Transactions on, 2008. 55(4): p. 344-348.
73. Sadhu, B., Circuit techniques for cognitive radio receiver front-ends. 2012, University of Minnesota
74. Goertzel, G., An Algorithm for the Evaluation of Finite Trigonometric Series. American Mathematical Monthly, 1958. 65(1): p. 34-35.
75. Lindfors, S., A. Parssinen, and K.A.I. Halonen, A $3-V 230-\mathrm{MHz}$ CMOS decimation subsampler. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 2003. 50(3): p. 105-117.
76. Chiueh, T.-D. and P.-Y. Tsai, OFDM Baseband Receiver Design for Wireless Communications. 2007: Wiley Publishing. 352.
77. Lehne, M. and S. Raman, A Discrete-Time FFT Processor for Ultrawideband OFDM Wireless Transceivers: Architecture and Behavioral Modeling. Circuits and Systems I: Regular Papers, IEEE Transactions on, 2010. 57(11): p. 30113022.
78. Zare-Hoseini, H., I. Kale, and O. Shoaei, Modeling of switched-capacitor deltasigma Modulators in SIMULINK. IEEE Transactions on Instrumentation and Measurement, 2005. 54(4): p. 1646-1654.
79. Guichang, Z., X. Fan, and A.N. Willson, A power-scalable reconfigurable FFT/IFFT IC based on a multi-processor ring. IEEE Journal of Solid-State Circuits, 2006. 41(2): p. 483-495.
80. Razavi, B., Design of Analog CMOS Integrated Circuits. 2001: McGraw-Hill.
81. Jaffari, J., Statistical yield analysis and design for nanometer VLSI. 2010, University of Waterloo.
82. Ben, Y., Statistical Verification and Optimization of Integrated Circuits. 2011, University of California, Berkeley.
83. Maly, W., A.J. Strojwas, and S.W. Director, VLSI Yield Prediction and Estimation: A Unified Framework. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 1986. 5(1): p. 114-130.
84. Glynn, P., The Central Limit Theorem, Law of Large Numbers and Monte Carlo Methods. Stanford University.
85. Davison, A.C., Statistical Models. 2003: Cambridge University Press.
86. https://www.mosis.com/.
87. Sadeghi, N., Analog FFT Interface for Ultra-Low Power Analog Receiver Architectures. 2007, University of Alberta.
88. Sadhu, B., et al., Analysis and Design of a $5 \mathrm{GS} /$ s Analog Charge-Domain FFT for an SDR Front-End in 65 nm CMOS. Solid-State Circuits, IEEE Journal of, 2013. 48(5): p. 1199-1211.
89. Sturm, M., Passive switched-capacitor based filter design, optimization, and calibration for sensing applications. 2013, University of Minnesota
90. Rivet, F., Contribution à l'étude et à la réalisation d'un frontal radiofréquence analogique en temps discrets pour la radio-logicielle intégrale. 2009, University of Bordeaux
91. Rivet, F., et al., The Experimental Demonstration of a SASP-Based Full Software Radio Receiver. Solid-State Circuits, IEEE Journal of, 2010. 45(5): p. 979-988
92. Suh, S., Low-power discrete Fourier transform and soft-decision Viterbi decoder for OFDM receivers. 2011, Georgia Institute of Technology.
93. Sangwook, S., et al., Low-Power Discrete Fourier Transform for OFDM: A Programmable Analog Approach. Circuits and Systems I: Regular Papers, IEEE Transactions on, 2011. 58(2): p. 290-298.
94. Gunhee, H. and E. Sanchez-Sinencio, CMOS transconductance multipliers: a tutorial. Circuits and Systems II: Analog and Digital Signal Processing, IEEE Transactions on, 1998. 45(12): p. 1550-1563.
95. Toumazou, C., F.J. Lidgey, and D. Haigh, Analogue IC design: the currentmode approach. 1990: Peregrinus on behalf of the Institution of Electrical Engineers.
96. Razavi, B., RF Microelectronics. 2012: Prentice Hall.
97. Barrie, G., A precise four-quadrant multiplier with subnanosecond response. Solid-State Circuits, IEEE Journal of, 1968. 3(4): p. 365-373.
98. Rogers, J.W.M. and C. Plett, Radio Frequency Integrated Circuit Design. 2014: Artech House, Incorporated.
99. Babanezhad, J.N. and G.C. Temes, A 20-V four-quadrant CMOS analog multiplier. Solid-State Circuits, IEEE Journal of, 1985. 20(6): p. 1158-1168.
100. Ko-Chi, K. and A. Leuciuc, A linear MOS transconductor using source degeneration and adaptive biasing. Circuits and Systems II: Analog and Digital Signal Processing, IEEE Transactions on, 2001. 48(10): p. 937-943.
101. Toumazou, C., G.S. Moschytz, and B. Gilbert, Trade-Offs in Analog Circuit Design: The Designer's Companion. 2007: Springer US.
102. Barrie, G., The multi-tanh principle: a tutorial overview. Solid-State Circuits, IEEE Journal of, 1998. 33(1): p. 2-17.
103. Ryan, A.P. and O. McCarthy, A novel pole-zero compensation scheme using unbalanced differential pairs. Circuits and Systems I: Regular Papers, IEEE Transactions on, 2004. 51(2): p. 309-318.
104. Soo, D.C. and R.G. Meyer, A four-quadrant NMOS analog multiplier. SolidState Circuits, IEEE Journal of, 1982. 17(6): p. 1174-1178.
105. Shen-Iuan, L. and H. Yuh-Shyan, CMOS four-quadrant multiplier using bias feedback techniques. Solid-State Circuits, IEEE Journal of, 1994. 29(6): p. 750752.
106. Hosticka, B.J., R.W. Brodersen, and P.R. Gray, MOS sampled data recursive filters using switched capacitor integrators. Solid-State Circuits, IEEE Journal of, 1977. 12(6): p. 600-608.
107. Martin, K., Improved circuits for the realization of switched-capacitor filters. Circuits and Systems, IEEE Transactions on, 1980. 27(4): p. 237-244.
108. Carusone, T.C., D. Johns, and K. Martin, Analog Integrated Circuit Design. 2011: Wiley.
109. Whitaker, J.C., The Electronics Handbook, Second Edition. 2005: CRC Press.
110. Gray, P.R., Analysis and Design of Analog Integrated Circuits. 2009: John Wiley \& Sons.
111. Caves, J.T., et al., Sampled analog filtering using switched capacitors as resistor equivalents. Solid-State Circuits, IEEE Journal of, 1977. 12(6): p. 592-599.
112. Brodersen, R.W., P.R. Gray, and D. Hodges, MOS switched-capacitor filters. Proceedings of the IEEE, 1979. 67(1): p. 61-75.
113. Malcovati, P., et al., Behavioral modeling of switched-capacitor sigma-delta modulators. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, 2003. 50(3): p. 352-364.
114. Castello, R. and P.R. Gray, Performance limitations in switched- capacitor filters. Circuits and Systems, IEEE Transactions on, 1985. 32(9): p. 865-876.
115. Allen, P.E., CMOS Analog IC Design Course notes Georgia Institute of Technology, 2005.
116. Enz, C. and Y. Cheng, MOS transistor modeling for RF IC design. IEEE Journal of Solid-State Circuits, 2000. 35(2): p. 186-201.
117. Pelgrom, M.J.M., A.C.J. Duinmaijer, and A.P.G. Welbers, Matching properties of MOS transistors. Solid-State Circuits, IEEE Journal of, 1989. 24(5): p. 14331439.
118. Drennan, P.G. and C.C. McAndrew, Understanding MOSFET mismatch for analog design. Solid-State Circuits, IEEE Journal of, 2003. 38(3): p. 450-456.
119. Drennan, P.G. and C.C. McAndrew. A comprehensive MOSFET mismatch model. in Electron Devices Meeting, 1999. IEDM '99. Technical Digest. International. 1999.
120. Lakshmikumar, K.R., R.A. Hadaway, and M.A. Copeland, Characterisation and modeling of mismatch in MOS transistors for precision analog design. SolidState Circuits, IEEE Journal of, 1986. 21(6): p. 1057-1066.
121. Kinget, P.R., Device mismatch and tradeoffs in the design of analog circuits. Solid-State Circuits, IEEE Journal of, 2005. 40(6): p. 1212-1224.
122. Lovett, S.J., et al., Optimizing MOS transistor mismatch. Solid-State Circuits, IEEE Journal of, 1998. 33(1): p. 147-150.
123. Mizuno, T., J. Okumtura, and A. Toriumi, Experimental study of threshold voltage fluctuation due to statistical variation of channel dopant number in MOSFET's. Electron Devices, IEEE Transactions on, 1994. 41(11): p. 22162221.
124. Ta-Hsun, Y., et al. Mis-match characterization of 1.8 V and 3.3 V devices in $0.18 \mu \mathrm{~m}$ mixed signal CMOS technology. in Microelectronic Test Structures, 2001. ICMTS 2001. Proceedings of the 2001 International Conference on. 2001.
125. Fouque, A., et al. A low power digitally-enhanced SASP-based receiver architecture for mobile DVB-S applications in the Ku-band (10.7-12.75 GHz). in Radio and Wireless Symposium (RWS), 2011 IEEE. 2011.
