This paper introduces a collection of scaling methods for generating $2N$-point DCT-II approximations based on $N$-point low-complexity transformations. Such scaling is based on the Hou recursive matrix factorization of the exact $2N$-point DCT-II matrix. Encompassing the widely employed Jridi-Alfalou-Meher scaling method, the proposed techniques are shown to produce DCT-II approximations that outperform the transforms resulting from the JAM scaling method according to total error energy and mean squared error. Orthogonality conditions are derived and an extensive error analysis based on statistical simulation demonstrates the good performance of the introduced scaling methods. A hardware implementation is also provided demonstrating the competitiveness of the proposed methods when compared to the JAM scaling method.
In this Letter, we propose a low-complexity estimator for the correlation coefficient based on the signed $operatorname{AR}(1)$ process. The introduced approximation is suitable for implementation in low-power hardware architectures. Monte Carlo simulations reveal that the proposed estimator performs comparably to the competing methods in literature with maximum error in order of $10^{-2}$. However, the hardware implementation of the introduced method presents considerable advantages in several relevant metrics, offering more than 95% reduction in dynamic power and doubling the maximum operating frequency when compared to the reference method.
The principal component analysis (PCA) is widely used for data decorrelation and dimensionality reduction. However, the use of PCA may be impractical in real-time applications, or in situations were energy and computing constraints are severe. In this context, the discrete cosine transform (DCT) becomes a low-cost alternative to data decorrelation. This paper presents a method to derive computationally efficient approximations to the DCT. The proposed method aims at the minimization of the angle between the rows of the exact DCT matrix and the rows of the approximated transformation matrix. The resulting transformations matrices are orthogonal and have extremely low arithmetic complexity. Considering popular performance measures, one of the proposed transformation matrices outperforms the best competitors in both matrix error and coding capabilities. Practical applications in image and video coding demonstrate the relevance of the proposed transformation. In fact, we show that the proposed approximate DCT can outperform the exact DCT for image encoding under certain compression ratios. The proposed transform and its direct competitors are also physically realized as digital prototype circuits using FPGA technology.
This paper proposes a new large-scale mask-compliant spectral precoder (LS-MSP) for orthogonal frequency division multiplexing systems. In this paper, we first consider a previously proposed mask-compliant spectral precoding scheme that utilizes a generic convex optimization solver which suffers from high computational complexity, notably in large-scale systems. To mitigate the complexity of computing the LS-MSP, we propose a divide-and-conquer approach that breaks the original problem into smaller rank 1 quadratic-constraint problems and each small problem yields closed-form solution. Based on these solutions, we develop three specialized first-order low-complexity algorithms, based on 1) projection on convex sets and 2) the alternating direction method of multipliers. We also develop an algorithm that capitalizes on the closed-form solutions for the rank 1 quadratic constraints, which is referred to as 3) semi-analytical spectral precoding. Numerical results show that the proposed LS-MSP techniques outperform previously proposed techniques in terms of the computational burden while complying with the spectrum mask. The results also indicate that 3) typically needs 3 iterations to achieve similar results as 1) and 2) at the expense of a slightly increased computational complexity.
OFDM sensing is gaining increasing popularity in wideband radar applications as well as in joint communication and radar/radio sensing (JCAS). As JCAS will potentially be integrated into future mobile networks where OFDM is crucial, OFDM sensing is envisioned to be ubiquitously deployed. A fast Fourier transform (FFT) based OFDM sensing (FOS) method was proposed a decade ago and has been regarded as a de facto standard given its simplicity. In this article, we introduce an easy trick -- a pre-processing on target echo -- to further reduce the computational complexity of FOS without degrading key sensing performance. Underlying the trick is a newly disclosed feature of the target echo in OFDM sensing which, to the best of our knowledge, has not been effectively exploited yet.
The increasing complexity of Internet-of-Things (IoT) applications and near-sensor processing algorithms is pushing the computational power of low-power, battery-operated end-node systems. This trend also reveals growing demands for high-speed and energy-efficient inter-chip communications to manage the increasing amount of data coming from off-chip sensors and memories. While traditional micro-controller interfaces such as SPIs cannot cope with tight energy and large bandwidth requirements, low-voltage swing transceivers can tackle this challenge thanks to their capability to achieve several Gbps of the communication speed at milliwatt power levels. However, recent research on high-speed serial links focused on high-performance systems, with a power consumption significantly larger than the one of low-power IoT end-nodes, or on stand-alone designs not integrated at a system level. This paper presents a low-swing transceiver for the energy-efficient and low power chip-to-chip communication fully integrated within an IoT end-node System-on-Chip, fabricated in CMOS 65nm technology. The transceiver can be easily controlled via a software interface; thus, we can consider realistic scenarios for the data communication, which cannot be assessed in stand-alone prototypes. Chip measurements show that the transceiver achieves 8.46x higher energy efficiency at 15.9x higher performance than a traditional microcontroller interface such as a single-SPI.