ترغب بنشر مسار تعليمي؟ اضغط هنا

We propose a method for emotion recognition through emotiondependent speech recognition using Wav2vec 2.0. Our method achieved a significant improvement over most previously reported results on IEMOCAP, a benchmark emotion dataset. Different types of phonetic units are employed and compared in terms of accuracy and robustness of emotion recognition within and across datasets and languages. Models of phonemes, broad phonetic classes, and syllables all significantly outperform the utterance model, demonstrating that phonetic units are helpful and should be incorporated in speech emotion recognition. The best performance is from using broad phonetic classes. Further research is needed to investigate the optimal set of broad phonetic classes for the task of emotion recognition. Finally, we found that Wav2vec 2.0 can be fine-tuned to recognize coarser-grained or larger phonetic units than phonemes, such as broad phonetic classes and syllables.
Much of the recent literature on automatic speech recognition (ASR) is taking an end-to-end approach. Unlike English where the writing system is closely related to sound, Chinese characters (Hanzi) represent meaning, not sound. We propose factoring a udio -> Hanzi into two sub-tasks: (1) audio -> Pinyin and (2) Pinyin -> Hanzi, where Pinyin is a system of phonetic transcription of standard Chinese. Factoring the audio -> Hanzi task in this way achieves 3.9% CER (character error rate) on the Aishell-1 corpus, the best result reported on this dataset so far.
161 - Hong Yuan , Yu-Han Ma , 2021
We study the non-equilibrium thermodynamics of a heat engine operating between two finite-sized reservoirs with well-defined temperatures. Within the linear response regime, it is discovered that there exists a power-efficiency trade-off depending on the ratio of heat capacities ($gamma$) of the reservoirs for the engine; the uniform temperature of the two reservoirs at final time $tau$ is bounded from below by the entropy production $sigma_{mathrm{min}}propto1/tau$. We further obtain a universal efficiency at maximum power of the engine for arbitrary $gamma$. Our findings can be used to develop an optimization scenario for thermodynamic cycles with finite-sized reservoirs in practice.
A complexity-adaptive tree search algorithm is proposed for $boldsymbol{G}_N$-coset codes that implements maximum-likelihood (ML) decoding by using a successive decoding schedule. The average complexity is close to that of the successive cancellation (SC) decoding for practical error rates when applied to polar codes and short Reed-Muller (RM) codes, e.g., block lengths up to $N=128$. By modifying the algorithm to limit the worst-case complexity, one obtains a near-ML decoder for longer RM codes and their subcodes. Unlike other bit-flip decoders, no outer code is needed to terminate decoding. The algorithm can thus be applied to modified $boldsymbol{G}_N$-coset code constructions with dynamic frozen bits. One advantage over sequential decoders is that there is no need to optimize a separate parameter.
We introduce generalized spatially coupled parallel concatenated codes (GSC-PCCs), a class of spatially coupled turbo-like codes obtained by coupling parallel concatenated codes (PCCs) with a fraction of information bits repeated before the PCC encod ing. GSC-PCCs can be seen as a generalization of the original spatially coupled parallel concatenated convolutional codes (SC-PCCs) proposed by Moloudi et al. [1]. To characterize the asymptotic performance of GSC-PCCs, we derive the corresponding density evolution equations and compute their decoding thresholds. We show that the proposed codes have some nice properties such as threshold saturation and that their decoding thresholds improve with the repetition factor $q$. Most notably, our analysis suggests that the proposed codes asymptotically approach the capacity as $q$ tends to infinity with any given constituent convolutional code.
A polar-coded transmission (PCT) scheme with joint channel estimation and decoding is proposed for channels with unknown channel state information (CSI). The CSI is estimated via successive cancellation (SC) decoding and the constraints imposed by th e frozen bits. SC list decoding with an outer code improves performance, including resolving a phase ambiguity when using quadrature phase-shift keying (QPSK) and Gray labeling. Simulations with 5G polar codes and QPSK show gains of up to $2$~dB at a frame error rate (FER) of $10^{-4}$ over pilot-assisted transmission for various non-coherent models. Moreover, PCT performs within a few tenths of a dB to a coherent receiver with perfect CSI. For Rayleigh block-fading channels, PCT outperforms an FER upper bound based on random coding and within one dB of a lower bound.
A magnetic field is generally considered to be incompatible with superconductivity as it tends to spin-polarize electrons and breaks apart the opposite-spin singlet superconducting Cooper pairs. Here, an experimental phenomenon is observed that an in triguing reemergent superconductivity evolves from a conventional superconductivity undergoing a hump-like intermediate phase with a finite electric resistance in the van der Waals heterointerface of layered NbSe2 and CrCl3 flakes. This phenomenon merely occurred when the applied magnetic field is parallel to the sample plane and perpendicular to the electric current direction as compared to the reference sample of a NbSe2 thin flake. The strong anisotropy of the reemergent superconducting phase is pointed to the nature of the Fulde-Ferrell-Larkin-Ovchinnikov (FFLO) state driven by the strong interfacial spin-orbit coupling between NbSe2 and CrCl3 layers. The theoretical picture of FFLO state nodes induced by Josephson vortices collectively pinning is presented for well understanding the experimental observation of the reemergent superconductivity. This finding sheds light on an opportunity to search for the exotic FFLO state in the van der Waals heterostructures with strong interfacial spin-orbit coupling.
Orthogonal time frequency space (OTFS) modulation is a recently developed multi-carrier multi-slot transmission scheme for wireless communications in high-mobility environments. In this paper, the error performance of coded OTFS modulation over high- mobility channels is investigated. We start from the study of conditional pairwise-error probability (PEP) of the OTFS scheme, based on which its performance upper bound of the coded OTFS system is derived. Then, we show that the coding improvement for OTFS systems depends on the squared Euclidean distance among codeword pairs and the number of independent resolvable paths of the channel. More importantly, we show that there exists a fundamental trade-off between the coding gain and the diversity gain for OTFS systems, i.e., the diversity gain of OTFS systems improves with the number of resolvable paths, while the coding gain declines. Furthermore, based on our analysis, the impact of channel coding parameters on the performance of the coded OTFS systems is unveiled. The error performance of various coded OTFS systems over high-mobility channels is then evaluated. Simulation results demonstrate a significant performance improvement for OTFS modulation over the conventional orthogonal frequency division multiplexing (OFDM) modulation over high-mobility channels. Analytical results and the effectiveness of the proposed code design are also verified by simulations with the application of both classical and modern codes for OTFS systems.
Partially information coupled turbo codes (PIC-TCs) is a class of spatially coupled turbo codes that can approach the BEC capacity while keeping the encoding and decoding architectures of the underlying component codes unchanged. However, PIC-TCs hav e significant rate loss compared to its component rate-1/3 turbo code, and the rate loss increases with the coupling ratio. To absorb the rate loss, in this paper, we propose the partially information coupled duo-binary turbo codes (PIC-dTCs). Given a rate-1/3 turbo code as the benchmark, we construct a duo-binary turbo code by introducing one extra input to the benchmark code. Then, parts of the information sequence from the original input are coupled to the extra input of the succeeding code blocks. By looking into the graph model of PIC-dTC ensembles, we derive the exact density evolution equations of the PIC-dTC ensembles, and compute their belief propagation decoding thresholds on the binary erasure channel. Simulation results verify the correctness of our theoretical analysis, and also show significant error performance improvement over the uncoupled rate-1/3 turbo codes and existing designs of spatially coupled turbo codes.
The two-user Gaussian interference channel (G-IC) is revisited, with a particular focus on practically amenable discrete input signalling and treating interference as noise (TIN) receivers. The corresponding deterministic interference channel (D-IC) is first investigated and coding schemes that can achieve the entire capacity region of D-IC under TIN are proposed. These schemes are then systematically translate into multi-layer superposition coding schemes based on purely discrete inputs for the real-valued G-IC. Our analysis shows that the proposed scheme is able to achieve the entire capacity region to within a constant gap for all channel parameters. To the best of our knowledge, this is the first constant-gap result under purely discrete signalling and TIN for the entire capacity region and all the interference regimes. Furthermore, the approach is extended to obtain coding scheme based on discrete inputs for the complex-valued G-IC. For such a scenario, the minimum distance and the achievable rate of the proposed scheme under TIN are analyzed, which takes into account the effects of random phase rotations introduced by the channels. Simulation results show that our scheme is capable of approaching the capacity region of the complex-valued G-IC and significantly outperforms Gaussian signalling with TIN in various interference regimes.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا