No Arabic abstract
From ancient to modern times, acoustic structures have been used to control the propagation of acoustic waves. However, the design of the acoustic structures has remained widely a time-consuming and computational resource-consuming iterative process. In recent years, Deep Learning has attracted unprecedented attention for its ability to tackle hard problems with huge datasets, which has achieved state-of-the-art results in various tasks. In this work, an acoustic structure design method is proposed based on deep learning. Taking the design of multi-order Helmholtz resonator for instance, we experimentally demonstrate the effectiveness of the proposed method. Our method is not only able to give a very accurate prediction of the geometry of the acoustic structures with multiple strong-coupling parameters, but also capable of improving the performance of evolutionary approaches in optimization for a desired property. Compared with the conventional numerical methods, our method is more efficient, universal and automatic, which has a wide range of potential applications, such as speech enhancement, sound absorption and insulation.
For centuries researchers have used sound to monitor and study wildlife. Traditionally, conservationists have identified species by ear; however, it is now common to deploy audio recording technology to monitor animal and ecosystem sounds. Animals use sound for communication, mating, navigation and territorial defence. Animal sounds provide valuable information and help conservationists to quantify biodiversity. Acoustic monitoring has grown in popularity due to the availability of diverse sensor types which include camera traps, portable acoustic sensors, passive acoustic sensors, and even smartphones. Passive acoustic sensors are easy to deploy and can be left running for long durations to provide insights on habitat and the sounds made by animals and illegal activity. While this technology brings enormous benefits, the amount of data that is generated makes processing a time-consuming process for conservationists. Consequently, there is interest among conservationists to automatically process acoustic data to help speed up biodiversity assessments. Processing these large data sources and extracting relevant sounds from background noise introduces significant challenges. In this paper we outline an approach for achieving this using state of the art in machine learning to automatically extract features from time-series audio signals and modelling deep learning models to classify different bird species based on the sounds they make. The acquired bird songs are processed using mel-frequency cepstrum (MFC) to extract features which are later classified using a multilayer perceptron (MLP). Our proposed method achieved promising results with 0.74 sensitivity, 0.92 specificity and an accuracy of 0.74.
Music generation is always interesting in a sense that there is no formalized recipe. In this work, we propose a novel dual-track architecture for generating classical piano music, which is able to model the inter-dependency of left-hand and right-hand piano music. Particularly, we experimented with a lot of different models of neural network as well as different representations of music, and the results show that our proposed model outperforms all other tested methods. Besides, we deployed some special policies for model training and generation, which contributed to the model performance remarkably. Finally, under two evaluation methods, we compared our models with the MuseGAN project and true music.
Cardiovascular (CV) diseases are the leading cause of death in the world, and auscultation is typically an essential part of a cardiovascular examination. The ability to diagnose a patient based on their heart sounds is a rather difficult skill to master. Thus, many approaches for automated heart auscultation have been explored. However, most of the previously proposed methods involve a segmentation step, the performance of which drops significantly for high pulse rates or noisy signals. In this work, we propose a novel segmentation-free heart sound classification method. Specifically, we apply discrete wavelet transform to denoise the signal, followed by feature extraction and feature reduction. Then, Support Vector Machines and Deep Neural Networks are utilised for classification. On the PASCAL heart sound dataset our approach showed superior performance compared to others, achieving 81% and 96% precision on normal and murmur classes, respectively. In addition, for the first time, the data were further explored under a user-independent setting, where the proposed method achieved 92% and 86% precision on normal and murmur, demonstrating the potential of enabling automatic murmur detection for practical use.
In this work, we investigated the teacher-student training paradigm to train a fully learnable multi-channel acoustic model for far-field automatic speech recognition (ASR). Using a large offline teacher model trained on beamformed audio, we trained a simpler multi-channel student acoustic model used in the speech recognition system. For the student, both multi-channel feature extraction layers and the higher classification layers were jointly trained using the logits from the teacher model. In our experiments, compared to a baseline model trained on about 600 hours of transcribed data, a relative word-error rate (WER) reduction of about 27.3% was achieved when using an additional 1800 hours of untranscribed data. We also investigated the benefit of pre-training the multi-channel front end to output the beamformed log-mel filter bank energies (LFBE) using L2 loss. We find that pre-training improves the word error rate by 10.7% when compared to a multi-channel model directly initialized with a beamformer and mel-filter bank coefficients for the front end. Finally, combining pre-training and teacher-student training produces a WER reduction of 31% compared to our baseline.
The fundamental principle in Modern Portfolio Theory (MPT) is based on the quantification of the portfolios risk related to performance. Although MPT has made huge impacts on the investment world and prompted the success and prevalence of passive investing, it still has shortcomings in real-world applications. One of the main challenges is that the level of risk an investor can endure, known as emph{risk-preference}, is a subjective choice that is tightly related to psychology and behavioral science in decision making. This paper presents a novel approach of measuring risk preference from existing portfolios using inverse optimization on the mean-variance portfolio allocation framework. Our approach allows the learner to continuously estimate real-time risk preferences using concurrent observed portfolios and market price data. We demonstrate our methods on real market data that consists of 20 years of asset pricing and 10 years of mutual fund portfolio holdings. Moreover, the quantified risk preference parameters are validated with two well-known risk measurements currently applied in the field. The proposed methods could lead to practical and fruitful innovations in automated/personalized portfolio management, such as Robo-advising, to augment financial advisors decision intelligence in a long-term investment horizon.