No Arabic abstract
Deploying Machine Learning (ML) algorithms within databases is a challenge due to the varied computational footprints of modern ML algorithms and the myriad of database technologies each with its own restrictive syntax. We introduce an Apache Spark-based micro-service orchestration framework that extends database operations to include web service primitives. Our system can orchestrate web services across hundreds of machines and takes full advantage of cluster, thread, and asynchronous parallelism. Using this framework, we provide large scale clients for intelligent services such as speech, vision, search, anomaly detection, and text analysis. This allows users to integrate ready-to-use intelligence into any datastore with an Apache Spark connector. To eliminate the majority of overhead from network communication, we also introduce a low-latency containerized version of our architecture. Finally, we demonstrate that the services we investigate are competitive on a variety of benchmarks, and present two applications of this framework to create intelligent search engines, and real-time auto race analytics systems.
In modern application areas for software systems --- like eHealth, the Internet-of-Things, and Edge Computing --- data is encoded in heterogeneous, tree-shaped data-formats, it must be processed in real-time, and it must be ephemeral, i.e., not persist in the system. While it is preferable to use a query language to express complex data-handling logic, their typical execution engine, a database external from the main application, is unfit in scenarios of ephemeral data-handling. A better option is represented by integrated query frameworks, which benefit from existing development support tools (e.g., syntax and type checkers) and execute within the application memory. In this paper, we propose one such framework that, for the first time, targets tree-shaped, document-oriented queries. We formalise an instantiation of MQuery, a sound variant of the widely-used MongoDB query language, which we implemented in the Jolie language. Jolie programs are microservices, the building blocks of modern software systems. Moreover, since Jolie supports native tree data-structures and automatic management of heterogeneous data-encodings, we can provide a uniform way to use MQuery on any data-format supported by the language. We present a non-trivial use case from eHealth, use it to concretely evaluate our model, and to illustrate our formalism.
The Internet has enabled the creation of a growing number of large-scale knowledge bases in a variety of domains containing complementary information. Tools for automatically aligning these knowledge bases would make it possible to unify many sources of structured knowledge and answer complex queries. However, the efficient alignment of large-scale knowledge bases still poses a considerable challenge. Here, we present Simple Greedy Matching (SiGMa), a simple algorithm for aligning knowledge bases with millions of entities and facts. SiGMa is an iterative propagation algorithm which leverages both the structural information from the relationship graph as well as flexible similarity measures between entity properties in a greedy local search, thus making it scalable. Despite its greedy nature, our experiments indicate that SiGMa can efficiently match some of the worlds largest knowledge bases with high precision. We provide additional experiments on benchmark datasets which demonstrate that SiGMa can outperform state-of-the-art approaches both in accuracy and efficiency.
The Large Intelligent Surface (LIS) concept has emerged recently as a new paradigm for wireless communication, remote sensing and positioning. It consists of a continuous radiating surface placed relatively close to the users, which is able to communicate with users by independent transmission and reception (replacing base stations). Despite of its potential, there are a lot of challenges from an implementation point of view, with the interconnection data-rate and computational complexity being the most relevant. Distributed processing techniques and hierarchical architectures are expected to play a vital role addressing this while ensuring scalability. In this paper we perform algorithm-architecture codesign and analyze the hardware requirements and architecture trade-offs for a discrete LIS to perform uplink detection. By doing this, we expect to give concrete case studies and guidelines for efficient implementation of LIS systems.
Researchers currently rely on ad hoc datasets to train automated visualization tools and evaluate the effectiveness of visualization designs. These exemplars often lack the characteristics of real-world datasets, and their one-off nature makes it difficult to compare different techniques. In this paper, we present VizNet: a large-scale corpus of over 31 million datasets compiled from open data repositories and online visualization galleries. On average, these datasets comprise 17 records over 3 dimensions and across the corpus, we find 51% of the dimensions record categorical data, 44% quantitative, and only 5% temporal. VizNet provides the necessary common baseline for comparing visualization design techniques, and developing benchmark models and algorithms for automating visual analysis. To demonstrate VizNets utility as a platform for conducting online crowdsourced experiments at scale, we replicate a prior study assessing the influence of user task and data distribution on visual encoding effectiveness, and extend it by considering an additional task: outlier detection. To contend with running such studies at scale, we demonstrate how a metric of perceptual effectiveness can be learned from experimental results, and show its predictive power across test datasets.
Intelligent reflecting surface (IRS) is a promising technology for wireless communications, thanks to its potential capability to engineer the radio environment. However, in practice, such an envisaged benefit is attainable only when the passive IRS is of a sufficiently large size, for which the conventional uniform plane wave (UPW)-based channel model may become inaccurate. In this paper, we pursue a new channel modelling and performance analysis for wireless communications with extremely large-scale IRS (XL-IRS). By taking into account the variations in signals amplitude and projected aperture across different reflecting elements, we derive both lower- and upper-bounds of the received signal-to-noise ratio (SNR) for the general uniform planar array (UPA)-based XL-IRS. Our results reveal that, instead of scaling quadratically with the increased number of reflecting elements M as in the conventional UPW model, the SNR under the more practically applicable non-UPW model increases with M only with a diminishing return and gets saturated eventually. To gain more insights, we further study the special case of uniform linear array (ULA)-based XL-IRS, for which a closed-form SNR expression in terms of the IRS size and transmitter/receiver location is derived. This result shows that the SNR mainly depends on the two geometric angles formed by the transmitter/receiver locations with the IRS, as well as the boundary points of the IRS. Numerical results validate our analysis and demonstrate the importance of proper channel modelling for wireless communications aided by XL-IRS.