ﻻ يوجد ملخص باللغة العربية
IoT Big Data requires new machine learning methods able to scale to large size of data arriving at high speed. Decision trees are popular machine learning models since they are very effective, yet easy to interpret and visualize. In the literature, we can find distributed algorithms for learning decision trees, and also streaming algorithms, but not algorithms that combine both features. In this paper we present the Vertical Hoeffding Tree (VHT), the first distributed streaming algorithm for learning decision trees. It features a novel way of distributing decision trees via vertical parallelism. The algorithm is implemented on top of Apache SAMOA, a platform for mining distributed data streams, and thus able to run on real-world clusters. We run several experiments to study the accuracy and throughput performance of our new VHT algorithm, as well as its ability to scale while keeping its superior performance with respect to non-distributed decision trees.
Tables are widely used with various structures to organize and present data. Recent attempts on table understanding mainly focus on relational tables, yet overlook to other common table structures. In this paper, we propose TUTA, a unified pre-traini
Machine learning (ML) is an important part of modern data science applications. Data scientists today have to manage the end-to-end ML life cycle that includes both model training and model serving, the latter of which is essential, as it makes their
Recently, deep learning has been an area of intense researching. However, as a kind of computing intensive task, deep learning highly relies on the scale of GPU memory, which is usually prohibitive and scarce. Although there are some extensive works
Federated learning (FL) is an emerging paradigm that enables multiple organizations to jointly train a model without revealing their private data to each other. This paper studies {it vertical} federated learning, which tackles the scenarios where (i
It is proved that each Hoeffding space associated with a random permutation (or, equivalently, with extractions without replacement from a finite population) carries an irreducible representation of the symmetric group, equivalent to a two-block Specht module.