ﻻ يوجد ملخص باللغة العربية
With the era of big data, an explosive amount of information is now available. This enormous increase of Big Data in both academia and industry requires large-scale data processing systems. A large body of research is behind optimizing Sparks performance to make it state of the art, a fast and general data processing system. Many science and engineering fields have advanced with Big Data analytics, such as Biology, finance, and transportation. Intelligent transportation systems (ITS) gain popularity and direct benefit from the richness of information. The objective is to improve the safety and management of transportation networks by reducing congestion and incidents. The first step toward the goal is better understanding, modeling, and detecting congestion across a network efficiently and effectively. In this study, we introduce an efficient congestion detection model. The underlying network consists of 3017 segments in I-35, I-80, I-29, and I-380 freeways with an overall length of 1570 miles and averaged (0.4-0.6) miles per segment. The result of congestion detection shows the proposed method is 90% accurate while has reduced computation time by 99.88%.
Experimental Particle Physics has been at the forefront of analyzing the worlds largest datasets for decades. The HEP community was among the first to develop suitable software and computing tools for this task. In recent times, new toolkits and syst
Identifying the causal relationships between subjects or variables remains an important problem across various scientific fields. This is particularly important but challenging in complex systems, such as those involving human behavior, sociotechnica
Distributed data processing ecosystems are widespread and their components are highly specialized, such that efficient interoperability is urgent. Recently, Apache Arrow was chosen by the community to serve as a format mediator, providing efficient i
With the explosive increase of big data in industry and academic fields, it is necessary to apply large-scale data processing systems to analysis Big Data. Arguably, Spark is state of the art in large-scale data computing systems nowadays, due to its
With the emergence of the big data age, the issue of how to obtain valuable knowledge from a dataset efficiently and accurately has attracted increasingly attention from both academia and industry. This paper presents a Parallel Random Forest (PRF) a