ترغب بنشر مسار تعليمي؟ اضغط هنا

Probabilistic Skyline Query Processing over Uncertain Data Streams in Edge Computing Environments

89   0   0.0 ( 0 )
 نشر من قبل Chuan-Chi Lai
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

With the advancement of technology, the data generated in our lives is getting faster and faster, and the amount of data that various applications need to process becomes extremely huge. Therefore, we need to put more effort into analyzing data and extracting valuable information. Cloud computing used to be a good technology to solve a large number of data analysis problems. However, in the era of the popularity of the Internet of Things (IoT), transmitting sensing data back to the cloud for centralized data analysis will consume a lot of wireless communication and network transmission costs. To solve the above problems, edge computing has become a promising solution. In this paper, we propose a new algorithm for processing probabilistic skyline queries over uncertain data streams in an edge computing environment. We use the concept of a second skyline set to filter data that is unlikely to be the result of the skyline. Besides, the edge server only sends the information needed to update the global analysis results on the cloud server, which will greatly reduce the amount of data transmitted over the network. The results show that our proposed method not only reduces the response time by more than 50% compared with the brute force method on two-dimensional data but also maintains the leading processing speed on high-dimensional data.



قيم البحث

اقرأ أيضاً

Extracting the valuable features and information in Big Data has become one of the important research issues in Data Science. In most Internet of Things (IoT) applications, the collected data are uncertain and imprecise due to sensor device variation s or transmission errors. In addition, the sensing data may change as time evolves. We refer an uncertain data stream as a dataset that has velocity, veracity, and volume properties simultaneously. This paper employs the parallelism in edge computing environments to facilitate the top-k dominating query process over multiple uncertain IoT data streams. The challenges of this problem include how to quickly update the result for processing uncertainty and reduce the computation cost as well as provide highly accurate results. By referring to the related existing papers for certain data, we provide an effective probabilistic top-k dominating query process on uncertain data streams, which can be parallelized easily. After discussing the properties of the proposed approach, we validate our methods through the complexity analysis and extensive simulated experiments. In comparison with the existing works, the experimental results indicate that our method can improve almost 60% computation time, reduce nearly 20% communication cost between servers, and provide highly accurate results in most scenarios.
A Range-Skyline Query (RSQ) is the combination of range query and skyline query. It is one of the practical query types in multi-criteria decision services, which may include the spatial and non-spatial information as well as make the resulting infor mation more useful than skyline search when the location is concerned. Furthermore, Continuous Range-Skyline Query (CRSQ) is an extension of Range-Skyline Query (RSQ) that the system continuously reports the skyline results to a query within a given search range. This work focuses on the RSQ and CRSQ within a specific range on Internet of Mobile Things (IoMT) applications. Many server-client approaches for CRSQ have been proposed but are sensitive to the number of moving objects. We propose an effective and non-centralized approach, Distributed Continuous Range-Skyline Query process (DCRSQ process), for supporting RSQ and CRSQ in mobile environments. By considering the mobility, the proposed approach can predict the time when an object falls in the query range and ignore more irrelevant information when deriving the results, thus saving the computation overhead. The proposed approach, DCRSQ process, is analyzed on cost and validated with extensive simulated experiments. The results show that DCRSQ process outperforms the existing approaches in different scenarios and aspects.
Graph feature extraction is a fundamental task in graphs analytics. Using feature vectors (graph descriptors) in tandem with data mining algorithms that operate on Euclidean data, one can solve problems such as classification, clustering, and anomaly detection on graph-structured data. This idea has proved fruitful in the past, with spectral-based graph descriptors providing state-of-the-art classification accuracy on benchmark datasets. However, these algorithms do not scale to large graphs since: 1) they require storing the entire graph in memory, and 2) the end-user has no control over the algorithms runtime. In this paper, we present single-pass streaming algorithms to approximate structural features of graphs (counts of subgraphs of order $k geq 4$). Operating on edge streams allows us to avoid keeping the entire graph in memory, and controlling the sample size enables us to control the time taken by the algorithm. We demonstrate the efficacy of our descriptors by analyzing the approximation error, classification accuracy, and scalability to massive graphs. Our experiments showcase the effect of the sample size on approximation error and predictive accuracy. The proposed descriptors are applicable on graphs with millions of edges within minutes and outperform the state-of-the-art descriptors in classification accuracy.
126 - A.V. Vaniachine 2013
The ever-increasing volumes of scientific data present new challenges for distributed computing and Grid technologies. The emerging Big Data revolution drives exploration in scientific fields including nanotechnology, astrophysics, high-energy physic s, biology and medicine. New initiatives are transforming data-driven scientific fields enabling massive data analysis in new ways. In petascale data processing scientists deal with datasets, not individual files. As a result, a task (comprised of many jobs) became a unit of petascale data processing on the Grid. Splitting of a large data processing task into jobs enabled fine-granularity checkpointing analogous to the splitting of a large file into smaller TCP/IP packets during data transfers. Transferring large data in small packets achieves reliability through automatic re-sending of the dropped TCP/IP packets. Similarly, transient job failures on the Grid can be recovered by automatic re-tries to achieve reliable six sigma production quality in petascale data processing on the Grid. The computing experience of the ATLAS and CMS experiments provides foundation for reliability engineering scaling up Grid technologies for data processing beyond the petascale.
Data intensive applications on clusters often require requests quickly be sent to the node managing the desired data. In many applications, one must look through a sorted tree structure to determine the responsible node for accessing or storing the d ata. Examples include object tracking in sensor networks, packet routing over the internet, request processing in publish-subscribe middleware, and query processing in database systems. When the tree structure is larger than the CPU cache, the standard implementation potentially incurs many cache misses for each lookup; one cache miss at each successive level of the tree. As the CPU-RAM gap grows, this performance degradation will only become worse in the future. We propose a solution that takes advantage of the growing speed of local area networks for clusters. We split the sorted tree structure among the nodes of the cluster. We assume that the structure will fit inside the aggregation of the CPU caches of the entire cluster. We then send a word over the network (as part of a larger packet containing other words) in order to examine the tree structure in another nodes CPU cache. We show that this is often faster than the standard solution, which locally incurs multiple cache misses while accessing each successive level of the tree.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا