ﻻ يوجد ملخص باللغة العربية
A distributed system keeps consistency by disallowing data anomalies. However, especially in the database, the definitions of data anomalies in the current ANSI standard are controversial. The standard does not include all anomalies and does not introduce characters of anomalies. First, the definitions lack a mathematical formalization and cause ambiguous interpretations. Second, the definitions of anomalies are case-by-case, which could not have a comprehensive understanding of data anomalies. In this paper, we propose a ring anomalies detection method (the bingo model) in the distribution system and applying it to databases. The bingo model introduces anomalies construction and gives the base anomalies formalization method. Based on anomalies we propose consistency levels. We prove the simplified anomaly rings in the model to classified anomalies to give the independent consistency levels. We specify the bingo model to databases and find 22 anomalies in addition to existing anomalies.
Cloud computing refers to maximizing efficiency by sharing computational and storage resources, while data-parallel systems exploit the resources available in the cloud to perform parallel transformations over large amounts of data. In the same line,
As machine learning systems become democratized, it becomes increasingly important to help users easily debug their models. However, current data tools are still primitive when it comes to helping users trace model performance problems all the way to
Data cleaning is the initial stage of any machine learning project and is one of the most critical processes in data analysis. It is a critical step in ensuring that the dataset is devoid of incorrect or erroneous data. It can be done manually with d
We interpret the neutrino anomalies in neutrino oscillation experiments and the high energy neutrino events at IceCube in terms of neutrino oscillations in an extension of the standard model where three sterile neutrinos are introduced so as to make
Cardinality estimation is a fundamental problem in database systems. To capture the rich joint data distributions of a relational table, most of the existing work either uses data as unsupervised information or uses query workload as supervised infor