Do you want to publish a course? Click here

A Ring Model for Data Anomalies

124   0   0.0 ( 0 )
 Added by Yearn Li
 Publication date 2021
and research's language is English




Ask ChatGPT about the research

A distributed system keeps consistency by disallowing data anomalies. However, especially in the database, the definitions of data anomalies in the current ANSI standard are controversial. The standard does not include all anomalies and does not introduce characters of anomalies. First, the definitions lack a mathematical formalization and cause ambiguous interpretations. Second, the definitions of anomalies are case-by-case, which could not have a comprehensive understanding of data anomalies. In this paper, we propose a ring anomalies detection method (the bingo model) in the distribution system and applying it to databases. The bingo model introduces anomalies construction and gives the base anomalies formalization method. Based on anomalies we propose consistency levels. We prove the simplified anomaly rings in the model to classified anomalies to give the independent consistency levels. We specify the bingo model to databases and find 22 anomalies in addition to existing anomalies.



rate research

Read More

Cloud computing refers to maximizing efficiency by sharing computational and storage resources, while data-parallel systems exploit the resources available in the cloud to perform parallel transformations over large amounts of data. In the same line, considerable emphasis has been recently given to two apparently disjoint research topics: data-parallel, and eventually consistent, distributed systems. Declarative networking has been recently proposed to ease the task of programming in the cloud, by allowing the programmer to express only the desired result and leave the implementation details to the responsibility of the run-time system. In this context, we propose a study on a logic-programming-based computational model for eventually consistent, data-parallel systems, the keystone of which is provided by the recent finding that the class of programs that can be computed in an eventually consistent, coordination-free way is that of monotonic programs. This principle is called CALM and has been proven by Ameloot et al. for distributed, asynchronous settings. We advocate that CALM should be employed as a basic theoretical tool also for data-parallel systems, wherein computation usually proceeds synchronously in rounds and where communication is assumed to be reliable. It is general opinion that coordination-freedom can be seen as a major discriminant factor. In this work we make the case that the current form of CALM does not hold in general for data-parallel systems, and show how, using novel techniques, the satisfiability of the CALM principle can still be obtained although just for the subclass of programs called connected monotonic queries. We complete the study with considerations on the relationships between our model and the one employed by Ameloot et al., showing that our techniques subsume the latter when the synchronization constraints imposed on the system are loosened.
As machine learning systems become democratized, it becomes increasingly important to help users easily debug their models. However, current data tools are still primitive when it comes to helping users trace model performance problems all the way to the data. We focus on the particular problem of slicing data to identify subsets of the validation data where the model performs poorly. This is an important problem in model validation because the overall model performance can fail to reflect that of the smaller subsets, and slicing allows users to analyze the model performance on a more granular-level. Unlike general techniques (e.g., clustering) that can find arbitrary slices, our goal is to find interpretable slices (which are easier to take action compared to arbitrary subsets) that are problematic and large. We propose Slice Finder, which is an interactive framework for identifying such slices using statistical techniques. Applications include diagnosing model fairness and fraud detection, where identifying slices that are interpretable to humans is crucial. This research is part of a larger trend of Big data and Artificial Intelligence (AI) integration and opens many opportunities for new research.
Data cleaning is the initial stage of any machine learning project and is one of the most critical processes in data analysis. It is a critical step in ensuring that the dataset is devoid of incorrect or erroneous data. It can be done manually with data wrangling tools, or it can be completed automatically with a computer program. Data cleaning entails a slew of procedures that, once done, make the data ready for analysis. Given its significance in numerous fields, there is a growing interest in the development of efficient and effective data cleaning frameworks. In this survey, some of the most recent advancements of data cleaning approaches are examined for their effectiveness and the future research directions are suggested to close the gap in each of the methods.
97 - Y. H. Ahn , Sin Kyu Kang 2019
We interpret the neutrino anomalies in neutrino oscillation experiments and the high energy neutrino events at IceCube in terms of neutrino oscillations in an extension of the standard model where three sterile neutrinos are introduced so as to make two light neutrinos to be Pseudo-Dirac particles and a light neutrino to be a Majorana particle. Our model is different from the so-called $3+n$ model with $n$ sterile neutrinos suggested to interpret short baseline anomalies in terms of neutrino oscillations. While the Pontecorvo-Maki-Nakagawa-Sakata (PMNS) matrix in $3+n$ model is simply extended to $ntimes n$ unitary matrix, the neutrino mixing matrix in our model is parameterized so as to keep the $3times3$ PMNS mixing matrix for three active neutrinos unitary. There are also no flavor changing neutral current interactions leading to the conversion of active neutrinos to sterile ones or vice versa. We derive new forms of neutrino oscillation probabilities containing the new interference between the active and sterile neutrinos which are characterized by additional new parameters $Delta m^2$ and $theta$. Based on the new formulae derived, we show how the short baseline neutrino anomalies can be explained in terms of oscillations, and study the implication of the high energy neutrino events detected at IceCube on the probe of pseudo-Dirac neutrinos. New phenomenological effects attributed to the existence of the sterile neutrinos are discussed.
77 - Peizhi Wu , Gao Cong 2021
Cardinality estimation is a fundamental problem in database systems. To capture the rich joint data distributions of a relational table, most of the existing work either uses data as unsupervised information or uses query workload as supervised information. Very little work has been done to use both types of information, and cannot fully make use of both types of information to learn the joint data distribution. In this work, we aim to close the gap between data-driven and query-driven methods by proposing a new unified deep autoregressive model, UAE, that learns the joint data distribution from both the data and query workload. First, to enable using the supervised query information in the deep autoregressive model, we develop differentiable progressive sampling using the Gumbel-Softmax trick. Second, UAE is able to utilize both types of information to learn the joint data distribution in a single model. Comprehensive experimental results demonstrate that UAE achieves single-digit multiplicative error at tail, better accuracies over state-of-the-art methods, and is both space and time efficient.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا