ترغب بنشر مسار تعليمي؟ اضغط هنا

An Unified Definition of Data Mining

37   0   0.0 ( 0 )
 نشر من قبل Christoph Schommer
 تاريخ النشر 2008
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Since many years, theoretical concepts of Data Mining have been developed and improved. Data Mining has become applied to many academic and industrial situations, and recently, soundings of public opinion about privacy have been carried out. However, a consistent and standardized definition is still missing, and the initial explanation given by Frawley et al. has pragmatically often changed over the years. Furthermore, alternative terms like Knowledge Discovery have been conjured and forged, and a necessity of a Data Warehouse has been endeavoured to persuade the users. In this work, we pick up current definitions and introduce an unified definition that covers existing attempted explanations. For this, we appeal to the natural original of chemical states of aggregation.

قيم البحث

اقرأ أيضاً

The discovery of spatio-temporal dependencies within urban road networks that cause Recurrent Congestion (RC) patterns is crucial for numerous real-world applications, including urban planning and scheduling of public transportation services. While m ost existing studies investigate temporal patterns of RC phenomena, the influence of the road network topology on RC is often overlooked. This article proposes the ST-Discovery algorithm, a novel unsupervised spatio-temporal data mining algorithm that facilitates the effective data-driven discovery of RC dependencies induced by the road network topology using real-world traffic data. We factor out regularly reoccurring traffic phenomena, such as rush hours, mainly induced by the daytime, by modelling and systematically exploiting temporal traffic load outliers. We present an algorithm that first constructs connected subgraphs of the road network based on the traffic speed outliers. Second, the algorithm identifies pairs of subgraphs that indicate spatio-temporal correlations in their traffic load behaviour to identify topological dependencies within the road network. Finally, we rank the identified subgraph pairs based on the dependency score determined by our algorithm. Our experimental results demonstrate that ST-Discovery can effectively reveal topological dependencies in urban road networks.
74 - Ziqi Zhang , Xingyi Song 2021
The Linked Open Data practice has led to a significant growth of structured data on the Web in the last decade. Such structured data describe real-world entities in a machine-readable way, and have created an unprecedented opportunity for research in the field of Natural Language Processing. However, there is a lack of studies on how such data can be used, for what kind of tasks, and to what extent they can be useful for these tasks. This work focuses on the e-commerce domain to explore methods of utilising such structured data to create language resources that may be used for product classification and linking. We process billions of structured data points in the form of RDF n-quads, to create multi-million words of product-related corpora that are later used in three different ways for creating of language resources: training word embedding models, continued pre-training of BERT-like language models, and training Machine Translation models that are used as a proxy to generate product-related keywords. Our evaluation on an extensive set of benchmarks shows word embeddings to be the most reliable and consistent method to improve the accuracy on both tasks (with up to 6.9 percentage points in macro-average F1 on some datasets). The other two methods however, are not as useful. Our analysis shows that this could be due to a number of reasons, including the biased domain representation in the structured data and lack of vocabulary coverage. We share our datasets and discuss how our lessons learned could be taken forward to inform future research in this direction.
With the rapid evolution of cross-strait situation, Mainland China as a subject of social science study has evoked the voice of Rethinking China Study among intelligentsia recently. This essay tried to apply an automatic content analysis tool (CATAR) to the journal Mainland China Studies (1998-2015) in order to observe the research trends based on the clustering of text from the title and abstract of each paper in the journal. The results showed that the 473 articles published by the journal were clustered into seven salient topics. From the publication number of each topic over time (including volume of publications, percentage of publications), there are two major topics of this journal while other topics varied over time widely. The contribution of this study includes: 1. We could group each independent study into a meaningful topic, as a small scale experiment verified that this topic clustering is feasible. 2. This essay reveals the salient research topics and their trends for the Taiwan journal Mainland China Studies. 3. Various topical keywords were identified, providing easy access to the past study. 4. The yearly trends of the identified topics could be viewed as signature of future research directions.
The main purpose of this paper is to facilitate the communication between the Analytic, Probabilistic and Algorithmic communities. We present a proof of convergence of the Hamiltonian (Hybrid) Monte Carlo algorithm from the point of view of the D ynamical Systems, where the evolving objects are densities of probability distributions and the tool are derived from the Functional Analysis.
The work presented in this paper is part of the cooperative research project AUTO-OPT carried out by twelve partners from the automotive industries. One major work package concerns the application of data mining methods in the area of automotive desi gn. Suitable methods for data preparation and data analysis are developed. The objective of the work is the re-use of data stored in the crash-simulation department at BMW in order to gain deeper insight into the interrelations between the geometric variations of the car during its design and its performance in crash testing. In this paper a method for data analysis of finite element models and results from crash simulation is proposed and application to recent data from the industrial partner BMW is demonstrated. All necessary steps from data pre-processing to re-integration into the working environment of the engineer are covered.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا