بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

An Unified Definition of Data Mining

191 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Christoph Schommer

تاريخ النشر 2008

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Christoph Schommer

الحساب الرمزي أجهزة الكمبيوتر والمجتمع

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Since many years, theoretical concepts of Data Mining have been developed and improved. Data Mining has become applied to many academic and industrial situations, and recently, soundings of public opinion about privacy have been carried out. However, a consistent and standardized definition is still missing, and the initial explanation given by Frawley et al. has pragmatically often changed over the years. Furthermore, alternative terms like Knowledge Discovery have been conjured and forged, and a necessity of a Data Warehouse has been endeavoured to persuade the users. In this work, we pick up current definitions and introduce an unified definition that covers existing attempted explanations. For this, we appeal to the natural original of chemical states of aggregation.

قيم البحث

32 - Nicolas Tempelmeier , Udo Feuerhake , Oskar Wage 2021

The discovery of spatio-temporal dependencies within urban road networks that cause Recurrent Congestion (RC) patterns is crucial for numerous real-world applications, including urban planning and scheduling of public transportation services. While m ost existing studies investigate temporal patterns of RC phenomena, the influence of the road network topology on RC is often overlooked. This article proposes the ST-Discovery algorithm, a novel unsupervised spatio-temporal data mining algorithm that facilitates the effective data-driven discovery of RC dependencies induced by the road network topology using real-world traffic data. We factor out regularly reoccurring traffic phenomena, such as rush hours, mainly induced by the daytime, by modelling and systematically exploiting temporal traffic load outliers. We present an algorithm that first constructs connected subgraphs of the road network based on the traffic speed outliers. Second, the algorithm identifies pairs of subgraphs that indicate spatio-temporal correlations in their traffic load behaviour to identify topological dependencies within the road network. Finally, we rank the identified subgraph pairs based on the dependency score determined by our algorithm. Our experimental results demonstrate that ST-Discovery can effectively reveal topological dependencies in urban road networks.

التعلم الآلي أجهزة الكمبيوتر والمجتمع

An Exploratory Study on Utilising the Web of Linked Data for Product Data Mining

74 - Ziqi Zhang , Xingyi Song 2021

The Linked Open Data practice has led to a significant growth of structured data on the Web in the last decade. Such structured data describe real-world entities in a machine-readable way, and have created an unprecedented opportunity for research in the field of Natural Language Processing. However, there is a lack of studies on how such data can be used, for what kind of tasks, and to what extent they can be useful for these tasks. This work focuses on the e-commerce domain to explore methods of utilising such structured data to create language resources that may be used for product classification and linking. We process billions of structured data points in the form of RDF n-quads, to create multi-million words of product-related corpora that are later used in three different ways for creating of language resources: training word embedding models, continued pre-training of BERT-like language models, and training Machine Translation models that are used as a proxy to generate product-related keywords. Our evaluation on an extensive set of benchmarks shows word embeddings to be the most reliable and consistent method to improve the accuracy on both tasks (with up to 6.9 percentage points in macro-average F1 on some datasets). The other two methods however, are not as useful. Our analysis shows that this could be due to a number of reasons, including the biased domain representation in the structured data and lack of vocabulary coverage. We share our datasets and discuss how our lessons learned could be taken forward to inform future research in this direction.

الحساب واللغة الذكاء الاصطناعي

How the Taiwanese Do China Studies: Applications of Text Mining

61 - Hsuan-Lei Shao , Sieh-Chuen Huang , Yun-Cheng Tsai 2018

With the rapid evolution of cross-strait situation, Mainland China as a subject of social science study has evoked the voice of Rethinking China Study among intelligentsia recently. This essay tried to apply an automatic content analysis tool (CATAR) to the journal Mainland China Studies (1998-2015) in order to observe the research trends based on the clustering of text from the title and abstract of each paper in the journal. The results showed that the 473 articles published by the journal were clustered into seven salient topics. From the publication number of each topic over time (including volume of publications, percentage of publications), there are two major topics of this journal while other topics varied over time widely. The contribution of this study includes: 1. We could group each independent study into a meaningful topic, as a small scale experiment verified that this topic clustering is feasible. 2. This essay reveals the salient research topics and their trends for the Taiwan journal Mainland China Studies. 3. Various topical keywords were identified, providing easy access to the past study. 4. The yearly trends of the identified topics could be viewed as signature of future research directions.

المكتبات الرقمية أجهزة الكمبيوتر والمجتمع

HMC, an Algorithms in Data Mining, the Functional Analysis approach

89 - Soumyadip Ghosh , Yingdong Lu , Tomasz Nowicki 2021

The main purpose of this paper is to facilitate the communication between the Analytic, Probabilistic and Algorithmic communities. We present a proof of convergence of the Hamiltonian (Hybrid) Monte Carlo algorithm from the point of view of the D ynamical Systems, where the evolving objects are densities of probability distributions and the tool are derived from the Functional Analysis.

حساب التعلم الآلي النظم الديناميكية

Data Mining on Crash Simulation Data

99 - A. Kuhlmann , R.-M. Vetter , Ch. Luebbing 2005

The work presented in this paper is part of the cooperative research project AUTO-OPT carried out by twelve partners from the automotive industries. One major work package concerns the application of data mining methods in the area of automotive desi gn. Suitable methods for data preparation and data analysis are developed. The objective of the work is the re-use of data stored in the crash-simulation department at BMW in order to gain deeper insight into the interrelations between the geometric variations of the car during its design and its performance in crash testing. In this paper a method for data analysis of finite element models and results from crash simulation is proposed and application to recent data from the industrial partner BMW is demonstrated. All necessary steps from data pre-processing to re-integration into the working environment of the engineer are covered.

استرجاع المعلومات الهندسة الحاسوبية، المالية،العلوم

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة دمشق

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

An Unified Definition of Data Mining

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً