بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Preprocessing: A Prerequisite for Discovering Patterns in WUM Process

343 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ramya C

تاريخ النشر 2011

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف C. Ramya - K S Shreedhara - G Kavitha

قواعد البيانات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Web log data is usually diverse and voluminous. This data must be assembled into a consistent, integrated and comprehensive view, in order to be used for pattern discovery. Without properly cleaning, transforming and structuring the data prior to the analysis, one cannot expect to find meaningful patterns. As in most data mining applications, data preprocessing involves removing and filtering redundant and irrelevant data, removing noise, transforming and resolving any inconsistencies. In this paper, a complete preprocessing methodology having merging, data cleaning, user/session identification and data formatting and summarization activities to improve the quality of data by reducing the quantity of data has been proposed. To validate the efficiency of the proposed preprocessing methodology, several experiments are conducted and the results show that the proposed methodology reduces the size of Web access log files down to 73-82% of the initial size and offers richer logs that are structured for further stages of Web Usage Mining (WUM). So preprocessing of raw data in this WUM process is the central theme of this paper.

قيم البحث

438 - C. Ramya , G. Kavitha , Dr. K. S. Shreedhara 2011

Web log data is usually diverse and voluminous. This data must be assembled into a consistent, integrated and comprehensive view, in order to be used for pattern discovery. Without properly cleaning, transforming and structuring the data prior to the analysis, one cannot expect to find meaningful patterns. As in most data mining applications, data preprocessing involves removing and filtering redundant and irrelevant data, removing noise, transforming and resolving any inconsistencies. In this paper, a complete preprocessing methodology having merging, data cleaning, user/session identification and data formatting and summarization activities to improve the quality of data by reducing the quantity of data has been proposed. To validate the efficiency of the proposed preprocessing methodology, several experiments are conducted and the results show that the proposed methodology reduces the size of Web access log files down to 73-82% of the initial size and offers richer logs that are structured for further stages of Web Usage Mining (WUM). So preprocessing of raw data in this WUM process is the central theme of this paper.

قواعد البيانات

Discovering High Utility-Occupancy Patterns from Uncertain Data

135 - Chien-Ming Chen , Lili Chen , Wensheng Gan 2020

It is widely known that there is a lot of useful information hidden in big data, leading to a new saying that data is money. Thus, it is prevalent for individuals to mine crucial information for utilization in many real-world applications. In the pas t, studies have considered frequency. Unfortunately, doing so neglects other aspects, such as utility, interest, or risk. Thus, it is sensible to discover high-utility itemsets (HUIs) in transaction databases while utilizing not only the quantity but also the predefined utility. To find patterns that can represent the supporting transaction, a recent study was conducted to mine high utility-occupancy patterns whose contribution to the utility of the entire transaction is greater than a certain value. Moreover, in realistic applications, patterns may not exist in transactions but be connected to an existence probability. In this paper, a novel algorithm, called High-Utility-Occupancy Pattern Mining in Uncertain databases (UHUOPM), is proposed. The patterns found by the algorithm are called Potential High Utility Occupancy Patterns (PHUOPs). This algorithm divides user preferences into three factors, including support, probability, and utility occupancy. To reduce memory cost and time consumption and to prune the search space in the algorithm as mentioned above, probability-utility-occupancy list (PUO-list) and probability-frequency-utility table (PFU-table) are used, which assist in providing the downward closure property. Furthermore, an original tree structure, called support count tree (SC-tree), is constructed as the search space of the algorithm. Finally, substantial experiments were conducted to evaluate the performance of proposed UHUOPM algorithm on both real-life and synthetic datasets, particularly in terms of effectiveness and efficiency.

قواعد البيانات

An Efficient Preprocessing Methodology for Discovering Patterns and Clustering of Web Users using a Dynamic ART1 Neural Network

627 - C. Ramya , , G. Kavitha 2011

In this paper, a complete preprocessing methodology for discovering patterns in web usage mining process to improve the quality of data by reducing the quantity of data has been proposed. A dynamic ART1 neural network clustering algorithm to group us ers according to their Web access patterns with its neat architecture is also proposed. Several experiments are conducted and the results show the proposed methodology reduces the size of Web log files down to 73-82% of the initial size and the proposed ART1 algorithm is dynamic and learns relatively stable quality clusters.

الحوسبة العصبية والتطورية

Discovering Sequential Patterns in a UK General Practice Database

498 - Jenna Reps , Jonathan M. Garibaldi , Uwe Aickelin 2013

The wealth of computerised medical information becoming readily available presents the opportunity to examine patterns of illnesses, therapies and responses. These patterns may be able to predict illnesses that a patient is likely to develop, allowin g the implementation of preventative actions. In this paper sequential rule mining is applied to a General Practice database to find rules involving a patients age, gender and medical history. By incorporating these rules into current health-care a patient can be highlighted as susceptible to a future illness based on past or current illnesses, gender and year of birth. This knowledge has the ability to greatly improve health-care and reduce health-care costs.

التعلم الآلي الهندسة الحاسوبية، المالية،العلوم تطبيقات الإحصاء

Constant delay enumeration with FPT-preprocessing for conjunctive queries of bounded submodular width

66 - Christoph Berkholz , Nicole Schweikardt 2020

Marx (STOC~2010, J.~ACM 2013) introduced the notion of submodular width of a conjunctive query (CQ) and showed that for any class $Phi$ of Boolean CQs of bounded submodular width, the model-checking problem for $Phi$ on the class of all finite struct ures is fixed-parameter tractable (FPT). Note that for non-Boolean queries, the size of the query result may be far too large to be computed entirely within FPT time. We investigate the free-connex variant of submodular width and generalise Marxs result to non-Boolean queries as follows: For every class $Phi$ of CQs of bounded free-connex submodular width, within FPT-preprocessing time we can build a data structure that allows to enumerate, without repetition and with constant delay, all tuples of the query result. Our proof builds upon Marxs splitting routine to decompose the query result into a union of results; but we have to tackle the additional technical difficulty to ensure that these can be enumerated efficiently.

قواعد البيانات المنطق في علوم الحاسوب

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

الجامعة الافتراضية السورية

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Preprocessing: A Prerequisite for Discovering Patterns in WUM Process

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً