ترغب بنشر مسار تعليمي؟ اضغط هنا

Automating Data Science: Prospects and Challenges

354   0   0.0 ( 0 )
 نشر من قبل Chris Williams
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Given the complexity of typical data science projects and the associated demand for human expertise, automation has the potential to transform the data science process. Key insights: * Automation in data science aims to facilitate and transform the work of data scientists, not to replace them. * Important parts of data science are already being automated, especially in the modeling stages, where techniques such as automated machine learning (AutoML) are gaining traction. * Other aspects are harder to automate, not only because of technological challenges, but because open-ended and context-dependent tasks require human interaction.



قيم البحث

اقرأ أيضاً

There is growing interest in the use of Knowledge Graphs (KGs) for the representation, exchange, and reuse of scientific data. While KGs offer the prospect of improving the infrastructure for working with scalable and reusable scholarly data consiste nt with the FAIR (Findability, Accessibility, Interoperability, and Reusability) principles, the state-of-the-art Data Management Systems (DMSs) for processing large KGs leave somewhat to be desired. In this paper, we studied the performance of some of the major DMSs in the context of querying KGs with the goal of providing a finely-grained, comparative analysis of DMSs representing each of the four major DMS types. We experimented with four well-known scientific KGs, namely, Allie, Cellcycle, DrugBank, and LinkedSPL against Virtuoso, Blazegraph, RDF-3X, and MongoDB as the representative DMSs. Our results suggest that the DMSs display limitations in processing complex queries on the KG datasets. Depending on the query type, the performance differentials can be several orders of magnitude. Also, no single DMS appears to offer consistently superior performance. We present an analysis of the underlying issues and outline two integrated approaches and proposals for resolving the problem.
While manufacturers have been generating highly distributed data from various systems, devices and applications, a number of challenges in both data management and data analysis require new approaches to support the big data era. These challenges for industrial big data analytics is real-time analysis and decision-making from massive heterogeneous data sources in manufacturing space. This survey presents new concepts, methodologies, and applications scenarios of industrial big data analytics, which can provide dramatic improvements in velocity and veracity problem solving. We focus on five important methodologies of industrial big data analytics: 1) Highly distributed industrial data ingestion: access and integrate to highly distributed data sources from various systems, devices and applications; 2) Industrial big data repository: cope with sampling biases and heterogeneity, and store different data formats and structures; 3) Large-scale industrial data management: organizes massive heterogeneous data and share large-scale data; 4) Industrial data analytics: track data provenance, from data generation through data preparation; 5) Industrial data governance: ensures data trust, integrity and security. For each phase, we introduce to current research in industries and academia, and discusses challenges and potential solutions. We also examine the typical applications of industrial big data, including smart factory visibility, machine fleet, energy management, proactive maintenance, and just in time supply chain. These discussions aim to understand the value of industrial big data. Lastly, this survey is concluded with a discussion of open problems and future directions.
In time-domain astronomy, we need to use the relational database to manage star catalog data. With the development of sky survey technology, the size of star catalog data is larger, and the speed of data generation is faster. So, in this paper, we ma ke a systematic and comprehensive introduction to process the data in time-domain astronomy, and valuable research questions are detailed. Then, we list candidate systems usually used in astronomy and point out the advantages and disadvantages of these systems. In addition, we present the key techniques needed to deal with astronomical data. Finally, we summarize the challenges faced by the design of our database prototype.
This paper proposes a composable Just in Time Architecture for Data Science (DS) Pipelines named JITA-4DS and associated resource management techniques for configuring disaggregated data centers (DCs). DCs under our approach are composable based on v ertical integration of the application, middleware/operating system, and hardware layers customized dynamically to meet application Service Level Objectives (SLO - application-aware management). Thereby, pipelines utilize a set of flexible building blocks that can be dynamically and automatically assembled and re-assembled to meet the dynamic changes in the workloads SLOs. To assess disaggregated DCs, we study how to model and validate their performance in large-scale settings.
A growing number of applications that generate massive streams of data need intelligent data processing and online analysis. Real-time surveillance systems, telecommunication systems, sensor networks and other dynamic environments are such examples. The imminent need for turning such data into useful information and knowledge augments the development of systems, algorithms and frameworks that address streaming challenges. The storage, querying and mining of such data sets are highly computationally challenging tasks. Mining data streams is concerned with extracting knowledge structures represented in models and patterns in non stopping streams of information. Generally, two main challenges are designing fast mining methods for data streams and need to promptly detect changing concepts and data distribution because of highly dynamic nature of data streams. The goal of this article is to analyze and classify the application of diverse data mining techniques in different challenges of data stream mining. In this paper, we present the theoretical foundations of data stream analysis and propose an analytical framework for data stream mining techniques.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا