Data Science through the looking glass and what we found there

99 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Subru Krishnan

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Fotis Psallidas - Yiwen Zhu - Bojan Karlas

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The recent success of machine learning (ML) has led to an explosive growth both in terms of new systems and algorithms built in industry and academia, and new applications built by an ever-growing community of data science (DS) practitioners. This quickly shifting panorama of technologies and applications is challenging for builders and practitioners alike to follow. In this paper, we set out to capture this panorama through a wide-angle lens, by performing the largest analysis of DS projects to date, focusing on questions that can help determine investments on either side. Specifically, we download and analyze: (a) over 6M Python notebooks publicly available on GITHUB, (b) over 2M enterprise DS pipelines developed within COMPANYX, and (c) the source code and metadata of over 900 releases from 12 important DS libraries. The analysis we perform ranges from coarse-grained statistical characterizations to analysis of library imports, pipelines, and comparative studies across datasets and time. We report a large number of measurements for our readers to interpret, and dare to draw a few (actionable, yet subjective) conclusions on (a) what systems builders should focus on to better serve practitioners, and (b) what technologies should practitioners bet on given current trends. We plan to automate this analysis and release associated tools and results periodically.

قيم البحث

392 - Mohammad Hossein Namaki , Avrilia Floratou , Fotis Psallidas 2020

There has recently been a lot of ongoing research in the areas of fairness, bias and explainability of machine learning (ML) models due to the self-evident or regulatory requirements of various ML applications. We make the following observation: All of these approaches require a robust understanding of the relationship between ML models and the data used to train them. In this work, we introduce the ML provenance tracking problem: the fundamental idea is to automatically track which columns in a dataset have been used to derive the features/labels of an ML model. We discuss the challenges in capturing such information in the context of Python, the most common language used by data scientists. We then present Vamsa, a modular system that extracts provenance from Python scripts without requiring any changes to the users code. Using 26K real data science scripts, we verify the effectiveness of Vamsa in terms of coverage, and performance. We also evaluate Vamsas accuracy on a smaller subset of manually labeled data. Our analysis shows that Vamsas precision and recall range from 90.4% to 99.1% and its latency is in the order of milliseconds for average size scripts. Drawing from our experience in deploying ML models in production, we also present an example in which Vamsa helps automatically identify models that are affected by data corruption issues.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية التعلم الالي

With Wronskian through the Looking Glass

93 - Vassily Gorbounov , Vadim Schechtman 2020

In the work of Mukhin and Varchenko from 2002 there was introduced a Wronskian map from the variety of full flags in a finite dimensional vector space into a product of projective spaces. We establish a precise relationship between this map and the P lucker map. This allows us to recover the result of Varchenko and Wright saying that the polynomials appearing in the image of the Wronsky map are the initial values of the tau-functions for the Kadomtsev-Petviashvili hierarchy.

نظرية التمثيل

Neutrino oscillation probabilities through the looking glass

71 - Gabriela Barenboim , Peter B. Denton , Stephen J. Parke 2019

In this paper we review different expansions for neutrino oscillation probabilities in matter in the context of long-baseline neutrino experiments. We examine the accuracy and computational efficiency of different exact and approximate expressions. W e find that many of the expressions used in the literature are not precise enough for the next generation of long-baseline experiments, but several of them are while maintaining comparable simplicity. The results of this paper can be used as guidance to both phenomenologists and experimentalists when implementing the various oscillation expressions into their analysis tools.

فيزياء الطاقة العالية - الظواهر

Cubic Equations Through the Looking Glass of Sylvester

128 - William Y. C. Chen 2021

One can hardly believe that there is still something to be said about cubic equations. To dodge this doubt, we will instead try and say something about Sylvester. He doubtless found a way to solve cubic equations. As mentioned by Rota, it was the onl y method in this vein that he could remember. We realize that Sylvesters magnificent approach for reduced cubic equations boils down to an easy identity.

تاريخ الرياضيات التوافقية

The ALMA Early Science view of FUor/EXor objects. I. Through the looking-glass of V2775 Ori

302 - Alice Zurlo , Lucas A. Cieza , Jonathan P. Williams 2016

As part of an ALMA survey to study the origin of episodic accretion in young eruptive variables, we have observed the circumstellar environment of the star V2775 Ori. This object is a very young, pre-main sequence object which displays a large amplit ude outburst characteristic of the FUor class. We present Cycle-2 band 6 observations of V2775 Ori with a continuum and CO (2-1) isotopologue resolution of 0.25as (103 au). We report the detection of a marginally resolved circumstellar disc in the ALMA continuum with an integrated flux of $106 pm 2$ mJy, characteristic radius of $sim$ 30 au, inclination of $14.0^{+7.8}_{-14.5}$ deg, and is oriented nearly face-on with respect to the plane of the sky. The co~emission is separated into distinct blue and red-shifted regions that appear to be rings or shells of expanding material from quasi-episodic outbursts. The system is oriented in such a way that the disc is seen through the outflow remnant of V2775 Ori, which has an axis along our line-of-sight. The $^{13}$CO emission displays similar structure to that of the co, while the C$^{18}$O line emission is very weak. We calculated the expansion velocities of the low- and medium-density material with respect to the disc to be of -2.85 km s$^{-1}$ (blue), 4.4 km s$^{-1}$ (red) and -1.35 and 1.15 km s$^{-1}$ (for blue and red) and we derived the mass, momentum and kinetic energy of the expanding gas. The outflow has an hourglass shape where the cavities are not seen. We interpret the shapes that the gas traces as cavities excavated by an ancient outflow. We report a detection of line emission from the circumstellar disc and derive a lower limit of the gas mass of 3 MJup.

الفيزياء الفلكية الشمسية والنجوم الفيزياء الفلكية من المجرات الأجهزة والأساليب للزيئات الفيزياء الفلكية