ترغب بنشر مسار تعليمي؟ اضغط هنا

Data Science through the looking glass and what we found there

99   0   0.0 ( 0 )
 نشر من قبل Subru Krishnan
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

The recent success of machine learning (ML) has led to an explosive growth both in terms of new systems and algorithms built in industry and academia, and new applications built by an ever-growing community of data science (DS) practitioners. This quickly shifting panorama of technologies and applications is challenging for builders and practitioners alike to follow. In this paper, we set out to capture this panorama through a wide-angle lens, by performing the largest analysis of DS projects to date, focusing on questions that can help determine investments on either side. Specifically, we download and analyze: (a) over 6M Python notebooks publicly available on GITHUB, (b) over 2M enterprise DS pipelines developed within COMPANYX, and (c) the source code and metadata of over 900 releases from 12 important DS libraries. The analysis we perform ranges from coarse-grained statistical characterizations to analysis of library imports, pipelines, and comparative studies across datasets and time. We report a large number of measurements for our readers to interpret, and dare to draw a few (actionable, yet subjective) conclusions on (a) what systems builders should focus on to better serve practitioners, and (b) what technologies should practitioners bet on given current trends. We plan to automate this analysis and release associated tools and results periodically.

قيم البحث

اقرأ أيضاً

There has recently been a lot of ongoing research in the areas of fairness, bias and explainability of machine learning (ML) models due to the self-evident or regulatory requirements of various ML applications. We make the following observation: All of these approaches require a robust understanding of the relationship between ML models and the data used to train them. In this work, we introduce the ML provenance tracking problem: the fundamental idea is to automatically track which columns in a dataset have been used to derive the features/labels of an ML model. We discuss the challenges in capturing such information in the context of Python, the most common language used by data scientists. We then present Vamsa, a modular system that extracts provenance from Python scripts without requiring any changes to the users code. Using 26K real data science scripts, we verify the effectiveness of Vamsa in terms of coverage, and performance. We also evaluate Vamsas accuracy on a smaller subset of manually labeled data. Our analysis shows that Vamsas precision and recall range from 90.4% to 99.1% and its latency is in the order of milliseconds for average size scripts. Drawing from our experience in deploying ML models in production, we also present an example in which Vamsa helps automatically identify models that are affected by data corruption issues.
In the work of Mukhin and Varchenko from 2002 there was introduced a Wronskian map from the variety of full flags in a finite dimensional vector space into a product of projective spaces. We establish a precise relationship between this map and the P lucker map. This allows us to recover the result of Varchenko and Wright saying that the polynomials appearing in the image of the Wronsky map are the initial values of the tau-functions for the Kadomtsev-Petviashvili hierarchy.
In this paper we review different expansions for neutrino oscillation probabilities in matter in the context of long-baseline neutrino experiments. We examine the accuracy and computational efficiency of different exact and approximate expressions. W e find that many of the expressions used in the literature are not precise enough for the next generation of long-baseline experiments, but several of them are while maintaining comparable simplicity. The results of this paper can be used as guidance to both phenomenologists and experimentalists when implementing the various oscillation expressions into their analysis tools.
128 - William Y. C. Chen 2021
One can hardly believe that there is still something to be said about cubic equations. To dodge this doubt, we will instead try and say something about Sylvester. He doubtless found a way to solve cubic equations. As mentioned by Rota, it was the onl y method in this vein that he could remember. We realize that Sylvesters magnificent approach for reduced cubic equations boils down to an easy identity.
As part of an ALMA survey to study the origin of episodic accretion in young eruptive variables, we have observed the circumstellar environment of the star V2775 Ori. This object is a very young, pre-main sequence object which displays a large amplit ude outburst characteristic of the FUor class. We present Cycle-2 band 6 observations of V2775 Ori with a continuum and CO (2-1) isotopologue resolution of 0.25as (103 au). We report the detection of a marginally resolved circumstellar disc in the ALMA continuum with an integrated flux of $106 pm 2$ mJy, characteristic radius of $sim$ 30 au, inclination of $14.0^{+7.8}_{-14.5}$ deg, and is oriented nearly face-on with respect to the plane of the sky. The co~emission is separated into distinct blue and red-shifted regions that appear to be rings or shells of expanding material from quasi-episodic outbursts. The system is oriented in such a way that the disc is seen through the outflow remnant of V2775 Ori, which has an axis along our line-of-sight. The $^{13}$CO emission displays similar structure to that of the co, while the C$^{18}$O line emission is very weak. We calculated the expansion velocities of the low- and medium-density material with respect to the disc to be of -2.85 km s$^{-1}$ (blue), 4.4 km s$^{-1}$ (red) and -1.35 and 1.15 km s$^{-1}$ (for blue and red) and we derived the mass, momentum and kinetic energy of the expanding gas. The outflow has an hourglass shape where the cavities are not seen. We interpret the shapes that the gas traces as cavities excavated by an ancient outflow. We report a detection of line emission from the circumstellar disc and derive a lower limit of the gas mass of 3 MJup.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا