ترغب بنشر مسار تعليمي؟ اضغط هنا

Untidy Data: The Unreasonable Effectiveness of Tables

140   0   0.0 ( 0 )
 نشر من قبل Michael Correll
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Working with data in table form is usually considered a preparatory and tedious step in the sensemaking pipeline; a way of getting the data ready for more sophisticated visualization and analytical tools. But for many people, spreadsheets -- the quintessential table tool -- remain a critical part of their information ecosystem, allowing them to interact with their data in ways that are hidden or abstracted in more complex tools. This is particularly true for data workers: people who work with data as part of their job but do not identify as professional analysts or data scientists. We report on a qualitative study of how these workers interact with and reason about their data. Our findings show that data tables serve a broader purpose beyond data cleanup at the initial stage of a linear analytic flow: users want to see and get their hands on the underlying data throughout the analytics process, reshaping and augmenting it to support sensemaking. They reorganize, mark up, layer on levels of detail, and spawn alternatives within the context of the base data. These direct interactions and human-readable table representations form a rich and cognitively important part of building understanding of what the data mean and what they can do with it. We argue that interactive tables are an important visualization idiom in their own right; that the direct data interaction they afford offers a fertile design space for visual analytics; and that sense making can be enriched by more flexible human-data interaction than is currently supported in visual analytics tools.



قيم البحث

اقرأ أيضاً

Current approaches for fine-grained recognition do the following: First, recruit experts to annotate a dataset of images, optionally also collecting more structured data in the form of part annotations and bounding boxes. Second, train a model utiliz ing this data. Toward the goal of solving fine-grained recognition, we introduce an alternative approach, leveraging free, noisy data from the web and simple, generic methods of recognition. This approach has benefits in both performance and scalability. We demonstrate its efficacy on four fine-grained datasets, greatly exceeding existing state of the art without the manual collection of even a single label, and furthermore show first results at scaling to more than 10,000 fine-grained categories. Quantitatively, we achieve top-1 accuracies of 92.3% on CUB-200-2011, 85.4% on Birdsnap, 93.4% on FGVC-Aircraft, and 80.8% on Stanford Dogs without using their annotated training sets. We compare our approach to an active learning approach for expanding fine-grained datasets.
165 - Roman Jackiw 1996
Quantum field theory offers physicists a tremendously wide range of application; it is both a language with which a vast variety of physical processes can be discussed and also it provides a model for fundamental physics, the so-called ``standard-mod el, which thus far has passed every experimental test. No other framework exists in which one can calculate so many phenomena with such ease and accuracy. Nevertheless, today some physicists have doubts about quantum field theory, and here I want to examine these reservations.
73 - Alfred Galichon 2021
Optimal transport has become part of the standard quantitative economics toolbox. It is the framework of choice to describe models of matching with transfers, but beyond that, it allows to: extend quantile regression; identify discrete choice models; provide new algorithms for computing the random coefficient logit model; and generalize the gravity model in trade. This paper offer a brief review of the basics of the theory, its applications to economics, and some extensions.
133 - Huy Ha , Shuran Song 2021
High-velocity dynamic actions (e.g., fling or throw) play a crucial role in our everyday interaction with deformable objects by improving our efficiency and effectively expanding our physical reach range. Yet, most prior works have tackled cloth mani pulation using exclusively single-arm quasi-static actions, which requires a large number of interactions for challenging initial cloth configurations and strictly limits the maximum cloth size by the robots reach range. In this work, we demonstrate the effectiveness of dynamic flinging actions for cloth unfolding with our proposed self-supervised learning framework, FlingBot. Our approach learns how to unfold a piece of fabric from arbitrary initial configurations using a pick, stretch, and fling primitive for a dual-arm setup from visual observations. The final system achieves over 80% coverage within 3 actions on novel cloths, can unfold cloths larger than the systems reach range, and generalizes to T-shirts despite being trained on only rectangular cloths. We also finetuned FlingBot on a real-world dual-arm robot platform, where it increased the cloth coverage over 4 times more than the quasi-static baseline did. The simplicity of FlingBot combined with its superior performance over quasi-static baselines demonstrates the effectiveness of dynamic actions for deformable object manipulation. Code, data, and simulation environment are available at https://flingbot.cs.columbia.edu .
Despite the widely-spread consensus on the brain complexity, sprouts of the single neuron revolution emerged in neuroscience in the 1970s. They brought many unexpected discoveries, including grandmother or concept cells and sparse coding of informati on in the brain. In machine learning for a long time, the famous curse of dimensionality seemed to be an unsolvable problem. Nevertheless, the idea of the blessing of dimensionality becomes gradually more and more popular. Ensembles of non-interacting or weakly interacting simple units prove to be an effective tool for solving essentially multidimensional problems. This approach is especially useful for one-shot (non-iterative) correction of errors in large legacy artificial intelligence systems. These simplicity revolutions in the era of complexity have deep fundamental reasons grounded in geometry of multidimensional data spaces. To explore and understand these reasons we revisit the background ideas of statistical physics. In the course of the 20th century they were developed into the concentration of measure theory. New stochastic separation theorems reveal the fine structure of the data clouds. We review and analyse biological, physical, and mathematical problems at the core of the fundamental question: how can high-dimensional brain organise reliable and fast learning in high-dimensional world of data by simple tools? Two critical applications are reviewed to exemplify the approach: one-shot correction of errors in intellectual systems and emergence of static and associative memories in ensembles of single neurons.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا