ترغب بنشر مسار تعليمي؟ اضغط هنا

A Case for Data Commons: Towards Data Science as a Service

112   0   0.0 ( 0 )
 نشر من قبل Robert Grossman
 تاريخ النشر 2016
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

As the amount of scientific data continues to grow at ever faster rates, the research community is increasingly in need of flexible computational infrastructure that can support the entirety of the data science lifecycle, including long-term data storage, data exploration and discovery services, and compute capabilities to support data analysis and re-analysis, as new data are added and as scientific pipelines are refined. We describe our experience developing data commons-- interoperable infrastructure that co-locates data, storage, and compute with common analysis tools--and present several cases studies. Across these case studies, several common requirements emerge, including the need for persistent digital identifier and metadata services, APIs, data portability, pay for compute capabilities, and data peering agreements between data commons. Though many challenges, including sustainability and developing appropriate standards remain, interoperable data commons bring us one step closer to effective Data Science as Service for the scientific research community.



قيم البحث

اقرأ أيضاً

139 - Anissa Tanweer 2017
Ethics in the emerging world of data science are often discussed through cautionary tales about the dire consequences of missteps taken by high profile companies or organizations. We take a different approach by foregrounding the ways that ethics are implicated in the day-to-day work of data science, focusing on instances in which data scientists recognize, grapple with, and conscientiously respond to ethical challenges. This paper presents a case study of ethical dilemmas that arose in a data science for social good (DSSG) project focused on improving navigation for people with limited mobility. We describe how this particular DSSG team responded to those dilemmas, and how those responses gave rise to still more dilemmas. While the details of the case discussed here are unique, the ethical dilemmas they illuminate can commonly be found across many DSSG projects. These include: the risk of exacerbating disparities; the thorniness of algorithmic accountability; the evolving opportunities for mischief presented by new technologies; the subjective and value- laden interpretations at the heart of any data-intensive project; the potential for data to amplify or mute particular voices; the possibility of privacy violations; and the folly of technological solutionism. Based on our tracing of the teams responses to these dilemmas, we distill lessons for an ethical data science practice that can be more generally applied across DSSG projects. Specifically, this case experience highlights the importance of: 1) Setting the scene early on for ethical thinking 2) Recognizing ethical decision-making as an emergent phenomenon intertwined with the quotidian work of data science for social good 3) Approaching ethical thinking as a thoughtful and intentional balancing of priorities rather than a binary differentiation between right and wrong.
We describe an ecosystem for teaching data science (DS) to engineers which blends theory, methods, and applications, developed at the Faculty of Physical and Mathematical Sciences, Universidad de Chile, over the last three years. This initiative has been motivated by the increasing demand for DS qualifications both from academic and professional environments. The ecosystem is distributed in a collaborative fashion across three departments in the above Faculty and includes postgraduate programmes, courses, professional diplomas, data repositories, laboratories, trainee programmes, and internships. By sharing our teaching principles and the innovative components of our approach to teaching DS, we hope our experience can be useful to those developing their own DS programmes and ecosystems. The open challenges and future plans for our ecosystem are also discussed at the end of the article.
Developing nations are particularly susceptible to the adverse effects of global warming. By 2040, 14 percent of global emissions will come from data centers. This paper presents early findings in the use AI and digital twins to model and optimize data center operations.
A recent study by the Robotic Industries Association has highlighted how service robots are increasingly broadening our horizons beyond the factory floor. From robotic vacuums, bomb retrievers, exoskeletons and drones, to robots used in surgery, spac e exploration, agriculture, home assistance and construction, service robots are building a formidable resume. In just the last few years we have seen service robots deliver room service meals, assist shoppers in finding items in a large home improvement store, checking in customers and storing their luggage at hotels, and pour drinks on cruise ships. Personal robots are here to educate, assist and entertain at home. These domestic robots can perform daily chores, assist people with disabilities and serve as companions or pets for entertainment. By all accounts, the growth potential for service robotics is quite large.
Information and data exchange is an important aspect of scientific progress. In computational materials science, a prerequisite for smooth data exchange is standardization, which means using agreed conventions for, e.g., units, zero base lines, and f ile formats. There are two main strategies to achieve this goal. One accepts the heterogeneous nature of the community which comprises scientists from physics, chemistry, bio-physics, and materials science, by complying with the diverse ecosystem of computer codes and thus develops converters for the input and output files of all important codes. These converters then translate the data of all important codes into a standardized, code-independent format. The other strategy is to provide standardized open libraries that code developers can adopt for shaping their inputs, outputs, and restart files, directly into the same code-independent format. We like to emphasize in this paper that these two strategies can and should be regarded as complementary, if not even synergetic. The main concepts and software developments of both strategies are very much identical, and, obviously, both approaches should give the same final result. In this paper, we present the appropriate format and conventions that were agreed upon by two teams, the Electronic Structure Library (ESL) of CECAM and the NOMAD (NOvel MAterials Discovery) Laboratory, a European Centre of Excellence (CoE). This discussion includes also the definition of hierarchical metadata describing state-of-the-art electronic-structure calculations.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا