Do you want to publish a course? Click here

Exploring Reproducibility and FAIR Principles in Data Science Using Ecological Niche Modeling as a Case Study

43   0   0.0 ( 0 )
 Publication date 2019
and research's language is English




Ask ChatGPT about the research

Reproducibility is a fundamental requirement of the scientific process since it enables outcomes to be replicated and verified. Computational scientific experiments can benefit from improved reproducibility for many reasons, including validation of results and reuse by other scientists. However, designing reproducible experiments remains a challenge and hence the need for developing methodologies and tools that can support this process. Here, we propose a conceptual model for reproducibility to specify its main attributes and properties, along with a framework that allows for computational experiments to be findable, accessible, interoperable, and reusable. We present a case study in ecological niche modeling to demonstrate and evaluate the implementation of this framework.



rate research

Read More

Data collected about individuals is regularly used to make decisions that impact those same individuals. We consider settings where sensitive personal data is used to decide who will receive resources or benefits. While it is well known that there is a tradeoff between protecting privacy and the accuracy of decisions, we initiate a first-of-its-kind study into the impact of formally private mechanisms (based on differential privacy) on fair and equitable decision-making. We empirically investigate novel tradeoffs on two real-world decisions made using U.S. Census data (allocation of federal funds and assignment of voting rights benefits) as well as a classic apportionment problem. Our results show that if decisions are made using an $epsilon$-differentially private version of the data, under strict privacy constraints (smaller $epsilon$), the noise added to achieve privacy may disproportionately impact some groups over others. We propose novel measures of fairness in the context of randomized differentially private algorithms and identify a range of causes of outcome disparities.
Understanding the behaviour of hosts of SARS-CoV-2 is crucial to our understanding of the virus. A comparison of environmental features related to the incidence of SARS-CoV-2 with those of its potential hosts is critical. We examine the distribution of coronaviruses among bats. We analyse the distribution of SARS-CoV-2 in a nine-week period following lockdown in Italy, Spain, and Australia. We correlate its incidence with environmental variables particularly ultraviolet radiation, temperature, and humidity. We establish a clear negative relationship between COVID-19 and ultraviolet radiation, modulated by temperature and humidity. We relate our results with data showing that the bat species most vulnerable to coronavirus infection are those which live in environmental conditions that are similar to those that appear to be most favourable to the spread of COVID-19. The SARS-CoV-2 ecological niche has been the product of long-term coevolution of coronaviruses with their host species. Understanding the key parameters of that niche in host species allows us to predict circumstances where its spread will be most favourable. Such conditions can be summarised under the headings of nocturnality and seasonality. High ultraviolet radiation, in particular, is proposed as a key limiting variable. We therefore expect the risk of spread of COVID-19 to be highest in winter conditions, and in low light environments. Human activities resembling those of highly social cave-dwelling bats (e.g. large nocturnal gatherings or high density indoor activities) will only serve to compound the problem of COVID-19.
As the amount of scientific data continues to grow at ever faster rates, the research community is increasingly in need of flexible computational infrastructure that can support the entirety of the data science lifecycle, including long-term data storage, data exploration and discovery services, and compute capabilities to support data analysis and re-analysis, as new data are added and as scientific pipelines are refined. We describe our experience developing data commons-- interoperable infrastructure that co-locates data, storage, and compute with common analysis tools--and present several cases studies. Across these case studies, several common requirements emerge, including the need for persistent digital identifier and metadata services, APIs, data portability, pay for compute capabilities, and data peering agreements between data commons. Though many challenges, including sustainability and developing appropriate standards remain, interoperable data commons bring us one step closer to effective Data Science as Service for the scientific research community.
43 - Alexandr Savinov 2019
We describe a new logical data model, called the concept-oriented model (COM). It uses mathematical functions as first-class constructs for data representation and data processing as opposed to using exclusively sets in conventional set-oriented models. Functions and function composition are used as primary semantic units for describing data connectivity instead of relations and relation composition (join), respectively. Grouping and aggregation are also performed by using (accumulate) functions providing an alternative to group-by and reduce operations. This model was implemented in an open source data processing toolkit examples of which are used to illustrate the model and its operations. The main benefit of this model is that typical data processing tasks become simpler and more natural when using functions in comparison to adopting sets and set operations.
Given the complexity of typical data science projects and the associated demand for human expertise, automation has the potential to transform the data science process. Key insights: * Automation in data science aims to facilitate and transform the work of data scientists, not to replace them. * Important parts of data science are already being automated, especially in the modeling stages, where techniques such as automated machine learning (AutoML) are gaining traction. * Other aspects are harder to automate, not only because of technological challenges, but because open-ended and context-dependent tasks require human interaction.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا