ترغب بنشر مسار تعليمي؟ اضغط هنا

Mind the Performance Gap: Examining Dataset Shift During Prospective Validation

105   0   0.0 ( 0 )
 نشر من قبل Erkin Otles
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Once integrated into clinical care, patient risk stratification models may perform worse compared to their retrospective performance. To date, it is widely accepted that performance will degrade over time due to changes in care processes and patient populations. However, the extent to which this occurs is poorly understood, in part because few researchers report prospective validation performance. In this study, we compare the 2020-2021 (20-21) prospective performance of a patient risk stratification model for predicting healthcare-associated infections to a 2019-2020 (19-20) retrospective validation of the same model. We define the difference in retrospective and prospective performance as the performance gap. We estimate how i) temporal shift, i.e., changes in clinical workflows and patient populations, and ii) infrastructure shift, i.e., changes in access, extraction and transformation of data, both contribute to the performance gap. Applied prospectively to 26,864 hospital encounters during a twelve-month period from July 2020 to June 2021, the model achieved an area under the receiver operating characteristic curve (AUROC) of 0.767 (95% confidence interval (CI): 0.737, 0.801) and a Brier score of 0.189 (95% CI: 0.186, 0.191). Prospective performance decreased slightly compared to 19-20 retrospective performance, in which the model achieved an AUROC of 0.778 (95% CI: 0.744, 0.815) and a Brier score of 0.163 (95% CI: 0.161, 0.165). The resulting performance gap was primarily due to infrastructure shift and not temporal shift. So long as we continue to develop and validate models using data stored in large research data warehouses, we must consider differences in how and when data are accessed, measure how these differences may affect prospective performance, and work to mitigate those differences.



قيم البحث

اقرأ أيضاً

With the inclusion of smart meters, electricity load consumption data can be fetched for individual consumer buildings at high temporal resolutions. Availability of such data has made it possible to study daily load demand profiles of the households. Clustering households based on their demand profiles is one of the primary, yet a key component of such analysis. While many clustering algorithms/frameworks can be deployed to perform clustering, they usually generate very different clusters. In order to identify the best clustering results, various cluster validation indices (CVIs) have been proposed in the literature. However, it has been noticed that different CVIs often recommend different algorithms. This leads to the problem of identifying the most suitable CVI for a given dataset. Responding to the problem, this paper shows how the recommendations of validation indices are influenced by different data characteristics that might be present in a typical residential load demand dataset. Furthermore, the paper identifies the features of data that prefer/prohibit the use of a particular cluster validation index.
While the traditional viewpoint in machine learning and statistics assumes training and testing samples come from the same population, practice belies this fiction. One strategy---coming from robust statistics and optimization---is thus to build a mo del robust to distributional perturbations. In this paper, we take a different approach to describe procedures for robust predictive inference, where a model provides uncertainty estimates on its predictions rather than point predictions. We present a method that produces prediction sets (almost exactly) giving the right coverage level for any test distribution in an $f$-divergence ball around the training population. The method, based on conformal inference, achieves (nearly) valid coverage in finite samples, under only the condition that the training data be exchangeable. An essential component of our methodology is to estimate the amount of expected future data shift and build robustness to it; we develop estimators and prove their consistency for protection and validity of uncertainty estimates under shifts. By experimenting on several large-scale benchmark datasets, including Recht et al.s CIFAR-v4 and ImageNet-V2 datasets, we provide complementary empirical results that highlight the importance of robust predictive validity.
Recently, messaging applications, such as WhatsApp, have been reportedly abused by misinformation campaigns, especially in Brazil and India. A notable form of abuse in WhatsApp relies on several manipulated images and memes containing all kinds of fa ke stories. In this work, we performed an extensive data collection from a large set of WhatsApp publicly accessible groups and fact-checking agency websites. This paper opens a novel dataset to the research community containing fact-checked fake images shared through WhatsApp for two distinct scenarios known for the spread of fake news on the platform: the 2018 Brazilian elections and the 2019 Indian elections.
We study the problem of fairly allocating a divisible resource, also known as cake cutting, with an additional requirement that the shares that different agents receive should be sufficiently separated from one another. This captures, for example, co nstraints arising from social distancing guidelines. While it is sometimes impossible to allocate a proportional share to every agent under the separation requirement, we show that the well-known criterion of maximin share fairness can always be attained. We then establish several computational properties of maximin share fairness -- for instance, the maximin share of an agent cannot be computed exactly by any finite algorithm, but can be approximated with an arbitrarily small error. In addition, we consider the division of a pie (i.e., a circular cake) and show that an ordinal relaxation of maximin share fairness can be achieved.
The burgeoning of misleading or false information spread by untrustworthy websites has, without doubt, created a dangerous concoction. Thus, it is not a surprise that the threat posed by untrustworthy websites has emerged as a central concern on the public agenda in many countries, including Czechia and Slovakia. However, combating this harmful phenomenon has proven to be difficult, with approaches primarily focusing on tackling consequences instead of prevention, as websites are routinely seen as quasi-sovereign organisms. Websites, however, rely upon a host of service providers, which, in a way, hold substantial power over them. Notwithstanding the apparent power hold by such tech stack layers, scholarship on this topic remains largely limited. This article contributes to this small body of knowledge by providing a first-of-its-kind systematic mapping of the back-end infrastructural support that makes up the tech stacks of Czech and Slovak untrustworthy websites. Our approach is based on collecting and analyzing data on top-level domain operators, domain name Registrars, email providers, web hosting providers, and utilized website tracking technologies of 150 Czech and Slovak untrustworthy websites. Our findings show that the Czech and Slovak untrustworthy website landscape relies on a vast number of back-end services spread across multiple countries, but in key tech stack layers is nevertheless still heavily dominated by locally based companies. Finally, given our findings, we discuss various possible avenues of utilizing the numeral tech stack layers in combating online disinformation.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا