Examining the tech stacks of Czech and Slovak untrustworthy websites

62 0 0.0 ( 0 )

Download Cite

Added by Jozef Michal Mintal

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Jozef Michal Mintal - Anna Macko - Marko Pav{l}a

Computers and Society

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The burgeoning of misleading or false information spread by untrustworthy websites has, without doubt, created a dangerous concoction. Thus, it is not a surprise that the threat posed by untrustworthy websites has emerged as a central concern on the public agenda in many countries, including Czechia and Slovakia. However, combating this harmful phenomenon has proven to be difficult, with approaches primarily focusing on tackling consequences instead of prevention, as websites are routinely seen as quasi-sovereign organisms. Websites, however, rely upon a host of service providers, which, in a way, hold substantial power over them. Notwithstanding the apparent power hold by such tech stack layers, scholarship on this topic remains largely limited. This article contributes to this small body of knowledge by providing a first-of-its-kind systematic mapping of the back-end infrastructural support that makes up the tech stacks of Czech and Slovak untrustworthy websites. Our approach is based on collecting and analyzing data on top-level domain operators, domain name Registrars, email providers, web hosting providers, and utilized website tracking technologies of 150 Czech and Slovak untrustworthy websites. Our findings show that the Czech and Slovak untrustworthy website landscape relies on a vast number of back-end services spread across multiple countries, but in key tech stack layers is nevertheless still heavily dominated by locally based companies. Finally, given our findings, we discuss various possible avenues of utilizing the numeral tech stack layers in combating online disinformation.

rate research

Hide and seek in Slovakia: utilizing tracking code data to uncover untrustworthy website networks

126 - Jozef Michal Mintal , Michal Kalman , Karol Fabian 2021

The proliferation of misleading or false information spread by untrustworthy websites has emerged as a significant concern on the public agenda in many countries, including Slovakia. Despite the influence ascribed to such websites, their transparency and accountability remain an issue in most cases, with published work on mapping the administrators and connections of untrustworthy websites remaining limited. This article contributes to this body of knowledge (i) by providing an effective open-source tool to uncover untrustworthy website networks based on the utilization of the same Google Analytics/AdSense IDs, with the added ability to expose networks based on historical data, and (ii) by providing insight into the Slovak untrustworthy website landscape through delivering a first of its kind mapping of Slovak untrustworthy website networks. Our approach is based on a mix-method design employing a qualitative exploration of data collected in a two wave study conducted in 2019 and 2021, utilizing a custom-coded tool to uncover website connections. Overall, the study succeeds in exposing multiple novel website ties. Our findings indicate that while some untrustworthy website networks have been found to operate in the Slovak infosphere, most researched websites appear to be run by multiple mutually unconnected administrators. The resulting data also demonstrates that untrustworthy Slovak websites display a high content diversity in terms of connected websites, ranging from websites of local NGOs, an e-shop selling underwear to a matchmaking portal.

Computers and Society

Stop Tracking Me Bro! Differential Tracking Of User Demographics On Hyper-partisan Websites

96 - Pushkal Agarwal , Sagar Joglekar , Panagiotis Papadopoulos 2020

Websites with hyper-partisan, left or right-leaning focus offer content that is typically biased towards the expectations of their target audience. Such content often polarizes users, who are repeatedly primed to specific (extreme) content, usually reflecting hard party lines on political and socio-economic topics. Though this polarization has been extensively studied with respect to content, it is still unknown how it associates with the online tracking experienced by browsing users, especially when they exhibit certain demographic characteristics. For example, it is unclear how such websites enable the ad-ecosystem to track users based on their gender or age. In this paper, we take a first step to shed light and measure such potential differences in tracking imposed on users when visiting specific party-lines websites. For this, we design and deploy a methodology to systematically probe such websites and measure differences in user tracking. This methodology allows us to create user personas with specific attributes like gender and age and automate their browsing behavior in a consistent and repeatable manner. Thus, we systematically study how personas are being tracked by these websites and their third parties, especially if they exhibit particular demographic properties. Overall, we test 9 personas on 556 hyper-partisan websites and find that right-leaning websites tend to track users more intensely than left-leaning, depending on user demographics, using both cookies and cookie synchronization methods and leading to more costly delivered ads.

Computers and Society Cryptography and Security

Examining Passenger Vehicle Miles Traveled and Carbon Emissions in the Boston Metropolitan Area

113 - Tigran Aslanyan , Shan Jiang 2021

With spatial analytic, econometric, and visualization tools, this book chapter investigates greenhouse gas emissions for the on-road passenger vehicle transport sector in the Boston metropolitan area in 2014. It compares greenhouse gas emission estimations from both the production-based and consumption-based perspectives with two large-scale administrative datasets: the vehicle odometer readings from individual vehicle annual inspection, and the road inventory data containing road segment level geospatial and traffic information. Based on spatial econometric models that examine socioeconomic and built environment factors contributing to the vehicle miles traveled at the census tract level, it offers insights to help cities reduce VMT and carbon footprint for passenger vehicle travel. Finally, it recommends a pathway for cities and towns in the Boston metropolitan area to curb VMT and mitigate carbon emissions to achieve climate goals of carbon neutrality.

Computers and Society Applications

User Tracking in the Post-cookie Era: How Websites Bypass GDPR Consent to Track Users

115 - Emmanouil Papadogiannakis , Panagiotis Papadopoulos , Nicolas Kourtellisn 2021

During the past few years, mostly as a result of the GDPR and the CCPA, websites have started to present users with cookie consent banners. These banners are web forms where the users can state their preference and declare which cookies they would like to accept, if such option exists. Although requesting consent before storing any identifiable information is a good start towards respecting the user privacy, yet previous research has shown that websites do not always respect user choices. Furthermore, considering the ever decreasing reliance of trackers on cookies and actions browser vendors take by blocking or restricting third-party cookies, we anticipate a world where stateless tracking emerges, either because trackers or websites do not use cookies, or because users simply refuse to accept any. In this paper, we explore whether websites use more persistent and sophisticated forms of tracking in order to track users who said they do not want cookies. Such forms of tracking include first-party ID leaking, ID synchronization, and browser fingerprinting. Our results suggest that websites do use such modern forms of tracking even before users had the opportunity to register their choice with respect to cookies. To add insult to injury, when users choose to raise their voice and reject all cookies, user tracking only intensifies. As a result, users choices play very little role with respect to tracking: we measured that more than 75% of tracking activities happened before users had the opportunity to make a selection in the cookie consent banner, or when users chose to reject all cookies.

Computers and Society Cryptography and Security

Mind the Performance Gap: Examining Dataset Shift During Prospective Validation

104 - Erkin Otlec{s} , Jeeheh Oh , Benjamin Li 2021

Once integrated into clinical care, patient risk stratification models may perform worse compared to their retrospective performance. To date, it is widely accepted that performance will degrade over time due to changes in care processes and patient populations. However, the extent to which this occurs is poorly understood, in part because few researchers report prospective validation performance. In this study, we compare the 2020-2021 (20-21) prospective performance of a patient risk stratification model for predicting healthcare-associated infections to a 2019-2020 (19-20) retrospective validation of the same model. We define the difference in retrospective and prospective performance as the performance gap. We estimate how i) temporal shift, i.e., changes in clinical workflows and patient populations, and ii) infrastructure shift, i.e., changes in access, extraction and transformation of data, both contribute to the performance gap. Applied prospectively to 26,864 hospital encounters during a twelve-month period from July 2020 to June 2021, the model achieved an area under the receiver operating characteristic curve (AUROC) of 0.767 (95% confidence interval (CI): 0.737, 0.801) and a Brier score of 0.189 (95% CI: 0.186, 0.191). Prospective performance decreased slightly compared to 19-20 retrospective performance, in which the model achieved an AUROC of 0.778 (95% CI: 0.744, 0.815) and a Brier score of 0.163 (95% CI: 0.161, 0.165). The resulting performance gap was primarily due to infrastructure shift and not temporal shift. So long as we continue to develop and validate models using data stored in large research data warehouses, we must consider differences in how and when data are accessed, measure how these differences may affect prospective performance, and work to mitigate those differences.

Computers and Society Machine Learning