أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Paul Schmitt

Understanding Model Drift in a Large Cellular Network

104 - Shinan Liu , Francesco Bronzino , Paul Schmitt 2021

Operational networks are increasingly using machine learning models for a variety of tasks, including detecting anomalies, inferring application performance, and forecasting demand. Accurate models are important, yet accuracy can degrade over time du e to concept drift, whereby either the characteristics of the data change over time (data drift) or the relationship between the features and the target predictor change over time (model drift). Drift is important to detect because changes in properties of the underlying data or relationships to the target prediction can require model retraining, which can be time-consuming and expensive. Concept drift occurs in operational networks for a variety of reasons, ranging from software upgrades to seasonality to changes in user behavior. Yet, despite the prevalence of drift in networks, its extent and effects on prediction accuracy have not been extensively studied. This paper presents an initial exploration into concept drift in a large cellular network in the United States for a major metropolitan area in the context of demand forecasting. We find that concept drift arises largely due to data drift, and it appears across different key performance indicators (KPIs), models, training set sizes, and time intervals. We identify the sources of concept drift for the particular problem of forecasting downlink volume. Weekly and seasonal patterns introduce both high and low-frequency model drift, while disasters and upgrades result in sudden drift due to exogenous shocks. Regions with high population density, lower traffic volumes, and higher speeds also tend to correlate with more concept drift. The features that contribute most significantly to concept drift are User Equipment (UE) downlink packets, UE uplink packets, and Real-time Transport Protocol (RTP) total received packets.

بنية الشبكات والإنترنت التعلم الآلي الأداء

A Pluralist Approach to Democratizing Online Discourse

366 - Jay Chen , Barath Raghavan , Paul Schmitt 2021

Online discourse takes place in corporate-controlled spaces thought by users to be public realms. These platforms in name enable free speech but in practice implement varying degrees of censorship either by government edict or by uneven and unseen co rporate policy. This kind of censorship has no countervailing accountability mechanism, and as such platform owners, moderators, and algorithms shape public discourse without recourse or transparency. Systems research has explored approaches to decentralizing or democratizing Internet infrastructure for decades. In parallel, the Internet censorship literature is replete with efforts to measure and overcome online censorship. However, in the course of designing specialized open-source platforms and tools, projects generally neglect the needs of supportive but uninvolved `average users. In this paper, we propose a pluralistic approach to democratizing online discourse that considers both the systems-related and user-facing issues as first-order design goals.

بنية الشبكات والإنترنت

Characterizing Service Provider Response to the COVID-19 Pandemic in the United States

108 - Shinan Liu , Paul Schmitt , Francesco Bronzino 2020

The COVID-19 pandemic has resulted in dramatic changes to the daily habits of billions of people. Users increasingly have to rely on home broadband Internet access for work, education, and other activities. These changes have resulted in correspondin g changes to Internet traffic patterns. This paper aims to characterize the effects of these changes with respect to Internet service providers in the United States. We study three questions: (1)How did traffic demands change in the United States as a result of the COVID-19 pandemic?; (2)What effects have these changes had on Internet performance?; (3)How did service providers respond to these changes? We study these questions using data from a diverse collection of sources. Our analysis of interconnection data for two large ISPs in the United States shows a 30-60% increase in peak traffic rates in the first quarter of 2020. In particular, we observe traffic downstream peak volumes for a major ISP increase of 13-20% while upstream peaks increased by more than 30%. Further, we observe significant variation in performance across ISPs in conjunction with the traffic volume shifts, with evident latency increases after stay-at-home orders were issued, followed by a stabilization of traffic after April. Finally, we observe that in response to changes in usage, ISPs have aggressively augmented capacity at interconnects, at more than twice the rate of normal capacity augmentation. Similarly, video conferencing applications have increased their network footprint, more than doubling their advertised IP address space.

بنية الشبكات والإنترنت

Traffic Refinery: Cost-Aware Data Representation for Machine Learning on Network Traffic

110 - Francesco Bronzino , Paul Schmitt , Sara Ayoubi 2020

Network management often relies on machine learning to make predictions about performance and security from network traffic. Often, the representation of the traffic is as important as the choice of the model. The features that the model relies on, a nd the representation of those features, ultimately determine model accuracy, as well as where and whether the model can be deployed in practice. Thus, the design and evaluation of these models ultimately requires understanding not only model accuracy but also the systems costs associated with deploying the model in an operational network. Towards this goal, this paper develops a new framework and system that enables a joint evaluation of both the conventional notions of machine learning performance (e.g., model accuracy) and the systems-level costs of different representations of network traffic. We highlight these two dimensions for two practical network management tasks, video streaming quality inference and malware detection, to demonstrate the importance of exploring different representations to find the appropriate operating point. We demonstrate the benefit of exploring a range of representations of network traffic and present Traffic Refinery, a proof-of-concept implementation that both monitors network traffic at 10 Gbps and transforms traffic in real time to produce a variety of feature representations for machine learning. Traffic Refinery both highlights this design space and makes it possible to explore different representations for learning, balancing systems costs related to feature extraction and model training against model accuracy.

بنية الشبكات والإنترنت التعلم الآلي

Pretty Good Phone Privacy

59 - Paul Schmitt , Barath Raghavan 2020

To receive service in todays cellular architecture, phones uniquely identify themselves to towers and thus to operators. This is now a cause of major privacy violations, as operators now sell and leak identity and location data of hundreds of million s of mobile users. In this paper, we take an end-to-end perspective on the cellular architecture and find key points of decoupling that enable us to protect user identity and location privacy with no changes to physical infrastructure, no added latency, and no requirement of direct cooperation from existing operators. We describe Pretty Good Phone Privacy (PGPP) and demonstrate how our modified backend stack (NGC) works with real phones to provide ordinary yet privacy-preserving connectivity. We explore inherent privacy and efficiency tradeoffs in a simulation of a large metropolitan region. We show how PGPP maintains todays control overheads while significantly improving user identity and location privacy.

بنية الشبكات والإنترنت

Beyond the Trees: Resilient Multipath for Last-mile WISP Networks

80 - Bilal Saleem , Paul Schmitt , Jay Chen 2020

Expanding the reach of the Internet is a topic of widespread interest today. Google and Facebook, among others, have begun investing substantial research efforts toward expanding Internet access at the edge. Compared to data center networks, which ar e relatively over-engineered, last-mile networks are highly constrained and end up being ultimately responsible for the performance issues that impact the user experience. The most viable and cost-effective approach for providing last-mile connectivity has proved to be Wireless ISPs (WISPs), which rely on point-to-point wireless backhaul infrastructure to provide connectivity using cheap commodity wireless hardware. However, individual WISP network links are known to have poor reliability and the networks as a whole are highly cost constrained. Motivated by these observations, we propose Wireless ISPs with Redundancy (WISPR), which leverages the cost-performance tradeoff inherent in commodity wireless hardware to move toward a greater number of inexpensive links in WISP networks thereby lowering costs. To take advantage of this new path diversity, we introduce a new, general protocol that provides increased performance, reliability, or a combination of the two.

بنية الشبكات والإنترنت

Inferring Streaming Video Quality from Encrypted Traffic: Practical Models and Deployment Experience

79 - Paul Schmitt , Francesco Bronzino , Sara Ayoubi 2019

Inferring the quality of streaming video applications is important for Internet service providers, but the fact that most video streams are encrypted makes it difficult to do so. We develop models that infer quality metrics (ie, startup delay and res olution) for encrypted streaming video services. Our paper builds on previous work, but extends it in several ways. First, the model works in deployment settings where the video sessions and segments must be identified from a mix of traffic and the time precision of the collected traffic statistics is more coarse (eg, due to aggregation). Second, we develop a single composite model that works for a range of different services (i.e., Netflix, YouTube, Amazon, and Twitch), as opposed to just a single service. Third, unlike many previous models, the model performs predictions at finer granularity (eg, the precise startup delay instead of just detecting short versus long delays) allowing to draw better conclusions on the ongoing streaming quality. Fourth, we demonstrate the model is practical through a 16-month deployment in 66 homes and provide new insights about the relationships between Internet speed and the quality of the corresponding video streams, for a variety of services; we find that higher speeds provide only minimal improvements to startup delay and resolution.

بنية الشبكات والإنترنت

Oblivious DNS: Practical Privacy for DNS Queries

105 - Paul Schmitt , Anne Edmundson , Nick Feamster 2018

Virtually every Internet communication typically involves a Domain Name System (DNS) lookup for the destination server that the client wants to communicate with. Operators of DNS recursive resolvers---the machines that receive a clients query for a d omain name and resolve it to a corresponding IP address---can learn significant information about client activity. Past work, for example, indicates that DNS queries reveal information ranging from web browsing activity to the types of devices that a user has in their home. Recognizing the privacy vulnerabilities associated with DNS queries, various third parties have created alternate DNS services that obscure a users DNS queries from his or her Internet service provider. Yet, these systems merely transfer trust to a different third party. We argue that no single party ought to be able to associate DNS queries with a client IP address that issues those queries. To this end, we present Oblivious DNS (ODNS), which introduces an additional layer of obfuscation between clients and their queries. To do so, ODNS uses its own authoritative namespace; the authoritative servers for the ODNS namespace act as recursive resolvers for the DNS queries that they receive, but they never see the IP addresses for the clients that initiated these queries. We present an initial deployment of ODNS; our experiments show that ODNS introduces minimal performance overhead, both for individual queries and for web page loads. We design ODNS to be compatible with existing DNS protocols and infrastructure, and we are actively working on an open standard with the IETF.

بنية الشبكات والإنترنت

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد