No Arabic abstract
MOOC participants often feel isolated and disconnected from their peers. Navigating meaningful peer interactions, generating a sense of belonging, and achieving social presence are all major challenges for MOOC platforms. MOOC users often rely on external social platforms for such connection and peer interaction, however, off-platform networking often distracts participants from their learning. With the intention of resolving this issue, we introduce PeerCollab, a web-based platform that provides affordances to create communities and supports meaningful peer interactions, building close-knit groups of learners. We present an initial evaluation through a field study (n=56) over 6 weeks and a controlled experiment (n=22). The result indicates insights on how learners build a sense of belonging and develop peer interactions leading to close-knit learning circles. We find that PeerCollab can provide more meaningful interactions and create a community to bring a culture of social learning to decentralized, and isolated MOOC learners.
Many researchers studying online social communities seek to make such communities better. However, understanding what better means is challenging, due to the divergent opinions of community members, and the multitude of possible community values which often conflict with one another. Community members own values for their communities are not well understood, and how these values align with one another is an open question. Previous research has mostly focused on specific and comparatively well-defined harms within online communities, such as harassment, rule-breaking, and misinformation. In this work, we ask 39 community members on reddit to describe their values for their communities. We gather 301 responses in members own words, spanning 125 unique communities, and use iterative categorization to produce a taxonomy of 29 different community values across 9 major categories. We find that members value a broad range of topics ranging from technical features to the diversity of the community, and most frequently prioritize content quality. We identify important understudied topics such as content quality and community size, highlight where values conflict with one another, and call for research into governance methods for communities that protect vulnerable members.
This paper is the confluence of two streams of ideas in the literature on generating numerical invariants, namely: (1) template-based methods, and (2) recurrence-based methods. A template-based method begins with a template that contains unknown quantities, and finds invariants that match the template by extracting and solving constraints on the unknowns. A disadvantage of template-based methods is that they require fixing the set of terms that may appear in an invariant in advance. This disadvantage is particularly prominent for non-linear invariant generation, because the user must supply maximum degrees on polynomials, bases for exponents, etc. On the other hand, recurrence-based methods are able to find sophisticated non-linear mathematical relations, including polynomials, exponentials, and logarithms, because such relations arise as the solutions to recurrences. However, a disadvantage of past recurrence-based invariant-generation methods is that they are primarily loop-based analyses: they use recurrences to relate the pre-state and post-state of a loop, so it is not obvious how to apply them to a recursive procedure, especially if the procedure is non-linearly recursive (e.g., a tree-traversal algorithm). In this paper, we combine these two approaches and obtain a technique that uses templates in which the unknowns are functions rather than numbers, and the constraints on the unknowns are recurrences. The technique synthesizes invariants involving polynomials, exponentials, and logarithms, even in the presence of arbitrary control-flow, including any combination of loops, branches, and (possibly non-linear) recursion. For instance, it is able to show that (i) the time taken by merge-sort is $O(n log(n))$, and (ii) the time taken by Strassens algorithm is $O(n^{log_2(7)})$.
Learning data storytelling involves a complex web of skills. Professional and academic educational offerings typically focus on the computational literacies required, but professionals in the field employ many non-technical methods; sketching by hand on paper is a common practice. This paper introduces and classifies a corpus of 101 data sketches produced by participants as part of a guided learning activity in informal and formal settings. We manually code each sketch against 12 metrics related to visual encodings, representations, and story structure. We find evidence for preferential use of positional and shape-based encodings, frequent use of symbolic and textual representations, and a high prevalence of stories comparing subsets of data. These findings contribute to our understanding of how learners sketch with data. This case study can inform tool design for learners, and help create educational programs that introduce novices to sketching practices used by experts.
Learning problems form an important category of computational tasks that generalizes many of the computations researchers apply to large real-life data sets. We ask: what concept classes can be learned privately, namely, by an algorithm whose output does not depend too heavily on any one input or specific training example? More precisely, we investigate learning algorithms that satisfy differential privacy, a notion that provides strong confidentiality guarantees in contexts where aggregate information is released about a database containing sensitive information about individuals. We demonstrate that, ignoring computational constraints, it is possible to privately agnostically learn any concept class using a sample size approximately logarithmic in the cardinality of the concept class. Therefore, almost anything learnable is learnable privately: specifically, if a concept class is learnable by a (non-private) algorithm with polynomial sample complexity and output size, then it can be learned privately using a polynomial number of samples. We also present a computationally efficient private PAC learner for the class of parity functions. Local (or randomized response) algorithms are a practical class of private algorithms that have received extensive investigation. We provide a precise characterization of local private learning algorithms. We show that a concept class is learnable by a local algorithm if and only if it is learnable in the statistical query (SQ) model. Finally, we present a separation between the power of interactive and noninteractive local learning algorithms.
In modern applications, statisticians are faced with integrating heterogeneous data modalities relevant for an inference, prediction, or decision problem. In such circumstances, it is convenient to use a graphical model to represent the statistical dependencies, via a set of connected modules, each relating to a specific data modality, and drawing on specific domain expertise in their development. In principle, given data, the conventional statistical update then allows for coherent uncertainty quantification and information propagation through and across the modules. However, misspecification of any module can contaminate the estimate and update of others, often in unpredictable ways. In various settings, particularly when certain modules are trusted more than others, practitioners have preferred to avoid learning with the full model in favor of approaches that restrict the information propagation between modules, for example by restricting propagation to only particular directions along the edges of the graph. In this article, we investigate why these modular approaches might be preferable to the full model in misspecified settings. We propose principled criteria to choose between modular and full-model approaches. The question arises in many applied settings, including large stochastic dynamical systems, meta-analysis, epidemiological models, air pollution models, pharmacokinetics-pharmacodynamics, and causal inference with propensity scores.