Instance-Optimality in the Noisy Value-and Comparison-Model --- Accept, Accept, Strong Accept: Which Papers get in?

203 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Frederik Mallmann-Trenn

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Vincent Cohen-Addad - Frederik Mallmann-Trenn - Claire Mathieu

بنى وهياكل البيانات والخوارزميات قواعد البيانات التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Motivated by crowdsourced computation, peer-grading, and recommendation systems, Braverman, Mao and Weinberg [STOC16] studied the emph{query} and emph{round} complexity of fundamental problems such as finding the maximum (textsc{max}), finding all elements above a certain value (textsc{threshold-$v$}) or computing the top$-k$ elements (textsc{Top}-$k$) in a noisy environment. For example, consider the task of selecting papers for a conference. This task is challenging due the crowdsourcing nature of peer reviews: the results of reviews are noisy and it is necessary to parallelize the review process as much as possible. We study the noisy value model and the noisy comparison model: In the emph{noisy value model}, a reviewer is asked to evaluate a single element: What is the value of paper $i$? (eg accept). In the emph{noisy comparison model} (introduced in the seminal work of Feige, Peleg, Raghavan and Upfal [SICOMP94]) a reviewer is asked to do a pairwise comparison: Is paper $i$ better than paper $j$? In this paper, we show optimal worst-case query complexity for the textsc{max},textsc{threshold-$v$} and textsc{Top}-$k$ problems. For textsc{max} and textsc{Top}-$k$, we obtain optimal worst-case upper and lower bounds on the round vs query complexity in both models. For textsc{threshold}-$v$, we obtain optimal query complexity and nearly-optimal round complexity, where $k$ is the size of the output) for both models. We then go beyond the worst-case and address the question of the importance of knowledge of the instance by providing, for a large range of parameters, instance-optimal algorithms with respect to the query complexity. Furthermore, we show that the value model is strictly easier than the comparison model.

قيم البحث

660 - Aaron S. Hoffer , Megan Donahue , Amalia Hicks 2012

We present IR and UV photometry for a sample of brightest cluster galaxies (BCGs). The BCGs are from a heterogeneous but uniformly characterized sample, the Archive of Chandra Cluster Entropy Profile Tables (ACCEPT), of X-ray galaxy clusters from the Chandra X-ray telescope archive with published gas temperature, density, and entropy profiles. We use archival GALEX, Spitzer, and 2MASS observations to assemble spectral energy distributions (SEDs) and colors for BCGs. We find that while the SEDs of some BCGs follow the expectation of red, dust-free old stellar populations, many exhibit signatures of recent star formation in the form of excess UV or mid-IR emission, or both. We establish a mean near-UV to 2MASS K color of 6.59 pm 0.34 for quiescent BCGs. We use this mean color to quantify the UV excess associated with star formation in the active BCGs. We use fits to a template of an evolved stellar population and library of starburst models and mid-IR star formation relations to estimate the obscured star formation rates. Many of the BCGs in X-ray clusters with low central gas entropy exhibit enhanced UV (38%) and mid-IR emission (43%), above that expected from an old stellar population. These excesses are consistent with on-going star formation activity in the BCG, star formation that appears to be enabled by the presence of high density, X-ray emitting gas in the the core of the cluster of galaxies. This hot, X-ray emitting gas may provide the enhanced ambient pressure and some of the fuel to trigger the star formation. This result is consistent with previous works that showed that BCGs in clusters with low central gas entropy host H{alpha} emission-line nebulae and radio sources, while clusters with high central gas entropy exhibit none of these features. UV and mid-IR measurements combined provide a complete picture of unobscured and obscured star formation occurring in these systems.

علم الكونيات والفيزياء الفلكية Nongalactic

DART: aDaptive Accept RejecT for non-linear top-K subset identification

398 - Mridul Agarwal , Vaneet Aggarwal , Christopher J. Quinn 2020

We consider the bandit problem of selecting $K$ out of $N$ arms at each time step. The reward can be a non-linear function of the rewards of the selected individual arms. The direct use of a multi-armed bandit algorithm requires choosing among $binom {N}{K}$ options, making the action space large. To simplify the problem, existing works on combinatorial bandits {typically} assume feedback as a linear function of individual rewards. In this paper, we prove the lower bound for top-$K$ subset selection with bandit feedback with possibly correlated rewards. We present a novel algorithm for the combinatorial setting without using individual arm feedback or requiring linearity of the reward function. Additionally, our algorithm works on correlated rewards of individual arms. Our algorithm, aDaptive Accept RejecT (DART), sequentially finds good arms and eliminates bad arms based on confidence bounds. DART is computationally efficient and uses storage linear in $N$. Further, DART achieves a regret bound of $tilde{mathcal{O}}(Ksqrt{KNT})$ for a time horizon $T$, which matches the lower bound in bandit feedback up to a factor of $sqrt{log{2NT}}$. When applied to the problem of cross-selling optimization and maximizing the mean of individual rewards, the performance of the proposed algorithm surpasses that of state-of-the-art algorithms. We also show that DART significantly outperforms existing methods for both linear and non-linear joint reward environments.

التعلم الآلي التعلم الالي

X-ray calibration of SZ scaling relations with the ACCEPT catalogue of galaxy clusters observed by Chandra

342 - B. Comis 2011

We explore the scaling relation between the flux of the Sunyaev-Zeldovich (SZ) effect and the total mass of galaxy clusters using already reduced Chandra X-ray data present in the ACCEPT (Archive of Chandra Cluster Entropy Profile Tables) catalogue. The analysis is conducted over a sample of 226 objects, examining the relatively small scale corresponding to a cluster overdensity equal to 2500 times the critical density of the background universe, at which the total masses have been calculated exploiting the hydrostatic equilibrium hypothesis. Core entropy (K0) is strongly correlated with the central cooling time, and is therefore used to identify cooling-core (CC) objects in our sample. Our results confirm the self-similarity of the scaling relation between the integrated Comptonization parameter (Y) and the cluster mass, for both CC and NCC (non-cooling-core) clusters. The consistency of our calibration with recent ones has been checked, with further support for Y as a good mass proxy. We also investigate the robustness of the constant gas fraction assumption, for fixed overdensity, and of the Yx proxy (Kravstov et al. 2007) considering CC and NCC clusters, again sorted on K0 from our sample. We extend our study to implement a K0-proxy, obtained by combining SZ and X-ray observables, which is proposed to provide a CC indicator for higher redshift objects. Finally, we suggest that an SZ-only CC indicator could benefit from the employment of deprojected Comptonization radial profiles.

علم الكونيات والفيزياء الفلكية Nongalactic

How to Design Robust Algorithms using Noisy Comparison Oracle

100 - Raghavendra Addanki , Sainyam Galhotra , Barna Saha 2021

Metric based comparison operations such as finding maximum, nearest and farthest neighbor are fundamental to studying various clustering techniques such as $k$-center clustering and agglomerative hierarchical clustering. These techniques crucially re ly on accurate estimation of pairwise distance between records. However, computing exact features of the records, and their pairwise distances is often challenging, and sometimes not possible. We circumvent this challenge by leveraging weak supervision in the form of a comparison oracle that compares the relative distance between the queried points such as `Is point u closer to v or w closer to x?. However, it is possible that some queries are easier to answer than others using a comparison oracle. We capture this by introducing two different noise models called adversarial and probabilistic noise. In this paper, we study various problems that include finding maximum, nearest/farthest neighbor search under these noise models. Building upon the techniques we develop for these comparison operations, we give robust algorithms for k-center clustering and agglomerative hierarchical clustering. We prove that our algorithms achieve good approximation guarantees with a high probability and analyze their query complexity. We evaluate the effectiveness and efficiency of our techniques empirically on various real-world datasets.

بنى وهياكل البيانات والخوارزميات قواعد البيانات التعلم الالي

Instance Optimal Join Size Estimation

78 - Mahmoud Abo-Khamis , Sungjin Im , Benjamin Moseley 2020

We consider the problem of efficiently estimating the size of the inner join of a collection of preprocessed relational tables from the perspective of instance optimality analysis. The run time of instance optimal algorithms is comparable to the mini mum time needed to verify the correctness of a solution. Previously instance optimal algorithms were only known when the size of the join was small (as one component of their run time that was linear in the join size). We give an instance optimal algorithm for estimating the join size for all instances, including when the join size is large, by removing the dependency on the join size. As a byproduct, we show how to sample rows from the join uniformly at random in a comparable amount of time.

بنى وهياكل البيانات والخوارزميات قواعد البيانات