ترغب بنشر مسار تعليمي؟ اضغط هنا

Predicting dataset popularity for the CMS experiment

53   0   0.0 ( 0 )
 نشر من قبل Valentin Kuznetsov
 تاريخ النشر 2016
  مجال البحث فيزياء
والبحث باللغة English




اسأل ChatGPT حول البحث

The CMS experiment at the LHC accelerator at CERN relies on its computing infrastructure to stay at the frontier of High Energy Physics, searching for new phenomena and making discoveries. Even though computing plays a significant role in physics analysis we rarely use its data to predict the system behavior itself. A basic information about computing resources, user activities and site utilization can be really useful for improving the throughput of the system and its management. In this paper, we discuss a first CMS analysis of dataset popularity based on CMS meta-data which can be used as a model for dynamic data placement and provide the foundation of data-driven approach for the CMS computing infrastructure.



قيم البحث

اقرأ أيضاً

96 - N. J. Ayres , G. Ban , G. Bison 2019
Psychological bias towards, or away from, a prior measurement or a theory prediction is an intrinsic threat to any data analysis. While various methods can be used to avoid the bias, e.g. actively not looking at the result, only data blinding is a tr aceable and thus trustworthy method to circumvent the bias and to convince a public audience that there is not even an accidental psychological bias. Data blinding is nowadays a standard practice in particle physics, but it is particularly difficult for experiments searching for the neutron electric dipole moment, as several cross measurements, in particular of the magnetic field, create a self-consistent network into which it is hard to inject a fake signal. We present an algorithm that modifies the data without influencing the experiment. Results of an automated analysis of the data are used to change the recorded spin state of a few neutrons of each measurement cycle. The flexible algorithm is applied twice to the data, to provide different data to various analysis teams. This gives us the option to sequentially apply various blinding offsets for separate analysis steps with independent teams. The subtle modification of the data allows us to modify the algorithm and to produce a re-blinded data set without revealing the blinding secret. The method was designed for the 2015/2016 measurement campaign of the nEDM experiment at the Paul Scherrer Institute. However, it can be re-used with minor modification for the follow-up experiment n2EDM, and may be suitable for comparable efforts.
The CERN IT provides a set of Hadoop clusters featuring more than 5 PBytes of raw storage with different open-source, user-level tools available for analytical purposes. The CMS experiment started collecting a large set of computing meta-data, e.g. d ataset, file access logs, since 2015. These records represent a valuable, yet scarcely investigated, set of information that needs to be cleaned, categorized and analyzed. CMS can use this information to discover useful patterns and enhance the overall efficiency of the distributed data, improve CPU and site utilization as well as tasks completion time. Here we present evaluation of Apache Spark platform for CMS needs. We discuss two main use-cases CMS analytics and ML studies where efficient process billions of records stored on HDFS plays an important role. We demonstrate that both Scala and Python (PySpark) APIs can be successfully used to execute extremely I/O intensive queries and provide valuable data insight from collected meta-data.
The Indian Scintillator Matrix for Reactor Anti-Neutrino detection - ISMRAN experiment aims to detect electron anti-neutrinos ($bar u_e$) emitted from a reactor via inverse beta decay reaction (IBD). The setup, consisting of 1 ton segmented Gadoliniu m foil wrapped plastic scintillator array, is planned for remote reactor monitoring and sterile neutrino search. The detection of prompt positron and delayed neutron from IBD will provide the signature of $bar u_e$ event in ISMRAN. The number of segments with energy deposit ($mathrm{N_{bars}}$) and sum total of these deposited energies are used as discriminants for identifying prompt positron event and delayed neutron capture event. However, a simple cut based selection of above variables leads to a low $bar u_e$ signal detection efficiency due to overlapping region of $mathrm{N_{bars}}$ and sum energy for the prompt and delayed events. Multivariate analysis (MVA) tools, employing variables suitably tuned for discrimination, can be useful in such scenarios. In this work we report the results from an application of artificial neural network -- the multilayer perceptron (MLP), particularly the Bayesian extension -- MLPBNN, to the simulated signal and background events in ISMRAN. The results from application of MLP to classify prompt positron events from delayed neutron capture events on Hydrogen, Gadolinium nuclei and also from the typical reactor $gamma$-ray and fast neutron backgrounds is reported. An enhanced efficiency of $sim$91$%$ with a background rejection of $sim$73$%$ for prompt selection and an efficiency of $sim$89$%$ with a background rejection of $sim$71$%$ for the delayed capture event, is achieved using the MLPBNN classifier for the ISMRAN experiment.
96 - S. Buontempo 2018
We discuss a CMS eXtension for Studying Energetic Neutrinos (CMS-XSEN). Neutrinos at the LHC are abundant and have unique features: their energies reach out to the TeV range, and the contribution of the {tau} flavour is sizeable. The measurement of t heir interaction cross sections has much physics potential. The pseudorapity range 4<|{eta}|<5 is of particular interest since leptonic W decays provide an additional contribution to the neutrino flux from b and c production. A modest detector of 4.1x$10^{27}$ nucleons/cm$^{2}$, placed in the LHC tunnel, 25 m from the interaction point, around the focusing magnet (Q1) closest to CMS, can cover that region. The hadronic calorimeter HF and the CMS forward shield will protect it from the debris of pp collisions. With a luminosity of 300/fb, foreseen for the LHC run in the years 2021-2023, the detector can observe over a thousand {tau} neutrino interactions, and a hundred TeV-neutrino interactions of all flavours. Several backgrounds are considered. A major source can be prompt muons from the interaction point. However, the CMS magnetic field and the structure of the Forward Shield make the estimation of their flux in the location of interest uncertain. Besides, machine induced backgrounds are expected to vary rapidly while moving along and away from the beam line. We propose to acquire experience during the 2018 LHC run by a brief test with a small Neutrino Experiment Demonstrator, based on nuclear emulsions.
GERDA is an experiment searching for the neutrinoless {beta}{beta} decay of Ge-76. The experiment uses an array of high-purity germanium detectors, enriched in Ge-76, directly immersed in liquid argon. GERDA recently started the physics data taking u sing eight enriched coaxial detectors. The status of the experiment has to be closely monitored in order to promptly identify possible instabilities or problems. The on-line slow control system is complemented by a regular off-line monitoring of data quality. This ensures that data are qualified to be used in the physics analysis and allows to reject data sets which do not meet the minimum quality standards. The off-line data monitoring is entirely performed within the software framework GELATIO. In addition, a relational database, complemented by a web-based interface, was developed to support the off-line monitoring and to automatically provide information to daily assess data quality. The concept and the performance of the off-line monitoring tools were tested and validated during the one-year commissioning phase.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا