We discuss the traditional criterion for discovery in Particle Physics of requiring a significance corresponding to at least 5 sigma; and whether a more nuanced approach might be better.
High energy physics experiments are currently recording large amounts of data and in a few years will be recording prodigious quantities of data. New methods must be developed to handle this data and make analysis at universities possible. We examine some techniques that exploit recent developments in commodity hardware. We report on tests of redundant arrays of integrated drive electronics (IDE) disk drives for use in offline high energy physics data analysis. IDE redundant array of inexpensive disks (RAID) prices now are less than the cost per terabyte of million-dollar tape robots! The arrays can be scaled to sizes affordable to institutions without robots and used when fast random access at low cost is important.
Differential measurements of particle collisions or decays can provide stringent constraints on physics beyond the Standard Model of particle physics. In particular, the distributions of the kinematical and angular variables that characterise heavy me- son multibody decays are non trivial and can sign the underlying interaction physics. In the era of high luminosity opened by the advent of the Large Hadron Collider and of Flavor Factories, differential measurements are less and less dominated by statistical precision and require a precise determination of efficiencies that depend simultaneously on several variables and do not factorise in these variables. This docu- ment is a reflection on the potential of multivariate techniques for the determination of such multidimensional efficiencies. We carried out two case studies that show that multilayer perceptron neural networks can determine and correct for the distortions introduced by reconstruction and selection criteria in the multidimensional phase space of the decays $B^{0}rightarrow K^{*0}(rightarrow K^{+}pi^{-}) mu^{+}mu^{-}$ and $D^{0}rightarrow K^{-}pi^{+}pi^{+}pi^{-}$, at the price of a minimal analysis effort. We conclude that this method can already be used for measurements which statistical precision does not yet reach the percent level and that with more sophisticated machine learning methods, the aforementioned potential is very promising.
In this paper, after a discussion of general properties of statistical tests, we present the construction of the most powerful hypothesis test for determining the existence of a new phenomenon in counting-type experiments where the observed Poisson process is subject to a Poisson distributed background with unknown mean.
An algorithm for optimization of signal significance or any other classification figure of merit suited for analysis of high energy physics (HEP) data is described. This algorithm trains decision trees on many bootstrap replicas of training data with each tree required to optimize the signal significance or any other chosen figure of merit. New data are then classified by a simple majority vote of the built trees. The performance of this algorithm has been studied using a search for the radiative leptonic decay B->gamma l nu at BaBar and shown to be superior to that of all other attempted classifiers including such powerful methods as boosted decision trees. In the B->gamma e nu channel, the described algorithm increases the expected signal significance from 2.4 sigma obtained by an original method designed for the B->gamma l nu analysis to 3.0 sigma.
The experimental issue of the search for new particles of unknown mass poses the challenge of exploring a wide interval to look for the usual signatures represented by excess of events above the background. A side effect of such a broad range quest is that the traditional significance calculations valid for signals of known location are no more applicable when such an information is missing. In this note the specific signal search approach via observation windows sliding over the range of interest is considered; in the assumptions of known background and of fixed width of the exploring windows the statistical implications of such a search scheme are described, with special emphasis on the correct significance assessment for a claimed discovery.