Fast data sorting with modified principal component analysis to distinguish unique single molecular break junction trajectories


Abstract in English

A simple and fast analysis method to sort large data sets into groups with shared distinguishing characteristics is described, and applied to single molecular break junction conductance versus electrode displacement data. The method, based on principal component analysis, successfully sorted data sets based on the projection of the data onto the first or second principal component of the correlation matrix without the need to assert any specific hypothesis about the expected features within the data. This was an improvement on the current correlation matrix analysis approach because it sorted data automatically, making it more objective and less time consuming, and our method is applicable to a wide range of multivariate data sets. Here the method was demonstrated on two systems. First, it was demonstrated on mixtures of two molecules with identical anchor groups, similar lengths, but either a $pi$ (high conductance) or $sigma$ (low conductance) bridge. The mixed data was automatically sorted into two groups containing one molecule or the other. Second, it was demonstrated on break junction data measured with the $pi$ bridged molecule alone. Again the method distinguished between two groups. These groups were tentatively assigned to different geometries of the molecule in the junction.

Download