Physics-based analysis of Affymetrix microarray data


Abstract in English

We analyze publicly available data on Affymetrix microarrays spike-in experiments on the human HGU133 chipset in which sequences are added in solution at known concentrations. The spike-in set contains sequences of bacterial, human and artificial origin. Our analysis is based on a recently introduced molecular-based model [E. Carlon and T. Heim, Physica A 362, 433 (2006)] which takes into account both probe-target hybridization and target-target partial hybridization in solution. The hybridization free energies are obtained from the nearest-neighbor model with experimentally determined parameters. The molecular-based model suggests a rescaling that should result in a collapse of the data at different concentrations into a single universal curve. We indeed find such a collapse, with the same parameters as obtained before for the older HGU95 chip set. The quality of the collapse varies according to the probe set considered. Artificial sequences, chosen by Affymetrix to be as different as possible from any other human genome sequence, generally show a much better collapse and thus a better agreement with the model than all other sequences. This suggests that the observed deviations from the predicted collapse are related to the choice of probes or have a biological origin, rather than being a problem with the proposed model.

Download