We describe and discuss the selection procedure and statistical properties of the galaxy sample used by the Calar Alto Legacy Integral Field Area Survey (CALIFA), a public legacy survey of 600 galaxies using integral field spectroscopy. The CALIFA mother sample was selected from the Sloan Digital Sky Survey (SDSS) DR7 photometric catalogue to include all galaxies with an r-band isophotal major axis between 45 and 79.2 and with a redshift 0.005 < z < 0.03. The mother sample contains 939 objects, 600 of which will be observed in the course of the CALIFA survey. The selection of targets for observations is based solely on visibility and thus keeps the statistical properties of the mother sample. By comparison with a large set of SDSS galaxies, we find that the CALIFA sample is representative of galaxies over a luminosity range of -19 > Mr > -23.1 and over a stellar mass range between 10^9.7 and 10^11.4Msun. In particular, within these ranges, the diameter selection does not lead to any significant bias against - or in favour of - intrinsically large or small galaxies. Only below luminosities of Mr = -19 (or stellar masses < 10^9.7Msun) is there a prevalence of galaxies with larger isophotal sizes, especially of nearly edge-on late-type galaxies, but such galaxies form < 10% of the full sample. We estimate volume-corrected distribution functions in luminosities and sizes and show that these are statistically fully compatible with estimates from the full SDSS when accounting for large-scale structure. We also present a number of value-added quantities determined for the galaxies in the CALIFA sample. We explore different ways of characterizing the environments of CALIFA galaxies, finding that the sample covers environmental conditions from the field to genuine clusters. We finally consider the expected incidence of active galactic nuclei among CALIFA galaxies.