Euclid is a Europe-led cosmology space mission dedicated to a visible and near infrared survey of the entire extra-galactic sky. Its purpose is to deepen our knowledge of the dark content of our Universe. After an overview of the Euclid mission and science, this contribution describes how the community is getting organized to face the data analysis challenges, both in software development and in operational data processing matters. It ends with a more specific account of some of the main contributions of the Swiss Science Data Center (SDC-CH).
The Euclid, Rubin/LSST and Roman (WFIRST) projects will undertake flagship optical/near-infrared surveys in the next decade. By mapping thousands of square degrees of sky and covering the electromagnetic spectrum between 0.3 and 2 microns with sub-arcsec resolution, these projects will detect several tens of billions of sources, enable a wide range of astrophysical investigations by the astronomical community and provide unprecedented constraints on the nature of dark energy and dark matter. The ultimate cosmological, astrophysical and time-domain science yield from these missions will require joint survey processing (JSP) functionality at the pixel level that is outside the scope of the individual survey projects. The JSP effort scoped here serves two high-level objectives: 1) provide precise concordance multi-wavelength images and catalogs over the entire sky area where these surveys overlap, which accounts for source confusion and mismatched isophotes, and 2) provide a science platform to analyze concordance images and catalogs to enable a wide range of astrophysical science goals to be formulated and addressed by the research community. For the cost of about 200WY, JSP will allow the U.S. (and international) astronomical community to manipulate the flagship data sets and undertake innovative science investigations ranging from solar system object characterization, exoplanet detections, nearby galaxy rotation rates and dark matter properties, to epoch of reionization studies. It will also allow for the ultimate constraints on cosmological parameters and the nature of dark energy, with far smaller uncertainties and a better handle on systematics than by any one survey alone.
Gaia is an ambitious space astrometry mission of ESA with a main objective to map the sky in astrometry and photometry down to a magnitude 20 by the end of the next decade. While the mission is built and operated by ESA and an industrial consortium, the data processing is entrusted to a consortium formed by the scientific community, which was formed in 2006 and formally selected by ESA one year later. The satellite will downlink around 100 TB of raw telemetry data over a mission duration of 5 years from which a very complex iterative processing will lead to the final science output: astrometry with a final accuracy of a few tens of microarcseconds, epoch photometry in wide and narrow bands, radial velocity and spectra for the stars brighter than 17 mag. We discuss the general principles and main difficulties of this very large data processing and present the organisation of the European Consortium responsible for its design and implementation.
The Pan-STARRS Data Processing System is responsible for the steps needed to downloaded, archive, and process all images obtained by the Pan-STARRS telescopes, including real-time detection of transient sources such as supernovae and moving objects including potentially hazardous asteroids. With a nightly data volume of up to 4 terabytes and an archive of over 4 petabytes of raw imagery, Pan-STARRS is solidly in the realm of Big Data astronomy. The full data processing system consists of several subsystems covering the wide range of necessary capabilities. This article describes the Image Processing Pipeline and its connections to both the summit data systems and the outward-facing systems downstream. The latter include the Moving Object Processing System (MOPS) & the public database: the Published Science Products Subsystem (PSPS).
SCUBA-2 is the largest submillimetre array camera in the world and was commissioned on the James Clerk Maxwell Telescope (JCMT) with two arrays towards the end of 2009. A period of shared-risks observing was then completed and the full planned complement of 8 arrays, 4 at 850 microns and 4 at 450 microns, are now installed and ready to be commissioned. SCUBA-2 has 10,240 bolometers, corresponding to a data rate of 8 MB/s when sampled at the nominal rate of 200 Hz. The pipeline produces useful maps in near real time at the telescope and often publication quality maps in the JCMT Science Archive (JSA) hosted at the Canadian Astronomy Data Centre (CADC).
The second Gaia data release is based on 22 months of mission data with an average of 0.9 billion individual CCD observations per day. A data volume of this size and granularity requires a robust and reliable but still flexible system to achieve the demanding accuracy and precision constraints that Gaia is capable of delivering. The internal Gaia photometric system was initialised using an iterative process that is solely based on Gaia data. A set of calibrations was derived for the entire Gaia DR2 baseline and then used to produce the final mean source photometry. The photometric catalogue contains 2.5 billion sources comprised of three different grades depending on the availability of colour information and the procedure used to calibrate them: 1.5 billion gold, 144 million silver, and 0.9 billion bronze. These figures reflect the results of the photometric processing; the content of the data release will be different due to the validation and data quality filters applied during the catalogue preparation. The photometric processing pipeline, PhotPipe, implements all the processing and calibration workflows in terms of Map/Reduce jobs based on the Hadoop platform. This is the first example of a processing system for a large astrophysical survey project to make use of these technologies. The improvements in the generation of the integrated G-band fluxes, in the attitude modelling, in the cross-matching, and and in the identification of spurious detections led to a much cleaner input stream for the photometric processing. This, combined with the improvements in the definition of the internal photometric system and calibration flow, produced high-quality photometry. Hadoop proved to be an excellent platform choice for the implementation of PhotPipe in terms of overall performance, scalability, downtime, and manpower required for operations and maintenance.