Kaggle-like competitions – Astrostatistics and Astroinformatics Portal

You are here: Home / Resources / Kaggle-like competitions

A number of data analysis challenges and competitions have been announced in recent years to address difficult and important problems in statistical and computational astronomy. Most involve issues arising in cosmology. Older entries are obtained from the Cosmology meets Machine Learning Uninstitute.

ARIEL Mission Data Challenges

The Ariel Space mission is a European Space Agency mission to be launched in 2028. Ariel will observe the atmospheres of 1000 extrasolar planets – planets around other stars – to determine how they are made, how they evolve and how to put our own Solar System in the gallactic context.Three challenges are in place. (1) Atmospheric Retrievals: This challenge is designed to facilitate the comparison between atmospheric forward models as well as atmospheric retrieval codes in preparation for the Ariel red book. The organizers provide several hot-Jupiter and super-Earth transmission/emission spectra, and will give blind spectra that reflect a range of cloud and chemical models. (2) Machine Learning: This challenge is trying to identify and correct the effect of stellar spots -literally spots on the surface of the star- in noisy transiting lightcurves of extrasolar planets. This competition has a prize. (3) Data Analysis: This challenge invites you to take part in the optimisation of the ARIEL data analysis processes.

SKA Data Challenge Competition #1

The challenge set for the community is to undertake: source finding (RA, Dec) to locate the centroids and/or core positions, source property characterization (integrated flux density, possible core fraction, major and minor axis size, major axis position angle), and source population identification (one of SFG, AGN-steep, AGN-flat). The SKA Science Data Challenge #1 (SDC1) release consists of simulated SKA continuum images in total intensity of the same field at 3 frequencies and 3 telescope integrations (8, 100, 1000 h as representative of a single, medium-depth and deep integration).Ancillary data consist of primary beams and synthesized beams for each frequency. An explanatory supplement describes the data and the challenge that is set for the community. A training set is also released, which consists in truth catalogues listing the objects in the simulated 1000 h data and their properties for a 5% of the field-of-view. The challenge starting day is 26/11/2018 and the deadline for submitting results is 15/3/2019, after which results will be graded.

LSST time series classification

The LSST Transient and Variable Stars Collaboration (TVS2) and the Dark Energy Science Collaboration (DESC3), this evening we have released PLAsTiCC: The Photometric LSST Astronomical Time-series Classification Challenge. PLAsTiCC is a large data challenge for which participants (astronomers and/or machine learning enthusiasts in other fields) are asked to classify astronomical time series, in preparation for the onslaught of data that LSST will bring us. The challenge is available via Kaggle in the following link:
https://www.kaggle.com/c/PLAsTiCC-2018. In encourage you all to participate, not only because there is a $25,000 cash prize for the winner, but also because you truly care about making sense of the Universe!

Microlensing Data Challenge

WFIRST will complete our census of the planetary population by using microlensing to discover a large sample of planets between 1-10 AU from their host stars. We are challenging the community to develop new analysis techniques to tackle unresolved questions and maximize the science return from this mission! First dataset available now by joining: https://github.com/microlensing-data-challenge. Deadline: Oct 31, 2018. Newcomers to the field are encouraged!

The analysis and modeling of microlensing events has always been a computationally-intensive and time-consuming task, traditionally requiring a powerful computer cluster as well as well sampled lightcurves. While the number of interesting events with adequate data remained fairly low, it has been practical to perform a careful interactive analysis of each one, often with the aid of a powerful computer cluster. Even so, a number of challenges remain, particularly concerning the analysis of triple lenses. This is expected to change with next-generation surveys, especially with the launch of WFIRST. This mission is expected to detect thousands of microlensing events, including hundreds of planetary events. Clearly, our analysis techniques need an upgrade to fully exploit this dataset, and the currently-small microlensing community needs to grow.

To stimulate research in this area, we are holding a series of data challenges, each based around the release of a large set of simulated WFIRST lightcurves. The first dataset was recently released, with a submission deadline of Oct 31, 2018. We are particularly keen to encourage participation by people from the astro-statistics and astro-informatics communities. For more information, please visit http://microlensing-source.org/data-challenge/.

Radial Velocity Fitting Challenge

Stellar signals are the main limitation for exoplanet detection from precise radial-velocity (RV) measurements. The m s−1 perturbation created by these signals prevents the detection and mass characterization of small-mass planetary candidates such as Earth-twins. Several methods have been proposed to mitigate stellar signals in RV measurements. The goal of the RV fitting challenge is to generate simulated RV data including stellar and planetary signals and to perform a blind test within the community to test the efficiency of the different methods proposed to recover planetary signals despite stellar signals. Dumusque 2016 and Dumusque et al. 2017.

Observing Dark Worlds

There is more to the Universe than meets the eye. Out in the cosmos exists a form of matter that outnumbers the stuff we can see by almost 7 to 1, and we don’t know what it is. What we do know is that it does not emit or absorb light, so we call it Dark Matter. Although dark, it warps and bends spacetime such that any light from a background galaxy which passes close to the Dark Matter will have its path altered and changed. This bending causes the galaxy to appear as an ellipse in the sky. This is an official Kaggle competition, now completed, here.Mapping Dark Matter

Mapping Dark Matter is a image analysis competition whose aim is to encourage the development of new algorithms that can be applied to challenge of measuring the tiny distortions in galaxy images caused by dark matter.The aim is to measure the shapes of galaxies to reconstruct the gravitational lensing signal in the presence of noise and a known Point Spread Function. The signal is a very small change in the galaxies’ ellipticity, an exactly circular galaxy image would be changed into an ellipse; however real galaxies are not circular. The challenge is to measure the ellipticity of 100,000 simulated galaxies. This is an official Kaggle competition, now completed, here.

Challenges in interferometric image reconstruction

Image reconstruction in optical interferometry has gained considerable importance for astrophysical studies during the last decade. This has been mainly due to improvements in the imaging capabilities of existing interferometers and the expectation of new facilities in the coming years. However, despite the advances made so far, image synthesis in optical interferometry is still an open field of research. Since 2004, the community has organized a biennial contest to formally test the different methods and algorithms for imagereconstruction. In 2016, we celebrated the 7th edition of the “Interferometric Imaging Beauty Contest”. This initiative represented an open call to participate in the reconstruction of a selected set of simulated targets with a wavelength-dependent morphology as they could be observed by the 2nd generation of VLTI instruments. See details here.

Challenges in visualization:

The VisIVO Contest 2014 (Visualization for the International Virtual Observatory) is a call to the worldwide scientific community to use technologies provided by the VisIVO Science Gateway to produce amazing images and movies from multi-dimensional datasets coming either from observations or numerical simulations. The package offers a framework for exploration of large-scale scientific datasets, particularly related to cosmological simulations.

Challenges in exoplanet detection:

The WFIRST Coronagraph Exoplanets Community Data Challenge seeks participation from teams with spectral retrieval expertise. The Challenge will run from Aug15 to Nov15 2016, and it will consist of a blind spectral retrieval exercise using simulated extracted spectra for several “known RV” and/or hypothetical “discovery” exoplanets. The data will be served via the IPAC WFIRST Science Center. For the first five teams that complete the entire retrieval challenge (all five planets, all requested SNR and spectral resolution parameters) we are offering travel expenses to an exoplanets meeting. Contact: Margaret Turnbull, SETI Institute, WFIRST Coronagraph SIT Principal Investigator.

The Nearby Earth Astrometric Telescope (a proposed satellite mission) is designed to measure the tiny positional wobble of solar-like stars due to orbiting planets. NEAT scientists have designed a double-blind contest with realistic simulated time series with and without planetary signals.

Challenges for weak-lensing galaxy image analysis:

A GRavitational lEnsing Accuracy Testing (GREAT3) challenge is underway to test methods of weak lensing data analysis. This is similar to strong lensing (above) but the background objects are galaxies rather than quasars, and the statistical problem involves measuring subtle shearing of the galaxy shapes. Details are available here and here.

Mapping Dark Matter Kaggle Challenge (a more accessible version of GREAT10)

GREAT10 PASCAL Challenge contains a spatially varying kernel, and a kernel estimation challenge

GREAT08 PASCAL Challenge was the first shear measurement challenge aimed at MLers

Challenges for galaxy morphology classification:

Kaggle and GalaxyZoo joined to present The Galaxy Challenge for automated galaxy morphology classification. The $16,000 prize has been won by data scientist graduate student Sander Dieleman, who used a 7-layer neural network with 42M parameters. Code was written in Python with Theano wrappers for GPU implementation. See Kaggle‘s interview here. Kaggle sponsored an earlier galaxy imaging competition in 2011.

Challenges for photometric redshift estimation:

The PHAT challenge here and here.

Challenges for strong gravitational lensing time delay:

A Strong Lens Time Delay Challenge is now open for competition. Based on simulated LSST data of gravitational lensing of quasars lying behind foreground galaxies, the challenge is to accurately establish delays between the stochastic variations of two lensed quasar images from sparse, irregularly sampled time series.