New modes of discovery are enabled by the growth of data and computational resources in the sciences. This cyberinfrastructure includes databases, virtual observatories (distributed data), high-performance computing , distributed computing (the Grid and the Cloud), intelligent search and discovery tools, and innovative visualization environments. In astronomy, data volumes from multiple sky surveys have grown from gigabytes into terabytes during the past decade, and will grow from terabytes into tens (and even hundreds) of petabytes in the next decade.
This plethora of new data both enables and challenges effective astronomical research, requiring new methods, algorithms, and skills. Astronomy and many other scientific disciplines are addressing such challenges through the development, support, and promotion of new sub-disciplines that are information-rich and data-intensive to such an extent that these are now becoming (or have already become) recognized stand-alone research disciplines and full-fledged academic programs on their own merits (e.g., Bioinformatics, and Geoinformatics). The late Jim Gray emphasized the development of this new data-intensive science paradigm by naming it X-Informatics, where X refers to any science. In this context, informatics specifically means data science (including information science), which is the discipline of organizing, accessing, integrating, and mining data from multiple sources for discovery and decision support.
From these developments, Astroinformatics emerges as the new data-oriented paradigm for 21st century astronomy research and education. Astroinformatics includes a broad spectrum of informatics specialties including data-to-knowledge transformations, semantic data integration, information visualization, knowledge extraction, sky-based and catalog-based indexing techniques, information retrieval methods, data mining and knowledge discovery methods, content-based and context-based information representations, consensus semantic annotation tags, astronomical classification taxonomies, astronomical concept ontologies, data-intensive computing, and astrostatistics. These methodologies enable data integration, data mining, information retrieval, knowledge discovery, and scientific decision support (e.g., robotic telescope operations and object selection) across heterogeneous massive data collections.
Significant research resources (measured in terms of scientists‘ time, graduate education programs, and grant funding) must be called upon in order to create and apply astronomy-specific data science algorithms that will address the data-to-knowledge challenges of massive data collections and that will enable us to discover the unknown unknowns that lie within the flood of data that are coming our way. Knowing how to mine, analyze, visualize, and derive scientific knowledge from large complex data collections are essential skills for current and especially for future astronomers. A new model for interdisciplinary astronomy graduate education is envisioned, one that provides unique training in the fundamental astronomical and astrophysical topic areas required for research success, plus a rigorous suite of graduate courses in data-intensive computing, data mining, statistics, time-series analysis, and information science.
The most dramatic need for knowledge discovery from large data sets in astronomy will come with the start-up of the very large time-domain surveys in the next decade, such as the LSST sky survey. The research community must get these Astroinformatics methodologies right. In so doing, astronomers can hope to mine effectively and efficiently the vast data repositories in astronomy for the wealth of scientific discoveries hidden therein. The astronomy community, through AAS sponsorship, needs the research discipline of Astroinformatics in order to fulfill the promise of Carl Sagan’s declaration on scientific discovery: “Somewhere, something incredible is waiting to be known.“