# Q&As on the R statistical software system

Up to Software ForumR is the only *comprehensive, public domain* statistical software system. Started in the 1990s, by ~2005 it became the dominant non-proprietary statistical software environment for the development and promulgation of statistical methodology. Its Comprehensive R Analysis Network (CRAN) for user-provided codes has grown exponentially since 2001, now with >3200 add-on packages with >50,000 statistical functionalities; a new package arrives daily. CRAN packages are loaded on-the-fly during R sessions. R/CRAN has ~2 million users (2010) and ~100 instructional books on its use.

R has been rarely used in astronomy, yet its potential to bring advanced methodology to the community is huge. CRAN has, for example, ~100 packages devoted to Bayesian inference and ~110 packages devoted to multivariate clustering and classification. The main difficulty is finding what you want, and understanding what you find, as R is not a didactic environment.

R is a high-level scripting language with a C-like syntax; it is very similar to the proprietary IDL system widely used in astronomy. R has uni- or bi-directional interfaces to other languages including BUGS, C, C++, Fortran, Java, JavaScript, Matlab, Python, Perl, Xlisp, and Ruby. For astronomy, R has a primitive FITS reader; more infrastructure is urgently needed.

This Forum starts with some links to further information on R for astronomy. But its main purpose is to field questions, and hopefully provide answers, on how specific statistical procedures can be performed using R. All ASAIP members are welcome to provide questions and answers.

**Some resources on R for astronomy:**

**-- A talk on R at the 2011 ADASS XXI conference (attached) **

**-- A thorough general introduction to R by the R Development Core Team (html, pdf)**

**-- R tutorials for astronomers: IIA, PSU-Hunter, PSU-Chakraborty, Birmingham-Sanderson**

**-- VOStat Rev 2 (Virtual Observatory statistical analysis Web service using R)**

In a way, R to statistics is something like Matlab to signal processing: It is an excellent environment for developing the methodology, but not always perfect for software distribution. For instance, it is not certain how much of LSST pipeline can be based on R code.

While R is originally designed just for development and testing of statistical methods, many use it also for software distribution, rather than using a programming language that is more suitable for developing software for distribution (e.g., C++). Therefore, while promoting R in astroinformatics surely has value, it should be done with care so that the methods can be of actual use also to audience of astronomers outside the astrostatistics community.

Ignorant question:

My job, in this context, is primarily managing data, much of which is stored in relational databases and can be grabbed flexibly and relatively quickly via SQL queries. Does R have recommended interfaces to major databases (SQL Server, PostGres, Oracle, MySQL, etc) or would one write one's own using, say, python client libraries?

I've read a few of the 'R' books. Many are not so useful for astronomers as they could be because of the focus on examples from life sciences.

But I did think the book "The Art of R Programming" by Norman Matloff was pretty good.

"The R Cookbook" by Paul Teeter was useful too.

Anyone else got recommendations?

Simon,

In response to this need for astronomically oriented R tutorials and scripts, Jogesh Babu and I have written a volume "Modern Statistical Methods for Astronomy with R Applications". It includes a `cookbook' of several dozen R and CRAN applications to ~20 real astronomical datasets. The book will be published in August by Cambridge University Press; see here. The R scripts and astronomical datasets are available here. (R grabs the datasets on the fly; they do not have to be downloaded manually.) The volume is based on our week-long Summer Schools in Statistics for Astronomers (see this year's program here) but is considerably broader in scope.

Eric

Our text and reference book introducing R for astronomical problems is now published: **Modern Statistical Methods for Astronomy with R Applications** by Eric D. Feigelson and G. Jogesh Babu (Cambridge University Press, July 2012). Using 19 contemporary astronomical datasets, from small to large, it gives short R scripts illustrating the use of dozens of R and CRAN statistical functionalities with explanations in the book. There is also an appendix overviewing R/CRAN capabilities and programming (e.g. comparison with IDL, high-performance computing, links to Python/C/Fortran).

Each chapter covers a field of statistics and describing the astronomical context, statistical concepts, some important results, recommended readings, and R tutorials. Chapters cover: probability, statistical inference, nonparametrics, density estimation, regression, mutivariate analysis and classification, time series analysis, and spatial point processes.

Eric