Skip to content. | Skip to navigation

Personal tools

Navigation

You are here: Home / Forums / Software Forum / Python or R ... a debate

Python or R ... a debate

Up to Software Forum
  • Python or R ... the Cross Validated debate

    Posted by Feigelson, Eric at May 25. 2013

    Cross Validated is a question and answer site for statisticians, data analysts, data miners and data visualization experts. During 2011-12, it hosted an interesting exchange on the relative value of R and Python here.

    One contributor argues that, although Python "has reached maturity and is now much better than MATLAB in many respects ... there are just too many R packages I use on a daily basis that have no Python equivalent. The absence of ggplot2 is enough to be a showstopper ... [and] R has a better syntax for data analysis. ... I believe Python will limit the way you think about data analysis. It will take a few years ...  to produce the module replacements for the 100 essential R packages".

    Another contributor state "It's hard to ignore the wealth of statistical packages available in R/CRAN. That said, I spend a lot of time in Python land and would never dissuade anyone from having as much fun as I do." Another states "there is nothing you can't do in R that you can do in python."  Another says "I don't think there's any argument that the range of statistical packages in cran and Bioconductor far exceed anything on offer from other languages, however, that isn't the only thing to consider. In my research, I use R when I can but sometimes R is just too slow. For example, a large MCMC run. Recently, I combined python and C to tackle this problem."

    Another contributor discusses "the rpy project, which provides an interface between R and Python. You get a pythonic api to most of R's functionality while retaining the (I would argue nicer) syntax, data processing, and in some cases speed of Python. It's unlikely that Python will ever have as many bleeding edge stats tools as R, just because R is a dsl and the stats community is more invested in R than possibly any other language."  See the recent update rpy2, "a simple and efficient access to R from Python".

    Another respondant writes "I am a biostatistician in what is essentially an R shop (~80 of folks use R as their primary tool). Still, I spend approximately 3/4 of my time working in Python. I attribute this primarily to the fact that my work involves Bayesian and machine learning approaches to statistical modeling. Python hits much closer to the performance/productivity sweet spot than does R, at least for statistical methods that are iterative or simulation-based."  Another comments "There are areas of statistical computing (e.g. unstructured text analysis and computer vision) that a lot of functionality exists for in Python, and Python is seemingly the lingua franca in those sub-domains. I think where the Python community has to catch up on is improving the data structures and semantics around classical statistical modeling that R's design is so good at. The scikits.statsmodels developers are making a lot of progress on that front."

    One contributor writes: "I would like to say that from the standpoint of someone who relies heavily on linear models for my statistical work, and love Python for other aspects of my job, I have been highly disappointed in Python as a platform for doing anything but fairly basic statistics. ... Python feels a bit like the Wild West."

    Another writes: "There's really no need to give up R for Python anyway. If you use IPython with a full stack, you have R, Octave and Cython extensions, so you can easily and cleanly use those languages within your IPython notebooks. You also have support for passing values between them and your Python namespace. You can output your data as plots, using matplotlib, and as properly rendered mathematical expressions. There's tons of other features, and you can do all this in your browser. IPython has come a long way."

    Another promotes rpy2 as a "high-level interface ... designed to facilitate the use of R by Python programmers. R objects are exposed as instances of Python-implemented classes, with R functions as bound methods to those objects in a number of cases. ... [With rpy2] I can process my data using the flexibility of python , turn it into a matrix using numpy or pandas and do the computation in R, and get back r objects to do post processing. I use econometrics and python simply will not have the bleeding edge stats tools of R. And R will unlikely ever be as flexible as python. This does require you to understand R. Fortunately, it has a nice developer community.  Rpy2 itself is well supported and the gentleman supporting it frequents the SO [Stack Overflow] forums."

    Different contributors list Python libraries for statistical including: NumPy/Scipy for general tools; matplotlib for plotting and graphics; pandas, pydataframe, and pyTables for manipulations of tables and time series (see here for a quick reference to pandas); larry for labeled arrays; python-statlib for more statistics libraries; PyIMSL Studio for a large collection of mathematical and statistical algorithms;  statsmodels for linear modeling; scikits for smoothing, optimization, and machine learning; PyMC for Bayesian MCMC computation; PyMix for mixture models; scikit-learn for machine learning; Theano for high performance computing;  Cython for conversion of computational intensive code to C; Sho for data analysis with Microsoft .NET compiled code;  IPython for interactive, parallelized computing; Sage for a collection of mathematical libraries

    • Re: Python or R ... the Cross Validated debate

      Posted by Feigelson, Eric at September 27. 2015

      A lively interchange on R vs. Python appeared in the Facebook astro.R Facebook group in September 2015 here.  

    • Re: Python or R ... a debate

      Posted by Feigelson, Eric at December 18. 2013

      A discussion of the growth of R and Python appears in the December 2013 r-bloggers.com.  Both are growing rapidly, perhaps exponentially.  Python appears to have a larger total community of users, but R may be growing more rapidly and may dominate among those dealing with data analysis.  The essayist concludes that both languages are valuable.   

       

      A similar comparison of R and Python users among readers of kdnuggets shows R is ahead of Python during 2011 and 2012.  

    • Re: Python or R ... a debate

      Posted by Feigelson, Eric at December 20. 2016

      from http://www.kdnuggets.com/2016/03/r-python-learning-both-datacamp.html

      R or Python? Consider learning both


      The key to become a data science professional is in understanding the underlying data science concepts and work towards expanding your programming toolbox as much as you can. Hence, one should understand when to use Python and when to pick R, rather mastering just one language.

      SAS  Business Intelligence/VASAS Business Intelligence/VA

       

       

      Learn Data Science, not Programming

      With the outbreak of the Data Science revolution a “war” between R enthusiasts and Python fanatics emerged. As a result Python and R have been compared and contrasted a thousand times with detailed listings of their respective advantages and weaknesses (e.g. see our infographic for a refresher).

      All this “warfare” led to the misconception that as a data science learner & enthusiast you should relentlessly focus on mastering either R or Python. This is bad advice. Namely, the actual key to become a data science professional is in understanding the underlying data science concepts and work towards expanding your programming toolbox as much as you can. In other words you should aim to learn the fundamentals of both R (see Introduction to R) and Python (see Introduction to Python for Data Science), one after the other.

      So while it is certainly true that it is important to know the differences between R and Python, today it is more relevant to understand how you can leverage the knowledge of both based on your understanding of fundamental data science concepts. In this post we hope to explain to you why you should learn both, and give you some ideas about how to begin.

      R vs Python, different brushes

      Why are you choosing between R and Python in the first place?

      Most likely you are in need of a tool that will allow you to perform data analysis, do statistical computations, and in general be a data science practitioner. So knowing R or Python is just one component of a bigger whole, which is comprised of knowledge from disciplines such as statistics, computer science, engineering, mathematics, and even graphics design. There is a reason why most data science curricular begin with a computing tool, but never end with them.

      r-vs-python-data-science

      You should think of R and Python as two different brushes that will allow you to better express yourself in data science projects, and take advantage of their individual unique features. Surely the brushes have different grip and texture, but they are also very similar and will allow you to do so much more.

      Do not choose between R & Python, learn both

      In general, you shouldn’t be choosing between R and Python, but instead should be working towards having both in your toolbox. Investing your time into acquiring working knowledge of the two languages is worthwhile and practical for multiple reasons.

      It strengthens your data science communication skills

      Both R and Python have strong online communities such as R-bloggers and python.org dedicated to the respective languages. Looking at these sites you can get the impression that R and Python communities are completely disjoint. Unnecessary to state that is not the case.

      In the real world of data science, Python and R users intersect a lot. So whichever industry or discipline you are interested in you are likely to run into projects done in both languages. To appreciate it all you need to have at least a basic understanding of both R and Python. Furthermore, by mastering both, you have the advantage and versatility of presenting and communicating effectively regardless of whether your audience is more comfortable with R or Python. So if you strive to become a data scientist, you will eventually need to be fairly familiar with both languages, and most likely a whole lot more.

      It boosts your data science career

      Knowing both R and Python will open doors for more job opportunities. Some companies, or departments within companies might prefer Python, while other like to work with R. Imagine that you are a perfect fit for the job, except that you know R while the company requires you to know Python. Wouldn’t that suck? Generally professionals from the industry encourage entrants to acquire as many tools and skills as they can. Most of the time you won’t be expected to be a complete master of R or Python, but displaying your commitment and passion by having learned at least some of both will only give you bonus points.

      It is not that hard

      You can think of Python and R as Spanish and Italian; they are both very different and very similar at the same time. They have a different syntax and have their own (technical) advantages, but at the same time they become very similar when appropriate Python packages are used (numpy, pandas, …). For example:

      Suppose you want to load csv files. In R you have a couple of options, one of which is read_csv(…). In Python you can use a function from the Pandas library with the code pd.read_csv(…). Spot the difference!

      Also, both Python and R are what is considered «scripting languages» which allows you to write snippets of executable code without having to use a compiler like when using Java for example. Next, they both have libraries and packages that you load into your environment to add functionality and do the tasks you need to complete. In addition, when working with both you will experience that your workflow for both languages is very similar, as are the documentations and communities surrounding them.

      Where the R and Python Worlds Cross

      In the past, one could argue that although R and Python are two very useful tools you could learn, it’s not true that one can paint on the same canvases with them. Today, thanks to new tools and technologies, that argument is becoming more and more invalid.

      We more and more see that the R and Python universes are starting to overlap, thereby mitigating the need to choose between the two languages. Lets look at some examples of technologies and tools that allow to leverage the knowledge of both languages and thus intersecting the borders between the R and Python worlds.

      Jupyter Notebooks

      Let’s begin with the Jupyter project. The Jupyter Notebook is essentially a tool that allows you to write and share executable code in a variety of programming languages. The name «Ju-Pyt-er» is derived from Julia, Python, and R which immediately tells you that these three languages are the focus, though today these online notebooks support something like 40 different languages.

      When working on a project in Jupyter, you can document both Python and R in the same format and share these notebooks with your colleagues, clients, students, or whoever. Jupyter is not an IDE and doesn’t attempt to replace Rstudio or Rodeo for Python. What Jupyter does is it gives you a universal space where you can display your work in either language, and hence organize your work more efficiently when implementing both R and Python for a project.

      If you are interested about how to use Jupyter with R read these posts from Continuum Analytics  andRevolution Analytics to get started, or see an example of what you can do with them. There is also a nice guide from quant-econ.net that you might find useful.

    • Re: Python or R ... a debate

      Posted by Feigelson, Eric at June 21. 2016

      A tutorial on running R from Python, and vice versa, appears here.  

      An infographics comparing R and Python under the title `Data Science Wars' from DataCamp appears here

      A lengthy (2012-16) multi-author discussion of R vs. Python from Quora appears here.  Essentially, the race is a tie.

      A simple procedure for running R from Python, and vice versa, appears here

      In mid-2016, R surpassed SAS in articles listed by Google Scholar.  SPSS is still strongest, though declining.  

    • Re: Python or R ... a debate

      Posted by Feigelson, Eric at March 28. 2014

      Here are more commentaries on the relative merits of R and Python:

      Daniel Gutierrez gives a sophisticated discussion in an article Data Science Wars: Python vs. R appearing in the inside-BIGDATA blog for data science professionals.  Here is his concluding paragraph: "Python lacks much of R’s richness for data analysis, data modeling and machine learning, but it is making progress. At this point, data science is a very technical area and in my mind you can’t give up R’s depth in favor of Python’s approachability and general-purpose nature. As I mentioned above, the two languages can and do live together nicely. Data science will always be the realm of “scientists who deal with data,” and that will not change anytime soon considering the overly simple nature of the recent “machine learning as a service” product offerings. Practitioners still maintain a firm foundation in mathematics and statistics, which is beyond mortal business analysts and others.  So at least for now, I’d like to douse the flames of the Python vs. R tech war. I think there’s plenty of room for two good choices in the pursuit of robust data science." 

       

      The blog ThinkOR (Think Operations Research) discusses online courses for learning R and Python, saying: "The Data Science programming / analytics languages to know are, R and Python. If you're in Operations Research or another analytics field that somewhat fits under the "Data Science" hat, you: a) already know them really well, b) want to brush up on them, or c) you probably should learn them now. Here I compile my thinking on how to learn R and Python from Beginner to the Intermediate and Advanced levels, based on having tried some of these course materials."

       

      Ron Pearson is a engineering data scientist and author of a recent book on the subject.  He blogs as follows: "While I greatly prefer R to Python for data analysis, I have found Python to be more suitable than R for a variety of extra-analytical tasks, including preliminary explorations of the contents of weakly structured data sources, as well as certain important reformatting and preprocessing tasks.  Like R, Python is an open-source language, freely available for a wide variety of computing environments.  Also like R, Python has numerous add-on packages that support an enormous variety of computational tasks (over 25,000 at this writing).  In my day job in a SAS-centric environment,...  I have found Python to be better suited than R to tasks that involve a combination of automatically generating simple programs in another language, data file management, text processing, simple data manipulation, and batch job scheduling."

       

      Math professor John Cook write: "I prefer Python to R for mathematical computing because mathematical computing doesn’t exist in a vacuum; there’s always other stuff to do. I find doing mathematical programming in a general-purpose language is easier than doing general-purpose programming in a mathematical language. Also, general-purpose languages like Python have larger user bases, are better designed, have better tool support, etc."  

       

      Data science blogger Karissa McKelvey has an article called My Data is Big Because It Doesn't Load Into R: Why Python is the Language of Web Science. She concludes: "A new field is being born, and it is Computation. This new field will sit along side statistics as an interdisciplinary foundation for analysis, visualization, and manipulation of data. It will also act as a platform to collect data, as the ability to scrape the web and create our own websites will become as commonplace as writing a paper. It’s best practices and teaching methodologies are still being discussed, theorized, brought into reality, and tested. And I bet the language they’ll adopt as the primary foundation will be Python.  Python is easy to use, the syntax is clear, the packages are abundant, and the community is open source (read: free).  It’s an exciting time to be a quant."

       

      The SwarmLab at New Jersey Institute of Technology is engaged in an extended `battle' between R and Python, where a problem is simultaneously coded in both languages.  Applications have been in the area of `Web scraping' of Hollywood movie data, somewhat distant from astrophysics.  Simon Garnier's conclusion after Round 2 is: "If you want my personal take on these two rounds, it is that the two languages don’t differ that much from each other. You can reach the same result with pretty much the same amount of effort. But there are already some noticeable differences between both languages, in particular in the way iterations are performed (I tend to think that R is more intuitive here). Also in this post we saw (briefly) that a big difference between the two languages will be the number and quality of available libraries. For this particular round, Python required less effort because an advanced IMDb-scraper already exists for this language. However when we’ll start playing with advanced statistics, it’s likely that R will be better equipped than Python."

       

      DataRobot is a company building a cloud-based app for predictive modeling.  They have a blog with a variety of resources on statistical modeling in R and Python.  For example, they are providing Python translations to the R scripts in the new textbook An Introduction to Statistical Learning with Applications in R by James, Witten, Hastie & Tibshirani (Springer, 2013).  This text is the basis of a new MOOC (Massive Open Online Course) called StatLearning.  

       

      An extended discussion of R vs. Python for data analysis recently appeared in the Programmers Stack Exchange.  Snippets include:

      1. "I use both Python (for data analysis ofcourse including numpy and scipy) and R next to each other. However, I use R exclusively to perform data analysis, and Python for more generic programming tasks (e.g. workflow control of a computer model)."
      2. "I use R and Python for all my research (with Rcpp or Cython as needed), but I would rather avoid writing in C or C++ if I can avoid it. R is a wonderful language, in large part because of the incredible community of users. It was created by statisticians, which means that data analysis lies at the very heart of the language; I consider this to be a major feature of the language and a big reason why it won't get replaced any time soon. Python is generally a better overall language, especially when you consider its blend of functional programming with object orientation. Combined with Scipy/Numpy, Pandas, and statsmodels, this provides a powerful combination. But Python is still lacking a serious community of statisticians/mathematicians."
      3. " I think in terms of basic operations, say operations on arrays and the sort, R and Python + numpy are very comparable. It is in the very large library of statistical functions that R has an advantage. In addition, matplotlib does not seem to be as good as ggplot2"
      4. "Since the [R] language has been around for ever, lots of people have done things that you're likely to want to do. This means that, when faced with a hard problem, you can just download the package and get to work. And R "just works": you give it a dataset, and it knows what summary statistics are useful. You give it some results, and it knows what plots you want.  ... As nice as scipy/numpy/pandas/statsmodels/etc. are for Python, they're not at the level of the R standard library.  The main advantage of Python over R is that it's a real programming language in the C family. It scales easily, so it's conceivable that anything you have in your sandbox can be used in production. Python has Object Orientation baked in, as opposed to R where it feels like kind of an afterthought (because it is). There's other stuff that Python does nicely too: threading and parallel processing are pretty easy, and I'm not sure if that's the case in R. ... I'll add this as a bit of a kicker: since you're still in school, you should think about jobs. You'll find more job postings for highly skilled Python devs than you will for highly skilled R devs. In Austin, jobs for Django devs are kind of falling out of the sky."
      5. "So, I have primarily done data analysis in Matlab, but have done some in Python (and more used Python for general purpose) and also I've started a bit of R. I am going to go against the grain here and suggest you use Python. The reason why is because you are doing data analysis from a Machine Learning perspective, not stats (where R is dominant) or digital signal processing (where Matlab is dominant). There is obviously heavy overlap between Machine Learning and Stats. But overlap is not identity. ... Sure, you can compute a minimal spanning tree in R. It may look like an ugly mess though. Machine learning people will assume you have easy access to hash tables, binary search trees, and so on. ... The side benefits of Python for data analysis are much higher too. You will learn a real programming language at the same time, which can handle scripting, create larger applications, etc."
      6. "As an old school (over 50) scientist who has and continues to use a number of these tools I will add my two cents. ... Recent Fortran dialects (F90, F95, F2003, F2008) are IMHO, some of the best designed languages in existence. ...  I use a time tested suite of languages that work well for me. Fortran, C, Perl, R, and Scheme (with tcl for scripting VMD). I find the combination of R and Fortran and C to be very comfortable. ... It is good to learn many languages. Python is undoubtably an important language, but R is as well in it’s domain. But when the rubber really needs to meet the road in science Fortran and C (and C++ for some) will be hard to displace."
    • Re: Python or R ... a debate

      Posted by Feigelson, Eric at May 18. 2015

      The DataCamp Blog produced an elegant infographic comparing various characteristics of R and Python for data analysis and data science.  They conclude:  `And the Winner is ... It's a tie!  It's up to you, the data scientist, to pick the language that best fits your needs.'

    • Re: Python or R ... a debate

      Posted by Aldcroft, Thomas at May 27. 2013

      What attracts me to Python for my analysis work is the "full-stack" of tools that are available by virtue of being designed as a general purpose language vs. R as a domain specific language.  The actual data analysis is only part of the story, and Python has rich tools and a clean full-featured language to get from the beginning to the end in a single language (use of C/Fortran wrappers notwithstanding).

      On the front end, my work commonly starts with getting data from a variety of sources, including databases, files in various formats, or web scraping.  Python support for this is good and most database or common data formats have a solid, well-maintained library available for interface.  R seems to share a general richness for data I/O, though for FITS the R package appears not to be under active development (no release of FITSio in 2.5 years?).  A lot of the next stage of work typically occurs in the stage of organizing the data and doing pipeline-based processing with a lot of system-level interactions.

      On the back end, you need to be able present large data sets in a tangible way, and for me this commonly means generating web pages.  For two projects I wrote significant django web apps for inspecting the results of large Chandra survey projects.  This included a lot of scraping (multiwavelength catalogs) and so forth.  These were just used internally for navigating the data set and helping in source catalog generation, but they were invaluable in the overall project.

      Moving to the astronomy-specific functionality for analysis, it seems clear that the community is solidly behind Python.  This is seen in the depth of available packages and level of development activity, both at an individual and institutional level (http://www.astropython.org/resources).  Given this level of infrastructure that is available and in work, I think it make sense to direct effort to porting the most useful R statistical tools for astronomy to Python.  This would complement the current capability to call R functions from Python via rpy2.

      Being part of a hugely popular language that is becoming a standard for scientific computing is a very good thing because it means developer resources (and money) are available.  Next generation projects like numba (http://www.slideshare.net/teoliphant/numba) or blaze (http://continuum.io/blog/blaze) are incredibly exciting and hold promise as the foundation for big data astronomy.  DARPA recently awarded $3M to the startup working on those (Continuum Analytics), and of course IPython Notebook got $1.15M from Sloan.  Also from Continuum, the bokeh project (https://github.com/ContinuumIO/Bokeh) is starting to bring some of the wonder of R’s ggplot2 to the Python world.



    • Re: Python or R ... a debate

      Posted by Feigelson, Eric at June 02. 2013

      Tom,  

      Thanks for your engaging explanation of Python's strengths for astronomical data analysis. To continue the `debate', let me share thoughts on three characteristics of R for astronomy:

      1. As various commentators in the CrossValidated and StackOverflow discussion above have said, R's coverage of statistical methodology is vaster by far than Python's coverage.  Many of the ~100,000 of R/CRAN's functionalities are narrowly focused on genomics or econometrics, but even the small fraction useful to astronomy is impressive.  And the CRAN packages are still growing exponentially.  R is a sweetly integrated environment with on-the-fly incorporation of CRAN packages (no compiling) and uniform documentation standards (with examples that always work).  So it is easy to try several statistical methods to compare the options within and between functions.  This is critical: astronomers can now try multiple sophisticated approaches to a challenging data analysis problem, so the science conclusion does not have unknown dependencies on the chosen statistical methodology.  

      2. There is a widespread belief that R is computationally slow and can't handle Big Data problems. This was certainly true in the past, but it is less true today:

      • Since 2012, all base-R methods have been converted to `byte code' compilation, and a simple CRAN package (called `compiler') allows the user to convert their own functions in a similar fashion.  Many computationally intensive operations in R are already coded in C/Fortran for maximum speed, and users are encouraged to do the same for their specialized analyses.  There are well-established methods for avoiding slow loops and speeding up your R code; see (for example) the recent blog entry here.  
      • The CRAN Task View on High-Performance and Parallel Computing with R lists over 70 CRAN packages for: disk-based & streaming data; cloud processing; homogeneous and heterogeneous multicore processing; GPU cluster processing; improved linkage between R and C/C++/Fortran/Java; and other techniques for Big Data analysis.  Various protocols and libraries are supported: MPI, Hadoop, PVM, OpenMP, OpenCL, CUDA, HTCondor, MAGMA, slurm, etc. For example, at least three CRAN packages treat parallelized MCMC chains. Many of these packages are new, and it is not clear to me that these are easily used and integrated into a serious data analysis project.  But there definitely has been a lot of progress in bringing R to serve Big Data problems. 

      3. R's major weakness for our community, particularly when compared to Python, concerns astronomy-specific functionalities today.  Only a handful of CRAN packages treat astronomical issues (see here).  Our group is upgrading the FITS reader to the IAU-certified CFITSIO library, and we are translating many IDL astrolib utilities into R.  Although more infrastructure is needed, remember that R/CRAN is a full-service data analysis environment with full ability to manipulate files & data, including several large graphics packages, and of course statistical analysis.  So it does not take too much software to liaise astronomical data with R.  But I do believe that a natural approach for many astronomers is to live in Python and tap into the ~100,000 R/CRAN functions using rpy2.  

       

      Eric

    • Re: Python or R ... a debate

      Posted by Feigelson, Eric at October 27. 2013

      KDNuggets is a huge Web site for data mining and machine learning. In a recent blog entry of 7 Steps for Learning Data Mining and Data Science, they recommend that one learn three computer languages:  R, Python and SQL.   This was based on a recent polltaken of KDNuggets readers.  Of 700 respondents, 61% use R, 39% use Python, 37% use SQL, and 20% use SAS.  R grew by 16% from 2012 to 2013;  SQL showed 14% growth (perhaps due to interfaces to Hadoop and other Big Data systems); Matlab and Java show declining use.  R is often used with other languages.

    • Re: Python or R ... a debate

      Posted by Feigelson, Eric at October 27. 2013

      Python has a new capability for machine learning: the Pylearn2 library built on the Theano library. Theano is a linear algebra compiler closely coupled to NumPy and gives transparent use of GPUs, dynamic C code generation.  Recent improvements to Theao are described here. Pylearn2 operates on any operating system using the Vagrant Virtual Machine interface.  As described here, allows the user to flexibly choose a training algorithm (e.g. gradient descent procedures), model estimation criteria (e.g. score matching, cross-entropy), models (e.g. neural nets, principal components analysis, k-nearest neighbor, Support Vector Machines), and datasets.  Theano and Pylearn2 are developed by the Laboratoire d'Informatique des Systemes Adaptifs at Universite de Montreal.  

    • Re: Python or R ... a debate

      Posted by Feigelson, Eric at May 25. 2013

      Some excerpts from similar discussions in Stack Overflow (a language-independent collaboratively edited question and answer site for programmers) around 2009:

      What can be done in R can can't be done with Python/Numpy/SciPa?

      1. "I feel I can speak for most people in this forum that Python is a beautiful, robust, versatile language, and a pleasure to code in. ... But nowhere in Python can you find the same breadth and depth of statistic functionality that you have in R. There are at least four [written in 2009, today it is ~7] packages devoted to wavelets; ...  Grid, lattice and ggplot2 offer a flexibility and power that cannot be matched by matplotlib. And I am not even mentioning the beautiful, rather esoteric capabilities of Bioconductor packages. If all these capabilities were available in Python, there would be no need for R. ... Looking forward, I don't see Python gaining any advantage over R in the ML [Machine Learning] and Statistics communities.
      2. "Note that R has this quite rare special-purpose + open source nature which makes this a killer combination. ... R has the added advantage of competing against SAS, which many statisticians hate (very expensive, limited features, hard to extend, and non-open). Since extending R is trivial (usually!) compared to writing custom C code, the user base is eager to write modules for it."
      3. "... the languages have very different goals. I've used R quite a bit longer (around 10 yrs) and python (about 3). Python can do many things better than R. As a programming language it is much more versatile. However, R is a scriptable data analysis tool with 15+ years of code libraries written largely by the academic community for statistical and data analysis. R would be useless for writing a database application, let a alone a website, content management system, word processor, image manipulation program, game, etc etc etc. R, on the other hand, is entirely devoted to data analysis tasks. You can push it a bit to do some rudimentary programming and OOP, but it was never written or conceived for that purpose. Probably anything that can be done in R can be done in python. There's an excellent chance it could be redone in a more elegant fashion. ... I would say for non-programmers who understand what they're doing with statistics, R is a good deal easier to get going. Perhaps simply the fact that every assignment creates a complete copy in a new object makes it easier to handle. It precludes many elegant programming constructs, but the vast majority of R users are scientists, statisticians and economists who may well be completely oblivious to those same elegant constructs ... While there may be few things (maybe nothing) that could strictly be done BETTER in R than in python, the fact is hundreds (or more realistically almost thousands) of things have already been done in R and are ready to use. So R is better for data analysis due to years of effort and the world-class pedigree of people writing cutting edge modules for it."
      4. I've been a S+/R programmer for almost 15 years, and a Python programmer for 3 (with some Matlab in the interim). First and foremost to this discussion is the fact that R is a specialized program for statistical analysis. It has developed a huge code base and developer base over the years, and has become, because of Bioconductor, a preferred language in bioinformatics.  Python, on the other hand, is a general purpose tool, one segment of which (the Numpy/Scipy/matplotlib world) is geared towards statistical analysis. Even there, it's strength is in random number generation, and there are not many modules to really do more involved statistical modeling (like glm, mixed effect models, survival analysis, let alone shrinkage methods.) Python has quite a bit of ML code available."
      5. "So I became an R programmer because it was the language that put the tools in my hands the fastest."
      6. " It isn't an issue of what one language can do that the other can't, but it just comes down to the age old notion that a programmer should use the right tool for the right job. I use both languages heavily (primarily Python now, used to be primarily R). I find myself bouncing between them (as well as some other languages) based on the task at hand."
      7. "... while in a simple Venn-diagram we may see the large overlap between R and Python (or Ruby or ...) the fact remains that R was designed for Programming with Data (to quote one of Chambers' books). With that in mind, it makes data exploration, analysis, programming, estimation, ... fairly easy. Of course, this does not mean you cannot do it in other languages (quite the opposite) but once you grasp some of the intrinsic advantages in R (working with data.frame object, missing values, modeling functions, visualization, ...) you may prefer to work in R. Chambers most recent book Software for Data Analysys (2008) makes this point rather well."
      8. "The CRAN Task Views also give a nice overview of what is being done with R by different scientific communities. That takes nothing away from Python, but it may chip away at the somewhat pejorative notion of R as 'yet another DSL'. It has become the primary language for data analysis and statistical computing."
      9. "There are two kinds of data analysis: 1) applying data models 2) programming an analysis/numerical computing model. If you are doing the first one, R is superior to Python with Numpy/Scipy. Because R core is very stable, R also has a lot of 3rd party statistical packages contributed by researchers in academic. If you have heard a statistical procedure, you 98% could find it in R. You can build a decision tree/linear regression using only a few lines including loading your data, plotting the results. While in Python, there are not many packages, and some of the packages are more buggy. If you are mainly doing the second task, Python is better than R, because Python is a better programming language than R. Scipy package has a good support for matrices, linear algebra, optimization, which are the basics for scientific computing. Python is also a very expressive language for small-to-middle sized projects. So if you want to implement a decision tree in Python, it would be easier than R."
      10. "Both are very good choices, and I think that they have slightly overlapping spheres of influence in my work. I prefer to use Python to do generation, processing and cleanup of data. The rationale for this is that Python has a very well-defined object-oriented syntax, lots of libraries fo processing common data formats, and if I'm interested in speeding things up it's easy for me to mix Python with C++. My co-workers also like Python very much, which is a very important factor when picking languages.  Once I have my data in a form that I like, I use R for analysis. It's concise, has good support for most statistical techniques, and can make beautiful graphics very concisely. Concise graphics are very useful when you're doing exploratory analysis, and examining the same data in several different ways is very important when you're exploring data your data."
    • Re: Python or R ... the Cross Validated debate

      Posted by Feigelson, Eric at September 11. 2015

      The latest (mid-2015) polls, ratings, and comparisons on R vs Python of Data Science from KDNugget can be found here, here and here.  Hour-long videos comparing the two languages appear here.  R is dominant is some polls, and Python in others. The general consensus is again that both languages are excellent, and can be chosen for the purpose at hand. 

    • Re: Python or R ... a debate

      Posted by Feigelson, Eric at February 25. 2015

      Two surveys of programming languages used in industry have been released that reveal the popularity of Python and R in the context of more popular, general-purpose, computer languages:

      • CodeEval ranks Python as #1 in 2015, growing 3% since 2014.  Java and C++ are #2 and #3 with R rapidly rising to #17.
      • The TIOBE Index ranks Python in the #8 rank, while R is ranked at #18, again found to be rapidly rising.  C, Java and C++ are the most popular languages here.  
    • Re: Python or R ... a debate

      Posted by Feigelson, Eric at March 28. 2014

      The following article appears in the Rprogramming Web site ... 

      Calling Python from R with rPython

      Python has generated a good bit of buzz over the past year as an alternative to R. Personal biases aside, an expert makes the best use of the available tools, and sometimes Python is better suited to a task. As a case in point, I recently wanted to pull data via the Reddit API. There isn’t an R package that provides easy access to the Reddit API, but there is a very well designed and documented Python module called PRAW (or, the Python Reddit API Wrapper). Using this module I was able to develop a Python-based solution to get and analyze the data I needed without too much trouble.

      However, I prefer working in R, so I was glad to discover the rPython package, which enables calling Python scripts from R. After finding rPython, I was able to rewrite my purely Python script as a primarily R-based program.

      If you want to use rPython there are a couple of prerequisites you’ll need to address if you haven’t already. No surprise, you’ll need to have Python installed. After that, you’ll need to install the PRAW module via pip install praw. Finally, install the rPython package from CRAN. (But see the note below first if you’re on Windows.)

      After you’ve completed those steps, it’s as easy as writing your Python script and adding a line or two to your R code.

      First create a Python script that imports the praw module and does the first data call:

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      import praw
       
      # Set the user agent information
      # IMPORTANT: Change this if you borrow this code. Reddit has very strong
      # guidelines about how to report user agent information
      r = praw.Reddit('Check New Articles script based on code by ProgrammingR.com')
          
      # Create a (lazy) generator that will get the data when we call it below
      new_subs = r.get_new(limit=100)
       
      # Get the data and put it into a usable format
      new_subs=[str(x) for x in new_subs]

      Since the Python session is persistent, we can also create a shorter Python script that we can use to fetch updated data without reimporting the praw module

      1
      2
      3
      4
      5
      # Create a (lazy) generator that will get the data when we call it below
      new_subs = r.get_new(limit=100)
       
      # Get the data and create a list of strings
      new_subs=[str(x) for x in new_subs]

      Finally, some R code that calls the Python script and gets the data from the Python variables we create:

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      library(rPython)
       
      # Load/run the main Python script
      python.load("GetNewRedditSubmissions.py")
       
      # Get the variable
      new_subs_data <- python.get("new_subs")
       
      # Load/run re-fecth script
      python.load("RefreshNewSubs.py")
       
      # Get the updated variable
      new_subs_data <- python.get("new_subs")
       
      head(new_subs_data)

Powered by Ploneboard