R was developed at the Auckland university as a tool for statistical computing on 29th February, 2000. Two decades have passed since then and this tool created by statisticians for the statisticians has left the academic closet to rise and fall with the industrial tides. It is R’s 20th anniversary; it is in order that we look at its story and appreciate this colourful little tool that has grown to become one of the most revered mediums of statistical computing and a go to weapon in the data scientist’s arsenal.
Enough with the romanticising
R has been out in the market for a longer period of time than most tools that offer data wrangling and visualization. The best part is you can perform certain tasks on R without having any prior knowledge of programming. Its initial learning curve is smoother even than that of Python. You will hear people say that R is an advanced language and better suited for the advanced users. The first part is true but the second part is not.
As Hadley Wickham said in an interview, R could do incredible things even back in the day when none of us had heard of it. It can still do those. But it was designed for statisticians who were not programmers, hence the simple interface and easy usage. The point is, R is very well suited for object oriented programming and advanced data analysis performed by advanced users. It can be equally useful for the novice user who just wants an impressive visualization that expresses some insights.
Data science lights the way
Making sense of numbers and data has been a prerogative for R since the beginning. It was there in the scene way before data science came into common awareness. However, R fit right in with data science as the go to tool. The world is becoming more and more reliant on data everyday. Even the commonest of lay people consume and create an astounding amount of data on a daily basis. All this data is important in order to convert leads, push through the right ads, fighting terrorism or finding a cure to a deadly virus. The applications of data science have been changing constantly. New tools and technologies have arrived and faded. R has stood ground as one of the most preferred tools for advanced statistical computing and data science.
The less appreciated waters of visualization
When it comes to data visualization and some light spirited analyses, people tend to shy away from R in favour of commercial tools that work through clicking and pointing. These tools may make you pay more than you should have to, can often lack accuracy and almost always lack transparency. R has great charting and plotting libraries. There is hardly a better tool when it comes to data wrangling and visualization. The best part is, since R is based on textual code, you can review the commands at any point to find out bugs. Yes, debugging is a simpler affair on R than on most tools.
A blatant favorite
Almost all industries have used R at one point or the other but certain fields seem to be more impressed by its features than others. For instance, the clinical laboratories are falling in love with R all over again after being failed and ditched by the various visualization and analysis tools they have been purchasing in recent times.
Clinical laboratories deal with data that can be the difference between life and death. R is therefore the ideal tool.
The experienced data scientists would not settle for anything other than R for statistical computing. If you have eyes for advanced analytics and data science, you cannot avoid R training.
Some healthy competition
Yes, Python. R and Python are close competitors for the title of the best data science tool. No, being the most popular does not warrant being the best. R lacks the wide range of libraries that Python has. Python lacks the simplicity and superior computing capacity of R.. While Python has a better repository of code for machine learning, R offers better statistical accuracy. While Python is a sleeker language without the braces and the parentheses, R is superb with metaprogramming. So, there you have it, learn both
Curtains
The chief scientist at Rstudio, Hadley Wickham has a clear vision of R’s future. R has now got a sister dialect called Tidyverse. The libraries like ggplot and diplyr are getting better. JavaScript is going to be the key in R based visualization, he thinks. And he hopes more people realize the beauty of coding their way through a presentation.