R, an open-source statistical and data mining programming language, is slowly but surely catching up in its race with commercial software like SAS & SPSS. I believe R will eventually replace SAS as the language of choice for modeling and analysis for most organizations. The primary reason for this is plainly commercial. Most organizations are questioning the heavy annual cost of SAS on their P&L statement. This is escalated with the presence of R as a free and viable replacement. R is highly advanced language with over 5000 add-on packages to assist in data management and analysis. Most senior analysts and analytics leaders have already started polishing their skills on R. In this article, I will introduce the books and online resource that will help you in self-learning R and its applications. Before introducing these resources, let me elucidate why you need many resources for self-learning.
Non-Linear Self-Learning
Humans are obsessed with linearity. Look at our houses, furniture, televisions, photo-frames or cabinets, they all follow linear designs. The reason is linearity is simple, however it is certainly not natural. Outside our houses nature is flourishing with non-linearity – trees, mountains, rivers and the human body all follow non-linear patterns and dynamics (to explore more read about fractal geometry and chaos theory, or we will discuss it in some later articles on YOU CANalytics). Learning / teaching in schools and universities usually take the linear path, however self-learning in my opinion is highly non-linear. Unlike school-learning, self-learning is driven by purpose and need, hence one tends to hop between books, chapters, and the internet – I say this from experience. Let me present the resources that have helped me the most while learning R. I have divided these resources in the following 5 categories
- R for Reference : these books cover most essential aspects about R and also serve well as reference books
- R with Theory : these books are great if you want to understand fundamentals of statistics and machine learning while using R as the tool
- R with Applications : these books use case studies or applications based learning
- R Graphic and Programming : focus of these books is on R Graphics or programming
- Online Resource : short online courses and computer based learning tools (I have also included the most important online data repository over here)
1. R for Reference
R for Everyone: Advanced Analytics and Graphics – Jared P. Lander
YOU CANalytics Book Rating
(5 / 5)
Jared Lander, in his book, wastes no time on basic graphic (comes pre-installed with R), but jumps directly to ggplot2 package (a much advanced and sleek graphical package). This sets the tone for this book i.e. don’t learn things you won’t use in real life applications later. I will highly recommend this book for a fast paced and relevant learning of R.
The R Book – Michael J. Crawley
YOU CANalytics Book Rating
(4.8 / 5)
With close to a thousand pages and vast coverage, ‘The R Book’ could be called the Bible for R. This book starts with simple concepts in R and gradually move to highly advanced topics. The breadth of the book can be estimated through the presence of dedicated chapters on topics as diverse as data-frames, graphics, Bayesian statistics, and survival analysis. Essentially this is a must have reference book for any wannabe R programmer. But for a beginner the thickness of the book could be intimidating.
2. R with Theory
An Introduction to Statistical Learning: with Applications in R – Gareth James et al.
YOU CANalytics Book Rating
(5 / 5)
This book is a high quality statistical text with R as the software of choice. If you want to be comfortable with fundamental concepts in parallel with learning R, then this is the book for you. Having said this, you will love this book even if you have studied advanced statistics. The book also covers some advanced machine learning concepts such as support machine learning (SVM) and regularization. A great book by all means.
Machine Learning with R – Brett Lantz
YOU CANalytics Book Rating
(4.5 / 5)
If you want to learn R from the machine learning perspective, then this is the book for you. Some people take a lot of interest in fine demarcation between statistics and machine learning; however for me there is too much overlap between the topics. I have given up on the distinction as it makes no difference from the applications perspective. The book introduces R-Weka package – Weka is another open source software used extensively in academic research.
3. R with Applications
R and Data Mining: Examples and Case Studies – Yanchang Zhao
YOU CANalytics Book Rating
(4.3 / 5)
There are other books that use case studies approach to teach R. I like this book because of the interesting topics this book covers including text mining, social network analysis and time series modeling. Having said this, the author could have put in some effort on formatting of this book which is pure ugly. At times you will feel you are reading a masters level project report while skimming through the book. However, once you get over this aspect the content is really good.
Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery (Use R!) – Graham Williams
YOU CANalytics Book Rating
(4.2 / 5)
Rattle is no SAS E-miner or SPSS modeler (both commercial GUI based data mining tools). However trust me, apart from a few minor issues Rattle is not at all bad. The book is a great reference to Rattle (a GUI add on package for R to mine data) for data mining. I really hope they keep working on Rattle to make it better as it has a lot of potential.
4. R Graphics and Programming
ggplot2: Elegant Graphics for Data Analysis (Use R!) – Hadley Wickham
YOU CANalytics Book Rating
(4 / 5)
‘ggplot 2′ is an exceptional package to create wonderful graphics on R. It is much better than the base graphics that comes pre-installed with R, so I would recommend you start directly with ggplot 2 without wasting your time on base graphics. ‘R for everyone’, the first book we discussed, has a good introduction to ggplot. However, if you want to get to further depths of ggplot-2 then this is the book for you.
Though I prefer ggplot 2, Lattice is another package at par with ggplot 2. A good book to start with Lattice is ‘Lattice: Multivariate Data Visualization with R (Use R!) by
The Art of R Programming : A Tour of Statistical Software Design –Norman Matloff
YOU CANalytics Book Rating
(4.2 / 5)
If you want to learn the programming and coding aspect of R more than the analysis aspect, then this is the book for you. The author of this book has extensive experience in R coding and that is evident when you read this book. I must warn you that at times while reading this book one wonders about the utility of some of the things Mr. Matloff talks about. Nevertheless, this is the best book in the market to learn R programming. The author also touches on the issues of parallel computing in R – a topic highly relevant in the day and age of big data.
5. Online Resource
YOU CANalytics Resource Rating
(4.9 / 5)
This is a wonderful place to learn R programming. Before jumping to the books, I recommend you take this free online course. You don’t need to install R on your system to complete this course. It will take you less than an hour to complete this course but will prepare you well for further learning. (Link)
Coursera : R Programming – Roger D. Peng
YOU CANalytics Resource Rating
(3.5 / 5)
I had really high expectations from this course on coursera.com. Expectations were high since Dr. Andrew Ng is associated with this site and his course on machine learning is delightful. However, the course by Dr. Roger D. Peng fell short of my expectations by some margin. The instructor is a good communicator, an expert in R and the topics of this course are highly relevant for learning R. The biggest problem for me with this course is its tone which is highly didactic. If Dr. Peng could slightly redesign this course around applications and examples it will become a fantastic course. (Link)
Lynda.com : R Statistics Essential Training –
YOU CANalytics Resource Rating
(4.5 / 5)
This course is not as comprehensive as the above course on coursera. However, the tone of the course is much more applied and learner-friendly. (Link)
UCI Machine Learning Repository
YOU CANalytics Resource Rating
(5 / 5)
UCI machine learning repository has tons of freely available datasets. This site is not associated with R. However, ‘datasets’ package in R has many of the datasets taken from this site. The reason you may still want to go this site is because they have provided links to research papers that have used these datasets. (Link).
A few more great online resources to learn R 1) Open Intro (Link): This site has some really good tutorials for doing basic statistics on R. 2) R-tutor (Link): This is a good site to start learning R from scratch. 3) Data-Camp (Link): Try this site for some interactive courses on R. 4) R-bloggers (Link): A great culminations of blogs for R, may not be the place you want to visit first up. 5) Kaggle (Link): this link has 3 good tutorials to learn R.
Sign-off Note
Let me create a loose parallel between Excel and R to offer you an advice about learning R. As I have mentioned earlier, R has more than 5000 add-on packages on CRAN library and millions of functions for data analysis. This may sound a bit daunting to a new learner. Luckily the online resource is quite powerful hence number of functions won’t be a challenge. Moreover, if you have worked on Excel, you will know that there are just a handful of functions that you use repeatedly based on you style of analysis. This same pattern will emerge with R as well. Hence, don’t get intimidated with the number of functions.
Additionally, for most books mentioned above you will find evaluation copies online. This will help you choose books to suit your learning style.
Enjoy learning R! It is good fun.