In the past, I lost track of reality trying to track a gazillion links covering every data-science-friendly programming language under the sun. **shakes head** Bad idea. Since I program in R daily, I like to keep track of R and Posit / RStudio developments. I’m mostly going to share R resources that I find useful for analytics, statistical programming, machine learning, data science workflows, and web app development. I’m enjoying Python a lot more recently so I’ll slowly build up this resources page with Python sub-topics that I find bookmark worthy.

In terms of the best place to start for getting into data analysis, I recommend learning SQL as this is by far the most widely used data querying language across the corporate and academic landscapes and if you master SQL, you’ve mastered most of the transformations that are possible for tabular numeric data sets. Nonetheless, I will not cover SQL resources here as I rarely write raw SQL anymore. Instead, I use R to establish data warehouse connections and I query that raw data using the common tidyverse collection of R packages to execute SQL code in the back-end (via the dbplyr package).

R and Python are open-source programming languages for statistical computing and graphics. These two languages have friendly online (and in-person) communities devoted to making data science easier to consume, easier to apply, and more effective at solving business problems. One of the things that I like most about both languages is the thousands of packages available making almost everything in R or Python a little easier from ETL, to method chaining, to developing predictive models and interactive web apps. I certainly welcome any suggestions that you might have for the lists below!

R Books: Classics

  • R for Data Science: Phenomenal introduction to R, the RStudio IDE, and the tidyverse collection of packages
  • Advanced R: Covers concepts, methods, and advanced object-oriented structures for R
  • Mastering Shiny: Designed to teach the foundations of Shiny for web development and more advanced concepts such as the introduction of modules to the Shiny framework
  • R Packages: The definitive reference point for R package development “covering workflow and process, alongside the presentation of all the important moving parts that make up an R package”

R Books: Applied Resources

  • Tidy Modeling with R: Over the last few months, I’ve learned a lot from this A to Z resource on predictive modeling workflows using the tidymodels framework
  • Deep Learning with R, Second Edition: In-depth introduction to artificial intelligence and deep learning applications with R using the Keras library
  • Forecasting Principles and Practice, Third Edition: Said best by the author, “The book is written for three audiences: (1) people finding themselves doing forecasting in business when they may not have had any formal training in the area; (2) undergraduate students studying business; (3) MBA students doing a forecasting elective.”
  • Regression and Other Studies: Super applied textbook on advanced regression techniques, Bayesian inference, and causal inference
  • Supervised Machine Learning for Text Analysis in R: Written by two Posit software engineers and incredible additions to their tidymodels team, Emil Hvitfeldt and Julia Silge, this book is a masterclass in natural language processing taking you from the basics of NLP to real-life applications including inference and prediction

R Packages

  • tidyverse: A collection of packages for data manipulation and functional programming (I use dplyr, stringr, and purrr on a daily basis)
  • tidymodels: Hands-down my preferred collection of packages for building reproducible machine learning recipes, workflows, model tuning, model stacking, and cross-validation
  • tidyverts: A collection of packages for time series analysis that comes out of Rob Hyndman’s lab
  • DT: This is an R implementation of the popular DataTables JavaScript library that lets you build polished, configurable tables for use in web reports, slides, and Shiny apps
  • bs4Dash: This R Shiny framework brings Bootstrap + AdminLTE dependencies to Shiny (including 1:1 support for shinydashboard functions) and it’s my go-to for developing enterprise-grade Shiny apps
  • leaflet: R implementation of the popular Leaflet JavaScript library for developing interactive maps
  • plotly: An extensive graphic library for creating interactive visualizations and 3D (WebGL) charts

Python Books

Python Packages

  • NumPy: Brings the computational power of C and Fortran to Python programmers for applying high-level mathematical functions to arrays and more
  • Pandas: This is the most popular package for data manipulation and analysis with extended operations available for tabular and time series data
  • Matplotlib: A comprehensive library for creating static, animated, and interactive visualizations in Python
  • scikit-learn: Built on top of NumPy, SciPy, and matplotlib, “sklearn” makes the development of predictive analysis workflows a simple and reproducible process
  • Beautiful Soup: The beautifulsoup4 library makes web scraping HTML and XML data a breeze
  • Streamlit: Using pure Python, this package lets you build interactive web apps in minutes with no UI / front-end experience required