Intro to Tidymodels
About Tidymodels
The tidymodels framework for R is a collection of packages that brings tidy principles and a unified syntax to machine learning (“ML”) for R programmers, enabling end-to-end reproducibility for your ML workflows. I’ve been using this framework for five years and it continues to improve. Posit PBC funds a software engineering team dedicated to the development of this framework so its packages are feature-rich, regularly maintained, and current with ML trends. For Python users unfamiliar with R tools, the tidymodels
framework is very similar to Python’s scikit-learn
.
The core tidymodels
packages include the following:
rsample: provides infrastructure for efficient data splitting and resampling
parsnip: a tidy, unified interface to models that can be used to try a range of models without getting bogged down in the syntactical minutiae of the underlying packages
recipes: a tidy interface to data pre-processing tools for feature engineering
workflows: expands the traditional model-only recipe to a much more holistic blueprint for pre-processing, modeling, post-processing, and evaluation
dials: creates and manages tuning parameters and parameter grids
tune: helps you optimize the hyperparameters of your model and pre-processing steps
yardstick: measures the effectiveness of models using performance metrics
broom: converts the information in common statistical R objects into user-friendly, predictable formats
I was thrilled to present about tidymodels
last week to the Department of Mathematics, Statistics and Data Science at Loyola Marymount University. Their students and faculty were engaging and I had a great time covering a logistic regression problem with this framework.
Other Tools Explored
- Positron: A fresh, open-source coding environment purpose-built for data analysis and modeling, including all the best bells and whistles from VS Code and RStudio.
Embedded Presentation
- Fullscreen web slides: Intro to Tidymodels
- GitHub repo: JavOrraca/intro-to-tidymodels