R is an open‑source programming language and environment specifically designed for statistical computing, data analysis, and graphics. It is widely used in academia, research, and industry for tasks ranging from classical statistical modeling to modern data science, machine learning, and reproducible research.
This tutorial accompanies a structured course that can support up to 9 ECTS / 72 hours of teaching and guided practice, introducing you step by step to the fundamentals of R programming, data manipulation, exploratory analysis, and machine learning. The material is organized as a sequence of self‑contained R Markdown lessons designed to help you build solid, reproducible workflows for analyzing and interpreting data.
R: a statistical computing environment for data science.
Focused on statistics and data analysis
R provides a rich set of tools for statistical modeling, hypothesis
testing, time‑series analysis, classification, clustering, and advanced
graphics in a single environment.
Central in data science and research
R is a standard in many scientific disciplines and is used by data
science teams in technology, healthcare, finance, energy, and other
sectors for data‑driven decision making and experimentation.
Reproducible and extensible
With R Markdown and a large ecosystem of packages, R supports fully
reproducible analyses, literate programming, and integration with
version control systems such as Git and GitHub.
This tutorial is organized into lessons that gradually increase in complexity and mirror the structure of a full academic course in R for data analysis.
Core R fundamentals
Syntax, objects and data types (vectors, matrices, factors, lists, data
frames), indexing and basic operations.
Data preparation and programming
Data import/export, handling missing values, data cleaning and
transformation, user‑defined functions, control structures (if/else,
loops), and vectorized programming with the apply‑family.
Exploratory data analysis and
visualization
Descriptive statistics, correlation, scatter plots, boxplots, bar
charts, histograms, and basic inferential procedures (t‑tests, one‑way
and two‑way ANOVA) implemented directly in R.
Machine learning foundations
Introduction to supervised learning (linear and generalized linear
models, penalized regression, decision trees, random forests, gradient
boosting, support vector machines) and unsupervised learning (k‑means,
hierarchical and density‑based clustering, model‑based clustering,
spectral clustering, and core dimensionality reduction) using R as the
main analysis environment.
Deep learning track (optional, advanced)
A dedicated set of lessons introduces deep learning in R using
keras3 and TensorFlow, covering deep
neural networks for tabular data, autoencoders for representation
learning, LSTM/GRU models for sequences (e.g., electricity demand
forecasting and text classification), and CNNs for image
classification.
The deep learning lessons are designed to run
locally (CPU is sufficient, GPU optional). You should
have a working installation of keras3 and
tensorflow in your R environment before running the
training code.
Get ready to build a principled R workflow, from raw data to models and interpretable results, leveraging a modern statistical computing environment.
A work by Gianluca Sottile
gianluca.sottile@unipa.it