Welcome to the R Programming Tutorial

R is an open‑source programming language and environment specifically designed for statistical computing, data analysis, and graphics. It is widely used in academia, research, and industry for tasks ranging from classical statistical modeling to modern data science, machine learning, and reproducible research.

This tutorial accompanies a structured course that can support up to 9 ECTS / 72 hours of teaching and guided practice, introducing you step by step to the fundamentals of R programming, data manipulation, exploratory analysis, and machine learning. The material is organized as a sequence of self‑contained R Markdown lessons designed to help you build solid, reproducible workflows for analyzing and interpreting data.

R: a statistical computing environment for data science.

R: a statistical computing environment for data science.

Why Learn R?

  • Focused on statistics and data analysis
    R provides a rich set of tools for statistical modeling, hypothesis testing, time‑series analysis, classification, clustering, and advanced graphics in a single environment.

  • Central in data science and research
    R is a standard in many scientific disciplines and is used by data science teams in technology, healthcare, finance, energy, and other sectors for data‑driven decision making and experimentation.

  • Reproducible and extensible
    With R Markdown and a large ecosystem of packages, R supports fully reproducible analyses, literate programming, and integration with version control systems such as Git and GitHub.

Course Highlights

This tutorial is organized into lessons that gradually increase in complexity and mirror the structure of a full academic course in R for data analysis.

  • Core R fundamentals
    Syntax, objects and data types (vectors, matrices, factors, lists, data frames), indexing and basic operations.

  • Data preparation and programming
    Data import/export, handling missing values, data cleaning and transformation, user‑defined functions, control structures (if/else, loops), and vectorized programming with the apply‑family.

  • Exploratory data analysis and visualization
    Descriptive statistics, correlation, scatter plots, boxplots, bar charts, histograms, and basic inferential procedures (t‑tests, one‑way and two‑way ANOVA) implemented directly in R.

  • Machine learning foundations
    Introduction to supervised learning (linear and generalized linear models, penalized regression, decision trees, random forests, gradient boosting, support vector machines) and unsupervised learning (k‑means, hierarchical and density‑based clustering, model‑based clustering, spectral clustering, and core dimensionality reduction) using R as the main analysis environment.

  • Deep learning track (optional, advanced)
    A dedicated set of lessons introduces deep learning in R using keras3 and TensorFlow, covering deep neural networks for tabular data, autoencoders for representation learning, LSTM/GRU models for sequences (e.g., electricity demand forecasting and text classification), and CNNs for image classification.

Notes on prerequisites (deep learning lessons)

The deep learning lessons are designed to run locally (CPU is sufficient, GPU optional). You should have a working installation of keras3 and tensorflow in your R environment before running the training code.

Get ready to build a principled R workflow, from raw data to models and interpretable results, leveraging a modern statistical computing environment.

 

A work by Gianluca Sottile

gianluca.sottile@unipa.it