CSV (comma-separated values) is one of the most common data formats.
Notes:
stringsAsFactors = FALSE; keeping it can be useful for
portability.\\) or use
forward slashes (/).PATH <- "raw_data/mtcars.csv"
# Keep stringsAsFactors explicitly for compatibility across R versions
df <- read.csv(PATH, header = TRUE, sep = ",", stringsAsFactors = FALSE)
ncol(df)## [1] 12
## 'data.frame': 32 obs. of 12 variables:
## $ X : chr "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : int 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : int 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : int 0 0 1 1 0 1 0 1 1 1 ...
## $ am : int 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: int 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: int 4 4 1 1 2 1 4 2 2 4 ...
The readxl package reads both .xls and
.xlsx files.
readxl_example() lists example spreadsheets shipped with
the package.
## [1] "clippy.xls" "clippy.xlsx" "datasets.xls" "datasets.xlsx"
## [5] "deaths.xls" "deaths.xlsx" "geometry.xls" "geometry.xlsx"
## [9] "type-me.xls" "type-me.xlsx"
## [1] "/home/runner/work/_temp/Library/readxl/extdata/datasets.xlsx"
Examples (sheet by name or index):
example <- readxl_example("datasets.xlsx")
df1 <- read_excel(example) # default first sheet
sheets <- excel_sheets(example) # list sheets
sheets## [1] "mtcars" "chickwts" "quakes"
quakes_a <- read_excel(example, sheet = "quakes")
quakes_b <- read_excel(example, sheet = 3)
identical(quakes_a, quakes_b)## [1] TRUE
| mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb |
|---|---|---|---|---|---|---|---|---|---|---|
| 21.0 | 6 | 160 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
| 21.0 | 6 | 160 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
| 22.8 | 4 | 108 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
| 21.4 | 6 | 258 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
| 18.7 | 8 | 360 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
## [1] 4 2
cell_rows() and cell_cols() give fine
control over the imported rectangle.
## [1] 4 11
## [1] 32 2
The haven package imports (and exports) common file
formats from other statistical software, including SAS, Stata, and
SPSS.
| ADMIT | GRE | GPA | RANK |
|---|---|---|---|
| 0 | 380 | 3.61 | 3 |
| 1 | 660 | 3.67 | 3 |
| 1 | 800 | 4.00 | 1 |
| 1 | 640 | 3.19 | 4 |
| 0 | 520 | 2.93 | 4 |
| 1 | 760 | 3.00 | 2 |
| admit | gre | gpa | rank |
|---|---|---|---|
| 0 | 380 | 3.61 | 3 |
| 1 | 660 | 3.67 | 3 |
| 1 | 800 | 4.00 | 1 |
| 1 | 640 | 3.19 | 4 |
| 0 | 520 | 2.93 | 4 |
| 1 | 760 | 3.00 | 2 |
snake_case.NA) and declare it via na = ... when
importing.| Library | Objective | Function | Key_args |
|---|---|---|---|
| utils (base R) | Read CSV | read.csv() | file, header=TRUE, sep=‘,’ |
| readxl | Read Excel | read_excel() | path, sheet, range, col_names, n_max, na |
| haven | Read SAS | read_sas() | data_file |
| haven | Read Stata | read_dta() | file |
| haven | Read SPSS | read_sav() | file |
A work by Gianluca Sottile
gianluca.sottile@unipa.it