A data frame is a table-like object where:
A matrix can only store one type (all numeric, all character, etc.), while a data frame can mix types.
You can create a data frame with data.frame() or
(recommended for modern workflows) tibble::tibble().
Note: In R ≥ 4.0 the default is effectively
stringsAsFactors = FALSE, but it can be set explicitly for
compatibility or teaching.
df <- data.frame(
ID = c(10, 20, 30, 40),
item = c("book", "pen", "textbook", "pencil_case"),
store = c(TRUE, FALSE, TRUE, FALSE),
price = c(2.5, 8, 10, 7)
)
df| ID | item | store | price |
|---|---|---|---|
| 10 | book | TRUE | 2.5 |
| 20 | pen | FALSE | 8.0 |
| 30 | textbook | TRUE | 10.0 |
| 40 | pencil_case | FALSE | 7.0 |
## 'data.frame': 4 obs. of 4 variables:
## $ ID : num 10 20 30 40
## $ item : chr "book" "pen" "textbook" "pencil_case"
## $ store: logi TRUE FALSE TRUE FALSE
## $ price: num 2.5 8 10 7
library(tibble)
df_tbl <- tibble(
ID = c(10, 20, 30, 40),
item = c("book", "pen", "textbook", "pencil_case"),
store = c(TRUE, FALSE, TRUE, FALSE),
price = c(2.5, 8, 10, 7)
)
df_tbl| ID | item | store | price |
|---|---|---|---|
| 10 | book | TRUE | 2.5 |
| 20 | pen | FALSE | 8.0 |
| 30 | textbook | TRUE | 10.0 |
| 40 | pencil_case | FALSE | 7.0 |
## tibble [4 × 4] (S3: tbl_df/tbl/data.frame)
## $ ID : num [1:4] 10 20 30 40
## $ item : chr [1:4] "book" "pen" "textbook" "pencil_case"
## $ store: logi [1:4] TRUE FALSE TRUE FALSE
## $ price: num [1:4] 2.5 8 10 7
Indexing uses df[rows, cols]:
rows blank means “all rows”cols blank means “all columns”| ID | item | store | price |
|---|---|---|---|
| 10 | book | TRUE | 2.5 |
| 20 | pen | FALSE | 8.0 |
| 30 | textbook | TRUE | 10.0 |
| 40 | pencil_case | FALSE | 7.0 |
| ID | item | store | price |
|---|---|---|---|
| 10 | book | TRUE | 2.5 |
| 20 | pen | FALSE | 8.0 |
## [1] 10 20 30 40
Select columns by name:
| ID | store |
|---|---|
| 10 | TRUE |
| 20 | FALSE |
| 30 | TRUE |
| 40 | FALSE |
Tip: extracting a single column can be done three ways:
## [1] 10 20 30 40
## [1] 10 20 30 40
## [1] 10 20 30 40
A new column must have the same number of rows as the data frame.
| ID | item | store | price | quantity |
|---|---|---|---|---|
| 10 | book | TRUE | 2.5 | 10 |
| 20 | pen | FALSE | 8.0 | 35 |
| 30 | textbook | TRUE | 10.0 | 40 |
| 40 | pencil_case | FALSE | 7.0 | 5 |
If lengths don’t match, R errors:
## Error in `$<-.data.frame`:
## ! replacement has 3 rows, data has 4
Modern alternative (nice in pipelines):
| ID | item | store | price | quantity |
|---|---|---|---|---|
| 10 | book | TRUE | 2.5 | 10 |
| 20 | pen | FALSE | 8.0 | 35 |
| 30 | textbook | TRUE | 10.0 | 40 |
| 40 | pencil_case | FALSE | 7.0 | 5 |
Use a logical condition inside the row index. This style is explicit and robust.
| ID | item | store | price | quantity | |
|---|---|---|---|---|---|
| 2 | 20 | pen | FALSE | 8 | 35 |
| 3 | 30 | textbook | TRUE | 10 | 40 |
| 4 | 40 | pencil_case | FALSE | 7 | 5 |
subset() can be convenient for quick exploration, but
many workflows prefer bracket indexing for explicitness.
| ID | item | store | price | quantity | |
|---|---|---|---|---|---|
| 2 | 20 | pen | FALSE | 8 | 35 |
| 3 | 30 | textbook | TRUE | 10 | 40 |
| 4 | 40 | pencil_case | FALSE | 7 | 5 |
A work by Gianluca Sottile
gianluca.sottile@unipa.it