-
Notifications
You must be signed in to change notification settings - Fork 0
/
data_save.Rmd
65 lines (47 loc) · 2.02 KB
/
data_save.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
---
title: "Subsetting and Saving Data"
author: "Aleya Khalifa"
date: "`r Sys.Date()`"
output: html_document
---
```{r, include = FALSE}
library(tidyverse)
```
## Import data
Note: `met_10` is a random sample of 10% of The Met's original dataset _MetObjects_, which comes in a text file. Here, we clean up the dataset a bit before using it in analysis. The cleaned dataset is named `met`.
```{r}
load("data/met_10.RData")
met <- met_10 %>%
janitor::clean_names()
```
### Check missingness
The variable `accession_year` has high completeness. And `culture` has moderate completeness. Here is a table of the % rows with missing values by selected column:
```{r}
sapply(met, function(x) sum(is.na(x))/nrow(met))
```
### Clean up object types to create more general classes
Note that the `object_name` variable can be more detailed than is necessary. Here, I try to create more general categories of objects.
```{r}
met <- met %>%
mutate(object_name = ifelse(
grepl("Textile", object_name), "Textile",
ifelse(grepl("Painting", object_name), "Painting",
ifelse(grepl("Relief", object_name), "Relief",
ifelse(grepl("Print", object_name), "Print",
ifelse(grepl("aseball card", object_name), "Baseball card",
ifelse(grepl("Vase", object_name), "Vase",
ifelse(grepl("rnament", object_name), "Vase",
ifelse(grepl("arring", object_name), "Earring",
ifelse(grepl("ecklace", object_name), "Necklace",
ifelse(grepl("hotograph", object_name), "Photograph",
ifelse(grepl("tatue", object_name), "Statue",
object_name))))))))))))
```
## Save R data file
Chosen subset: Objects with complete data for `object_name` and `accession_year` - two crucial variables for our analysis.
```{r}
met <- met %>%
filter(!is.na(object_name) & !is.na(accession_year))
save(met,
file = "data/met.RData")
```