-
-
Notifications
You must be signed in to change notification settings - Fork 0
/
technology-adoption.Rmd
155 lines (132 loc) 路 6.19 KB
/
technology-adoption.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
---
title: "Adding continent and country names with {countrycode}, and subsetting a data frame using sample()"
description: |
Data wrangling and exploration to plot electricity production according to energy source and continent using the
#TidyTuesday data set for week 29 of 2022
(19/7/2022): "Technology Adoption"
author:
- name: Ronan Harrington
url: https://github.com/rnnh/
date: 2022-07-21
repository_url: https://github.com/rnnh/TidyTuesday/
preview: technology-adoption_files/figure-html5/fig1-1.png
output:
distill::distill_article:
self_contained: false
toc: true
---
````{r knitr, include=FALSE}
knitr::opts_chunk$set(include = TRUE)
knitr::opts_chunk$set(fig.height = 6)
knitr::opts_chunk$set(fig.width = 9)
```
## Introduction
In this post, the [Technology Adoption](https://github.com/rfordatascience/tidytuesday/blob/master/data/2022/2022-07-19/readme.md) data set is used to illustrate data exploration [R](https://www.r-project.org/) and adding information using the [{countrycode}](https://cran.rstudio.com/web/packages/countrycode/) package.
During data exploration, the `tt$technology` data set is filtered to select for the "Energy" category, and the distinct values for "variable" and "label" are printed.
A subset is then created to test adding full country names and corresponding continents based on 3 letter ISO codes in the data set using the `countrycode()` function.
The full data set is then wrangled into two tibbles for fossil fuel and low-carbon electricity production: the distribution for each energy source is plotted according to the corresponding continent.
The full source for this blog post is [available on GitHub](https://github.com/rnnh/TidyTuesday).
## Setup
Loading the [R](https://www.r-project.org/) libraries and
[data set](https://github.com/rfordatascience/tidytuesday/blob/master/data/2022/2022-07-19/readme.md).
```{r setup}
# Loading libraries
library(tidytuesdayR)
library(countrycode)
library(tidyverse)
library(ggthemes)
# Loading data
tt <- tt_load("2022-07-19")
```
## Exploring tt$technology: selecting distinct values after filtering, and testing adding a "continent" variable
```{r explore}
# Printing a summary of tt$technology
tt$technology
# Printing the distinct "variable" and "label" pairs for the "Energy" category
## This will be used as a reference to create the "energy_type" column/variable
tt$technology %>% filter(category == "Energy") %>% select(variable, label) %>%
distinct()
# Setting a seed to make results reproducible
set.seed("20220719")
# Using sample() to select six rows of tt$technology at random
sample_rows <- sample(x = rownames(tt$technology), size = 6)
# Creating a subset using the random rows
technology_sample <- tt$technology[sample_rows, ]
# Printing a summary of the randomly sampled subset
technology_sample
# Adding continent and country name columns/variables to the sample subset,
# using the countrycode::countrycode() function
technology_sample <- technology_sample %>%
mutate(continent = countrycode(iso3c, origin = "iso3c",
destination = "continent"),
country = countrycode(iso3c, origin = "iso3c", destination = "country.name"))
# Selecting the country ISO code, continent and country name of the sample
# subset, to confirm that countrycode() worked as intended
technology_sample %>% select(iso3c, continent, country)
```
## Wrangling tt$technology into two electricity production tibbles: fossil fuels and low-carbon sources
```{r wrangling}
# Adding the corresponding continent for each country in tt$technology;
# filtering to select for the "Energy" category; adding a more succinct
# "energy_type" variable; and dropping rows with missing values
energy_tbl <- tt$technology %>%
mutate(continent = countrycode(iso3c, origin = "iso3c",
destination = "continent")) %>%
filter(category == "Energy") %>%
mutate(energy_type = fct_recode(variable,
"Consumption" = "elec_cons", "Coal" = "elec_coal", "Gas" = "elec_gas",
"Hydro" = "elec_hydro", "Nuclear" = "elec_nuc", "Oil" = "elec_oil",
"Other renewables" = "elec_renew_other", "Solar" = "elec_solar",
"Wind" = "elec_wind", "Output" = "elecprod",
"Capacity" = "electric_gen_capacity")) %>%
drop_na()
# Printing a summary of energy_tbl
energy_tbl
# Filtering energy_table for fossil fuel rows
fossil_fuel_tbl <- energy_tbl %>%
filter(energy_type != "Consumption" & energy_type != "Output"
& energy_type != "Capacity") %>%
filter(energy_type == "Coal" | energy_type == "Gas" | energy_type == "Oil")
# Printing a summary of the tibble
fossil_fuel_tbl
# Filtering energy_table for low-carbon energy source rows
low_carbon_tbl <- energy_tbl %>%
filter(energy_type != "Consumption" & energy_type != "Output"
& energy_type != "Capacity") %>%
filter(energy_type != "Coal" & energy_type != "Gas" & energy_type != "Oil")
# Printing a summary of the tibble
low_carbon_tbl
```
## Plotting distributions of electricity produced from fossil fuels and low-carbon sources
```{r fig1, fig.cap = "Box plots of electricity produced from fossil fuels, faceted by continent."}
# Plotting distributions of electricity produced from fossil fuels
fossil_fuel_tbl %>%
ggplot(aes(x = fct_reorder(energy_type, value), y = value, fill = energy_type)) +
geom_boxplot() +
theme_solarized() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none") +
scale_colour_discrete() +
scale_y_log10() +
facet_wrap(~continent, scales = "free") +
labs(
title = "Electricity generated from fossil fuels by continent",
y = "Output in log terawatt-hours: log10(TWh)",
x = "Source")
```
```{r fig2, fig.cap = "Box plots of electricity produced from low-carbon energy sources, faceted by continent."}
# Plotting distributions of electricity produced from low-carbon sources
low_carbon_tbl %>%
ggplot(aes(x = fct_reorder(energy_type, value), y = value, fill = energy_type)) +
geom_boxplot() +
theme_solarized() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none") +
scale_colour_discrete() +
scale_y_log10() +
facet_wrap(~continent, scales = "free") +
labs(
title = "Electricity generated from low-carbon sources by continent",
y = "Output in log terawatt-hours: log10(TWh)",
x = "Source")
```