Some tips and tricks for working with date and time data #33
ernestguevarra
started this conversation in
Hackathon 2024
Replies: 2 comments
-
Using dependencies to deal with date data typesThis is where package dependencies, in my opinion, give a lot more utility for dealing with date data types. Here is example code to deal with datasets with date data types using package dependencies: library(dplyr) ## for data wrangling
library(tidyr) ## for data wrangling adjunct to dplyr
library(lubridate) ## for workign with date data type
library(ggplot2) ## for plotting
malaria <- read.table("https://raw.githubusercontent.com/OxfordIHTM/teaching_datasets/main/malaria.dat", header = TRUE)
malaria <- malaria %>%
dplyr::mutate(Time = my(Time)) %>%
tidyr::pivot_longer(Cases:Rain, names_to = "variable", values_to = "n")
malaria %>%
dplyr::filter(variable == "Cases") %>%
ggplot(mapping = aes(x = Time, y = n, group = variable)) +
geom_line() +
scale_x_date(
breaks = seq(from = min(malaria$Time), to = max(malaria$Time), by = "2 month"),
labels = paste(
seq(from = min(malaria$Time), to = max(malaria$Time), by = "2 month") %>%
lubridate::month(label = TRUE) %>%
as.character(),
seq(from = min(malaria$Time), to = max(malaria$Time), by = "2 month") %>%
lubridate::year() %>%
as.character()
)
) +
scale_y_continuous(
breaks = seq(
from = 0, to = max(malaria$n[malaria$variable == "Cases"]), by = 100
)
) +
labs(
title = "Malaria cases over time",
subtitle = "July 1997 to July 1999",
x = NULL, y = "n"
) +
theme_bw() +
theme(axis.text.x.bottom = element_text(angle = 90, vjust = 0.5, hjust = 1)) This gives the following plot: |
Beta Was this translation helpful? Give feedback.
0 replies
-
Thanks Ernest. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Using an example real world data on malaria containing data on rainfall (in mm) and the number of cases of malaria reported from health centres in an administrative district of Ethiopia between July 1997 and July 1999.
The dataset is available from the Oxford IHTM teaching datasets repository and can be read into R as follows:
The dataset looks like this (all records):
For this guide, we will focus on how we work with the
date
information found in theTime
variable in the dataset.In this dataset, the date information is recorded in
character
class. You can check this by using theclass()
function:As a general principle, we would like to keep date data in
date
class because this is the format in which R recognises how to handle this data appropriately. There are many implications of this but in this guide, we will show first what the implication of not having date information indate
class when it comes to plotting.The malaria dataset is a time series dataset and the most basic analysis we can perform on this dataset is to create a time series plot to show trend of cases and/or trend of rainfall over time (per month). If we create this plot using base R plotting functions without processing the
Time
variable (keeping it as is), we get the following:we get the following error:
This error indicates that R doesn't know how to plot/deal with the
Time
variable as it is hence it is complaining about the x values.So, we need to sort of process that
Time
variable so that we can use it for plotting. The characteristics of what theTime
variable should be is that it should be able to be recognised by R as data point that has a chronological order (i.e., months go from January to December and then years go from lowest year to highest year, and if the data has days, then lowest day to highest day). The data type that can be used that would give theTime
variable these characteristics is theDate
class. To read more about this data type, issue?as.Date
on your R console to read the help file.So, we will now transform the
Time
variable into a date rather than a character class as follows:and checking the data type of
Time
variable, we get:now, this should address the issue with the plotting. So, we try plotting again and we get:
This plot looks a lot more like what we expect.
Beta Was this translation helpful? Give feedback.
All reactions