ggplot2 - Time Series



A time series is a graphical plot which represents the series of data points in a specific time order. A time series is a sequence taken with a sequence at a successive equal spaced points of time. Time series can be considered as discrete-time data. The dataset which we will use in this chapter is “economics” dataset which includes all the details of US economic time series.

The dataframe includes following attributes which is mentioned below −

Date Month of data collection
Psavert Personal savings rate
Pce Personal consumption expenditure
Unemploy Number of unemployed in thousands
Unempmed Median duration of unemployment
Pop Total population in thousands

Load the required packages and set the default theme to create a time series.

> library(ggplot2)
> theme_set(theme_minimal())
> # Demo dataset
> head(economics)
# A tibble: 6 x 6
date           pce            pop          psavert       uempmed       unemploy
<date>         <dbl>         <dbl>         <dbl>         <dbl>         <dbl>
1 1967-07-01    507.          198712        12.6          4.5           2944
2 1967-08-01    510.          198911        12.6          4.7           2945
3 1967-09-01    516.          199113        11.9          4.6           2958
4 1967-10-01    512.          199311        12.9          4.9           3143
5 1967-11-01    517.          199498        12.8          4.7           3066
6 1967-12-01    525.          199657        11.8          4.8           3018

Create a basic line plots which creates a time series structure.

> # Basic line plot
> ggplot(data = economics, aes(x = date, y = pop))+
+ geom_line(color = "#00AFBB", size = 2)
Time Series Structure

We can plot the subset of data using following command −

> # Plot a subset of the data
> ss <- subset(economics, date > as.Date("2006-1-1"))
> ggplot(data = ss, aes(x = date, y = pop)) +
+ geom_line(color = "#FC4E07", size = 2)
Subset Of Data

Creating Time Series

Here we will plot the variables psavert and uempmed by dates. Here we must reshape the data using the tidyr package. This can be achieved by collapsing psavert and uempmed values in the same column (new column). R function: gather()[tidyr]. The next step involves creating a grouping variable that with levels = psavert and uempmed.

> library(tidyr)
> library(dplyr)
Attaching package: ‘dplyr’
The following object is masked from ‘package:ggplot2’: vars
The following objects are masked from ‘package:stats’: filter, lag
The following objects are masked from ‘package:base’: intersect, setdiff, setequal, union
> df <- economics %>%
+    select(date, psavert, uempmed) %>%
+    gather(key = "variable", value = "value", -date)
> head(df, 3)
# A tibble: 3 x 3
date          variable     value
<date> <chr>   <dbl>
1 1967-07-01   psavert       12.6
2 1967-08-01   psavert       12.6
3 1967-09-01   psavert       11.9

Create a multiple line plots using following command to have a look on the relationship between “psavert” and “unempmed” −

> ggplot(df, aes(x = date, y = value)) +
+    geom_line(aes(color = variable), size = 1) +
+    scale_color_manual(values = c("#00AFBB", "#E7B800")) +
+    theme_minimal()
Multiple Line Plots
Advertisements