Manipulating Time Series Data in R with xts & zoo


The xts and zoo are two R packages that provide tools and functions for manipulating time series data. Both packages offer functions for reading, writing, and manipulating time series data stored in various formats, such as CSV, Excel, and other data sources. We shall start by introducing xts and zoo classes, basic manipulations, merging and modifying time series, and by the end, we will be discussing applying and aggregating by time.

XTS and Zoo class

Syntax

In R, xts extends the zoo class. An xts object is similar to a matrix of observations that are indexed by a time object. We can create an xts object by using the below syntax,

xts(myData, order.by)

Here myData represent the data and order.by represents a vector of data/time type (for indexing the data)

Note that one may also include metadata to a xts object by creating name-value pairs like birthdayDate = as.POSIXct(“2000-09-07”).

Example

Consider an example below that creates an xts object using a vector of data that is indexed by a vector of dates. As you can see below, we have used name-value pair as well to add metadata −

# Importing xts library library(xts) # Creating a data myData <- xts(x = rnorm(n = 10), order.by = seq(as.Date("2022-01-01"), length = 10, by = "days"), born = as.POSIXct("2000-09-07")) # Print the data print(myData)

Output

                 [,1]
2022-01-01 -0.7307375
2022-01-02  0.2299910
2022-01-03 -0.6965284
2022-01-04 -0.2002072
2022-01-05  1.1121364
2022-01-06 -0.6601843
2022-01-07  0.2926226
2022-01-08  1.2273859
2022-01-09 -0.5464344
2022-01-10  0.1108407

As you can see in the output, we have created an XTS object that is index by a date vector containing dates from 01-01-2022 to 10-01-2022.

The Zoo class provides us coredata() and index() functions using which we can separate core data and index attributes for analysis and manipulation purposes.

Example

Consider the following program −

# Importing xts library library(xts) # Creating a data myData <- xts(x = rnorm(n = 10), order.by = seq(as.Date("2022-01-01"), length = 10, by = "days"), born = as.POSIXct("2000-09-07")) # Using coredata() function print(coredata(myData))

Output

            [,1]
 [1,] -2.4213283
 [2,] -0.8433878
 [3,]  2.0066340
 [4,]  0.2640308
 [5,] -0.3049552
 [6,]  0.2998816
 [7,] -0.4239970
 [8,]  0.5577881
 [9,] -0.5870677
[10,]  0.6856740

Using index() function to get actual dates

print(index(myData))

Output

[1] "2022-01-01" "2022-01-02" "2022-01-03" "2022-01-04"
 [5] "2022-01-05" "2022-01-06" "2022-01-07" "2022-01-08"
 [9] "2022-01-09" "2022-01-10"

As you can see in the output, firstly core data is printed then we have print actual dates on the console.

Basic Time Series Data Manipulations

In this section, we will discuss basic operations like extraction on the basis of the index and the role of forward slash between two time intervals.

Extraction on the basis of index

The xts class object allows us to extract value based on the index. Let us consider an example below −

Example

# Importing library library(xts) # Create myData myData <- rnorm(n = 500) dates <- seq(as.Date("2022-01-01"), length = 500, by = "days") # Creating myObject myObject <- xts(x = myData, order.by = dates) # Print the number of rows at index "2022" nrow(myObject["2022"])

Output

[1] 365

Using forward slash(/) for creating an interval

We can use forward slash between two-time intervals in an XTS object to get the duration between the specified intervals. For example, consider a program given below −

Example

# Importing library library(xts) # Create myData myData <- rnorm(n = 500) dates <- seq(as.Date("2022-01-01"), length = 500, by = "days") # Creating myObject myObject <- xts(x = dates, order.by = dates) # Print the duration between the two time intervals nrow(myObject["2022-01-01/2022-03-01"])

Output

[1] 60

As you can see in the output, the duration of data between 01-01-2022 and 01-03-2022 gets printed on the console.

We can also use forward slash between two-time intervals −

# Importing library library(xts) # Create myData myData <- rnorm(n = 500) # 20 days of data by minute times <- rnorm(n = 60*24*20) dateTimes = as.POSIXct("2022-11-01") + (1:(60*24*20))*60 # Creating myObject myObject <- xts(x = times, order.by = dateTimes) # Print time intervals between, # 2022-11-01 4AM and 2022-11-01 6AM head(myObject["2022-11-01T04:00/2022-11-01T06:00"])

Output

                          [,1]
2022-11-01 04:00:00 -0.4277830
2022-11-01 04:01:00  0.6544654
2022-11-01 04:02:00  0.4196311
2022-11-01 04:03:00 -0.1766988
2022-11-01 04:04:00 -1.8570621
2022-11-01 04:05:00  0.3229214

As you can see in the output, the time intervals between the two specified times gets printed on the console.

Merge and Modify Time Series Data

Syntax

The xts package provides us merge() function using which we can join an object of xts to another object on the index or a vector containing dates to an xts object. This function has the following syntax −

merge(object1, object2, ..., objectN, join = typeOfJoin, fill = integerValue)

The first argument is equal to the objects that you want to merge. The second argument is the type of join to be performed. The third argument is fill that specifies what to do with NA values and it is an optional argument.

Performing an Inner Join

Now let us consider a program that performs an inner join between two objects of xts class containing date elements in it −

Example

# Importing library library(xts) # Creating an object of xts class myObject1 <- xts(x = rnorm(n = 4), order.by = as.Date(c("2022-11-01", "2022-11-04", "2022-11-10", "2022-11-23"))) # Creating another object of xts class myObject2 <- xts(x = rnorm(n = 4), order.by = as.Date(c("2022-11-04", "2022-11-10", "2022-11-15", "2022-11-21"))) # Performing inner join merge(myObject1, myObject2, join = "inner")

Output

           myObject1 myObject2
2022-11-04 0.9151754  1.332591
2022-11-10 0.4244563 -1.494515

Similarly, we can perform left outer join and right outer join.

Performing Outer and Right Outer Join

Now we will see a program demonstrating the full outer join of two objects of xts class −

Example

# Importing library library(xts) # Creating an object of xts class myObject1 <- xts(x = rnorm(n = 4), order.by = as.Date(c("2022-11-01", "2022-11-04", "2022-11-10", "2022-11-23"))) # Creating another object of xts class myObject2 <- xts(x = rnorm(n = 4), order.by = as.Date(c("2022-11-04", "2022-11-10", "2022-11-15", "2022-11-21"))) # Performing inner join merge(myObject1, myObject2, join = "outer")

Output

            myObject1   myObject2
2022-11-01 -0.1080882          NA
2022-11-04  0.6906676 -0.75314257
2022-11-10 -0.3375777  1.29528001
2022-11-15         NA  0.09088094
2022-11-21         NA  0.20408394
2022-11-23 -1.7205721          NA

Performing Full Outer Join

Consider another program below that also performs full outer join of the two objects. It is important to note that, this time we are passing a third argument as “fill = 0” to the merge() function. Due to this, all the NA values will be replaced by 0 in the output −

Example

# Importing library library(xts) # Creating an object of xts class myObject1 <- xts(x = rnorm(n = 4), order.by = as.Date(c("2022-11-01", "2022-11-04", "2022-11-10", "2022-11-23"))) # Creating another object of xts class myObject2 <- xts(x = rnorm(n = 4), order.by = as.Date(c("2022-11-04", "2022-11-10", "2022-11-15", "2022-11-21"))) # Performing inner join merge(myObject1, myObject2, join = "outer", fill = 0)

Output

             myObject1 myObject2
2022-11-01 -0.27983799  0.000000
2022-11-04  0.56771575  1.079353
2022-11-10  0.09849405  1.169731
2022-11-15  0.00000000 -1.022448
2022-11-21  0.00000000  1.031976
2022-11-23  0.99577871  0.000000

Apply and aggregate by time

The xts class also provides us endpoints() function that we can use to get the locations of the last observations in each interval that is mentioned by the argument,

on = c("years", "quarters", "months", "hours", "minutes")

Example

Let us consider a program below that contains all the xts object having dates in the range 01-01-2022 to 10-01-2022 −

# Importing xts library library(xts) # Creating a data myData <- xts(x = rnorm(n = 10), order.by = seq(as.Date("2022-01-01"), length = 10, by = "days"), born = as.POSIXct("2000-09-07")) # Print myData print(myData)

Output

                  [,1]
2022-01-01 -0.71176135
2022-01-02  0.07589876
2022-01-03 -0.06607525
2022-01-04  0.53143095
2022-01-05  0.11743337
2022-01-06 -0.29164378
2022-01-07 -0.04782661
2022-01-08 -1.93776118
2022-01-09 -0.04961253
2022-01-10 -0.45633307

Now we can use endpoints function on myData as to print Sunday dates (02-01-2022 and 09-01-2022) and end point date (10-01-2022) −

# Importing xts library library(xts) # Creating a data myData <- xts(x = rnorm(n = 10), order.by = seq(as.Date("2022-01-01"), length = 10, by = "days"), born = as.POSIXct("2000-09-07")) # Get endpoints endPoints <- endpoints(myData, on = "weeks") # Print the endpoints data from myData myData[endPoints]

Output

                   [,1]
2022-01-02  0.399972612
2022-01-09 -0.009547296
2022-01-10 -0.622855139

Conclusion

In this tutorial, we have discussed about how we can manipulate time series data in R using xts and zoo. We discussed in detail basic manipulation, merging and modifying time series, and lastly, we discussed how we can apply and aggregate by time. I hope this tutorial has helped you to strengthen your knowledge in the field of data science.

Updated on: 17-Jan-2023

756 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements