Introduction

This article is a quick start on using R to deal with time-dependent data.

Time indices

In R, there are several options for dealing with time indices.

Date Objects

For dates, the class Date in the base package can be useful.

A plotting example

start_date <- as.Date("2010-01-01")
end_date <- as.Date("2011-12-31")
dates <- seq(start_date, end_date, by=1)
set.seed(100)
net_worth <- cumsum(rnorm(365 * 2))
plot(dates, net_worth, type="l",
     main="Johnny's Net Worth (Fictional)",
     xlab="Date",
     ylab="Value (in US dollars)")

# Want a more informative x-axis
plot(dates, net_worth, type="l",
     main="Johnny's Net Worth (Fictional)",
     xlab="Date",
     ylab="Value (in US dollars)",
     xaxt="n")
axis.Date(1, at=seq(start_date, end_date, by="3 mon"), format="%Y-%m")

Formatting

Often the date is not presented in the form YYYY-MM-DD and directly using as.Date() might result in an error. On the other hand, sometimes we might want to convert the date to a format other than YYYY-MM-DD.

as.Date("12/18/1993", format="%m/%d/%Y")
as.Date("September 2, 2013", format="%B %d, %Y")
as.Date("03Aug2014", format="%d%B%Y")
format(Sys.Date(), "%b. %d, %Y")

For more information on how to specify the date format, see the documentation for strptime.

POSIXt objects

POSIX stands for “Portable Operating System Interface”. There are two subclasses of POSIXt: 1. POSIXct (calendar time): a signed number of seconds since 1970-01-01; 2. POSIXlt (local time): a named list of attributes, including the second (sec), minute (min), hour (hour), day of the month (mday), month (mon), year (year), day of the week (wday), day of the year (yday), and daylight savings time flag (isdst).

heart_rate1 <- read.table("http://ecg.mit.edu/time-series/hr.11839")[, 1]
heart_rate2 <- as.vector(read.table("http://ecg.mit.edu/time-series/hr.7257"))[, 1]
numObs <- length(heart_rate1)
# Suppose the first observation was recorded at 4:25pm on Apr 3, 2004. 
times <- as.POSIXct("2004-04-03 16:25:00") + seq(0, len=1800, by=0.5)

summary(heart_rate1)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   73.44   88.85   92.21   92.60   96.40  106.80
summary(heart_rate2)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   80.21   92.42   98.24   96.64  101.40  104.90
var(heart_rate1)
## [1] 30.13284
var(heart_rate2)
## [1] 32.344
plot(heart_rate1 ~ times, type="l",
     xlab="time", ylab="heart rate",
     main="Heart Rates")
lines(heart_rate2 ~ times, col="red")
legend("bottom", legend=c("Subject 1", "Subject 2"), col=c("black", "red"),
       lty=1)

Arithmetics

Doing arithmetics (addition and subtraction) with date-time objects is almost as easy as doing arithmerics with ordinary numbers. The package lubridate facilitates some of the operations.

# POSIXct objects
ex1 <- as.POSIXct("2014-02-03 16:00:00")
# Add 10 seconds (note the by default 10 means 10 seconds)
ex2 <- ex1 + 10
print(ex2)
## [1] "2014-02-03 16:00:10 PST"
# Add 10 hours
ex3 <- ex1 + 10 * 60 * 60
print(ex3)
## [1] "2014-02-04 02:00:00 PST"
# Add 10 days
ex4 <- ex1 + 10 * 24 * 60 * 60
print(ex4)
## [1] "2014-02-13 16:00:00 PST"
library(lubridate)
# Add 10 hours
ex5 <- ex1 + hours(10)
print(ex5)
## [1] "2014-02-04 02:00:00 PST"
ex6 <- ex1 + days(10)
print(ex6)
## [1] "2014-02-13 16:00:00 PST"

ts objects

The ts object is useful for analyzing time series with regular time spacings.

print(nottem)
##       Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
## 1920 40.6 40.8 44.4 46.7 54.1 58.5 57.7 56.4 54.3 50.5 42.9 39.8
## 1921 44.2 39.8 45.1 47.0 54.1 58.7 66.3 59.9 57.0 54.2 39.7 42.8
## 1922 37.5 38.7 39.5 42.1 55.7 57.8 56.8 54.3 54.3 47.1 41.8 41.7
## 1923 41.8 40.1 42.9 45.8 49.2 52.7 64.2 59.6 54.4 49.2 36.3 37.6
## 1924 39.3 37.5 38.3 45.5 53.2 57.7 60.8 58.2 56.4 49.8 44.4 43.6
## 1925 40.0 40.5 40.8 45.1 53.8 59.4 63.5 61.0 53.0 50.0 38.1 36.3
## 1926 39.2 43.4 43.4 48.9 50.6 56.8 62.5 62.0 57.5 46.7 41.6 39.8
## 1927 39.4 38.5 45.3 47.1 51.7 55.0 60.4 60.5 54.7 50.3 42.3 35.2
## 1928 40.8 41.1 42.8 47.3 50.9 56.4 62.2 60.5 55.4 50.2 43.0 37.3
## 1929 34.8 31.3 41.0 43.9 53.1 56.9 62.5 60.3 59.8 49.2 42.9 41.9
## 1930 41.6 37.1 41.2 46.9 51.2 60.4 60.1 61.6 57.0 50.9 43.0 38.8
## 1931 37.1 38.4 38.4 46.5 53.5 58.4 60.6 58.2 53.8 46.6 45.5 40.6
## 1932 42.4 38.4 40.3 44.6 50.9 57.0 62.1 63.5 56.3 47.3 43.6 41.8
## 1933 36.2 39.3 44.5 48.7 54.2 60.8 65.5 64.9 60.1 50.2 42.1 35.8
## 1934 39.4 38.2 40.4 46.9 53.4 59.6 66.5 60.4 59.2 51.2 42.8 45.8
## 1935 40.0 42.6 43.5 47.1 50.0 60.5 64.6 64.0 56.8 48.6 44.2 36.4
## 1936 37.3 35.0 44.0 43.9 52.7 58.6 60.0 61.1 58.1 49.6 41.6 41.3
## 1937 40.8 41.0 38.4 47.4 54.1 58.6 61.4 61.8 56.3 50.9 41.4 37.1
## 1938 42.1 41.2 47.3 46.6 52.4 59.0 59.6 60.4 57.0 50.7 47.8 39.2
## 1939 39.4 40.9 42.4 47.8 52.4 58.0 60.7 61.8 58.2 46.7 46.6 37.8
class(nottem)
## [1] "ts"
tsp(nottem) # start time, end time, frequency
## [1] 1920.000 1939.917   12.000
plot(nottem, main="Average air temperature at Nottingham Castle",
  xlab="Time",
  ylab="Temperature (Fahrenheit)")

# We will discuss this in the future.
plot(stl(nottem, "per"))

ggplot2

For visualization, besides the base graphics, the R package ggplot2, developed by Hadley Wickham, are often used. While ggplot2 is a bit harder to learn compared to base graphics, the resulting graphics are often more visually appealing. For a slightly more in-depth introduction to ggplot2, see here.

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.3
library(zoo)
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
# Redo heart rate example.
heart_rate_data <- data.frame(times, heart_rate1, heart_rate2)
colnames(heart_rate_data) <- c("Time", "Subject 1", "Subject 2")
library(reshape2)
heart_rate_data_long <- melt(heart_rate_data,
                             id.vars="Time")
ggplot(heart_rate_data_long,
        aes(x=Time, y=value, color=variable)) +
  geom_line() +
  xlab("time") +
  ylab("heart rate") +
  scale_color_discrete(guide=guide_legend(title = "Subject")) +
  ggtitle("Heart rate time series") +
  theme(
    plot.title = element_text(size=24),
    axis.title.x = element_text(size=14),
    axis.title.y = element_text(size=14),
    axis.text.x = element_text(size=12),
    axis.text.y = element_text(size=12),
    legend.title = element_text(size=14),
    legend.text = element_text(size=12)
  )

economics_mod <- economics
economics_mod$unemploy_ma <- rollmean(economics$unemploy, k=5,
                              fill=list(NA, NULL, NA))
ggplot(economics_mod, aes(x=date)) +
  geom_line(aes(y=unemploy, color="reality")) +
  geom_line(aes(y=unemploy_ma, color="moving average")) +
  scale_colour_manual("Lines", values=c("reality"="black",
                                        "moving average"="red")) +
  ggtitle("US Unemployment") +
  xlab("Unemployment (in thousand)") + 
  ylab("Date")
## Warning: Removed 4 rows containing missing values (geom_path).

Other useful packages

There are many more ways in R to deal with time series data. For irregularly spaced time series, consider the packages zoo, xts, and timeSeries.

Reference

  1. ggplot2: Elegant Graphics for Data Analysis by Hadley Wickham
  2. Working with Financial Time Series Data in R by Eric Zivot (2014).