Introduction

We are going to study the dataset hours using the techniques described in Chapter 3 of Cryer and Chan, following Problems 3.4 and 3.10 closely.

library(TSA)
## Loading required package: leaps
## Loading required package: locfit
## locfit 1.5-9.1    2013-03-22
## Loading required package: mgcv
## Loading required package: nlme
## This is mgcv 1.8-7. For overview type 'help("mgcv-package")'.
## Loading required package: tseries
## 
## Attaching package: 'TSA'
## 
## The following objects are masked from 'package:stats':
## 
##     acf, arima
## 
## The following object is masked from 'package:utils':
## 
##     tar
# Load the data.
hours <- read.table("http://homepage.divms.uiowa.edu/~kchan/TSA/Datasets/hours.dat",
                    header=T)

Preliminary analysis

As described in the text, the dataset contains monthly values of the average hours worked per week in the U.S. manufacturing sector for July 1982 through June 1987. Our first step is to plot the series with monthly plotting symbols.

hours <- ts(hours$hours, start=c(1982, 7), frequency=12)
plot(hours,
     main="Average Hours Worked Per Week\nin the U.S. manufacturing sector",
     ylab="Hours")
# Append monthly plotting symbols.
points(as.vector(time(hours)), hours, pch=as.vector(season(hours)))

Trend Fitting

For simplicity, consider only linear trend and quadratic trend.

Linear Trend

linear_time_trend <- lm(hours ~ time(hours))
# Regression output.
summary(linear_time_trend)
## 
## Call:
## lm(formula = hours ~ time(hours))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.03634 -0.24446 -0.03871  0.34405  1.11314 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -556.31190   86.31998  -6.445 2.49e-08 ***
## time(hours)    0.30062    0.04349   6.913 4.11e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4861 on 58 degrees of freedom
## Multiple R-squared:  0.4517, Adjusted R-squared:  0.4423 
## F-statistic: 47.79 on 1 and 58 DF,  p-value: 4.106e-09
# Study standardized residuals.
plot(as.vector(time(hours)), rstudent(linear_time_trend), type="l",
     xlab="Time", ylab="hours", main="Residuals Versus Time (Linear Trend)")
# Append monthly plotting symbols.
points(as.vector(time(hours)), rstudent(linear_time_trend),
       pch=as.vector(season(hours)))

Quadratic trend

quadratic_time_trend <- lm(hours ~ time(hours) + I(time(hours) ^ 2))
# Regression output.
summary(quadratic_time_trend)
## 
## Call:
## lm(formula = hours ~ time(hours) + I(time(hours)^2))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.00603 -0.25431 -0.02267  0.22884  0.98358 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -5.122e+05  1.155e+05  -4.433 4.28e-05 ***
## time(hours)       5.159e+02  1.164e+02   4.431 4.31e-05 ***
## I(time(hours)^2) -1.299e-01  2.933e-02  -4.428 4.35e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.423 on 57 degrees of freedom
## Multiple R-squared:  0.5921, Adjusted R-squared:  0.5778 
## F-statistic: 41.37 on 2 and 57 DF,  p-value: 7.97e-12
# Study standardized residuals.
plot(as.vector(time(hours)), rstudent(quadratic_time_trend), type="l",
     xlab="Time", ylab="hours", main="Residuals Versus Time (Quadratic Trend)")
# Append monthly plotting symbols.
points(as.vector(time(hours)), rstudent(quadratic_time_trend),
       pch=as.vector(season(hours)))

Residual Analysis

We restrict our attention to residuals from fitting the quadratic trend.

Runs Test (for detecting non-randomness)

For runs test, the null hypothesis is that the observations are independently drawn from the same distribution. A small p-value rejects the null and provides evidence of (nontrivial) dependence.

res <- rstudent(quadratic_time_trend)
runs(res)
## $pvalue
## [1] 0.00012
## 
## $observed.runs
## [1] 16
## 
## $expected.runs
## [1] 30.96667
## 
## $n1
## [1] 31
## 
## $n2
## [1] 29
## 
## $k
## [1] 0

Correlogram / ACF plot

Correlogram (also known as an ACF plot) provides a nice graphical display of the sample autocorrelation as a function of time lag. The horizontal dotted lines are at \(\pm 1.96/\sqrt{n}\), where \(n\) is the number of observations in the time series. If all the vertical bars lie within the dotted lines, we fail to reject the null hypothesis that \(\rho_k = 0\) for the values of \(k\) displayed.

acf(res, main="Residuals")

Histogram

Histograms can be useful for studying normality. If the histogram is roughly bell-shaped, it might be reasonable to conclude that the deviation from normality is not too severe.

hist(res, main="Histogram of Residuals")

Normal Quantile-Quantile plot

A more careful method for checking normality is the normal quantile-quantile plot. If the data is generated from a normal distribution, the plot resembles a straight line.

qqnorm(res)
qqline(res)

Reference

  1. Time Series Analysis (With Applications in R) by Jonathan D. Cryer and Kung-Sik Chan