We are going to study the dataset hours
using the techniques described in Chapter 3 of Cryer and Chan, following Problems 3.4 and 3.10 closely.
library(TSA)
## Loading required package: leaps
## Loading required package: locfit
## locfit 1.5-9.1 2013-03-22
## Loading required package: mgcv
## Loading required package: nlme
## This is mgcv 1.8-7. For overview type 'help("mgcv-package")'.
## Loading required package: tseries
##
## Attaching package: 'TSA'
##
## The following objects are masked from 'package:stats':
##
## acf, arima
##
## The following object is masked from 'package:utils':
##
## tar
# Load the data.
hours <- read.table("http://homepage.divms.uiowa.edu/~kchan/TSA/Datasets/hours.dat",
header=T)
As described in the text, the dataset contains monthly values of the average hours worked per week in the U.S. manufacturing sector for July 1982 through June 1987. Our first step is to plot the series with monthly plotting symbols.
hours <- ts(hours$hours, start=c(1982, 7), frequency=12)
plot(hours,
main="Average Hours Worked Per Week\nin the U.S. manufacturing sector",
ylab="Hours")
# Append monthly plotting symbols.
points(as.vector(time(hours)), hours, pch=as.vector(season(hours)))
For simplicity, consider only linear trend and quadratic trend.
linear_time_trend <- lm(hours ~ time(hours))
# Regression output.
summary(linear_time_trend)
##
## Call:
## lm(formula = hours ~ time(hours))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.03634 -0.24446 -0.03871 0.34405 1.11314
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -556.31190 86.31998 -6.445 2.49e-08 ***
## time(hours) 0.30062 0.04349 6.913 4.11e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4861 on 58 degrees of freedom
## Multiple R-squared: 0.4517, Adjusted R-squared: 0.4423
## F-statistic: 47.79 on 1 and 58 DF, p-value: 4.106e-09
# Study standardized residuals.
plot(as.vector(time(hours)), rstudent(linear_time_trend), type="l",
xlab="Time", ylab="hours", main="Residuals Versus Time (Linear Trend)")
# Append monthly plotting symbols.
points(as.vector(time(hours)), rstudent(linear_time_trend),
pch=as.vector(season(hours)))
quadratic_time_trend <- lm(hours ~ time(hours) + I(time(hours) ^ 2))
# Regression output.
summary(quadratic_time_trend)
##
## Call:
## lm(formula = hours ~ time(hours) + I(time(hours)^2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.00603 -0.25431 -0.02267 0.22884 0.98358
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.122e+05 1.155e+05 -4.433 4.28e-05 ***
## time(hours) 5.159e+02 1.164e+02 4.431 4.31e-05 ***
## I(time(hours)^2) -1.299e-01 2.933e-02 -4.428 4.35e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.423 on 57 degrees of freedom
## Multiple R-squared: 0.5921, Adjusted R-squared: 0.5778
## F-statistic: 41.37 on 2 and 57 DF, p-value: 7.97e-12
# Study standardized residuals.
plot(as.vector(time(hours)), rstudent(quadratic_time_trend), type="l",
xlab="Time", ylab="hours", main="Residuals Versus Time (Quadratic Trend)")
# Append monthly plotting symbols.
points(as.vector(time(hours)), rstudent(quadratic_time_trend),
pch=as.vector(season(hours)))
We restrict our attention to residuals from fitting the quadratic trend.
For runs test, the null hypothesis is that the observations are independently drawn from the same distribution. A small p-value rejects the null and provides evidence of (nontrivial) dependence.
res <- rstudent(quadratic_time_trend)
runs(res)
## $pvalue
## [1] 0.00012
##
## $observed.runs
## [1] 16
##
## $expected.runs
## [1] 30.96667
##
## $n1
## [1] 31
##
## $n2
## [1] 29
##
## $k
## [1] 0
Correlogram (also known as an ACF plot) provides a nice graphical display of the sample autocorrelation as a function of time lag. The horizontal dotted lines are at \(\pm 1.96/\sqrt{n}\), where \(n\) is the number of observations in the time series. If all the vertical bars lie within the dotted lines, we fail to reject the null hypothesis that \(\rho_k = 0\) for the values of \(k\) displayed.
acf(res, main="Residuals")
Histograms can be useful for studying normality. If the histogram is roughly bell-shaped, it might be reasonable to conclude that the deviation from normality is not too severe.
hist(res, main="Histogram of Residuals")
A more careful method for checking normality is the normal quantile-quantile plot. If the data is generated from a normal distribution, the plot resembles a straight line.
qqnorm(res)
qqline(res)