# Project You are a statistical consultant in which Custom Research Paper Assistance

STA 137 Project Description of the data: This project focusing on the data of average monthly milk production per cow in the US from January 1994 till December 2005. This data is found in the milk datatset in the R language library which called TSA . We going to remove the trend and seasonal components in the data and fit a proper time series model for the data and finally, forcast the future data. > require(TSA) > data(milk) > x = as.vector(cmort.part) > polt(milk) > n = length(x) > t = 1:n Deterministic components: We see an upward trend and seasonality from the plot. Therefore, the we need to choose a trend to fit the data. We use simple linear trend and remove the tred #remove the trend > trend.fit = lm(x~t) > y.trend = resid(trend.fit) > ts.plot(x) >lines(fitted(trend.fit), col = red ) #obtain residuals after trend is removed > y = residuals(trend.fit) > plot(t,y, type= l , main= Trend Removed , ylab= ) The second step, We estimate the seasonal component using a sum of harmonics. Since the period is 12, we will attempt to t 6 harmonics and then chose which model are signicant for the data by using stepwise regression with an AIC criterion. #remove the seasonal component > t = (t) / n > d=12 > n.harm = 6 #set to [d/2] > harm = matrix(nrow=length(t), ncol=2*n.harm) > for(i in 1:n.harm){ harm[,i*2-1] = sin(n/d * i *2*pi*t) harm[,i*2] = cos(n/d * i *2*pi*t) } > colnames(harm)= paste0(c( sin , cos ), rep(1:n.harm, each = 2)) #fit on all of the sines and cosines > dat = data.frame(y.trend, harm) > fit = lm(y.trend~., data=dat) > summary(fit) # setup the full model and the model with only an intercept > full = lm(y.trend~.,data=dat) > reduced = lm(y.trend~1, data=dat) #stepwise regression starting with the full model > fit.back = step( full, scope = formula(reduced), direction = both ) #get back the original t so that we can plot over this range > t = 1:n #plot the estimated seasonal components > plot(t,y.trend, type= l , col= darkgrey , ylab= ) > lines(t, fitted(fit.back), col= red ) #plot the residuals after seasonal component is removed >ts.plot(residuals(fit.back), main= After seasonal componenets removed , ylab = , xlab= t ) Time series model:After the deterministic components have been removed, plot the ACF and PACF for the residuals. Since ACF trail off and PACF cut of at lag 1. We should consider to use AR(1) model. The best model after the using an AIC criteria is ARIMA(3,0,2). Furthermore, we Plot the residuals after the model is fitted, we saw both ACF and PACF plot are within significance limits, and the box-ljung test also suggests that since the p-value is 0.59, it is not significant, then we cannot reject the hypothesis that the noise is independent. #plot the acf and pacf of the residuals > y = residuals(fit.back) > acf(y) > pacf(y) # Find the best model using an AIC criteria > require(forecast) > arma.fit = auto.arima(resid(fit.back),allowmean = F) > z = resid(fit.y) > acf(z) pacf(z) Forecasting: > fc = forecast(fit.y, h=10, level = .95) #plot the noise forecast > plot(fc) #zoom in on the forecast > plot(fc, xlim=c(140,154)) #forecast the seasonal component and noise > season.fc = fit.back$fitted.values[1:10]+fc$mean #forecast the trend > trend.fc = predict(trend.fit, newdata = data.frame(t=145:154)) #add the seasonal and noise forecasts > x.hat = season.fc+trend.fc > plot(t, milk, xlim = c(1,154), type= l ) > lines(144:154, c(milk[154], x.hat), col= red ) #add the forecast intervals > lines(144:154, c(cmort.part[154], x.hat+fc$lower), col= green , lty=2) > lines(144:154, c(cmort.part[154], x.hat+fc$upper), col= green , lty=2)