Which of the following statements about time-series forecasting methods is TRUE Check all that apply

Time series forecasting occurs when you make scientific predictions based on historical time stamped data. It involves building models through historical analysis and using them to make observations and drive future strategic decision-making. An important distinction in forecasting is that at the time of the work, the future outcome is completely unavailable and can only be estimated through careful analysis and evidence-based priors.

What is time series forecasting?

Time series forecasting is the process of analyzing time series data using statistics and modeling to make predictions and inform strategic decision-making. It’s not always an exact prediction, and likelihood of forecasts can vary wildly—especially when dealing with the commonly fluctuating variables in time series data as well as factors outside our control. However, forecasting insight about which outcomes are more likely—or less likely—to occur than other potential outcomes. Often, the more comprehensive the data we have, the more accurate the forecasts can be. While forecasting and “prediction” generally mean the same thing, there is a notable distinction. In some industries, forecasting might refer to data at a specific future point in time, while prediction refers to future data in general. Series forecasting is often used in conjunction with time series analysis. Time series analysis involves developing models to gain an understanding of the data to understand the underlying causes. Analysis can provide the “why” behind the outcomes you are seeing. Forecasting then takes the next step of what to do with that knowledge and the predictable extrapolations of what might happen in the future.

Applications of time series forecasting

Forecasting has a range of applications in various industries. It has tons of practical applications including: weather forecasting, climate forecasting, economic forecasting, healthcare forecasting engineering forecasting, finance forecasting, retail forecasting, business forecasting, environmental studies forecasting, social studies forecasting, and more. Basically anyone who has consistent historical data can analyze that data with time series analysis methods and then model, forecasting, and predict. For some industries, the entire point of time series analysis is to facilitate forecasting. Some technologies, such as augmented analytics, can even automatically select forecasting from among other statistical algorithms if it offers the most certainty. 

When you should use time series analysis forecasting

Naturally, there are limitations when dealing with the unpredictable and the unknown. Time series forecasting isn’t infallible and isn’t appropriate or useful for all situations. Because there really is no explicit set of rules for when you should or should not use forecasting, it is up to analysts and data teams to know the limitations of analysis and what their models can support. Not every model will fit every data set or answer every question. Data teams should use time series forecasting when they understand the business question and have the appropriate data and forecasting capabilities to answer that question. Good forecasting works with clean, time stamped data and can identify the genuine trends and patterns in historical data. Analysts can tell the difference between random fluctuations or outliers, and can separate genuine insights from seasonal variations. Time series analysis shows how data changes over time, and good forecasting can identify the direction in which the data is changing.

The first thing to consider is the amount of data at hand—the more points of observation you have, the better your understanding. This is a constant across all types of analysis, and time series analysis forecasting is no exception. However, forecasting relies heavily on the amount of data, possibly even more so than other analyses. It builds directly off of past and current data. The less data you have to extrapolate, the less accurate your forecasting will be.

Time horizons

The time frame of your forecast also matters. This is known as a time horizon—a fixed point in time where a process (like the forecast) ends. It’s much easier to forecast a shorter time horizon with fewer variables than it is a longer time horizon. The further out you go, the more unpredictable the variables will be. Alternatively, having less data can sometimes still work with forecasting if you adjust your time horizons. If you’re lacking long-term recorded data but you have an extensive amount of short-term data, you can create short-term forecasts.

Dynamic and static states

The state of your forecasting and data makes a difference as to when you want to use it. Will the forecast be dynamic or static? If the forecast is static, it is set in stone once it is made, so make sure your data is adequate for a forecast. However, dynamic forecasts can be constantly updated with new information as it comes in. This means you can have less data at the time the forecast is made, and then get more accurate predictions as data is added.

Data quality

As always with analysis, the best analysis is only useful if the data is of a useable quality. Data that is dirty, poorly processed, overly processed, or isn’t properly collected can significantly skew results and create wildly inaccurate forecasts. The typical guidelines for data quality apply here:

  • Make sure data is complete,
  • is not duplicated or redundant,
  • was collected in a timely and consistent manner,
  • is in a standard and valid format,
  • is accurate for what it is measuring,
  • and is uniform across sets.

When dealing with time series analysis, it is even more important that the data was collected at consistent intervals over the period of time being tracked. This helps account for trends in the data, cyclic behavior, and seasonality. It also can help identify if an outlier is truly an outlier or if it is part of a larger cycle. Gaps in the data can hide cycles or seasonal variation, skewing the forecast as a result.

Examples of time series forecasting

Here are several examples from a range of industries to make the notions of time series analysis and forecasting more concrete:

  • Forecasting the closing price of a stock each day.
  • Forecasting product sales in units sold each day for a store.
  • Forecasting unemployment for a state each quarter.
  • Forecasting the average price of gasoline each day.

Things that are random will never be forecast accurately, no matter how much data we collect or how consistently. For example: we can observe data every week for every lottery winner, but we can never forecast who will win next. Ultimately, it is up to your data and your time series data analysis as to when you should use forecasting, because forecasting varies widely due to various factors. Use your judgment and know your data. Keep this list of considerations in mind to always have an idea of how successful forecasting will be.

It is important to evaluate forecast accuracy using genuine forecasts. Consequently, the size of the residuals is not a reliable indication of how large true forecast errors are likely to be. The accuracy of forecasts can only be determined by considering how well a model performs on new data that were not used when fitting the model.

When choosing models, it is common practice to separate the available data into two portions, training and test data, where the training data is used to estimate any parameters of a forecasting method and the test data is used to evaluate its accuracy. Because the test data is not used in determining the forecasts, it should provide a reliable indication of how well the model is likely to forecast on new data.

Which of the following statements about time-series forecasting methods is TRUE Check all that apply

The size of the test set is typically about 20% of the total sample, although this value depends on how long the sample is and how far ahead you want to forecast. The test set should ideally be at least as large as the maximum forecast horizon required. The following points should be noted.

  • A model which fits the training data well will not necessarily forecast well.
  • A perfect fit can always be obtained by using a model with enough parameters.
  • Over-fitting a model to data is just as bad as failing to identify a systematic pattern in the data.

Some references describe the test set as the “hold-out set” because these data are “held out” of the data used for fitting. Other references call the training set the “in-sample data” and the test set the “out-of-sample data”. We prefer to use “training data” and “test data” in this book.

The window() function introduced in Chapter 2 is useful when extracting a portion of a time series, such as we need when creating training and test sets. In the window() function, we specify the start and/or end of the portion of time series required using time values. For example,

window(ausbeer, start=1995)

extracts all data from 1995 onward.

Another useful function is subset() which allows for more types of subsetting. A great advantage of this function is that it allows the use of indices to choose a subset. For example,

subset(ausbeer, start=length(ausbeer)-4*5)

extracts the last 5 years of observations from ausbeer. It also allows extracting all values for a specific season. For example,

subset(ausbeer, quarter = 1)

extracts the first quarters for all years.

Finally, head and tail are useful for extracting the first few or last few observations. For example, the last 5 years of ausbeer can also be obtained using

A forecast “error” is the difference between an observed value and its forecast. Here “error” does not mean a mistake, it means the unpredictable part of an observation. It can be written as \[ e_{T+h} = y_{T+h} - \hat{y}_{T+h|T}, \] where the training data is given by \(\{y_1,\dots,y_T\}\) and the test data is given by \(\{y_{T+1},y_{T+2},\dots\}\).

Note that forecast errors are different from residuals in two ways. First, residuals are calculated on the training set while forecast errors are calculated on the test set. Second, residuals are based on one-step forecasts while forecast errors can involve multi-step forecasts.

We can measure forecast accuracy by summarising the forecast errors in different ways.

The forecast errors are on the same scale as the data. Accuracy measures that are based only on \(e_{t}\) are therefore scale-dependent and cannot be used to make comparisons between series that involve different units.

The two most commonly used scale-dependent measures are based on the absolute errors or squared errors: \[\begin{align*} \text{Mean absolute error: MAE} & = \text{mean}(|e_{t}|),\\ \text{Root mean squared error: RMSE} & = \sqrt{\text{mean}(e_{t}^2)}. \end{align*}\] When comparing forecast methods applied to a single time series, or to several time series with the same units, the MAE is popular as it is easy to both understand and compute. A forecast method that minimises the MAE will lead to forecasts of the median, while minimising the RMSE will lead to forecasts of the mean. Consequently, the RMSE is also widely used, despite being more difficult to interpret.

The percentage error is given by \(p_{t} = 100 e_{t}/y_{t}\). Percentage errors have the advantage of being unit-free, and so are frequently used to compare forecast performances between data sets. The most commonly used measure is: \[ \text{Mean absolute percentage error: MAPE} = \text{mean}(|p_{t}|). \] Measures based on percentage errors have the disadvantage of being infinite or undefined if \(y_{t}=0\) for any \(t\) in the period of interest, and having extreme values if any \(y_{t}\) is close to zero. Another problem with percentage errors that is often overlooked is that they assume the unit of measurement has a meaningful zero.2 For example, a percentage error makes no sense when measuring the accuracy of temperature forecasts on either the Fahrenheit or Celsius scales, because temperature has an arbitrary zero point.

They also have the disadvantage that they put a heavier penalty on negative errors than on positive errors. This observation led to the use of the so-called “symmetric” MAPE (sMAPE) proposed by Armstrong (1978, p. 348), which was used in the M3 forecasting competition. It is defined by \[ \text{sMAPE} = \text{mean}\left(200|y_{t} - \hat{y}_{t}|/(y_{t}+\hat{y}_{t})\right). \] However, if \(y_{t}\) is close to zero, \(\hat{y}_{t}\) is also likely to be close to zero. Thus, the measure still involves division by a number close to zero, making the calculation unstable. Also, the value of sMAPE can be negative, so it is not really a measure of “absolute percentage errors” at all.

Hyndman & Koehler (2006) recommend that the sMAPE not be used. It is included here only because it is widely used, although we will not use it in this book.

Scaled errors were proposed by Hyndman & Koehler (2006) as an alternative to using percentage errors when comparing forecast accuracy across series with different units. They proposed scaling the errors based on the training MAE from a simple forecast method.

For a non-seasonal time series, a useful way to define a scaled error uses naïve forecasts: \[ q_{j} = \frac{\displaystyle e_{j}} {\displaystyle\frac{1}{T-1}\sum_{t=2}^T |y_{t}-y_{t-1}|}. \] Because the numerator and denominator both involve values on the scale of the original data, \(q_{j}\) is independent of the scale of the data. A scaled error is less than one if it arises from a better forecast than the average naïve forecast computed on the training data. Conversely, it is greater than one if the forecast is worse than the average naïve forecast computed on the training data.

For seasonal time series, a scaled error can be defined using seasonal naïve forecasts: \[ q_{j} = \frac{\displaystyle e_{j}} {\displaystyle\frac{1}{T-m}\sum_{t=m+1}^T |y_{t}-y_{t-m}|}. \]

The mean absolute scaled error is simply \[ \text{MASE} = \text{mean}(|q_{j}|). \]

beer2 <- window(ausbeer,start=1992,end=c(2007,4)) beerfit1 <- meanf(beer2,h=10) beerfit2 <- rwf(beer2,h=10) beerfit3 <- snaive(beer2,h=10) autoplot(window(ausbeer, start=1992)) + autolayer(beerfit1, series="Mean", PI=FALSE) + autolayer(beerfit2, series="Naïve", PI=FALSE) + autolayer(beerfit3, series="Seasonal naïve", PI=FALSE) + xlab("Year") + ylab("Megalitres") + ggtitle("Forecasts for quarterly beer production") + guides(colour=guide_legend(title="Forecast"))

Which of the following statements about time-series forecasting methods is TRUE Check all that apply

Figure 3.9: Forecasts of Australian quarterly beer production using data up to the end of 2007.

Figure 3.9 shows three forecast methods applied to the quarterly Australian beer production using data only to the end of 2007. The actual values for the period 2008–2010 are also shown. We compute the forecast accuracy measures for this period.

beer3 <- window(ausbeer, start=2008) accuracy(beerfit1, beer3) accuracy(beerfit2, beer3) accuracy(beerfit3, beer3)

Mean method 38.45 34.83 8.28 2.44
Naïve method 62.69 57.40 14.18 4.01
Seasonal naïve method 14.31 13.40 3.17 0.94

It is obvious from the graph that the seasonal naïve method is best for these data, although it can still be improved, as we will discover later. Sometimes, different accuracy measures will lead to different results as to which forecast method is best. However, in this case, all of the results point to the seasonal naïve method as the best of these three methods for this data set.

To take a non-seasonal example, consider the Google stock price. The following graph shows the 200 observations ending on 6 Dec 2013, along with forecasts of the next 40 days obtained from three different methods.

googfc1 <- meanf(goog200, h=40) googfc2 <- rwf(goog200, h=40) googfc3 <- rwf(goog200, drift=TRUE, h=40) autoplot(subset(goog, end = 240)) + autolayer(googfc1, PI=FALSE, series="Mean") + autolayer(googfc2, PI=FALSE, series="Naïve") + autolayer(googfc3, PI=FALSE, series="Drift") + xlab("Day") + ylab("Closing Price (US$)") + ggtitle("Google stock price (daily ending 6 Dec 13)") + guides(colour=guide_legend(title="Forecast"))

Which of the following statements about time-series forecasting methods is TRUE Check all that apply

Figure 3.10: Forecasts of the Google stock price from 7 Dec 2013.

googtest <- window(goog, start=201, end=240) accuracy(googfc1, googtest) accuracy(googfc2, googtest) accuracy(googfc3, googtest)

Mean method 114.21 113.27 20.32 30.28
Naïve method 28.43 24.59 4.36 6.57
Drift method 14.08 11.67 2.07 3.12

Here, the best method is the drift method (regardless of which accuracy measure is used).

A more sophisticated version of training/test sets is time series cross-validation. In this procedure, there are a series of test sets, each consisting of a single observation. The corresponding training set consists only of observations that occurred prior to the observation that forms the test set. Thus, no future observations can be used in constructing the forecast. Since it is not possible to obtain a reliable forecast based on a small training set, the earliest observations are not considered as test sets.

The following diagram illustrates the series of training and test sets, where the blue observations form the training sets, and the red observations form the test sets.

Which of the following statements about time-series forecasting methods is TRUE Check all that apply

The forecast accuracy is computed by averaging over the test sets. This procedure is sometimes known as “evaluation on a rolling forecasting origin” because the “origin” at which the forecast is based rolls forward in time.

With time series forecasting, one-step forecasts may not be as relevant as multi-step forecasts. In this case, the cross-validation procedure based on a rolling forecasting origin can be modified to allow multi-step errors to be used. Suppose that we are interested in models that produce good \(4\)-step-ahead forecasts. Then the corresponding diagram is shown below.

Which of the following statements about time-series forecasting methods is TRUE Check all that apply

Time series cross-validation is implemented with the tsCV() function. In the following example, we compare the RMSE obtained via time series cross-validation with the residual RMSE.

e <- tsCV(goog200, rwf, drift=TRUE, h=1) sqrt(mean(e^2, na.rm=TRUE)) #> [1] 6.233 sqrt(mean(residuals(rwf(goog200, drift=TRUE))^2, na.rm=TRUE)) #> [1] 6.169

As expected, the RMSE from the residuals is smaller, as the corresponding “forecasts” are based on a model fitted to the entire data set, rather than being true forecasts.

A good way to choose the best forecasting model is to find the model with the smallest RMSE computed using time series cross-validation.

The ugliness of the above R code makes this a good opportunity to introduce some alternative ways of stringing R functions together. In the above code, we are nesting functions within functions within functions, so you have to read the code from the inside out, making it difficult to understand what is being computed. Instead, we can use the pipe operator %>% as follows.

goog200 %>% tsCV(forecastfunction=rwf, drift=TRUE, h=1) -> e e^2 %>% mean(na.rm=TRUE) %>% sqrt() #> [1] 6.233 goog200 %>% rwf(drift=TRUE) %>% residuals() -> res res^2 %>% mean(na.rm=TRUE) %>% sqrt() #> [1] 6.169

The left hand side of each pipe is passed as the first argument to the function on the right hand side. This is consistent with the way we read from left to right in English. When using pipes, all other arguments must be named, which also helps readability. When using pipes, it is natural to use the right arrow assignment -> rather than the left arrow. For example, the third line above can be read as “Take the goog200 series, pass it to rwf() with drift=TRUE, compute the resulting residuals, and store them as res”.

We will use the pipe operator whenever it makes the code easier to read. In order to be consistent, we will always follow a function with parentheses to differentiate it from other objects, even if it has no arguments. See, for example, the use of sqrt() and residuals() in the code above.

The goog200 data, plotted in Figure 3.5, includes daily closing stock price of Google Inc from the NASDAQ exchange for 200 consecutive trading days starting on 25 February 2013.

The code below evaluates the forecasting performance of 1- to 8-step-ahead naïve forecasts with tsCV(), using MSE as the forecast error measure. The plot shows that the forecast error increases as the forecast horizon increases, as we would expect.

e <- tsCV(goog200, forecastfunction=naive, h=8) # Compute the MSE values and remove missing values mse <- colMeans(e^2, na.rm = T) # Plot the MSE values against the forecast horizon data.frame(h = 1:8, MSE = mse) %>% ggplot(aes(x = h, y = MSE)) + geom_point()

Which of the following statements about time-series forecasting methods is TRUE Check all that apply

Armstrong, J. S. (1978). Long-range forecasting: From crystal ball to computer. John Wiley & Sons. [Amazon]

Hyndman, R. J., & Koehler, A. B. (2006). Another look at measures of forecast accuracy. International Journal of Forecasting, 22(4), 679–688. [DOI]