WFB Fractional CFO
  • Home
  • Services
  • Contact
  • Strategy
  • Finance
  • Operations
  • About

Forecasting - Statistical Methods and the ARIMA model

1/30/2026

0 Comments

 
Picture
​Statistical Methods : This is a quick example of an ARIMA model on a QoQ Revenues dataset that I chose because it is particularly ugly. While QoQ revenues is not typically the period over which forecasting would be done for revenues from operations, it was a readily available set. Statistical ARIMA forecasting is particularly useful when there aren't easily available and associated variables to use as predictors like you might use in multivariate regression models.
 
We are going to look at time-series data for forecasting.  Time-series simply refers to data that has been collected at regular intervals over a time period.  Because it is collected regularly, this data has both its value and the time it was collected as characteristics.  Examples are closing stock prices (end of day - daily), monthly revenue (monthly), and hourly electricity usage (hourly). Using time-series data requires an understanding of Stationarity. 
 
The most important thing to understand about forecasts is that they rely on historical data and the composition of that data. This fact is an influential assumption to any forecast since you are utilizing the historical relationships that produced your time-series values and then pushing these same relationships into the future to produce forecasted values.  You can certainly adjust these relationships, but then you may have to decompose your values by identifying those relationships before producing forecasts  … not always an easy task.  This is why nearly all forecasts begin with the assumption that any influence that creates past values will continue in the same fashion into the future.
​When most people think of forecasting the first thing that comes to mind is the weather and the subsequent models that are derived from the equations of inputs and outputs.  While it is true that meteorology is stacked with mathematical equations for the forecasting of weather events, there are also similar forecasting equations and methods that are used in the world of business and finance to forecast outputs of everything from business concerns for unit demands on things like electricity usage, tickets sales, subscribers, and all retail products to financial concerns on things like interest rates, stock prices, bond prices, and derivatives.
 
Forecasting can be a lot of fun with the right tools.  Using a platform such as R, Python, or STATA can easily create statistical forecasts of time series data which, quite frankly, most companies lack.  I can’t tell you how many times I discussed forecasting using statistical means such as regression models (models that regress using predictor variables), ARIMA models (models that are lagged regressions of errors and/or observed values), or ETS models (models that use exponential smoothing) and got blank stares.  I presented decomposition of data into trends, seasonality, cyclic activity, and randomness and I generally got brushed off because those asking about my forecasting approach knew almost nothing about forecasting.  It is truly unfortunate that the planners and analysts, whom through correlated activity would also conduct the forecasting, don’t have the knowledge, and in many instances the capability, to create statistical forecasts. (We must begin to realistic assess the personnel and skillsets otherwise there will never be improvement). 
 
Using a simple moving average, tacking on a “x-% increase for the year”, or the ‘go-to’ of the ‘Delphi Method’, which is the gut feeling of management, are not generally the best methods to use.  These models lack the understanding that historical data and its composition of randomness, trend, cycle, and/or seasonality are highly likely to continue and need to be understood individually.  Separately, you might forecast your seasonality and cyclic activity first and, secondly, layer in the trend to arrive at an appropriate composition offering more accurate forecasts relative to historic patterns with the added benefit of being able to defend the basis for the forecasted values and offer a confidence interval as a basis to judge the model’s effectiveness.  I have seen far too many companies, at the very least, need to continually change budgets because of poor forecasting or, at the worst, suffer financially to the point of straining solvency.
 
The data that will be used is a dataset of quarterly revenues for a beverage company plotted below.  We will simply use the data as-is without looking at acquisitions or divestments overtime, we only use the statistical decomposition.
Picture
Forecasting is predictive modeling.   When undergoing predictive modeling you have two broad-based routes.  First, you can use associated variables as predictors to arrive at a predicted value as in the multivariate regressive models.  This is common for predicting housing prices relative to lot size, lot location (corner lot), square footage, number of bathrooms, and number of bedrooms. Second, you can use the time-series data that you would like to forecast and decompose it into its component parts, of residuals, trend, and/or seasonality.  From here you can use lagged regression in an ARIMA model or exponential smoothing in an ETS model.
 
It should be mentioned that either route may utilize linear relationships, nonlinear relationships, or a combination of the two to offer predicted values.  The usage is noted in the naming of the model used and it is good practice to record the methods used for future comparisons. 
 
Examples of the notation for the first route may be lin-lin, lin-log, log-lin, and log-log, referencing the linear or nonlinear usage on either side of the regression equation.  A lin-lin, a linear-to-linear regression, represents unit to unit changes in the independent variables to the dependent variable. A logarithmic-to-logarithmic is a measure of elasticity as it is percent changes to percent change in the independent and dependent variables, respectively.

Since this forecasting is focusing on the second route, the notation will be covered as the writing progresses.
Check for Stationarity
 
If you are forecasting time-series data, which is pretty much everything in business, from operations to finance to marketing, then you must evaluate whether you have variables that can be used as predictors for your forecasting focus.  In the case of revenues, you may track drivers such as appointments, sit-downs, web clicks, and so on, but these may not be terribly effective at offering a prediction of revenues.  A multivariate regression would probably produce unreliable predictions and a low F-stat.  In this type of instance, using a lagged regression or exponential smoothing method would be a better option.  It is not unrealistic to use the assumption that historic patterns in revenue would continue into the future.
 
When using time-series data and its patterns to predict its future values, as opposed to associated data as predictors, the first thing we should do is check for stationarity.  If you are not checking for stationarity in the data, then you are always going to have incorrect forecasts.  Determining stationarity leads you to whether or not your data series has trend and seasonality components so that you can use them for your forecast.  Recall the mentioning of decomposing your data into its components of residuals, trend, and seasonality.
 
Nonstationary data: you have will have a seasonality and/or a trend component in your data.
 
Stationary data: The mean and variance is constant over time.  No trend or seasonality. The data looks like random noise.
 
 Example:
               Below are the additive (linear) and multiplicative (nonlinear) decomposition of the Quarterly Revenues.
Picture
Picture
​To get a decomposition in R, you simply need to use the ‘forecast’ package in R and the ‘decompose’ module.
 
But we are getting a little bit ahead of ourselves.  We would, first, like to check to see if the data is stationary.  As we can see from the decompositions, it is nonstationary, but how do we know?  We use either a Unit Root Test or an ACF (auto-correlation function), but the Unit Root is an easy statistical test.
 
In R we can use the KPSS Unit Root Test, but there are other tests such as the Dickey-Fuller Unit Root Test.  If the test is run for ‘Revenues QoQ’ we see that the test-statistic fails at the 10% (0.6376 > 0.347), 5% (0.6376 > 0.463), and 2.5% (0.6376  > 0.574) critical value levels.  
Picture
​In this test, when it fails, we ‘reject’ the assumption that the data is stationary, this is called a Null Hypothesis.  By rejecting this we are saying that the data is nonstationary.  We can easily see that this is true from the decomposition.  How do we create a stationary data series?
 
This is where the I in the ARIMA model comes into usage.  The I stands for Integrated and while it represents the putting together of the differencing (hence integration), it is easier to think about it in terms of differencing the data series.  Differencing will create a stationary series.  Since we know how to difference the data series from the Unit Root test, shown below, using arithmetic differencing, but it is good to know that we would use either arithmetic differencing of the values in the data series when looking to stabilize the mean or a logarithmic differencing of the values when looking to stabilize the variance. 
 
For the data is the ‘Revenue QoQ’ series, the differencing needs to be done using consecutive values and seasonal equivalent values.  Let’s show these as equations for an example.
 
arithmetic consecutive differencing (first difference):
Picture
arithmetic seasonal differencing (first difference):
Picture
​Where ‘m’ is the seasonal lag.  If there is an annual seasonality, m = 12, a quarterly m = 4, and so on.
 
Let’s see what the test looks like with only an arithmetic consecutive difference:
Picture
​This is a pretty good test-statistic, but can we make it better?
 
Let’s try a consecutive difference and a seasonal difference (quarterly), since we can see that there is a seasonal component in the decomposition.
Picture
​The test-statistic implies that the time-series, using a seasonal difference and a consecutive difference, is essentially just white noise, a random series. This means that all of our trend and seasonality information is removed from the data series.  You can see how easy it is to check the Unit Root in R once you have built some familiarity with the application.
 
Recall that the naming convention for the forecasting method, such as the ARIMA(p,d,q), enables us to track the transformations to our data set so that we can recreate it and others will follow our steps.  We have introduced one consecutive difference along the entire series, d = 1.  But we have also introduced a seasonal difference.  To notate the seasonality difference another set of parentheticals is used with capital letters, s(P,D,Q), with a ‘s’ to mark the seasonality.  It looks like, ARIMA(p,d,q) s(P,D,Q).  Since our seasonality is quarters, s = 4 our notation up to this point is: 
 
ARIMA(0,1,0) 4(0,1,0)
 
We would now like to find out if any orders of AR or MA are beneficial to our model.  We can quickly get the orders using the ACF module, (Auto-Correlated Function) on our differenced series, but lets look at the AR and MA components for a basic conceptual understanding.
AR(p) – Auto-Regressive
​Auto-regression does exactly what it says, it regresses the data series on itself but in a lagged fashion.  An AR(1) regresses the data series on itself, one period back.  An AR(2) regresses the data series on itself, two periods back.  The first thing you should notice, when recalling that a regression requires equal length data series, is that you will have to adjust the primary series when regressing against a lagged version of itself.  The second thing that you should note is that a regression is done to produce coefficients for the variables in a regression equation.  In a data series, this would be similar to a linear multivariate regression equation. 
 
Using the AR equation for a model maps out the pattern of time-series data, assigning a coefficient to the lagged data values relative to their predictive ability of future values.   
 
The equation for an AR(p) model should look familiar if you have done any regression analysis:
Picture
Instead of seeing ‘x’ variables, you see the lagged ‘y’ due to the fact that the regression occurs on itself.  The epsilon is the residual values for the fit of observed data in the series vs the predicted from the equation, they are your error terms that are minimized in the sum of squared errors assessment for regressions.  If all available information is drawn from the data series, the epsilon values should look like white noise and, by definition, i.i.d. (independent and identically distributed), normally distributed with mean of zero and constant variance.
 
Occasionally, you will see AR models of order p = 3, but you will very rarely need orders higher than p = 2, especially when including seasonality.
 
Again, the orders, p and P, can be found by using the ‘ACF’ module (auto-correlation function).  This will be done right after the MA explanation.
MA(q) – Moving-Average
​The Moving Average of an ARIMA model is different from the moving average used when computing a 50-day moving average on stock prices.  An ARIMA model’s MA component is a lagged regression of the past forecast residuals (also called the error values) to predict the next value in the data series,
Picture
where, again, epsilon is our white noise.  Note the epsilon in both the AR and the MA.  Any stationary AR(p) model can be written as a MA() model through some math but that is beyond the scope and could be found with a simple search.
 
When we combine AR with MA we arrive at the formula for predicting future values.
Picture
​If using a differenced series it is written as
Picture
​where y’ is the differenced value of the series.
 
Another note.  You could utilize the formula and do this in Excel, checking models by comparing error summations for each model.  This would take some time depending on how many orders are required for AR and MA, but it is possible.
 
How do we determine the number of orders of AR and MA?  There is another test or two that we can run that will help to determine the orders of AR and MA, the ACF (autocorrelation function) and PACF (partial autocorrelation function) tests.
Picture
Picture
It is a bit of an art to read ACFs, but give there is a significant value at lag 8 (it touches the line) I could begin my modeling with a seasonal MA(2) or an ARIMA(0,1,0) 4(0,1,2).
Picture
​This was a good estimate, but I ended with an ARIMA(0,1,0) 4(2,1,2), it had the lowest AIC.
Picture
Picture
Picture
Auto-ARIMA from R
​Of course, we don’t have to do all of this work.  There is an Auto-ARIMA function in R.  If we use it then out forecast for 8 quarters looks is:
Picture
Picture
Picture
​We could also use a ETS model.
Picture
But the AIC is far to high to compare to our seasonal ARIMA model.
 
​
Please Remember
 
 
ALL MODELS ARE EQUATIONS.
​Other interesting basic forecasts …
Picture
ARIMA(0,0,0)  =                                          White Noise
ARIMA(0,1,0) with no constant =          Random Walk
ARIMA(0,1,0)  with constant =               Random Walk with Drift
ARIMA(0,0,0) m(0,1,0) =                           Seasonal Naive
​A RANDOM WALK
              
 
               This method was the basis from which several stock price models were developed.  You can probably see why.  The last observed value moves up or down one unit in each period with a 50/50 probability.  An important characteristic of the random walk is the long run mean is zero, however the variance is not constant. This makes the data non-stationary, which means that the data has a trend, cyclic or seasonal component or changing variance relative to the time period.  Determining whether the data is stationary is a must when working with time-series data and is often overlooked with forecasting. This point will be reiterated several times.
Picture
It isn’t too difficult to imagine using different probabilities to randomly determine the movement up or down in this basic model, just remember that they must sum to one (prob(up) + prob(down) = 1). This basic concept is actually used as one of the computational components for modeling the pricing of options using the binomial method.   
 
The random walk is a Markov Process, which is a restricted process in which its future value is dependent on only its most recent value.  Because it is only dependent on the most recent value, it is known as memoryless.  ‘Memoryless’ is a neat concept that is also used in the Poisson Process to model the occurrence of events over time.  The memoryless attribute comes from the fact that an occurrence has no influence on the timing of the next occurrence.  This is unrestricted because there is no dependence on even the most recent observation. There is a lot of interest in modeling randomness.
 
​
 
A RANDOM WALK with DRIFT
 
Where the expectation is a mean zero for the random walk, the drift component is the amount that the mean shifts each period.  This drift component produces a trend which makes it non-stationary, which means that the data has a trend, cyclic or seasonal component or changing variance relative to the time period.  Determining whether the data is stationary is a must when working with time-series data and is often overlooked with forecasting.
Picture
​WHITE NOISE
 
This component is the residual, also called the error term or difference between the observed values of the historical data and the fitted values of the forecasting formulas.  When forecasting people underestimate the value of the information found in the residuals.  Residuals are so significant that one could say that forecasting is a study in residuals.
 
As mentioned before, one measures their forecasting method by the cumulative sum of the residuals.  By definition, residuals are to be independent values and are typically normally distributed with mean of zero and a constant variance.  If residuals are not normal with mean of zero and a constant variance, then it’s likely that there is additional information to be gleaned from the data to produce a better forecasting model.  Set out to find random residuals.
Simply generating a sample set for white noise:
Picture
​You can see that it looks similar to static that you might hear from a radio, and if measured on an oscilloscope it would also look similar.
 
Closing
 
Forecasting is more than automating your ERP system to create future values. There is plenty of math and stats laws and guidance that go into creating a forecast, but like any process, if you know the appropriate steps and applications you can complete the process swiftly and to the best benefit of the company. My hope is always that the process will encourage further study into the method and math. Using statistical methods create an actual, quantitative approach, for forecasting evaluation beyond the very general budget vs actuals so that you might track and improve upon the composition of your data forecasts. There is also more advanced dynamic modeling that includes the usage of additional variables to help improve your ARIMA forecasting. If you are forecasting and not putting int the effort to understand this area of statistics you could add considerable value to your skills and your organizations IT capabilities.
 
 
 
Post Note:
 
More companies are ruined by the “years of experience” ignorance when selecting a workforce than any other strategic endeavor.  The issue that always arises from individuals that have little understanding about skillsets then evaluating and assessing your skilled individuals. This is not a topic that is learned "on the job" and, as a result, those selected for the task of forecasting know very little about the topic or methods for forecasting, have no math or stats understanding, and will only repeat the exact same process that they did in a prior company through their “years of experience”.  This is never useful.  This individual will never understand how to assess, rank, and adjust forecasts to become better and many organizations have no real way to determine the skillsets because they themselves do not have the skillsets. 
 
If you are lucky, the individual(s) tasked with the forecasting at least understand the basics such as random walks, random walks with drift, trend, seasonality, cyclic activity, white noise, simple moving average, and exponential moving average.  If they can define these terms, it at least demonstrates a basic understanding of forecasting since you can’t go through any tutelage on the topic without covering these terms.  If they can’t define these topics, then it is highly unlikely that they can explain to you, at a root level, what needs to be done to create better forecasts and therefore it is highly unlikely that your forecasting, budgets and models will improve. 
0 Comments



Leave a Reply.

    Contact
    All case studies and blog writings are written by:
    William F Bryant
    MSc MBA CMA
About
Contact
Services
Case Studies
Blog
Copyright © 2025
  • Home
  • Services
  • Contact
  • Strategy
  • Finance
  • Operations
  • About