Sales forecasting is the process of estimating future sales and revenue in order to enable companies to make informed business decisions and predict short-term and long-term performance. Companies can base their forecasts on past sales data, industry-wide comparisons, and economic trends. The problem of sales forecasting can be classified as a time-series forecasting, because the time is the domain in which the data (sales or revenue) got changed.

## Time Series Analysis

A time series is a sequence of data points ordered by the time. Time series analysis is a methodology for extracting useful and meaningful information from these data points. Any time series can be decomposed into three components:- Trend: it means the regression of the data points with time. For example, a time series with a positive trend means that the values of the data points at (t+n) is larger than the ones at time (t). Here the value of the data dependes on the Time rather than the previous values.
- Seasonality (Cycle): it means the repetition of the data over the time domain. In other words, the data values at time (t+n) is the same as the data at time (t), where n is the seasonality or cycle length
- Noise (Random Walk): this is a time independent component (non-systematic) that is added (or subtracted) to the data points.

Based on this we can classify the time series into two classes:

- Non-Stationary: data points with means or variance and covariance that change over the time. This is interpreted as trends or cycles or combination of them.
- Stationary: data points that its means and variance and covariance does not change over the time

The theories behind non-stationary signals and forecasting is not mature and modeling it is complex, which leads to inaccurate results. Luckily, non-stationary series can be transformed into stationary using common techniques (e.g. differentiation). The idea of differentiation is to subtract the data value from its predecessor, so the new series will lose a component of the time-dep

## Sales-Force Forecasting

Salesforce.com (abbreviated as SF or SFDC) is a could computing company that purchase customer relationship management (CRM) products. Salesforce.com's CRM service is broken down into several broad categories: Sales Cloud, Service Cloud, Data Cloud, Marketing Cloud, Community Cloud, Analytics Cloud, App Cloud, and IoT with over 100,000 customers.In this post we will analyze and forecast a sample sales data from salesforce CRM that shows the sales grows between 2013 and 2016, and we will predict the sales values for two business quarters. We will use two models for forecasting: ARIMA and HoltWinters, and will demonstrate how to do that using R language.

The methodology that we will follow is:

- Aggregate the data per month
- Construct the model using 75% of the data as training set
- Check the model accuracy using 25% of the data, and calculate the root mean square error
- Forecast the data for the next two quarters

- CloseDate: date of closing the oppertunity (Measure field)
- Amount: the monotary amount optained
- IsWon and IsClosed: flags for if the opportunity win/lost and closed/opened

## Sales Forecasting using R

We need to install two packages: RJDBC for connecting to DB and retrieve the data, and forecast for data modeling, analysis and forecasting. The following R script shows the sales forecasting using ARMIA.ARIMA model is an abbreviation for Autoregressive Integrated Moving Average, so it is a combination of multiple techniques:

- Auto-regression (AR)
- Integration (I)
- Moving Average (MA)

library(RJDBC)

library(forecast)

rmse <- function(sim, obs){

return(sqrt(mean((sim - obs)^2, na.rm = TRUE)))

}

construct_model <- function(data){

data.start = strsplit(data$CloseDate[1], "-")

data.end = strsplit(data$CloseDate[nrow(data)], "-")

data.ts = ts(data$Amount,

start=c(as.integer(data.start[[1]][1]),

as.integer(data.start[[1]][2])),

end=c(as.integer(data.end[[1]][1]),

as.integer(data.end[[1]][2])),

frequency = 12)

model = auto.arima(data.ts)

summary(model)

return(model)

}

get_forecast_model <- function(close.win.opp){

# Train with 75% of data

N = ceiling(0.75*nrow(close.win.opp))

train.data = close.win.opp[1:N,]

model = construct_model(train.data)

# Test with 25% of data

test.data = close.win.opp[(N+1):nrow(close.win.opp),]

predicted = forecast(model, length(test.data$Amount))

cat("RMSE=", rmse(predicted$mean, test.data$Amount), "\n")

# Train with all data

model = construct_model(close.win.opp)

return(model)

}

drv <- JDBC("com.mysql.jdbc.Driver", classPath="./mysql-connector-java-5.1.41-bin.jar")

conn <- dbConnect(drv, "jdbc:mysql://my-db-domain:3306/sfdc", "my-db-username", "my-db-password")

close.win.opp = dbGetQuery(conn, "SELECT DATE_FORMAT(CloseDate,'%Y-%m') as CloseDate, SUM(Amount) as Amount FROM opportunity WHERE IsWon='true' AND IsClosed='true' And CloseDate < '2016-09-01' GROUP BY DATE_FORMAT(CloseDate,'%Y-%m') ORDER BY CloseDate")

model = get_forecast_model(close.win.opp)

predicted = forecast(model, 6)

plot(predicted)

library(forecast)

rmse <- function(sim, obs){

return(sqrt(mean((sim - obs)^2, na.rm = TRUE)))

}

construct_model <- function(data){

data.start = strsplit(data$CloseDate[1], "-")

data.end = strsplit(data$CloseDate[nrow(data)], "-")

data.ts = ts(data$Amount,

start=c(as.integer(data.start[[1]][1]),

as.integer(data.start[[1]][2])),

end=c(as.integer(data.end[[1]][1]),

as.integer(data.end[[1]][2])),

frequency = 12)

model = auto.arima(data.ts)

summary(model)

return(model)

}

get_forecast_model <- function(close.win.opp){

# Train with 75% of data

N = ceiling(0.75*nrow(close.win.opp))

train.data = close.win.opp[1:N,]

model = construct_model(train.data)

# Test with 25% of data

test.data = close.win.opp[(N+1):nrow(close.win.opp),]

predicted = forecast(model, length(test.data$Amount))

cat("RMSE=", rmse(predicted$mean, test.data$Amount), "\n")

# Train with all data

model = construct_model(close.win.opp)

return(model)

}

drv <- JDBC("com.mysql.jdbc.Driver", classPath="./mysql-connector-java-5.1.41-bin.jar")

conn <- dbConnect(drv, "jdbc:mysql://my-db-domain:3306/sfdc", "my-db-username", "my-db-password")

close.win.opp = dbGetQuery(conn, "SELECT DATE_FORMAT(CloseDate,'%Y-%m') as CloseDate, SUM(Amount) as Amount FROM opportunity WHERE IsWon='true' AND IsClosed='true' And CloseDate < '2016-09-01' GROUP BY DATE_FORMAT(CloseDate,'%Y-%m') ORDER BY CloseDate")

model = get_forecast_model(close.win.opp)

predicted = forecast(model, 6)

plot(predicted)

The script splits the data-set into training data (75%) and verification data (25%). Next, it build the model based on the training data. R has an implementation for ARIMA model featured with automatic detection of parameters. For the before mentioned SFDC dataset, we obtained the following model, with root mean square error = 233560

ARIMA(0,1,0)(0,1,0)[12]

sigma^2 estimated as 4.254e+10: log likelihood=-218.49

AIC=438.98 AICc=439.27 BIC=439.76

Next, we build a model using all the data-set, the obtained model is
sigma^2 estimated as 4.254e+10: log likelihood=-218.49

AIC=438.98 AICc=439.27 BIC=439.76

ARIMA(0,1,1)(0,1,0)[12]

sigma^2 estimated as 5.288e+10: log likelihood=-343.82

AIC=691.64 AICc=692.18 BIC=694.07

Finally, we forecast the next 6 months using this model at Line 45. The data forecasting is
sigma^2 estimated as 5.288e+10: log likelihood=-343.82

AIC=691.64 AICc=692.18 BIC=694.07

Point Forecast Lo 80 Hi 80 Lo 95 Hi 95

Sep 2016 1894667 1599973 2189362 1443970.6 2345364

Oct 2016 1520401 1201883 1838919 1033270.3 2007532

Nov 2016 1545809 1205130 1886488 1024785.6 2066833

Dec 2016 1477773 1116289 1839257 924930.8 2030616

Jan 2017 1517143 1135988 1898299 934216.2 2100070

Feb 2017 1825764 1425904 2225624 1214230.9 2437297

Sep 2016 1894667 1599973 2189362 1443970.6 2345364

Oct 2016 1520401 1201883 1838919 1033270.3 2007532

Nov 2016 1545809 1205130 1886488 1024785.6 2066833

Dec 2016 1477773 1116289 1839257 924930.8 2030616

Jan 2017 1517143 1135988 1898299 934216.2 2100070

Feb 2017 1825764 1425904 2225624 1214230.9 2437297

R supports different time series forecasting models. In the code above, you can easily change the forecasting model by changing Line 18. Fore example to use HoltWinters model change the code to

model = HoltWinters(data.ts)

The root mean square error was 301992.9, and the predictions were in this case
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95

Sep 2016 1760990 1507244 2014735 1372919.6 2149059

Oct 2016 1363874 1107298 1620450 971474.8 1756273

Nov 2016 1381785 1120706 1642865 982498.9 1781072

Dec 2016 1321675 1054109 1589240 912468.3 1730881

Jan 2017 1376777 1100499 1653055 954246.8 1799307

Feb 2017 1687538 1400158 1974919 1248028.3 2127049

Sep 2016 1760990 1507244 2014735 1372919.6 2149059

Oct 2016 1363874 1107298 1620450 971474.8 1756273

Nov 2016 1381785 1120706 1642865 982498.9 1781072

Dec 2016 1321675 1054109 1589240 912468.3 1730881

Jan 2017 1376777 1100499 1653055 954246.8 1799307

Feb 2017 1687538 1400158 1974919 1248028.3 2127049

As we see, using HoltWinters model for our data is more appropriate than ARMIA (less mean square error)