Optimizing Demand Forecasting with SAP Data: A Modern Approach

What is Demand Forecasting?

The demand of any product can be forecasted by estimating the shipments an organization has made in the past, this allows us to pro-actively optimize our production schedule so that we don’t lose any potential business and at the same time we also don’t invest in excess inventory thereby leading to better cashflow and a lower possibility of overstock, this process of forecasting the demand has been around for a while now but mostly has been a pretty haphazard and Ad-hoc approach where Business Analysts try to forecast the demand with simple time series algorithms, also the traditional process of doing that only involved leveraging the historical datasets. Now with the power of modern cloud and in-memory Big Data Engine such as Apache Spark we can even co-relate external any external factors that may be responsible for driving a part of our demand such as:

Demography,
Weather conditions
impact of competition.

This co-relation tends to give us a better insight into our demand thereby leading to a better accuracy moreover we can now leverage all the modern AI/ML algorithms like:

Random Forest,
Gradient Boosting
Neural networks

and run them at scale on our enterprise data as compared to traditional time series algorithms.

Methodology to Forecast the Demand

We need to first get our Enterprise Data in Azure from SAP ERP so that we can start applying all the latest and greatest algorithms in a scalable and distributed platform such as Azure Databricks (manged service of Apache Spark). After getting the data into Azure we start data cleaning/massaging to train a predictive machine learning model on it. After training the model we visualize the forecasts on a comprehensive BI tool such as Power BI and also deploy the model on Azure ML so that we can call our forecasts from any third party application

So in order to quickly summarize we are going to do the following:

PowerBI Dashboard

Pull data into Azure using a managed ETL/ELT service called Azure Data Factory
Read the data from Azure Data Lake in a Spark Data frame
Train the machine learning model in Azure Databricks
Visualize the results on Power BI
Deploy the model on Azure ML

Detailed Steps:

Azure Data Factory Pipeline

[/vc_column_text]

[/vc_column][/vc_row]

In the pipeline depicted in the above image our source is SAP HANA and we are using the HANA connector that is provided in Azure Data Factory as a standard offering

Databricks Notebooks

Once the data is into Azure Data Lake we start reading the data in the Spark Data Frames with a single line of code

SAP Data in a Spark Data Frame

After this we are going to apply very simple spark based transformations where we convert the categorical variables into numerical ones so that we can build a Machine Learning model on it. In this scenario we are going to use GBT Regression using Apache Spark MLlib and then forecast the

We finally the train the model and have our forecasts and quickly visualize the results in Databricks graph