Machine learning for the masses: bringing models to the king of office tools(a.k.a Excel)
Context :
Nowadays, companies of all sizes are using machine learning not only to improve their business performance and make strategic decisions, but also to automate and simplify daily tasks workflow of their employees. However our experience shows that it’s not always easy to design and industrialize data driven solutions that can be integrated into a legacy system.
Hopefully, the most commonly used productivity tool, aka Excel, once used to store, analyze data and create reports and dashboards, is now ready to be used in combination with a really powerful tool: Azure Machine Learning Studio.
The latest brings additional and important features to Excel by allowing the automated integration of the results from machine learning models directly into an Excel sheet — i.e allow any company to harness the power of ML directly from their excel sheet.
In this article, we will detail how we built an end-to-end machine learning workflow using only Excel and Azure ML Studio, for a client who needed a model (predict the probability of selling a product), and whose underlying data was living in his native tool: Excel. This task requires building a statistical model that estimates the survival probability using different features of the data. We also considered developing a dedicated application, but regarding the context (restricted number of users, simple data schema), we decided to stay in Excel and bring it some Machine Learning power.
In this specific use case, the target workflow consists in assigning a probability to each product based on its specifications so that the employee can visualize in the same tool the data and the estimated probability side by side.
Technical stack
ML on Azure
ML Studio simplifies the workflow of a machine learning project by providing a collaborative UI in which you can drag and drop datasets, models and analysis modules and connect them together to form an experiment, which you run in Machine Learning Studio.
Once your predictive analytics model is ready, you can convert the training experiment into a predictive experiment and then publish it as a web service hosted on Azure so that it can be consumed easily by web apps or Excel.
Project workflow
As written before, the aim of our experiment is to predict the selling probability of a product at some point in the future based on multiple features. Our project follows the default workflow for an Azure experiment.
1. Get and prepare the data
Once you have set up the experiment in Azure ML, the first thing to do is to import the data which can be a local csv file, online data source or from an on-premise SQL Server database.
First, we need to apply preprocessing steps in order to make it consistent with the input of our model. So, we use the Data Transformation section of the module palette for modules that perform these functions.
The data is processed in order to make it model ingestible.
Azure ML pro tip: An easy way to debug the pipeline is to view or download the data produced by a module at each point in your experiment by clicking on the output port.
2. Choose and train the model
Now that the data is ready, we can use it to train the model.
Azure ML Studio provides a rich workspace of statistical models, grouped by use case: supervised learning, unsupervised learning, time series, text analytics, etc.
From this module palette we can easily drag and drop machine learning models into the experiment and link it to the other steps of the pipeline.
However, real life use cases usually require custom solutions and modelling approaches that are not always available in Azure ML workspace.
So, in case we don’t find the required estimator in the module palette, we can add an R or Python script that embeds the code of the estimator, and finally link it to the rest of the experiment. This allows us to customize the modelling approach.
In our context, we had to build a survival function S(t) that computes the probability that a subject survives longer than time t. For the math-savvy among you, we choosed the Kaplan-Meier (KM) curve. The estimator is given by:
- with ti a time when at least one event happened,
- di the number of events (i.e., deaths) that happened at time ti
- ni the individuals known to survive (have not yet had an event or been censored) at time ti .
This process is different from the normal workflow of an Azure ML experiment, because the Kaplan-Meier estimator is not available in the module palette of algorithms. That is why we just added an R Script that embeds the code of the estimator and finally linked it to the rest of the experiment.
3. Deploy the model as a web service
To get this model ready for deployment, we need to convert the training experiment to a predictive one. This involves some light plumbery : saving the model, trimming the experiment to remove modules that were only needed for training and defining where the web service will accept input and where it generates output.
All these steps can be accomplished by clicking Set Up Web Service at the bottom of the experiment canvas (and selecting the Predictive Web Service option).
Finally, you can download an Excel file containing appropriate sample values and Azure ML add in. From this Excel file you can call the webservice and get predictions about survival probabilities for your products.
4. Consume ML web service in Excel
Microsoft provides an add-in for Excel that allows you to use web services directly in Excel. You just need to know the URL and the API key for the service to connect to though. Azure ML makes consuming a model as a web service and using it in Excel so easy :
- Once you have deployed a webservice, you will get an API key and a Request Response URL link. These credentials will be needed to access your web service through Excel and predict beyond your new dataset.
- Then, you can open Excel and download Azure Machine Learning add-in for Excel from Office Add-ins. ( Go to the ‘Insert tab’ and select the ‘Store’ icon ).
- Then all you need to do is to click the ‘+ Add web service’ button and paste in your Response Request URL and your secure API key, so that only your team can access the service.
- Finally, you should select the data on which you need to run the model and then click on ‘predict’ button in order to call the ML web service. And boom, voilà!
Conclusion
Azure Machine Learning Studio provides an intuitive and a collaborative workspace that allows you to create and deploy machine learning experiments. You can deploy a ML web service without writing a single line of code, thanks to the modules of Data Preprocessing, Model Training and Model Deployment that are available in the tool. In case you need to build more customized models, you can integrate R or Python scripts in the workflow that embeds your code.
This feature could be a game changer to many organizations that still rely on Excel as their main productivity tool.