“Designing proper data collection today improves the quality of ML outcomes tomorrow”
Machine learning may have all sorts of use cases, but forecasting? In honor of the upcoming ML Conference, we talked to Philipp Beer about how data scientists can utilize ML in statistical forecasting. We talk about the advantages and disadvantages of modern vs. classical methods, how can one decide between the two, and where should they turn when they need good predictions for their business KPIs.
JAXenter: Classical statistical forecasting might be used in many many businesses. Please give us an example where it’s still used and it’s using still makes sense.
Philipp Beer: Brad Efron once said, “Those who ignore Statistics are condemned to reinvent it.”
Machine learning algorithms, with proper design, can have great power of generalization. Together with feature selection, feature engineering and encoding are key steps that can lead to algorithms usable for different dataset.
On the other side, statistical models require a lot of effort for tuning parameters and finding optimal models. Therefore, they are found where models are well understood and are driven by theory. Additionally, statistics also generate reasonable results where the available data volume is not large enough for machine learning. Consequently, their needs are also limited in terms of computational power. That same thriftiness cannot be attributed to machine learning algorithms.
JAXenter: You’re talking about “data hungry” machine learning algorithms. Forecasting in times of ML needs a lot of data. Where is this data coming from? Who collects it?
Philipp Beer: You’re talking about “data hungry” machine learning algorithms. Forecasting in times of ML needs a lot of data. Where is this data coming from? Who collects it? Data needs to come from necessities. Often organizations are asking themselves, how they can collect more information to harness the power of machine learning. From my point of view this approach will not yield good results.
A more insightful approach is to ask good questions. Which kind of data do you need to tackle your most pressing tasks at hand? Knowing the answer to that usually helps to identify where the data should be coming from. It also helps a great deal to understand that the data not necessarily needs to be generated in house. Third party data that can be licensed (e.g. commodity prices) or open data (e.g. weather, population change) may be the place to go to fulfill your business needs.
Having big data today is not a question anymore. Having the right data, however, is not always given. Designing proper data collection today improves the quality of outcomes tomorrow.
With this perspective an organization is in a good position to tackle all of the five W’s regarding the needed data.
JAXenter: How can one decide which method – classical statistics or machine learning – fits the own case?
Philipp Beer: Whatever yields the best results! An a priori guide can only give very general guidance.
To identify the right method, developers need to conduct a detailed exploration and analysis of the data. That will allow them to rule out certain methods and approaches and leave a smaller subset that may yield good results. This will be true for statistical and ML approaches.
In order to get a definitive answer, the remaining methods have to be compared in the results that they produce. In the case of a time-series, predictions and models need to compete side by side and their predictive power determined. If both of them give good results for prediction, there’s no need to choose; just use both of them.
JAXenter: Let’s make a different forecast. How long will it take for machine learning to take over the forecasting sector?
Philipp Beer: I don’t think that machine learning will supplant statistical methods, because both have complementary capabilities.
Machine learning will be equally important component in time-series forecasting in the next 2 – 3 years. The adoption will be driven by convenience and integration. Machine learning needs to become accessible for all interested stake holders – not only in time-series forecasting. As machine learning results becomes more seamlessly integrated into organizations, it will grow into a pillar of future development in all areas of our society.
Philipp Beer will be delivering a talk at ML Conference in Berlin on Wednesday, December 5 that goes over the advantages and disadvantages of modern vs. classical methods. How can one decide between the two and where should they turn when they need good predictions for their business KPIs.