Industrialized analytics Implications of large-scale predictive analytics models

Analytics has grown from being a “craft” activity to one that is capable of creating thousands, even millions of predictive models and embedding them within operational processes. But if we’re going to industrialize analytics, we need to figure out quality measures and controls just as the manufacturing sector has done for its products.

Analytics, for most of its relatively short lifespan, has been a “craft” activity. Decision-makers commissioned analysts to find some data and analyze it, and then report back with results. This batch-oriented process might take weeks or even months. If decision-makers actually used the results, it often resulted in better decisions. But in early days, the time and pace of analytical work hindered its broad acceptance. Business speeded up, but only recently has analytics managed to keep up.

Today, however, many organizations are “industrializing” analytics, dramatically accelerating their speed and scale. I’ve described this elsewhere as a key aspect of “Analytics 3.0.”1 With technologies like model management, in-database scoring, and machine learning (the use of which was a Deloitte Analytics Trend in 2014),2 they are creating thousands, even millions of models and embedding them within operational systems and processes. Decisions are being made without human intervention—or in some cases, without human comprehension. For the most part this is a good thing, but it means that both analysts and decision-makers will have to change their way of working.

Take, for example, the networking equipment company Cisco Systems. About a decade ago the company embarked upon the use of “propensity models”—analytical models to predict how likely a customer was to buy Cisco products or services. Knowing a customer’s propensity to buy helped sales people decide where to focus their efforts. A staff of between three and four analysts developed the models, and produced several models per year to cover different Cisco offerings.

This approach was useful, but it wasn’t up to dealing with the thousands of products and services Cisco offers, and the 170 million potential customers it would like to sell to. So Cisco adopted a more industrialized approach to propensity modeling. It now generates over 60,000 different models for different types of customers and products, and updates them every quarter.3It uses the same three or four people to generate the vastly greater number of models through the “magic” of machine learning.

IBM takes a similar approach to propensity modeling. The company generates somewhat fewer models than Cisco—around 5,000 per quarter, leading to 11 billion scores for the propensities of particular customer executives. The vendor who created these models for IBM calls the approach a “model factory,” which is consistent with the idea of industrializing analytics.

For another type of example, take the process of determining what digital ads to show a particular customer on a particular publisher’s website. Current methods allow for targeting ads to customers based on the cookies in their browsers, and “retargeting” based on sites they have recently visited. But matching ads to customers and publishers is no situation for conventional craft-based analytics. It’s a complex calculation involving lots of variables, rapidly changing prices in auctions, and many different publishing alternatives—and it must be made in about 15 milliseconds. The leading “programmatic buying” firms, as this set of techniques is called, generate thousands of new models a week to deal with this level of complexity. Again, machine learning provides the means for such large-scale analytics development.

More and more of this type of industrialization is going on; it’s a sign that analytics has become so important that it needs to be integrated with daily or real-time operations. As the pace of data creation accelerates (for example, from Internet of Things sensors), we’ll need increasingly industrialized analytics to be able to handle it all at the needed pace.

All this sounds great, but as I mentioned at the beginning of this essay, there is a downside. Machine learning and (to a lesser degree) model management approaches employ so many variables, and move them in and out of models so rapidly, that it is almost impossible for decision-makers to understand what the models mean. And even skilled analysts may have a difficult time making sense of results. For the most part, given the number of models and results that industrialization creates, they don’t even try. “If it works we don’t try to interpret it,” one digital marketer admitted to me.

The problem here is that we may not realize that the world has changed quickly enough to modify our approach to modeling. In customer propensity models, for example, we may miss some variables that would affect some customers’ buying behaviors more than others, such as changes in a particular country’s economic status. It’s unlikely that many Greek companies are in the market for a new router now, for example.

In digital marketing, we also run the risk of missing the forest for the trees. Programmatic buying algorithms may be busily humming away to serve me that Airbnb ad, but they don’t seem conscious of the fact that many “people” who click on them are not actually people. Bots have been created by many shadowy organizations to roboclick on ads; some estimate that half of the impressions of digital ads are fake. With all the supposed precision of the models, that means a very high proportion of advertising spending is wasted.

Industrialized processes were developed, of course, in manufacturing. They have been refined for decades in that context, and manufacturing managers ultimately insisted that there must be effective quality measures and controls for the manufactured goods. If we’re going to industrialize analytics, we need to figure out the same types of measures and controls of quality for the analytics and the operations that they power.

Machine learning needs to become more transparent, and the assumptions behind the models it creates need to be clear. Models that drive operations, no matter how many and complex they may be, need to be understandable and understood. We are in the earliest days of understanding how to apply these new tools to business at industrial scale. We need to realize that it’s not how fast we can create models that matters, but how we make sound decisions based on them.