resources Article

Simulation: the Bedrock
of AI

Critical business issues are being automated and solved by AI. Companies are extracting more value than ever before from the data they collect. And these trends look set to continue. But AI is still struggling to help business leaders in industries like finance, transportation and healthcare make better decisions in the complex systems they operate in.

Andrew Ng — one of the leading figures in the world of AI and Machine Learning — explains how the leading companies in the AI space are committing huge amounts of resources to generate the data on which they need to train their intelligence. Uber, Google and Tesla, for example, are spending billions of dollars covering millions of kilometers in their self-driving cars to perfect the algorithms that drive them. They, along with many others, are leveraging an alternative way of generating accurate data to train these algorithms, and it’s a technology that has been around for a number of years.

Computational Simulation

In a recent episode of the McKinsey & Co. podcast, McKinsey Global Institute partner Michael Chui explains the role that simulation plays in training AI:

“To try to improve the speed at which you can learn some of those things, one of the things you can do is simulate environments. By creating these virtual environments — basically within a data center, basically within a computer — you can run a whole bunch more trials and learn a whole bunch more things through simulation. So, when you actually end up in the physical world, you’ve come to the physical world with your AI already having learned a bunch of things in simulation.”

Simulation complements Artificial Intelligence

To more deeply understand why this is, we can think of machine learning and simulation as being two different approaches for understanding and predicting the behavior of complex adaptive systems. Individually they are both powerful modeling paradigms, but it is together, working in concert, that they show the greatest promise.

Machine learning automates the building of analytical models. The models are constructed using algorithms that learn from data. These algorithms are able to update models in real-time, giving them the ability to generate real-time predictions. What’s more, they are able to learn from past predictions, outcomes, and errors.

This makes them incredibly powerful. But these models are limited to forecasting effects of events that are similar to what has already happened in the past. The models are likely to produce inaccurate results once they extrapolate beyond previously observed bounds.

If the future will look different from the past, if we have missing data, or if there is a bias in our data, then data-driven methods are considerably flawed. Many bankers found this out to their considerable cost when the financial crisis hit in 2008.

An over-reliance on data-driven models for decision-making in complex systems is risky; their key deficiency is the scarcity of observations during crisis times. This makes them fragile under the definition prescribed by Nicholas Taleb:

Anything that has more upside than downside from random events (or certain shocks) is antifragile; the reverse is fragile.

In other words, when exposed to previously unseen events, data-driven models tend to do worse than when history is repeating itself.


Simulation’s key advantage over data-driven methods is that it allows us to forecast things that have never happened before and to run scenarios outside of historical bounds — including crisis scenarios.

This is no panacea. The caveat is that we need a good theory and causal hypotheses about how the system we are studying works. Simulation works best when the processes of the system under study are well-understood such that high-fidelity simulations are able to match predicted output.

As long as our theory is sound, we can make startling accurate predictions about states of the world we have never seen before. When reality pans out along one of these simulated trajectories, we already know what to do because we have already test-driven our decisions in the virtual world.

Machines learning from simulations

Probably the most familiar example of simulation is the flight simulator. Pilots expose themselves to previously unseen states of the world in a realistic model of flight in order that they may make better decisions when flying in the real world.

Just as pilots are ‘trained’, so too are machine learning models. We talk about ‘training’ them on data, just as the pilot’s brain is learning from the large amounts of visual and sensory data being generated by the flight simulator.

Unleashing machine learning models on accurate simulations is similarly powerful. Rather than have that model fail when reality moves outside the known parameter set, we can generate vast quantities of data using our simulation — even training the model in the extreme tails of the distribution — in order that our data-driven models can provide useful insight during crisis times.

Simulations learning from machines

We can also use machine learning to build better simulations. Recall that in order to provide sensible predictions, simulations need to accurately reflect the processes at work in the system of study. We can use machine learning here to inform behaviours in our simulation models.

Less well-understood processes can be approximated by machine learning models themselves; where a behavioural component of a simulation model is hard to pin-down, we can use data-driven black box functions to stand-in for stronger theoretical foundations.

This type of ‘hybrid’ modeling — mixing different modeling paradigms — allows us to alloy them into something more powerful.

Machines adapting to a changing environment

The most challenging aspect of making decisions in a complex adaptive system is their adaptive nature.

Any human knows that the actors, or agents, operating in these systems are not fixed and unresponsive. Instead, they change their behavior as the incentives set by their environment change.

Agent-based simulation models are even able to account for the ability of agents to deviate from rationality, optimize, or exploit their environment. Given a policy change, these models can predict how a system will reconfigure itself — allowing machine intelligence to stay ahead of the human responses.

One real-world example of how this can be applied is detecting fraud. We can train machine intelligence for spotting fraud on a simulation comprised of “intelligent fraudsters” which are able to learn fraudulent strategies that humans may not have yet-devised in the real-world.


It is not a question of simulation or machine learning, but the combination of simulation and machine learning that offers the greatest potential. Simulation is a foundational technology for businesses wanting to exploit AI.

The technology to build powerful simulations already exists. And rather than collecting data in the real-world, more and more companies are turning to virtual worlds to collect lifetimes’ worth of data for a fraction of the cost.

The winners in the race to build AI will be those companies best able to leverage simulation to train them.

Chloe Hibbert