Research & Writing·7 min read

After Action Reviews

A proposal for systematically learning from past investment decisions — borrowing from the military, medicine, and software to build a feedback loop into the value process.

What AlphaGo got right

In 2016, Google DeepMind’s AlphaGo defeated Lee Sedol at Go — a game so complex that many researchers thought AI mastery was decades away. The key was reinforcement learning: the system played millions of games against itself, recorded the outcome of every move, and updated its strategy after each result. Every single game fed back into the next decision. Nothing was lost.

Reinforcement learning is not a new idea. It is a formalised, systematic version of something humans have always done: learn from experience. Try something, observe the outcome, adjust the approach. The difference is that DeepMind built the infrastructure to guarantee that the feedback loop was complete, consistent, and free of the biases that humans bring to self-assessment.

In investing, the feedback loop is broken. Most investors learn from experience in a haphazard, ad-hoc way — no structured record of what was expected versus what happened, no systematic review, no formal mechanism to close the loop. The industry lacks the learning infrastructure that DeepMind built for Go, and that other high-stakes professions — military, medicine, aviation — have long adopted.

The compounding value of systematic learning

To see why systematic learning matters, consider a simple simulation. A stock has a hidden true probability of outperforming. Each period produces a noisy outcome. Three investors process this information differently: one ignores it, one over-reacts to the latest data point, and one accumulates evidence in a structured way — like a decision journal.

Two investors, one shifting signal

A stock has a hidden probability of outperforming that shifts over time (regime changes every ~25 periods). Two investors face the same noisy outcomes. The coin-flip investor makes random buy/pass decisions each period — no learning, no memory. The systematic learner keeps a decision journal, accumulates evidence, and adapts as conditions change. Hit “Simulate” to see who converges on the truth — and who captures the value.

True probability

65%

Periods

100

Probability estimate over time

Cumulative P&L from position sizing

Coin-flip investor (no learning)Systematic learner (decision journal)True probability (shifts with regime)

The systematic learner tracks the shifting true probability because it retains and weights evidence from past decisions. The coin-flip investor ignores the signal entirely — their decisions are disconnected from reality. As the true probability shifts with regime changes, the learner adapts; the coin-flipper never notices. Over enough decisions, the gap in cumulative P&L is dramatic — and it compounds. This is the case for a decision journal: not because any single review is transformative, but because the accumulated learning shifts the probability of being right on every future decision.

Why learning from decisions is hard

We have a sample size of one and tell ourselves never to repeat that type of mistake. We then go on to recall these limited experiences to inform our future decisions, whether we like it or not. This is potentially dangerous, because it preserves our self-esteem as to why it went wrong rather than helping us learn a lesson from a dispassionate analysis of the data.

A decision journal — a structured record of expectations at the point of each investment decision — can overcome many of the psychological biases that otherwise corrupt the feedback loop. Narrative fallacy (recalling an easy story from memory), hindsight bias (adjusting our thoughts for what subsequently transpired), and misremembering (our memories being poor at recalling past reasoning accurately) are all potential pitfalls that a contemporaneous written record can defuse.

Given we learn from experience, why do we not have a more systematic approach to learning from past decisions — and inevitably, the mistakes we have made?

The constraints on learning

There are large technical and psychological problems to overcome:

•Our first major hurdle is that we may find it hard. Psychologically, the value archive will be jam-packed with mistakes we have made and errors of judgement. We are not set up to deal with cognitive dissonance — the mental discomfort experienced when confronted with the original thesis when it goes wrong.
•Investing sits at some unknown place on the skill/luck continuum. A bad outcome may be down to a poor decision or to bad luck. Any system for learning should acknowledge that luck can play a meaningful role in any one outcome.
•Sample size: the more luck involved in an activity, the greater the required sample size to arrive at good conclusions. Building a decision journal with sufficient depth over time is the solution to this.
•Time: an effective learning process has an action followed by an outcome. In investing, the time required for effective feedback can be years.

Borrowing from other fields

Outside of investment, there are many examples of organisations using post-mortem style reviews of past decisions. Medicine and politics have different names depending on the industry, but essentially all revolve around revisiting a large mistake and trying to learn from it. Many of the examples came across the Japanese ‘Kaizen’ approach of continuous improvement, which translated into the idea of a ‘Learning Organisation’ in the 1990s.

The example which most frequently appeared was the After Action Review, a review process developed for the US Army. This has been adopted by a number of other organisations such as the NHS. The aim of the AAR is to answer a number of questions in a systematic manner for past decisions and actions.

Constellation Software, which does something very different to what we do (private equity investing), have a forced review of every acquisition after one year to review how things have progressed versus the original assumptions. The same exercise can be done for equity investing.

The proposal: Plan + Reflection = Progress

The main aim of learning from the past should be to identify ways to improve our future decision making. Identifying and learning from mistakes in a systematic manner should enable us to produce a positive feedback loop which produces better longer term performance.

The approach needs to be built with a positive frame of mind, not as a tool to blame people for mistakes. It needs to understand that any outcome will be an unknown portion of skill and luck, and as such should not overweight any one example.

This thinking leads naturally to the importance of keeping a decision journal — a systematic record of every stock-level decision, the reasoning behind it, and the expected outcome, written at the point of decision and revisited once the outcome is known. Over time, this kind of journal builds the sample size needed to distinguish skill from luck, identify recurring patterns of error, and create a genuine feedback loop between past judgement and future decisions.

Key Takeaway

Systematic learning from past decisions is one of the highest-leverage activities in investment management — and one of the least practised. The After Action Review framework, adapted from military and industrial contexts, provides a structured way to overcome the psychological barriers to honest self-assessment. Plan + Reflection = Progress.