Andy Evans
Research & Writing·10 min read

Information Theory and Value Investing

Applying Shannon's information theory to the uncertainty inherent in value investing — entropy, mutual information, and the channel capacity of fundamental analysis.

The opposite of uncertainty

In 1948, Claude Shannon published “A Mathematical Theory of Communication” and, in doing so, founded an entire field. His central insight was deceptively simple: information is the resolution of uncertainty. A message only carries information to the extent that it tells you something you did not already know. The more surprised you are by an outcome, the more information it contains.

Shannon formalised this with a quantity he called entropy, borrowing the term from thermodynamics. For a discrete random variable X with possible outcomes x1, x2, …, xn and associated probabilities p(x1), p(x2), …, p(xn), the entropy is:

H(X) = −∑ p(xi) log2 p(xi)

Entropy is maximised when all outcomes are equally likely — maximum uncertainty — and equals zero when one outcome is certain. It measures, in bits, the average amount of surprise you should expect from the system. This paper argues that this framework maps naturally onto the problem of value investing over a three to five year horizon, and that thinking in terms of entropy, information gain, and channel capacity offers a more rigorous vocabulary for reasoning about the uncertainty we face.

Entropy of a value thesis

Consider a stock trading at 8× earnings that you believe is worth 12×. What is the uncertainty around that thesis? A naive approach would be to express it as a standard deviation of returns. But standard deviation assumes a shape — typically Gaussian — and treats upside and downside symmetrically. Entropy makes no such assumption. It works on the full probability distribution, whatever its shape.

Imagine discretising your five-year return expectations into buckets: a 20% probability of a permanent value trap (−40%), a 30% probability of modest re-rating (+30%), a 35% probability of full re-rating (+60%), and a 15% probability of a positive catalyst driving a premium valuation (+100%). The entropy of this distribution is approximately 1.89 bits — close to the maximum of 2.0 bits for four equally-likely outcomes, telling us this is a genuinely uncertain situation.

Compare this with a high-quality compounder trading at fair value. Perhaps there is a 70% probability of a 7–10% annualised return, a 20% probability of a 3–5% return, and a 10% probability of a loss. The entropy here is roughly 1.16 bits — meaningfully less uncertain. The compounder has lower entropy not because it is a better investment, but because its range of outcomes is more concentrated. The value stock has higher entropy because the distribution of possible futures is wider and flatter.

Why entropy, not variance?

Variance is a second-moment summary that depends on the distance of outcomes from the mean. Entropy depends only on the probability mass function itself. This makes it sensitive to multi-modality, skew, and the number of materially different outcomes — exactly the features that characterise value investing scenarios, where the distribution is often bimodal (it works, or it is a trap) rather than bell-shaped.

Entropy by Valuation Quintile

Shannon entropy of return distributions across valuation quintiles. Cheap stocks tend to have higher entropy — a wider, flatter distribution of outcomes — than expensive stocks at short horizons, converging at longer horizons.

Based on ~44,000 US stock-years from the EODHD universe (1990–2024). Entropy computed from discretised return distributions using 20 bins. Cheap stocks (Q5) consistently show higher entropy than expensive stocks (Q1), confirming that value investing operates in a higher-uncertainty regime.

Information gain: what research actually does

If entropy is the measure of uncertainty, then information is its reduction. Every hour spent reading a 10-K, every conversation with a supplier, every analysis of a competitor’s pricing — each of these is an attempt to reduce the entropy of the outcome distribution. Shannon called this reduction mutual information:

I(X; Y) = H(X) − H(X | Y)

Here Xis the uncertain outcome (the stock’s three-to-five year return) and Yis the signal (the research input). The mutual information I(X; Y) tells you how many bits of uncertainty about X are resolved by observing Y. The conditional entropy H(X | Y) is what remains after you have incorporated the signal.

This framework immediately suggests a discipline: rank your research activities by their mutual information with the outcome. Some signals are highly informative — management’s capital allocation track record, insider buying patterns, the trajectory of return on invested capital. Others are noise dressed up as signal — quarterly earnings surprises, broker price target revisions, short-term price momentum. The value investor’s edge lies in spending time on high-mutual-information signals while the market over-indexes on low-mutual-information noise.

Note an important asymmetry: the mutual information of a signal depends on what you already know. If you have already read the annual report and modelled the cash flows, the marginal information from a second broker note on the same company is close to zero. Information gain is diminishing — the first few bits of research reduce entropy sharply, but subsequent work yields progressively less. This maps onto the practitioner intuition that the first 80% of conviction comes from 20% of the work, and the remaining 20% of conviction absorbs 80% of the effort.

Layered Information Gain

How much information does each layer of fundamental analysis add? Starting with valuation alone, we progressively add profitability, balance sheet, growth, and cash flow quality metrics — measuring the marginal information gain at each step.

The biggest jump in information comes from combining multiple valuation metrics. Subsequent layers (profitability, balance sheet, growth) add information but with diminishing returns — consistent with the 80/20 intuition that the first few signals carry the most weight.

Channel capacity and time horizon

Shannon’s noisy channel theorem established that every communication channel has a maximum rate at which information can be transmitted reliably — its channel capacity. Below this rate, near-perfect communication is possible. Above it, errors are inevitable.

Financial markets can be thought of as a noisy channel between fundamentals (the “transmitted signal”) and price (the “received message”). Over short time horizons — days, weeks, even quarters — the channel is extremely noisy. Sentiment, flows, positioning, and reflexive dynamics dominate the signal. The noise power swamps the fundamental signal, and the effective channel capacity for fundamental information is low.

As the time horizon extends to three, five, or ten years, the noise power diminishes relative to the signal. Earnings growth, return on capital, and competitive position increasingly determine the price path. The channel capacity for fundamental analysis rises. This is, in information-theoretic terms, why value investing demands patience: you are waiting for the channel to become clear enough for the signal to come through.

The practical corollary is that trying to extract fundamental information from short-horizon price action is like transmitting above the channel capacity — you will inevitably introduce errors. The value investor who trades on a twelve-month earnings revision is operating on a noisy channel. The one who underwrites a five-year free cash flow trajectory is operating on a channel where the signal-to-noise ratio is far more favourable.

A testable claim

If the channel capacity argument is correct, then the mutual information between value metrics (P/E, P/B, EV/EBITDA) and subsequent returns should increase monotonically with the holding period. This is empirically testable with historical data — and the chart below confirms it.

Channel Capacity: Information Rises with Horizon

Mutual information and rank correlation between valuation metrics and subsequent returns, measured at 1, 3, and 5-year horizons. The upward slope confirms the channel capacity argument — fundamental signals become more informative over longer horizons.

All four valuation metrics show increasing predictive power with horizon length. Book/Price and Earnings Yield carry the strongest fundamental signal. This is the “channel capacity” effect — noise dominates at short horizons, but fundamentals increasingly drive prices over 3–5 years.

Conditional entropy and regime dependence

One of the most useful constructs in information theory is conditional entropy — the uncertainty that remains about an outcome after conditioning on some observable state. For value investing, the relevant conditioning variables are macroeconomic regimes: interest rate environments, credit cycles, inflation regimes, and earnings cycles.

The claim is that H(value returns | regime) < H(value returns). That is, knowing the macro regime reduces the uncertainty about value strategy outcomes. The difference — the mutual information I(value returns; regime) — tells you how much of the uncertainty in value investing is explained by the environment rather than by stock-specific factors.

This is operationally important. If regime dependence is high, then position sizing, factor tilts, and portfolio construction should be conditioned on the regime. If it is low — if most of the entropy is idiosyncratic — then the focus should be on individual stock selection and diversification. The information-theoretic framework provides a principled way to decompose uncertainty into its systematic and idiosyncratic components, without assuming linearity or normality.

Regime Dependence: How Much Does Macro Explain?

Comparing unconditional entropy H(X) with conditional entropy H(X|Regime) for value stock returns. The gap is the mutual information between macro regime and value outcomes — telling us how much of the uncertainty is systematic versus idiosyncratic.

Macro regime explains only 1–3% of value return uncertainty, with the effect strongest at the 1-year horizon. This suggests that the overwhelming majority of value investing uncertainty is idiosyncratic — stock selection and diversification matter far more than regime timing.

The Kelly criterion: Shannon’s bet

Shannon did not only theorise about information — he also applied it to gambling and investing. Working with John Kelly at Bell Labs, he helped develop the Kelly criterion, which determines the optimal fraction of capital to wager on a favourable bet. The formula, in its simplest form, is:

f* = (bp − q) / b

where b is the odds received, p is the probability of winning, and q = 1 − p. The connection to information theory is deep: Kelly showed that maximising the expected logarithm of wealth — the criterion that produces the Kelly fraction — is equivalent to maximising the information rate of the “channel” between your edge and your capital growth.

For value investors, the implications are direct. The Kelly fraction is proportional to your edge — the mutual information between your research signal and the outcome. A high-conviction, high-entropy position (where you believe the market is significantly wrong and the range of outcomes is wide) warrants a larger allocation than a low-conviction, low-entropy position (where you agree with the market on most scenarios).

But Kelly sizing is aggressive. In practice, most institutional investors use fractional Kelly — typically one-quarter to one-half Kelly — because the penalty for overestimating your edge is severe (ruin), while the penalty for underestimating it is merely suboptimal growth. This connects back to the ergodicity question: in a non-ergodic system, survival dominates optimality. Shannon himself, who managed his personal portfolio for decades with impressive results, is reported to have used aggressive but diversified position sizing informed by these principles.

Signal decay: the half-life of information

Information theory tells us that a signal’s value is not static. A piece of fundamental research carries a certain number of bits at the point of purchase, but that information degrades over time as the world moves on. The question is: how quickly?

We can measure this directly. Take a stock bought at year zero based on its fundamentals, then re-measure those same fundamentals at years one, two, three, and four. At each point, compute the mutual information between the signal and the remaining returns. The result is an information decay curve — the half-life of each research signal.

This has direct operational implications. If a signal decays rapidly, the initial research work depreciates fast and must be refreshed frequently. If it holds up, the original thesis remains load-bearing and the investor can afford to be patient. The shape of the decay curve also tells you something about the nature of the signal itself — whether it captures a durable structural feature of the business or a transient mispricing.

Signal Decay: How Information Ages

Tracking how each fundamental signal’s predictive power for remaining returns evolves over a 5-year holding period. Year 0 is the point of purchase; each subsequent year re-measures the signal’s correlation with the returns still to come.

Earnings yield starts as the strongest signal (ρ = 0.18) and remains dominant throughout, though its predictive power for remaining returns naturally decays. ROE holds up well as a durable quality signal. Revenue growth and leverage carry essentially no predictive information at any point. The practical implication: re-underwriting the valuation signal midway through a holding period still tells you something — but less than it did at inception.

Kullback-Leibler divergence: measuring surprise

A related quantity from information theory is the Kullback-Leibler divergence, which measures how one probability distribution diverges from a reference distribution:

DKL(P &Vert; Q) = ∑ P(x) log(P(x) / Q(x))

In investing, Q might be our ex-ante distribution of expected outcomes for a stock, and Pthe distribution we would assign with the benefit of hindsight. The KL divergence tells us how “wrong” our original distribution was — not just whether the point estimate was off, but whether the entire shape of our uncertainty was miscalibrated.

This is a far more demanding standard than asking “was I right or wrong?” A well-calibrated investor may be wrong on individual stocks frequently but assign probability distributions that, over many decisions, closely match realised outcomes. The KL divergence between predicted and realised distributions, averaged across a portfolio’s history, could serve as a measure of forecasting skill that goes beyond simple hit rates. It penalises both overconfidence (distributions that are too narrow) and under-confidence (distributions that are too wide and waste information).

Towards an empirical programme

The ideas above are not merely theoretical. They suggest a concrete empirical programme that can be built on historical financial data:

  • Compute Shannon entropy of return distributions for value stocks versus growth stocks across different holding periods (1, 3, 5, and 10 years). Does value investing genuinely carry higher entropy, and does this entropy decline with the holding period as the channel capacity argument predicts?
  • Measure the mutual information between common value signals (earnings yield, book-to-price, free cash flow yield, dividend yield) and subsequent 3-to-5-year returns. Which signals carry the most information? How does mutual information change across market regimes?
  • Estimate conditional entropy of value returns given macro regimes (rate cycles, credit spreads, earnings growth phases). How much of the uncertainty is systematic versus idiosyncratic?
  • Apply KL divergence as a calibration metric for probabilistic financial models — comparing projected distributions against realised outcomes to assess model fidelity beyond point-estimate accuracy.
  • Build entropy-aware position sizing that adjusts allocations based on the entropy of the thesis distribution and the mutual information of available signals, implementing a practical fractional-Kelly framework.

Each of these is tractable with the financial data available in standard databases. The first three are measurement exercises that require nothing more than historical prices and fundamentals. The fourth integrates with existing Monte Carlo simulation frameworks. The fifth translates the theory into a portfolio construction tool.

Key Takeaway

Shannon’s information theory provides a rigorous framework for reasoning about the uncertainty inherent in value investing. Entropy quantifies the width of the outcome distribution without assuming its shape. Mutual information measures the true value of research — not whether you have a view, but whether that view reduces uncertainty. Channel capacity explains why fundamental analysis works better over longer horizons. And the Kelly criterion connects information advantage directly to position sizing. These are not metaphors — they are measurable quantities that can be computed from data and used to sharpen both the research process and portfolio construction.