See Also

Markov had an intellectual battle where he aimed to prove that the Law of Large Numbers could apply to dependent events, challenging the prevailing idea that it only applied to independent ones.

Conditions for dependent events to follow the [Law of Large Numbers]:

For dependent events, the conditions become more complex, but generally, they involve some form of “weak dependence” or “decaying dependence” over time. The key idea is that the influence of past events on future events must diminish as the events get further apart.

Markov’s Challenge: Dependent Events and the Law of Large Numbers

Before Markov, many mathematicians believed that independence was a necessary condition for the Law of Large Numbers to hold. Pavl Nekrasov, in particular, used this belief to support philosophical arguments about free will. He suggested that if social phenomena (like crime rates or marriage rates) followed the Law of Large Numbers, it implied that the individual choices leading to these phenomena must be independent acts of free will.

Andrey Markov found this theological connection to mathematics absurd. He set out to prove that the Law of Large Numbers could also apply to dependent events, thereby severing the presumed link between mathematical independence and free will.

Markov’s Proof using “Eugene Onegin

Markov’s brilliant approach involved analyzing the sequence of vowels (V) and consonants (C) in the first 20,000 letters of Alexander Pushkin’s poem “Eugene Onegin.” This was a painstaking manual task!

Here’s how he demonstrated that dependent events can still follow the LLN:

  1. Challenging Independence:

    • Observation: Markov first calculated the overall proportion of vowels and consonants in the text. Let’s say he found the overall probability of a letter being a vowel, , and a consonant, . (For instance, one source mentions 8638 vowels and 11362 consonants out of 20,000 letters, so and .

    • Independence Assumption Test: If letters were independent, then the probability of a vowel followed by a vowel, P(V then V), would simply be . Similarly, P(V then C) would be , and so on.

    • The Discrepancy: Markov then meticulously counted the actual occurrences of these pairs in the text . He found that the actual observed frequencies of these letter pairs were significantly different from what would be expected if the letters were truly independent. For example, the probability of a vowel being followed by another vowel was much lower than P(V)×P(V), while a vowel being followed by a consonant was much higher than P(V)×P(C). This clearly showed that the occurrence of a letter was dependent on the preceding letter.

  2. Introducing the “Markov Chain” Concept (Dependent Probabilities):

    • Since independence was disproven, Markov needed a way to model these dependencies. He introduced the idea that the probability of the next letter depends only on the current letter, not on the entire history of letters before it. This is the core concept of a Markov chain – a “memoryless” property.

    • He calculated conditional probabilities:

      • P(Vowel next | Current is Vowel)

      • P(Consonant next | Current is Vowel)

      • P(Vowel next | Current is Consonant)

      • P(Consonant next | Current is Consonant)

    • These conditional probabilities represented the “transition rules” of his system.

  3. Simulating and Proving the LLN:

    • Markov then set up a theoretical “prediction machine” based on these observed conditional probabilities. He could start with any initial letter (vowel or consonant) and then, based on the transition probabilities, generate a long sequence of letters.

    • What he showed was that even though each letter’s choice was dependent on the previous one, as he generated a very long sequence using these dependent rules, the overall proportion of vowels and consonants in the generated sequence still converged to the same overall proportions (approximately 43% vowels and 57% consonants) that he observed in Pushkin’s original text.

    • Markov’s proof showed that if you run this dependent, probabilistic process for a very long time, the overall proportion of vowels and consonants in the final generated sequence would converge to the same proportions he observed in Pushkin’s original text, thereby proving that the Law of Large Numbers could apply to a system with dependent events. He wasn’t just generating random letters; he was generating a random sequence governed by the rules of dependence he discovered (the transitional probabilities).