Intro to probability theory – part 2

Ce billet a été traduit de sa version originale en français : Probabilités – Partie 2.

After the previous post about probability theory, here’s the second part, in which I’ll talk about random variables.

The idea of random variables is to have some way of dealing with events for which we do not exactly what happens (for instance, we roll a die), but still want to have some idea of what can happen. The die example is pretty simple, so using random variables may be a bit overkill, but let’s keep examples simple for now.

For a given experiment, we consider a variable, called X, and look at all the values it can reach with the associated probability. If my experiment is “rolling a die and looking at its value”, I can define a random variable on the value of a 6-sided die and call it X. For a full definition of X, I need to provide all the possible values of X (what we call the random variable’s domain) and their associated probabilities. For a 6-sided die, the values are the numbers from 1 to 6; for a non-loaded die, the probabilities are all equal to \displaystyle \frac 1 6. We can write that as follows:

\displaystyle \forall i \in \{1,2,3,4,5,6\}, \Pr[X = i] = \frac 1 6

and read “for all i in the set of values {1,2,3,4,5,6}, the probability that X takes the value i equals \displaystyle \frac 1 6.

One of the basic ways to have an idea about the behaviour of a random variable is to look at its expectation. The expectation of a random variable can be seen as its average value, or as “suppose I roll my die 10000 times, and I average all the results (summing all the results and dividing by 10000), what result would I typically get?”

This expectation (written E[X]) can be computed with the following formula:

E[X] = \displaystyle \sum_{i \in \text{dom}(X)} \Pr[x = i] \times i

which can be read as “sum for all elements i in the domain of X of the probability that X takes the value i, times i. In the die example, since the domain is all the integer numbers from 1 to 6, I can write

\displaystyle \sum_{i=1}^6 \Pr[X = i] \times i

which I can in turn expand as follows:

\begin{aligned}E[X] &=& 1 \times \Pr[X = 1] + 2 \times \Pr[X = 2] + 3 \times \Pr[X = 3]  \\ &+& 4 \times \Pr[X = 4] + 5 \times \Pr[X = 5] + 6 \times \Pr[X = 6]\end{aligned}

Since, for my die, all the probabilities are equal to \displaystyle \frac 1 6, I can conclude with

\displaystyle E[X] = \frac 1 6 \times (1 + 2+3+4+5+6) = \frac{21}{6} = 3.5

So the average value of a die over a large number of experiments is 3.5, as most tabletop gamers would know 😉

Now let’s look at a slightly more complicated example. Suppose that I have n dice, and that I want to know how many 6s I can expect in my n dice. From a handwavy point of view, we know that we will not get an exact answer for every time we roll n dice, but that we can get a rough answer. There’s no reason there should be more or less 6s than 1s, 2s, 3s, 4s or 5s, so generally speaking the dice should be distributed approximately equally in the 6 numbers, so there should be approximately \displaystyle \frac n 6 6s over n dice. (The exception to that being me playing Orks in Warhammer 40k, in which case the expected number is approximately 3 6s over 140 dice.) Let us prove that intuition properly.

I define Y as the random variable representing the number of 6s over n dice. The domain of Y is all the numbers from 0 to n. It’s possible to compute the probability to have, for example, exactly 3 6s over n dice, and even to get a general formula for k dice, but I’m way too lazy to compute all that and sum over n and so on. So let’s be clever.

There’s a very neat trick called linearity of expectation that says that the expectation of the sum of several random variables is equal to the sum of the expectations of said random variables, which we write

E[A + B] = E[A] + E[B]

This is true for all random variables A and B. Beware, though: it’s only true in general for the addition. We cannot say in general that E[A \times B] = E[A] \times E[B]: that’s in particular true if the variables are independent, but it’s not true in general.

Now we’re going to define n variables, called Y_1, Y_2, ..., Y_n so that Y is the sum of all these variables. We can define, for each variable Y_1, the domain {0,1}, and we say that Y_1 is equal to 1 if and only if the die number 1 shows a 6. The other variables Y_i are defined similarly, one for each die. Since I have n variables, which take value 1 when their associated die shows a 6, I can write

\displaystyle Y = \sum_{i = 1}^n Y_i

This is where I use linearity of expectation:

\displaystyle E[Y] = E\left[\sum_{i=1}^n Y_i\right] = \sum_{i=1}^n E[Y_i]

The main trick here is that variables Y_i are much simpler to deal with than Y. With probability \displaystyle \frac 1 6, they take value 1; with probability \displaystyle \frac 5 6, they take value 0. Consequently, the expectation of Y_i is also much easier to compute:

\displaystyle E[Y_i] = 1 \times \frac 1 6 + 0 \times \frac 5 6 = \frac 1 6

Plugging that in the previous result, we get the expectation of Y:

\displaystyle E[Y] = E\left[\sum_{i=1}^n Y_i\right] = \sum_{i=1}^n E[Y_i] = \sum_{i=1}^n \frac 1 6 = \frac n 6

which is the result we expected.

Now my examples are pretty simple. But we can use that kind of tools in much more complicated situations. And there’s a fair amount of other tools that allow to estimate things around random variables, and to have a fairly good idea of what’s happening… even if we involve dice in the process.

One thought on “Intro to probability theory – part 2

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s