# 2.4.2: Examples Involving Randomness 涉及随机性的例子

Lets work with a few examples that involve randomness.

This is also an opportunity for us to practice some simple data visualization techniques.

Our first example is to roll the die 100 times and plot a histogram of the outcomes, meaning a histogram that shows how frequent the numbers from 1 to 6 appeared in the 100 samples.

Your observations will be integers between 1 and 6.

We will simulate the die using the random module.

And we will plot the histogram using the plt.hist function.

So let’s work with this example.

The first thing we want to do is to import the random module,if you haven’t done so already.

I’m going to take this line of code here and just keep it here for future.

The second thing we want to be able to do is throw a single die.

We already know how to do this.

We did that by using the random choice function using a list with the numbers 1, 2, 3, 4, 5, and 6.

What this line does is it throws one die just one time.

I’ll move that line up here also for future use.

What I would like to be able to do is roll a die not just once, but 100 times.

Let’s first see how I could do that.

Because we want to repeat the rolling of a single die 100 times,this seems to call for a for loop.

In this case, we could use a loop variable– let’s say k.

We’d like to repeat this action 100 times.

And the action we’d like to do is to roll a die.

So we just type random.choice and we have our die here.

In this case, what we’re doing is we’re rolling a die 100 times.

But we’re not storing the results anywhere.

Let me move this code up here for future use.

It appears that we need some variable that will contain the results,will hold the results, from each of the 100 die rolls.

I’m going to call that rolls.

And I’m going to build that up as an empty list.

Every time I roll a die, I’d like to be able to append the new roll to my rolls list.

So I type roll.append.

What gets appended is the outcome of a new die roll.

Let’s try running this code now.

In this case, the code runs.

If we look at the length of rolls, we will see that we have 100 items, 100 objects, in there.

We can also look at the actual numbers.

And it seems to be working.

The final piece that’s missing from our example is the drawing of the histogram.

We’ll do that using plt.hist.

Our variable is called rolls and I’d like to be able to specify the locations of the bins.

Remember, that’s something we can do with the keyword argument bins.

I’m again going to be using NumPy linspace for this.

The starting point is going to be 0.5.

The ending point is going to be 6.5.

And because I would like to have six bins,I need seven evenly spaced points.

Let’s try running the histogram line.

This completes our histogram example.

Here we would have intuitively expected a relatively flat histogram.

But what’s a more rigorous justification for this result?

The law of large numbers, which is a theorem of probability,tells us that we should expect more or less the same number of 1s and 2s all the way to the 6s because they each have the same probability.

And we’ve repeated the experiment– the role of a die– a large number of times.

Well, actually 100 is not such a large number.

So we can see what happens if we increase that number.

Let’s try rerunning our example using 10,000 repetitions.

In this case I will go back to my code as before.

I’m going to add a semi-colon to suppress the printing of the output object the histogram gives me.

And then I will increase the number of data points to 10,000.

I’m going to run or rerun the code and in this case, it appears that we get a histogram that’s more flat.

Let’s now do this 1 million times.

So we add two more zeros here.

We rerun the code.

This takes a bit longer.

And you can see that the histogram is almost completely flat.

Just to recap, we learned how to simulate a die, how to throw a die any number of times, how to visualize the output as a histogram,and by evoking the law of large numbers, we have an understanding of what should happen, which in this case was confirmed by our simulation.

Considering now rolling not one die, but 10 independent dies denoted with x1 to x10.

We’re going to define a new random variable called y, which is the sum of all of the 10x variables.

In other words, our new random variable y is going to be equal to x1 plus x2 plus all the way up to x10.

We’d like to understand the distribution of the random variable y by simulating its values a large number of times,

and then plotting a histogram.

The histogram will give us a reasonably good sense about the distribution of y.

And the larger number of samples of y we use,the smoother the histogram becomes.

Before proceeding, let’s try to anticipate what the histogram might look like.

This is generally a very useful thing to do.

In other words, before you start writing the code,it’s useful to think about what you would expect the result to be.

First, since each x variable is at least 1, and we’re summing 10 of them together, the least value y can assume is 10.

By similar logic, the greatest value of y is 60.

Let’s now think about these two extremes– say, the number 60.

The only way that can occur if all 10 dice give a 6, which is very unlikely.

But if we think about some intermediate value such as number 30,there are many combinations of die rolls that could give us that value.

Because rolling 10 6s is just as likely or unlikely as rolling 10 1s,or 10 of anything, we would expect the histogram to peak at the center.

And we’d also expect it to be very thin towards the ends as we get closer to either 10 or 60.

Let’s now simulate this process to see what happens.

We already know how to roll one die.

So the first task would seem to be how do we roll 10 dice.

We know that we can do that by doing a random choice from 1 to 6.

I’m going to call this x, because that was the notation that we used before.

Our y variable is the sum of several x variables.

So one way to proceed is to construct a loop in which we draw a new value of x 10 times

and we keep building up our variable y.

This would seem to call for a for loop.

I’m going to be using k as my loop variable.

We’ll just type for k in range 10.

We want to repeat this 10 times.

We want to be careful about indenting our code,right?

And then we also need a variable y.

I’m going to define it before the loop.

Initially the value of y is going to be 0.

So what happens is the following:

First, I set y to be equal to 0,I then enter the loop,and I then draw a new value for x.

The final step that’s missing is to update the value of y.

So the new value of y is going to be equal to the old value plus whatever the value of x happens to be.

Let’s then see how we can roll this die multiple times and keep track of those rolls.

Let’s draw all y variables in a list called ys.

Let’s first create our list ys.

That’s an empty list.

The code we have underneath here so far gives us just one realization of the random variable y.

What we’d like to be able to do is have 100 such realizations.

This suggests that we need to run this code 100 times which calls for a for loop.

I’m going to build another for loop and nest my existing for loop inside the new for loop.

The new dummy variable is going to be called rep.

We’re going to be doing this operation 100 times.

In this case, I need to indent the code because I want first to run the outer loop 100 time,and for each time, I want to run the inner loop 10 times.

A key point to realize is at what point of the code should I append the new value of y to ys.

For example, if I type ys.append here, the following is going to happen:

The new value of y is going to be appended to ys every time the inner loop runs.

This is not correct.

We therefore need to de-indent this line.

Now we only append y once we’ve rolled the die 10 times.

Let’s now try running this code.

I’m going to make one alteration here.

It’s usually a good idea to start small.

So instead of doing this 100 times, let me just first do it 5 times.

The code runs, which is a good sign.

I can look at the length of my ys,and I have five numbers in there.

If I print out the values, the numbers seem reasonable.

I can now go back to my code.

Instead of doing this five times, I’m going to do this 100 times.

And I’ll rerun the code.

Now I will have a new set of y variables stored in ys.

what is the minimum value that I have,or what is the maximum value that I have?

In this case, both the minimum value and the maximum value is within expected bounds.

To complete the example, we need to plot the histogram.

Type plt.hist and ys.

And we already have y values stored so we can just try running plt.hist.

Let’s now try rerunning this code.

But instead of doing it 100 times, let’s do it 10,000 times.

I will rerun all of the code and this is the output I get.

Let’s run this one more time.

Again, I’m going to add the semi-colon at the end of plt.hist,which suppresses the output.

Just a couple of arrays that plt returns to me.
plt返回给我的只是几个阵列。
And I’m then going to add two 0s to my range, which means that I will be repeating this process 1 million times.

Let me run this.

This will take a couple of seconds.

And in this case, what we see is a beautiful histogram.

You can see that the shape of the histogram looks a bit like what we anticipated.

And you can get a better sense of the shape by varying the number of pins that you’re using to plot the histogram.

But to understand what’s happening here, we can again get some insights from probability theory.

The so-called central limit theorem, or CLT,states that the sum of a large number of random variables regardless of their distribution will approximately

There are some additional considerations that we will not get into but the main point is the following:

You can sum together many random variables whose distribution is nothing like a normal distribution like die rolls, or even coin flips.

And yet, the distribution of the sum will get closer and closer to a normal distribution as the number of random variables that are added together increases.

The central limit theorem not only helps us understand our simulation results,but it also explains why the normal distribution, sometimes called a Gaussian distribution, occurs so often.

For example, the height of a person probably depends on a large number of factors that are related to things like genetics, nutrition, environment, and so on.

If we think of height as being a random variable that itself consists of a large number of other random variables that are added together,

we would expect the height of a person in a population to follow the normal distribution.

That is, in fact, what we know to be the case from empirical data.