Python数据分析(中英对照)·Examples Involving Randomness 涉及随机性的例子

作者 : jamin 本文共12092个字,预计阅读时间需要31分钟 发布时间: 2020-10-18 共1000人阅读

2.4.2: Examples Involving Randomness 涉及随机性的例子

Lets work with a few examples that involve randomness.
我们来举几个涉及随机性的例子。
This is also an opportunity for us to practice some simple data visualization techniques.
这也是我们练习一些简单数据可视化技术的机会。
Our first example is to roll the die 100 times and plot a histogram of the outcomes, meaning a histogram that shows how frequent the numbers from 1 to 6 appeared in the 100 samples.
我们的第一个例子是将模具旋转100次并绘制结果的直方图,这意味着直方图显示了100个样本中1到6的数字出现的频率。
Your observations will be integers between 1 and 6.
您的观察值将是介于1和6之间的整数。
We will simulate the die using the random module.
我们将使用随机模块模拟模具。
And we will plot the histogram using the plt.hist function.
我们将使用plt.hist函数绘制直方图。
So let’s work with this example.
让我们来看看这个例子。
The first thing we want to do is to import the random module,if you haven’t done so already.
我们要做的第一件事是导入随机模块,如果您还没有这样做的话。
I’m going to take this line of code here and just keep it here for future.
我将把这行代码放在这里,并把它保存在这里,以备将来使用。
The second thing we want to be able to do is throw a single die.
我们想做的第二件事就是扔一个骰子。
We already know how to do this.
我们已经知道如何做到这一点。
We did that by using the random choice function using a list with the numbers 1, 2, 3, 4, 5, and 6.
我们通过使用随机选择函数,使用数字为1、2、3、4、5和6的列表来实现这一点。
What this line does is it throws one die just one time.
这条线的作用是一次抛出一个骰子。
I’ll move that line up here also for future use.
我也会把这条线移到这里,以备将来使用。
What I would like to be able to do is roll a die not just once, but 100 times.
我想做的是掷骰子不是一次,而是100次。
Let’s first see how I could do that.
让我们先看看我是怎么做到的。
Because we want to repeat the rolling of a single die 100 times,this seems to call for a for loop.
因为我们想将单个模具的滚动重复100次,这似乎需要一个for循环。
In this case, we could use a loop variable– let’s say k.
在这种情况下,我们可以使用一个循环变量——比如k。
We’d like to repeat this action 100 times.
我们想重复这个动作100次。
And the action we’d like to do is to roll a die.
我们要做的就是掷骰子。
So we just type random.choice and we have our die here.
所以我们只需输入random.choice,我们就有了死亡。
In this case, what we’re doing is we’re rolling a die 100 times.
在这种情况下,我们所做的是将模具滚动100次。
But we’re not storing the results anywhere.
但我们不会把结果存储在任何地方。
Let me move this code up here for future use.
让我把这个代码移到这里以备将来使用。
It appears that we need some variable that will contain the results,will hold the results, from each of the 100 die rolls.
似乎我们需要一些变量来包含结果,保存100个模辊中的每个模辊的结果。
I’m going to call that rolls.
我要叫它劳斯莱斯。
And I’m going to build that up as an empty list.
我将把它建立为一个空列表。
Every time I roll a die, I’d like to be able to append the new roll to my rolls list.
每次我掷骰子时,我希望能够将新的骰子附加到我的骰子列表中。
So I type roll.append.
所以我输入roll.append。
What gets appended is the outcome of a new die roll.
追加的是新模具卷的结果。
Let’s try running this code now.
现在让我们试着运行这段代码。
In this case, the code runs.
在本例中,代码将运行。
If we look at the length of rolls, we will see that we have 100 items, 100 objects, in there.
如果我们看一下卷轴的长度,我们会看到里面有100个物品,100个物体。
We can also look at the actual numbers.
我们也可以看看实际的数字。
And it seems to be working.
而且它似乎在起作用。
The final piece that’s missing from our example is the drawing of the histogram.
我们示例中缺少的最后一个部分是直方图的绘制。
We’ll do that using plt.hist.
我们将使用plt.hist来实现这一点。
Our variable is called rolls and I’d like to be able to specify the locations of the bins.
我们的变量称为rolls,我希望能够指定垃圾箱的位置。
Remember, that’s something we can do with the keyword argument bins.
记住,这是我们可以用关键字参数箱做的。
I’m again going to be using NumPy linspace for this.
我将再次使用NumPy linspace进行此操作。
The starting point is going to be 0.5.
起点是0.5。
The ending point is going to be 6.5.
终点将是6.5。
And because I would like to have six bins,I need seven evenly spaced points.
因为我想要六个垃圾箱,所以我需要七个均匀分布的点。
Let’s try running the histogram line.
让我们试着运行直方图行。
This completes our histogram example.
这就完成了我们的直方图示例。
Here we would have intuitively expected a relatively flat histogram.
在这里,我们会直觉地预期一个相对平坦的柱状图。
But what’s a more rigorous justification for this result?
但对于这个结果,有什么更为严格的理由呢?
The law of large numbers, which is a theorem of probability,tells us that we should expect more or less the same number of 1s and 2s all the way to the 6s because they each have the same probability.
大数定律是一个概率定理,它告诉我们,从1到2,一直到6,我们都应该期望或多或少的相同数量的1和2,因为它们都有相同的概率。
And we’ve repeated the experiment– the role of a die– a large number of times.
我们已经重复了这个实验——模具的作用——很多次了。
Well, actually 100 is not such a large number.
嗯,实际上100不是一个很大的数字。
So we can see what happens if we increase that number.
所以我们可以看到如果我们增加这个数字会发生什么。
Let’s try rerunning our example using 10,000 repetitions.
让我们尝试使用10000次重复来重新运行我们的示例。
In this case I will go back to my code as before.
在这种情况下,我将像以前一样回到我的代码。
I’m going to add a semi-colon to suppress the printing of the output object the histogram gives me.
我将添加一个分号来抑制直方图提供的输出对象的打印。
And then I will increase the number of data points to 10,000.
然后我会将数据点的数量增加到10000。
I’m going to run or rerun the code and in this case, it appears that we get a histogram that’s more flat.
我将运行或重新运行代码,在本例中,我们得到了一个更平坦的直方图。
Let’s now do this 1 million times.
现在让我们做一百万次。
So we add two more zeros here.
所以我们在这里再加两个零。
We rerun the code.
我们重新运行代码。
This takes a bit longer.
这需要更长的时间。
And you can see that the histogram is almost completely flat.
你可以看到直方图几乎是完全平坦的。
Just to recap, we learned how to simulate a die, how to throw a die any number of times, how to visualize the output as a histogram,and by evoking the law of large numbers, we have an understanding of what should happen, which in this case was confirmed by our simulation.
简单回顾一下,我们学习了如何模拟模具,如何多次投掷模具,如何将输出可视化为直方图,通过唤起大数定律,我们了解了应该发生什么,在这种情况下,我们的模拟证实了这一点。
Considering now rolling not one die, but 10 independent dies denoted with x1 to x10.
考虑到现在滚动的不是一个模具,而是10个用x1到x10表示的独立模具。
We’re going to define a new random variable called y, which is the sum of all of the 10x variables.
我们将定义一个新的随机变量y,它是所有10x变量的总和。
In other words, our new random variable y is going to be equal to x1 plus x2 plus all the way up to x10.
换句话说,我们的新随机变量y将等于x1加x2,一直到x10。
We’d like to understand the distribution of the random variable y by simulating its values a large number of times,
我们想通过多次模拟随机变量y的值来了解其分布,
and then plotting a histogram.
然后绘制直方图。
The histogram will give us a reasonably good sense about the distribution of y.
柱状图将使我们对y的分布有一个相当好的了解。
And the larger number of samples of y we use,the smoother the histogram becomes.
我们使用的y样本越多,直方图就越平滑。
Before proceeding, let’s try to anticipate what the histogram might look like.
在继续之前,让我们尝试预测直方图可能是什么样子。
This is generally a very useful thing to do.
这通常是一件非常有用的事情。
In other words, before you start writing the code,it’s useful to think about what you would expect the result to be.
换句话说,在开始编写代码之前,考虑一下预期结果是什么是很有用的。
First, since each x variable is at least 1, and we’re summing 10 of them together, the least value y can assume is 10.
首先,因为每个x变量至少是1,我们把其中的10个加起来,所以y可以假设的最小值是10。
By similar logic, the greatest value of y is 60.
根据类似的逻辑,y的最大值为60。
Let’s now think about these two extremes– say, the number 60.
现在让我们考虑这两个极端——比如,数字60。
The only way that can occur if all 10 dice give a 6, which is very unlikely.
如果所有10个骰子都给出6,那么这是唯一可能发生的方法,这是非常不可能的。
But if we think about some intermediate value such as number 30,there are many combinations of die rolls that could give us that value.
但是如果我们考虑一些中间值,比如数字30,有许多模具辊的组合可以给我们这个值。
Because rolling 10 6s is just as likely or unlikely as rolling 10 1s,or 10 of anything, we would expect the histogram to peak at the center.
因为滚动10秒与滚动10秒或任何10秒一样可能或不太可能,所以我们预计直方图将在中心达到峰值。
And we’d also expect it to be very thin towards the ends as we get closer to either 10 or 60.
我们还预计,当我们接近10或60时,它将非常薄。
Let’s now simulate this process to see what happens.
现在让我们模拟这个过程,看看会发生什么。
We already know how to roll one die.
我们已经知道如何掷骰子了。
So the first task would seem to be how do we roll 10 dice.
因此,第一个任务似乎是如何掷10个骰子。
Let’s start with the rolling of just one die.
让我们从一个模具的滚动开始。
We know that we can do that by doing a random choice from 1 to 6.
我们知道我们可以通过从1到6的随机选择来做到这一点。
I’m going to call this x, because that was the notation that we used before.
我将把这个叫做x,因为这是我们以前使用的符号。
Our y variable is the sum of several x variables.
我们的y变量是几个x变量的总和。
So one way to proceed is to construct a loop in which we draw a new value of x 10 times
一种方法是构造一个循环,在这个循环中,我们画一个新的x值10倍
and we keep building up our variable y.
我们不断地建立变量y。
This would seem to call for a for loop.
这似乎需要一个for循环。
I’m going to be using k as my loop variable.
我将使用k作为循环变量。
We’ll just type for k in range 10.
我们只需要输入范围为10的k。
We want to repeat this 10 times.
我们想重复10次。
We want to be careful about indenting our code,right?
我们要小心缩进代码,对吗?
And then we also need a variable y.
然后我们还需要一个变量y。
I’m going to define it before the loop.
我将在循环之前定义它。
Initially the value of y is going to be 0.
最初,y的值将为0。
So what happens is the following:
因此,发生的情况如下:
First, I set y to be equal to 0,I then enter the loop,and I then draw a new value for x.
首先,我将y设置为等于0,然后进入循环,然后为x绘制一个新值。
The final step that’s missing is to update the value of y.
缺少的最后一步是更新y的值。
So the new value of y is going to be equal to the old value plus whatever the value of x happens to be.
所以y的新值等于旧值加上x的值。
Let’s then see how we can roll this die multiple times and keep track of those rolls.
然后,让我们看看如何多次滚动模具,并跟踪这些滚动。
Let’s draw all y variables in a list called ys.
让我们在一个名为ys的列表中绘制所有y变量。
Let’s first create our list ys.
让我们首先创建我们的列表。
That’s an empty list.
那是一张空名单。
The code we have underneath here so far gives us just one realization of the random variable y.
到目前为止,我们在下面的代码只给出了随机变量y的一个实现。
What we’d like to be able to do is have 100 such realizations.
我们希望能够做到的是有100个这样的实现。
This suggests that we need to run this code 100 times which calls for a for loop.
这表明我们需要将这段代码运行100次,这将调用for循环。
I’m going to build another for loop and nest my existing for loop inside the new for loop.
我将构建另一个for循环,并将现有的for循环嵌套在新的for循环中。
The new dummy variable is going to be called rep.
新的虚拟变量将被称为rep。
We’re going to be doing this operation 100 times.
我们要做这个手术100次。
In this case, I need to indent the code because I want first to run the outer loop 100 time,and for each time, I want to run the inner loop 10 times.
在本例中,我需要缩进代码,因为我希望首先运行外循环100次,每次运行内循环10次。
A key point to realize is at what point of the code should I append the new value of y to ys.
要实现的一个关键点是,我应该在代码的哪一点将y的新值附加到ys。
For example, if I type ys.append here, the following is going to happen:
例如,如果我在此处键入ys.append,将发生以下情况:
The new value of y is going to be appended to ys every time the inner loop runs.
每次运行内部循环时,y的新值都将附加到ys。
This is not correct.
这是不对的。
We therefore need to de-indent this line.
因此,我们需要将这一行缩进。
Now we only append y once we’ve rolled the die 10 times.
现在我们只在掷骰子10次后追加y。
Let’s now try running this code.
现在让我们试着运行这段代码。
I’m going to make one alteration here.
我要在这里做一个改动。
It’s usually a good idea to start small.
从小处着手通常是个好主意。
So instead of doing this 100 times, let me just first do it 5 times.
因此,与其做100次,不如让我先做5次。
The code runs, which is a good sign.
代码运行,这是一个好迹象。
I can look at the length of my ys,and I have five numbers in there.
我可以看看我的Y的长度,里面有五个数字。
If I print out the values, the numbers seem reasonable.
如果我打印出这些值,这些数字似乎是合理的。
I can now go back to my code.
我现在可以回到我的代码了。
Instead of doing this five times, I’m going to do this 100 times.
我不会做五次,而是要做100次。
And I’ll rerun the code.
我会重新运行代码。
Now I will have a new set of y variables stored in ys.
现在我将在ys中存储一组新的y变量。
To learn more about those values I can ask –
要了解更多关于这些价值观的信息,我可以问-
what is the minimum value that I have,or what is the maximum value that I have?
我拥有的最小值是多少,或者我拥有的最大值是多少?
In this case, both the minimum value and the maximum value is within expected bounds.
在这种情况下,最小值和最大值都在预期范围内。
To complete the example, we need to plot the histogram.
为了完成这个例子,我们需要绘制直方图。
Type plt.hist and ys.
输入plt.hist和ys。
And we already have y values stored so we can just try running plt.hist.
我们已经存储了y值,所以我们可以试着运行plt.hist。
Let’s now try rerunning this code.
现在让我们尝试重新运行此代码。
But instead of doing it 100 times, let’s do it 10,000 times.
但与其做100次,不如让我们做10000次。
I will rerun all of the code and this is the output I get.
我将重新运行所有代码,这是我得到的输出。
Let’s run this one more time.
让我们再运行一次。
Again, I’m going to add the semi-colon at the end of plt.hist,which suppresses the output.
同样,我将在plt.hist的末尾添加分号,这将抑制输出。
Just a couple of arrays that plt returns to me.
plt返回给我的只是几个阵列。
And I’m then going to add two 0s to my range, which means that I will be repeating this process 1 million times.
然后我将在我的范围内增加两个0,这意味着我将重复这个过程一百万次。
Let me run this.
让我来处理这个。
This will take a couple of seconds.
这需要几秒钟的时间。
And in this case, what we see is a beautiful histogram.
在这个例子中,我们看到的是一个漂亮的柱状图。
You can see that the shape of the histogram looks a bit like what we anticipated.
您可以看到直方图的形状看起来有点像我们预期的。
And you can get a better sense of the shape by varying the number of pins that you’re using to plot the histogram.
通过改变用于绘制直方图的针数,可以更好地了解形状。
But to understand what’s happening here, we can again get some insights from probability theory.
但是为了理解这里发生了什么,我们可以再次从概率论中获得一些见解。
The so-called central limit theorem, or CLT,states that the sum of a large number of random variables regardless of their distribution will approximately
所谓的中心极限定理(CLT)指出,大量随机变量的总和,无论其分布如何,都将
follow a normal distribution.
遵循正态分布。
There are some additional considerations that we will not get into but the main point is the following:
我们将不讨论其他一些考虑因素,但要点如下:
You can sum together many random variables whose distribution is nothing like a normal distribution like die rolls, or even coin flips.
您可以将许多随机变量相加,这些随机变量的分布与骰子掷骰,甚至抛硬币等正态分布完全不同。
And yet, the distribution of the sum will get closer and closer to a normal distribution as the number of random variables that are added together increases.
然而,随着随机变量数量的增加,总和的分布会越来越接近正态分布。
The central limit theorem not only helps us understand our simulation results,but it also explains why the normal distribution, sometimes called a Gaussian distribution, occurs so often.
中心极限定理不仅帮助我们理解我们的模拟结果,而且还解释了为什么正态分布,有时称为高斯分布,经常出现。
For example, the height of a person probably depends on a large number of factors that are related to things like genetics, nutrition, environment, and so on.
例如,一个人的身高可能取决于许多与遗传、营养、环境等相关的因素。
If we think of height as being a random variable that itself consists of a large number of other random variables that are added together,
如果我们认为高度是一个随机变量,它本身由大量其他随机变量加在一起组成,
we would expect the height of a person in a population to follow the normal distribution.
我们期望人口中的身高服从正态分布。
That is, in fact, what we know to be the case from empirical data.
事实上,这就是我们从经验数据中知道的情况。

本站所提供的部分资源来自于网络,版权争议与本站无关,版权归原创者所有!仅限用于学习和研究目的,不得将上述内容资源用于商业或者非法用途,否则,一切后果请用户自负。您必须在下载后的24个小时之内,从您的电脑中彻底删除上述内容资源。如果上述内容资对您的版权或者利益造成损害,请提供相应的资质证明,我们将于3个工作日内予以删除。本站不保证所提供下载的资源的准确性、安全性和完整性,源码仅供下载学习之用!如用于商业或者非法用途,与本站无关,一切后果请用户自负!本站也不承担用户因使用这些下载资源对自己和他人造成任何形式的损失或伤害。如有侵权、不妥之处,请联系站长以便删除!
金点网络-全网资源,一网打尽 » Python数据分析(中英对照)·Examples Involving Randomness 涉及随机性的例子

常见问题FAQ

免费下载或者VIP会员专享资源能否直接商用?
本站所有资源版权均属于原作者所有,这里所提供资源均只能用于参考学习用,请勿直接商用。若由于商用引起版权纠纷,一切责任均由使用者承担。
是否提供免费更新服务?
持续更新,永久免费
是否经过安全检测?
安全无毒,放心食用

提供最优质的资源集合

立即加入 友好社区
×