Python数据分析(中英对照)·Generating Histograms-生成直方图
2.3.4: Generating Histograms-生成直方图
There are many ways to generate histograms.
有许多方法可以生成直方图。
Let’s look at one of them, which is using the hist function from the plt library.
让我们看看其中一个,它使用plt库中的hist函数。
To get started, let’s first generate some random numbers.
首先,让我们生成一些随机数。
In this case, we’ll use the np random normal function to generate 1,000 samples or draws from the standard normal distribution, which is a special case of the normal distribution that has mean equal to 0 and variance equal to 1.
在这种情况下,我们将使用np随机正态函数从标准正态分布生成1000个样本或样本,这是正态分布的一个特例,其均值等于0,方差等于1。
Let me first import the matplotlib.pyplot library as plt.
首先让我将matplotlib.pyplot库作为plt导入。
And then I’m also going to import the NumPy library.
然后我还要导入NumPy库。
The first step is going to be to create random numbers.
第一步是创建随机数。
Here we’ll be using np.random.normal.
这里我们将使用np.random.normal。
And I need to specify how many numbers I’d like to draw from the standard normal distribution.
我需要指定从标准正态分布中提取多少个数字。
In this case, I’d like to have 1,000 numbers.
在这种情况下,我想要1000个数字。
And I will assign this array into a vector x.
我将把这个数组赋给一个向量x。
We’ll then call the plt.hist function.
然后我们将调用plt.hist函数。
And Python returns a histogram to us.
Python会向我们返回一个直方图。
There are many optional parameters in the hist function.
hist函数中有许多可选参数。
And the best way to get to know them is to explore online documentation.
了解他们的最好方法是浏览在线文档。
You can just Google plt hist, and you’ll get many hits.
你可以在谷歌上搜索plt hist,你会得到很多点击率。
To demonstrate some of the features of hist,I’ll show how to normalize the histogram and how to provide locations of bin edges that are used to construct the histogram.
为了演示hist的一些功能,我将演示如何规范化直方图,以及如何提供用于构造直方图的bin边的位置。
By default, hist uses 10 evenly spaced bins and it tries to optimize both bin width and bin locations.
默认情况下,hist使用10个等间距的箱子,并尝试优化箱子宽度和箱子位置。
But sometimes you really want to be able to specify the bins yourself.
但有时你真的想自己指定垃圾箱。
Let’s go back to our previous example where we had generated 1,000 random variables from the standard normal distribution.
让我们回到上一个例子,我们从标准正态分布生成了1000个随机变量。
We’ll continue working with our plt.hist example,except that we will add first one extra argument, which is normed.
我们将继续使用我们的plt.hist示例,除了我们将添加第一个额外的参数,这是规范化的。
When we set normed to be true, the histogram, in this case on the y-axis,instead of having the number of observations that fall in each bin,we have the proportion of observations that fall in each bin.
当我们将normed设置为true时,柱状图(在本例中为y轴上)不是每个箱子中的观察数量,而是每个箱子中的观察比例。
That’s what it means for a histogram to be normalized.
这就是直方图标准化的含义。
To provide the location of the bins, we used a keyword argument called bins.
为了提供bin的位置,我们使用了一个名为bins的关键字参数。
I’m going to construct the bins using the np.linspace function.
我将使用np.linspace函数来构造垃圾箱。
Remember, the first argument is the starting point.
记住,第一个参数是起点。
I’m going to start at 5, minus 5.
我将从5开始,减去5。
I want to go all the way to plus 5.
我想一直到+5。
And I’d like to have 21 points.
我想要21分。
In this case, you see that the histogram looks different.
在本例中,您会看到直方图看起来有所不同。
That’s because we have specified 20 bins between the numbers minus 5 and plus 5.
这是因为我们在数字减5和加5之间指定了20个箱子。
In the previous example, we generated 21 points to get 20 bins.
在上一个示例中,我们生成了21个点,得到了20个箱子。
Let’s make sure we understand why.
让我们确保我们了解原因。
If we think about the first example where we just have one bin, to have one bin we need to specify the start point– the start location of the bin and the end location of that bin.
如果我们考虑第一个例子,我们只有一个箱子,要有一个箱子,我们需要指定起点–箱子的开始位置和那个箱子的结束位置。
So to get one bin, we need two points.
所以要得到一个垃圾箱,我们需要两个点。
If we had two bins we would need three points, and so on.
如果我们有两个箱子,我们需要三分,以此类推。
This is why to have 20 bins, we will need to have 20 plus 1,or 21, points along the x-axis.
这就是为什么要有20个箱子,我们需要沿着x轴有20加1或21个点。
Let’s then examine another distribution, which is a bit more exotic,the so-called gamma distribution.
然后让我们研究另一个分布,它有点奇怪,即所谓的伽马分布。
It is a continuous probability density function that starts at 0 and goes all the way to positive infinity.
它是一个连续的概率密度函数,从0开始一直到正无穷大。
The gamma distribution, like the normal distribution, has two parameters.
伽马分布与正态分布一样,有两个参数。
For now, we don’t need to be too concerned about the nature of the gamma distribution.
现在,我们不需要太担心伽马分布的性质。
We just know that it’s some type of probability distribution,and we’d like to investigate its shape using histograms.
我们只知道它是某种概率分布,我们想用直方图来研究它的形状。
In this case, we’re going to draw a large number of samples from the gamma distribution.
在这种情况下,我们将从伽马分布中提取大量样本。
Let’s go with 100,000 samples which would give us a very smooth histogram.
让我们用100000个样本,这将给我们一个非常平滑的直方图。
We’ll also meet here a plt function called subplot, which enables us to have several subplots within each figure.
我们还将在这里遇到一个名为subplot的plt函数,它使我们能够在每个图形中拥有多个子图。
This upload function takes in three arguments where the first two specify the number of rows and the number of columns in the subplot,and the third argument gives the plot number.
此上载函数接受三个参数,其中前两个参数指定子绘图中的行数和列数,第三个参数指定绘图编号。
For example, if you specify two rows and three columns,then you will have six subplots.
例如,如果指定两行三列,则将有六个子批次。
The plot number always starts at 1, so in a two by three subplot,the plot numbers range from 1 to 6, where the plot numbers are incremented across rows first.
绘图编号始终从1开始,因此在二乘三的子绘图中,绘图编号的范围为1到6,其中绘图编号首先跨行递增。
Let’s see how this works.
让我们看看这是怎么回事。
Let’s look at a two by three subplot.
让我们看一个二乘三的子地块。
In this case, we have two rows and three columns.
在本例中,我们有两行三列。
So this would be a two by three subplot.
所以这将是一个二乘三的子地块。
The first panel in the top left corner is subplot number 1.
左上角的第一个面板是子地块编号1。
This is number 2.
这是2号。
And this is number 3.
这是3号。
Then we move on to the next row.
然后我们进入下一排。
We have number 4, number 5, and number 6.
我们有4号、5号和6号。
Let’s first draw our samples from the gamma distribution.
我们先从伽马分布中提取样本。
I’m going to call my variable "x".
我将把我的变量称为“x”。
We’re using the random.gamma function.
我们使用的是random.gamma函数。
The first two arguments are 2 and 3.
前两个参数是2和3。
And the third one specifies that we would like to have 100,000 samples from this specific gamma distribution.
第三个指定我们想要从这个特定的伽马分布中获得100000个样本。
To learn about the different histogram options,we have the code up here that demonstrates four different ways of drawing a histogram.
为了了解不同的直方图选项,我们这里的代码演示了绘制直方图的四种不同方法。
I’ll walk you through now each one of them.
现在我会带你逐一看一遍。
We’ll then in the end create one plot that consists of four subplots to demonstrate the use of the subplot function.
最后,我们将创建一个由四个子图组成的图,以演示子图函数的使用。
First, we’ll just use the plt hist function.
首先,我们将只使用plt hist函数。
We’ll provide x, our input vector, and we specify the number of bins, which in this case is 30.
我们将提供x,我们的输入向量,并指定存储箱的数量,在本例中为30。
And this is what the histogram looks like.
这就是直方图的样子。
If we’d like to normalize this histogram,we can use the keyword argument normed, and we set that to be equal to true.
如果我们想规范化这个直方图,我们可以使用关键字参数normed,并将其设置为true。
In this case, the histogram will be normalized.
在这种情况下,直方图将标准化。
We can also try looking at the cumulative histogram.
我们还可以尝试查看累积直方图。
So we’ll say cumulative equals true.
所以我们会说累积等于真。
And in this case, we get the cumulative histogram.
在这种情况下,我们得到了累积直方图。
We can also have both normed and cumulative options be on at the same time.
我们还可以同时启用规范选项和累积选项。
In this case, I can just add normed equals true.
在这种情况下,我可以加上normed等于true。
And I can also change the histogram type.
我还可以改变直方图的类型。
I can do that by using the hist type keyword argument.
我可以通过使用hist type关键字参数来实现这一点。
And I’d like to use a step histogram here.
我想在这里用一个阶跃直方图。
And this is the output that I get.
这是我得到的输出。
We can now pool all of these four different histograms into one figure.
我们现在可以将这四个不同的直方图合并到一个图中。
I will first create a figure by saying plt.figure.
我将首先通过说plt.figure创建一个图形。
And I then insert each of these histograms into its own subplot.
然后我将这些直方图插入到它自己的子图中。
Let’s see what happens.
让我们看看会发生什么。
In this case, we have created just one figure with four panels where each type of histogram appears in its own subplot.
在本例中,我们只创建了一个具有四个面板的图形,其中每种类型的直方图都显示在其自己的子地块中。
金点网络 » Python数据分析(中英对照)·Generating Histograms-生成直方图
常见问题FAQ
- 免费下载或者VIP会员专享资源能否直接商用?
- 本站所有资源版权均属于原作者所有,这里所提供资源均只能用于参考学习用,请勿直接商用。若由于商用引起版权纠纷,一切责任均由使用者承担。
- 是否提供免费更新服务?
- 持续更新,永久免费
- 是否经过安全检测?
- 安全无毒,放心食用