The Normal distribution is arguably the most important continuous distribution. It is used throughout the sciences, because of a remarkable result known as the central limit theorem, which is covered in the module Inference for means. Due to the phenomenon behind the central limit theorem, many variables tend to show an empirical distribution that is close to the Normal distribution. If \(X\) has a Normal distribution with mean \(\mu\) and standard deviation \(\sigma\), then we write that \(X \stackrel{\mathrm{d}}{=} \mathrm{N}(\mu,\sigma^2)\); the probability density function of \(X\) is given by \[ f_X(x) = \dfrac{1}{\sigma \sqrt{2\pi}} \exp\Bigl(\dfrac{-(x-\mu)^2}{2\sigma^2}\Bigr), \qquad\text{for } x \in \mathbb{R}. \]This distribution is so important that it is well known in general culture, where it is often referred to as the bell curve — for example, in the controversial 1994 book by R. J. Herrnstein entitled The Bell Curve: Intelligence and Class Structure in American Life. Detailed description Figure 2: The pdf of a Normal random variable with mean \(\mu\) and standard deviation \(\sigma\). Several properties of the Normal distribution are worth noting:
Figure 3: Probabilities of three intervals for the Normal distribution.
Suppose that \(f_X(x)\) is the pdf of a Normal random variable with mean \(\mu\) and standard deviation \(\sigma\).
Recall that, for continuous random variables, it is the cumulative distribution function (cdf) and not the pdf that is used to find probabilities, because we are always concerned with the probability of the random variable being in an interval. Before considering the cdf of \(X \stackrel{\mathrm{d}}{=} \mathrm{N}(\mu,\sigma^2)\), we explore a very useful feature of the Normal distribution. A random variable with the standard Normal distribution, commonly denoted by \(Z\), has mean zero and standard deviation one. That is, \(Z \stackrel{\mathrm{d}}{=} \mathrm{N}(0,1)\). The pdf for the standard Normal distribution is \[ f_Z(z) = \dfrac{1}{\sqrt{2\pi}} \exp\bigl(-\tfrac{1}{2} z^2\bigr), \qquad\text{for } z \in \mathbb{R}. \]The probabilities for any Normal distribution can be reduced to probabilities for the standard Normal distribution, using the device of standardisation. Therefore probability calculations for any Normal distribution can be reduced to calculations for the standard Normal distribution, as shown by the following result. Standardisation of a Normal distributionIf \(X \stackrel{\mathrm{d}}{=} \mathrm{N}(\mu,\sigma^2)\) and \(X_s = \dfrac{X-\mu}{\sigma}\), then \(X_s \stackrel{\mathrm{d}}{=} \mathrm{N}(0,1)\). Proof
The result is established by first considering the cdf of \(X_s\). We have \begin{align*} F_{X_s}(z) &= \Pr(X_s \leq z) \\\\ &= \Pr\Bigl(\dfrac{X-\mu}{\sigma} \leq z\Bigr) \\\\ &= \Pr(X \leq \sigma z + \mu) \\\\ &= F_X(\sigma z + \mu). \end{align*}Hence, \begin{align*} f_{X_s}(z) &= \dfrac{d}{dz} F_{X_s}(z) \\\\ &= \dfrac{d}{dz} F_X(\sigma z + \mu) \\\\ &= \sigma f_X(\sigma z + \mu) \qquad\qquad \text{(by the chain rule)} \\\\ &= \dfrac{1}{\sqrt{2\pi}} \exp\bigl(-\tfrac{1}{2} z^2\bigr). \end{align*}It follows that \(X_s \stackrel{\mathrm{d}}{=} \mathrm{N}(0,1)\). \(\Box\) Finding probabilities for the standard Normal distributions requires technology: the cdf of \(Z \stackrel{\mathrm{d}}{=} \mathrm{N}(0,1)\) is \[ F_Z(z) = \int_{-\infty}^z \dfrac{1}{\sqrt{2\pi}} \exp\bigl(-\tfrac{1}{2}t^2\bigr) \;dt. \]This integral does not have a closed form, and must be evaluated using numerical integration. It is available in statistical software, on many calculators, in Matlab and in Excel; here we describe the Excel function. It is \(\sf \text{NORM.S.DIST}\), which requires two arguments:
For example, to use Excel to find the value of \(F_Z(1.5)\), the cdf of the standard Normal distribution when \(z=1.5\), enter \[ \sf \text{=NORM.S.DIST(1.5, 1)} \]in a cell and hit return. You should obtain the value 0.9332.
Suppose that crowd size at home games for a particular football club follows a Normal distribution with mean \(26\ 000\) and standard deviation 5000. What percentage of crowds are between \(31\ 000\) and \(36\ 000\)? We standardise to solve this. Let \(X \stackrel{\mathrm{d}}{=} \mathrm{N}(26\;000, 5000^2)\). Then \(X_s = \dfrac{X - 26\;000}{5000} \stackrel{\mathrm{d}}{=} \mathrm{N}(0,1)\), and therefore \begin{align*} \Pr(31\;000 < X < 36\;000) &= \Pr\Bigl(\dfrac{31\;000 - 26\;000}{5000} < \dfrac{X - 26\;000}{5000} < \dfrac{36\;000 - 26\;000}{5000}\Bigr) \\\\ &= \Pr(1 < X_s < 2) \\\\ &= F_{X_s}(2) - F_{X_s}(1) \\\\ &= 0.9772 - 0.8413 \\\\ &= 0.1359. \end{align*}Note that, in this example, \(31\;000 = \mu + \sigma\) and \(36\;000 = \mu + 2 \sigma\). If the mean and the standard deviation were different from these, but we still sought the probability of being between one and two standard deviations greater than the mean, then the same probability would be obtained. This is illustrated in figure 4, in which the same probability as that obtained in the example (\(0.1359\)) is found in all four cases. Detailed description Figure 4: Four Normal probability density functions. Interactive 1 Interactive 2The cdf of any Normal distribution can also be found, using technology, without first standardising. If \(X \stackrel{\mathrm{d}}{=} \mathrm{N}(\mu,\sigma^2)\), then the cdf of \(X\) is given by \[ \Pr(X \leq x) = F_X(x) = \int_{-\infty}^x \dfrac{1}{\sigma \sqrt{2\pi}} \exp\Bigl(\dfrac{-(t-\mu)^2}{2\sigma^2}\Bigr) \;dt, \qquad\text{for } x \in \mathbb{R}. \]One way to obtain this is in Excel using the function \(\sf \text{NORM.DIST}\). This function requires four arguments:
We can use this function to find the required probabilities in the crowd-size example directly. For example, you should find that typing \[ \sf \text{=NORM.DIST(36000, 26000, 5000, 1)} \]returns the value 0.9772. Sometimes we need to find a quantile of the Normal distribution. Let \(q\) be a number between 0 and 1. Then the \(q\) quantile, \(c_q\), of the Normal distribution with cdf \(F_X\) is defined by the equation \[ F_X(c_q) = q. \]To obtain the value of \(c_q\), we can use technology. In Excel, for example, the function is \(\sf \text{NORM.INV}\). It requires three arguments:
Suppose that the difference between the forecast maximum temperature and the actual maximum temperature (in degrees Celsius) in a city is Normally distributed with mean 0 and standard deviation 1.2.
Animals of a given weight are operated on in a veterinary hospital. The dose of anaesthetic \(A\) (in mg) required to render the animals suitably unconscious for the operation is Normally distributed with mean 120 and standard deviation 20. The lethal dose \(L\) (in mg) of the same anaesthetic for these animals is also Normally distributed, with mean 400 and standard deviation 50.
We shall study confidence intervals in the two modules Inference for proportions and Applications of differentiation. In that context, we want to know the bounds of the central 95% of the distribution for \(Z \stackrel{\mathrm{d}}{=} \mathrm{N}(0,1)\). That is, we want \(z\) such that \[ \Pr(-z < Z < z) = 0.95. \]We can find this \(z\) using the same techniques as for quantiles. Since the standard Normal distribution is symmetric about 0, we require \[ \Pr(Z \leq -z) = \tfrac{1}{2}(1 - 0.95) = 0.025 \qquad\text{and}\qquad \Pr(Z \geq z) = \tfrac{1}{2}(1 - 0.95) = 0.025. \]So we want \[ F_Z(z) = \Pr(Z \leq z) = 1 - 0.025 = 0.975. \]We can now find \(z\) in Excel using \[ \sf \text{=NORM.INV(0.975, 0, 1)}, \]which gives 1.96. This is illustrated in the following figure. Figure 5: The standard Normal distribution, \(Z \stackrel{\mathrm{d}}{=} \mathrm{N}(0,1)\). More generally, if we are given a probability \(p\) and we want \(z\) with \(\Pr(-z < Z < z) = p\), then we find \(z\) such that \[ F_Z(z) = \dfrac{p+1}{2}. \]Next page - Answers to exercises |