Empirical Rule at Five Levels

  1. Child: Imagine we have a pile of apples. Most of them are pretty much the same size, but some are really big and some are really small. The Empirical Rule is like saying: if we line up all the apples from smallest to biggest, about 7 out of 10 will be close to the middle size, about 9 out of 10 will include slightly smaller and slightly bigger ones, and almost all of them (almost 10 out of 10) will include the really small and really big ones.

  2. Teenager: Let’s say you are in a class, and you’ve got test scores from all students. Most students will score around the average, right? But some students will score very high or very low. The Empirical Rule is like a guideline that says if your class scores are spread out in a bell-shaped pattern (like a hill), about 68% of the scores will be within one standard deviation from the average score, 95% within two standard deviations, and almost all (99.7%) within three standard deviations.

  3. Undergrad majoring in the same subject: In statistics, the Empirical Rule, also known as the 68-95-99.7 rule, applies to a normal distribution (bell curve). It states that for a given dataset with a normal distribution, approximately 68% of the data points will fall within one standard deviation of the mean, 95% within two standard deviations, and 99.7% within three standard deviations. This rule provides a quick estimate of the probability of certain outcomes in a normal distribution.

  4. Grad student: The Empirical Rule, integral to inferential statistics, pertains to the properties of the normal distribution. It provides a rapid method for assessing probabilities associated with a normally distributed variable. The rule, also known as the 68-95-99.7 rule, stipulates that about 68% of observations will fall within ±1 standard deviation of the mean, about 95% will fall within ±2 standard deviations, and about 99.7% will fall within ±3 standard deviations. This understanding forms the backbone of hypothesis testing and confidence interval construction in many statistical analyses.

  5. Colleague (Fellow Researcher/Engineer): The Empirical Rule, or the 68-95-99.7 rule, is a fundamental aspect of normal distributions in statistical analysis. It provides a foundational shorthand for understanding the distribution of data within the standard deviations of the mean, allowing us to make rapid, informed estimates about our data. However, it’s essential to remember that this rule only applies to normal or near-normal distributions and should be applied cautiously when dealing with skewed or otherwise non-normal distributions. The rule’s reliance on standard deviations also highlights the importance of understanding the potential impact of outliers on standard deviation as a measure of dispersion.

Richard Feynman Explanation

Imagine you’re playing darts at a local pub. You’ve been practicing, so you’re not bad at it. Most of your darts hit pretty close to the bullseye. But you’re not perfect, so some darts stray a bit farther away. Now, imagine if we collected data on where all your darts landed and made a graph of it.

What we’d likely see is a “bell curve,” also known as a normal distribution. The bullseye, where most darts land, is right in the center of the curve, and as we move away from the bullseye, fewer and fewer darts land — so the curve gets lower.

Now, here’s the cool part. The Empirical Rule gives us a quick way to estimate where our darts are likely to land. It says that about 68% of your darts (let’s not quibble about whether it’s 68.27% or 68.28%, let’s just call it 68% for simplicity) will land within one ‘standard deviation’ from the bullseye. In dart terms, let’s say a ‘standard deviation’ is the length of one dart.

So, 68% of your darts land within a dart’s length from the bullseye. About 95% land within two dart lengths. And about 99.7% land within three dart lengths. That’s the Empirical Rule! It’s a handy way to estimate probabilities for things that follow a normal distribution, like dart throws, heights of people, test scores, and a whole lot more. Just remember: 68 - 95 - 99.7. Easy peasy!

Code

The Empirical Rule, also known as the 68-95-99.7 rule, is a statistical rule which states that for a normal distribution, almost all data falls within three standard deviations of the mean. More precisely:

  • About 68% of the data falls within one standard deviation of the mean.
  • About 95% falls within two standard deviations.
  • About 99.7% falls within three standard deviations.

In Python, you can use packages such as numpy and matplotlib to visualize and express this empirical rule.

Here is a simple code example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# Generate some data for this demonstration.
data = np.random.normal(0, 1, 10000)  # mean = 0, std deviation = 1

# Fit a normal distribution to the data:
mu, std = norm.fit(data)

# Calculate the Empirical Rule percentages
one_std = len(data[(data > mu - std) & (data < mu + std)]) / len(data)
two_std = len(data[(data > mu - 2*std) & (data < mu + 2*std)]) / len(data)
three_std = len(data[(data > mu - 3*std) & (data < mu + 3*std)]) / len(data)

print('Data within one standard deviation: ', one_std)
print('Data within two standard deviations: ', two_std)
print('Data within three standard deviations: ', three_std)

# Plot the histogram.
plt.hist(data, bins=30, density=True, alpha=0.6, color='g')

# Plot the PDF.
xmin, xmax = plt.xlim()
x = np.linspace(xmin, xmax, 100)
p = norm.pdf(x, mu, std)
plt.plot(x, p, 'k', linewidth=2)

plt.title('Empirical Rule for Normally Distributed Data')
plt.xlabel('Data')
plt.ylabel('Frequency')

plt.show()

This code generates a set of normally distributed data (with mean 0 and standard deviation 1), calculates the percentages of data within one, two, and three standard deviations of the mean, and then prints these percentages. It also plots a histogram of the data and the corresponding probability density function.

To run this code, you need to have numpy, matplotlib, and scipy installed in your Python environment, which you can do with pip:

1
pip install numpy matplotlib scipy