Pareto Principle

Child: You know how sometimes you wear your favorite two or three outfits most of the time, even though you have lots of clothes in your closet? That’s kind of what the Pareto Principle is about. It says that we often get 80% of our results from just 20% of our effort or things.
Teenager: Let’s imagine you’re part of a team in a school project. Often, it might seem like 20% of the team members do 80% of the work. This is a real-world example of the Pareto Principle, which says that in many situations, a small number of causes are responsible for a large portion of the effects.
Undergrad majoring in the same subject: The Pareto Principle, also known as the 80/20 rule, is an observation that most things in life are not distributed evenly. It can mean that, for example, 80% of your company’s profits come from 20% of your customers, or 80% of complaints come from 20% of clients. It’s an incredibly useful concept that can be applied in areas such as economics, business, and project management.
Grad student: The Pareto Principle is a powerful principle named after the economist Vilfredo Pareto. It’s a principle of factor sparsity, stating that 80% of effects often come from 20% of causes. It’s a tool often used in a variety of fields like quality control (the vital few and the trivial many), project management (focusing on the 20% of tasks that will generate 80% of the benefits), and even in machine learning models to prioritize the most impactful features.
Colleague (Fellow Researcher/Engineer): As you’re aware, the Pareto Principle, or the 80/20 rule, is an empirical observation, not a law of nature. While it doesn’t apply to every situation, it’s a useful heuristic in diverse fields such as business management, economics, and engineering. We can use it to prioritize tasks, manage resources, and identify areas where small changes can have a large impact. However, it’s critical to remember that it is a rough guide rather than an exact ratio and can vary based on the context.

Example in Code

The Pareto Principle, also known as the 80/20 rule, states that roughly 80% of the effects come from 20% of the causes. In a programming context, this might mean that 80% of the bugs come from 20% of the code, or that 80% of the usage of a software application comes from 20% of the features.

Here’s an example that illustrates the Pareto Principle in Python, in the context of a simple retail system:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Let's assume we have a list of products and the number of sales each product made
product_sales = {
    'product1': 200,
    'product2': 800,
    'product3': 150,
    'product4': 75,
    'product5': 50,
    'product6': 50,
    'product7': 500,
    'product8': 100,
    'product9': 75,
    'product10': 100,
}

# Calculate total sales
total_sales = sum(product_sales.values())

# Sort products by sales, in descending order
sorted_product_sales = sorted(product_sales.items(), key=lambda item: item[1], reverse=True)

# Find the top 20% products
top_20_percent_products = sorted_product_sales[:len(sorted_product_sales) // 5]

# Calculate the sales from the top 20% products
top_20_percent_sales = sum([sales for product, sales in top_20_percent_products])

print("Total sales:", total_sales)
print("Sales from top 20% products:", top_20_percent_sales)

print("Percentage of total sales from top 20% products:",
      (top_20_percent_sales / total_sales) * 100)

This code first calculates the total sales, then it sorts the products by sales and finds the top 20% products. It calculates the sales from these top products and prints the percentage of the total sales that comes from the top 20% products.

Of course, this is a simplistic example and real-world scenarios can be much more complex. But this should give you an idea of how the Pareto Principle can be applied in a coding context.

Q&A

It may not necessarily to 80/20. It could be 90/10, 70/30 and so. So how do determine the current split?

Yes, you’re right. The Pareto Principle doesn’t always strictly follow the 80/20 rule. It is an empirical rule of thumb, which suggests that in many situations, a small proportion of causes lead to a large proportion of effects. The actual ratio can vary depending on the context.

If you wanted to determine the current split in a real-world situation, you’d need to gather data and analyze it. In the context of the example I gave, you’d want to gather data about the sales of each product over a certain period of time.

You could then sort the products by sales and calculate the cumulative sales as a proportion of total sales for each product. The point at which this cumulative proportion reaches a certain percentage (e.g., 80%) would give you the proportion of products that account for that percentage of sales.

Here’s an updated version of the Python script that calculates this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
import numpy as np

# Sort products by sales, in descending order
sorted_product_sales = sorted(product_sales.items(), key=lambda item: item[1], reverse=True)

# Calculate cumulative sales
cumulative_sales = np.cumsum([sales for product, sales in sorted_product_sales])

# Calculate cumulative proportion of total sales
cumulative_proportion = cumulative_sales / total_sales

# Find the point at which cumulative proportion reaches 80%
split_point = next(i for i, proportion in enumerate(cumulative_proportion) if proportion >= 0.8)

# The split is then between the products that contribute to 80% of sales and the rest
split = (split_point + 1) / len(product_sales)

print(f"The top {split * 100:.0f}% of products contribute to 80% of sales.")

In this updated code, we use the numpy library to calculate the cumulative sum of the sales. We then calculate the cumulative proportion of total sales and find the point at which this proportion reaches 80%. The split is then between the products that contribute to 80% of sales and the rest.

How do you know whether to split based on 70%, 80% or 90% ?

Choosing the right split percentage (70%, 80%, 90%, etc.) depends on the particular problem or domain you’re working on, and what you consider a significant concentration.

The Pareto principle, or the 80/20 rule, is more of a guideline than a strict rule. It states that in many situations, 80% of the outcomes come from 20% of the causes. However, the actual percentages can vary depending on the context. For example, in a different context, you might find that 70% of your sales come from 30% of your products, or 90% of errors in your code come from 10% of the modules.

In general, the key is to identify a “vital few” causes that result in the majority of the effects. The specific percentage you choose to define the “vital few” and the “trivial many” can be adjusted based on your specific needs and observations.

If you have historical data, you could analyze it to find a percentage split that seems to hold over time. Alternatively, you could start with a percentage like 80%, and then adjust it up or down based on the results you observe and the particular needs of your project or business.

I have to look at all causes over a period of time that is sufficiently long enough and trace each of its effect to how much it contributed to the outcome. Then separate the vital few from all the causes. This can then give the percentage of the vital few causes among all the causes for a given time frame?

Yes, you’ve got it exactly right! This is typically how you would apply the Pareto principle in real-world scenarios.

The steps can be summarized as follows:

Identify and List All Causes: This step involves figuring out all possible causes that could be contributing to the outcome. This could involve brainstorming, data collection, observation, and so on.
Measure the Effect of Each Cause: This step involves quantifying the contribution of each cause to the outcome. This could involve data analysis, statistical methods, or other forms of measurement, depending on the context.
Rank the Causes from Most to Least Effective: This step involves sorting your list of causes in order of their effect on the outcome, with the highest contributors at the top.
Calculate the Cumulative Effect: This step involves adding up the effects of the causes, starting from the top of your ranked list. For each cause, calculate what percentage of the total effect so far is made up by that cause and all the causes above it on the list.
Identify the Vital Few: Look at your list and see where a small number of causes make up a significant percentage of the effect (e.g., 70%, 80%, 90%). These are your “vital few”.

The specific percentage (80/20, 70/30, 90/10) is less important than the general principle: that a small number of causes are often responsible for a large portion of the effect. So don’t get too caught up in the specific percentage, and focus more on understanding which are the critical few causes that you should focus on.