Non Parametric Statistics

Non-parametric statistics refers to statistical methods that do not rely on data belonging to a particular probability distribution. They provide robust techniques for many applications.

Some common non-parametric tests:

  • Sign test
  • Wilcoxon signed-rank test
  • Mann-Whitney U test
  • Kruskal-Wallis H test
  • Spearman’s rank correlation

Java - Spearman’s correlation example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
double spearmanCorr(double[] x, double[] y) {
  double[] xRanks = ranks(x);
  double[] yRanks = ranks(y);

  double sumD2 = 0;
  for (int i = 0; i < x.length; i++) {
    sumD2 += Math.pow(xRanks[i] - yRanks[i], 2);
  }

  return 1 - 6*sumD2 / (x.length * (x.length*x.length - 1));
}

C++ - Mann-Whitney U test example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
double mannWhitneyUTest(vector<double> x, vector<double> y) {
  int n1 = x.size(), n2 = y.size();
  vector<double> data = x; 
  data.insert(data.end(), y.begin(), y.end());
  vector<double> ranks = assignRanks(data);

  double U1 = calculateU(ranks, n1);
  double U2 = n1*n2 - U1;
  return min(U1, U2);
}

Python - Kruskal-Wallis H test example:

1
2
3
4
import scipy.stats as stats

def kruskal_wallis(groups):
  return stats.kruskal(*[group.values() for group in groups])

Non-parametric statistics work on minimal assumptions and have broad applicability to real-world messy data.

Non-parametric statistics refers to statistical methods that do not rely on data belonging to any parameterized distributions.

They make fewer assumptions about the underlying data distribution.

Some common non-parametric methods include:

  • Rank based tests e.g. Wilcoxon signed rank test
  • Permutation tests
  • Bootstrapping
  • Kernel density estimation

Non-parametric methods are useful when data does not meet parametric assumptions or when such assumptions are questionable. They have applicability to a wider range of problem scenarios.

Solution

Here is an example non-parametric test in Python:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
from scipy.stats import mannwhitneyu

x = [1.3, 2.4, 7.6, 5.1] 
y = [2.1, 4.2, 6.1, 5.5]

stat, p = mannwhitneyu(x, y)

if p < 0.05:
  print("Reject null hypothesis")
else:
  print("Fail to reject null hypothesis")  

This applies the Mann-Whitney U test to compare means without assuming normality.

Non-parametric methods provide more flexibility and fewer data assumptions.

Description: Non-Parametric Statistics

In statistics, non-parametric methods are used when your data doesn’t meet the normal distribution assumptions or when you have ordinal or nominal data. These methods are less sensitive to outliers and can be more robust. Non-parametric tests often involve ranking data before analyzing it.

Solution:

Let’s consider a simple example of the Mann-Whitney U test, which is a non-parametric test used to compare two independent samples to determine if they come from the same distribution.

Java

Java doesn’t have built-in statistical libraries for non-parametric tests, but you can implement the Mann-Whitney U test from scratch.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
public class MannWhitneyU {
    public static double calculateU(int[] sampleA, int[] sampleB) {
        // Calculate ranks and sum them for each sample
        // (This example assumes sorted samples for simplicity)
        double sumRanksA = 0;
        for (int i = 1; i <= sampleA.length; i++) {
            sumRanksA += i;
        }

        double sumRanksB = 0;
        for (int i = 1; i <= sampleB.length; i++) {
            sumRanksB += (sampleA.length + i);
        }

        // Calculate U values
        double uA = sumRanksA - (sampleA.length * (sampleA.length + 1)) / 2.0;
        double uB = sumRanksB - (sampleB.length * (sampleB.length + 1)) / 2.0;

        return Math.min(uA, uB);
    }
}

C++

C++ also lacks native support for statistical methods. Here’s how you could manually implement the Mann-Whitney U test.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
#include <algorithm>
#include <vector>

double calculateU(std::vector<int> sampleA, std::vector<int> sampleB) {
    double sumRanksA = 0;
    for (int i = 1; i <= sampleA.size(); ++i) {
        sumRanksA += i;
    }

    double sumRanksB = 0;
    for (int i = 1; i <= sampleB.size(); ++i) {
        sumRanksB += (sampleA.size() + i);
    }

    double uA = sumRanksA - (sampleA.size() * (sampleA.size() + 1)) / 2.0;
    double uB = sumRanksB - (sampleB.size() * (sampleB.size() + 1)) / 2.0;

    return std::min(uA, uB);
}

Python

Python has the scipy library, which includes the Mann-Whitney U test.

1
2
3
4
5
6
7
8
9
from scipy.stats import mannwhitneyu

sampleA = [1, 3, 5]
sampleB = [2, 4, 6]

statistic, p_value = mannwhitneyu(sampleA, sampleB)

print("U Statistic:", statistic)
print("P Value:", p_value)

Key Takeaways

  • Non-parametric statistics are used when the data do not meet the assumptions of parametric methods.
  • Non-parametric methods can be more robust and less sensitive to outliers.
  • You can implement basic non-parametric tests like Mann-Whitney U manually if a library isn’t available.
  • Libraries like scipy in Python make non-parametric tests easy to perform.