Random Sampling

Random sampling refers to selecting a subset of elements from a population by chance, with the goal of estimating properties of the whole population from the sample. Some key algorithms:

  • Simple random sampling - Sample with equal probability
  • Stratified sampling - Partition into subgroups and sample from each
  • Reservoir sampling - Sample fixed size subset from stream

Java - Simple random sample:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
int[] randomSample(int[] population, int k) {
  Random rand = new Random();

  int[] sample = new int[k];
  for (int i = 0; i < k; i++) {
    int j = rand.nextInt(population.length);
    sample[i] = population[j];
  }

  return sample;
}

C++ - Stratified sampling:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
vector<int> stratifiedSample(vector<int> population, int k) {
  int n = population.size();
  vector<int> sample(k);

  int ptsPerStrata = k / STRATA_NUM;

  for(int i = 0; i < STRATA_NUM; i++) {
    int start = i * (double)n / STRATA_NUM;
    int end = (i+1) * (double)n / STRATA_NUM;
    // Sample ptsPerStrata from subpopulation  
  }

  return sample;
}

Python - Reservoir sampling:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import random

def reservoir_sample(population, k):
  reservoir = []
  for i in range(k):
    reservoir.append(population[i])
  
  for i in range(k, len(population)):
    j = random.randint(0, i)
    if j < k:
      reservoir[j] = population[i]

  return reservoir

Random sampling provides unbiased estimation of population characteristics from a subset. Useful in statistics, surveys, machine learning.

Random sampling is the process of selecting a subset of items from a larger set, where each item has an equal chance of being chosen. It’s a key technique in statistics and machine learning for estimating population parameters and training models. The primary takeaway is that random sampling provides a quick, unbiased representation of the larger set.

Java Code for Random Sampling

In Java, you can use the java.util.Random class for generating random numbers. The following code randomly samples k elements from an array arr of integers.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import java.util.Random;

public class RandomSampling {
    public static void sample(int[] arr, int k) {
        Random rand = new Random();
        for (int i = 0; i < k; i++) {
            int pos = i + rand.nextInt(arr.length - i);
            int temp = arr[pos];
            arr[pos] = arr[i];
            arr[i] = temp;
        }
    }
}
  1. sample(int[] arr, int k) randomizes the positions of k elements in the array arr.

C++ Code for Random Sampling

In C++, you can use <random> library. The function below performs random sampling on a C++ vector.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#include <random>
#include <vector>
#include <algorithm>

void sample(std::vector<int>& arr, int k) {
    std::random_device rd;
    std::mt19937 gen(rd());
    
    for (int i = 0; i < k; ++i) {
        std::uniform_int_distribution<> dis(i, arr.size() - 1);
        int pos = dis(gen);
        std::swap(arr[i], arr[pos]);
    }
}
  1. sample(std::vector<int>& arr, int k) swaps k random positions in the vector arr.

Python Code for Random Sampling

In Python, the random library can be used. Below is a Python function that performs the same task.

1
2
3
4
5
6
import random

def sample(arr, k):
    for i in range(k):
        pos = random.randint(i, len(arr) - 1)
        arr[i], arr[pos] = arr[pos], arr[i]
  1. sample(arr, k) replaces k elements in the list arr with randomly selected elements from the list.

All these implementations use the Fisher-Yates shuffle algorithm to select a random sample. The sample function modifies the original array or vector, placing the random sample in the first k positions.