Rolling Hash

Concept of Rolling Hash

Rolling Hash is an algorithmic technique often used for pattern matching or substring searching. The primary idea is to use a hash function that allows for efficient recalculation when moving from one substring to the next. This is accomplished by “rolling” through the string, updating the hash value incrementally instead of recomputing it from scratch.

In many applications, like the Rabin-Karp algorithm for string matching, Rolling Hash enables us to achieve a time complexity better than a naive approach would allow. It is particularly useful when dealing with large strings or multiple queries.


Example Code

Java

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
public class RollingHash {
  private static final int BASE = 256;
  private static final int MOD = 1000000007;

  public static int calculateRollingHash(String str) {
    int hashValue = 0;
    for (char ch : str.toCharArray()) {
      hashValue = (hashValue * BASE + ch) % MOD;
    }
    return hashValue;
  }
  
  public static void main(String[] args) {
    String str = "hello";
    System.out.println("Rolling Hash: " + calculateRollingHash(str));
  }
}

C++

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
#include <iostream>
#include <string>

const int BASE = 256;
const int MOD = 1000000007;

int calculateRollingHash(const std::string& str) {
  int hashValue = 0;
  for (char ch : str) {
    hashValue = (hashValue * BASE + ch) % MOD;
  }
  return hashValue;
}

int main() {
  std::string str = "hello";
  std::cout << "Rolling Hash: " << calculateRollingHash(str) << std::endl;
  return 0;
}

Python

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
BASE = 256
MOD = 1000000007

def calculate_rolling_hash(str):
  hash_value = 0
  for ch in str:
    hash_value = (hash_value * BASE + ord(ch)) % MOD
  return hash_value

str = "hello"
print("Rolling Hash:", calculate_rolling_hash(str))

Key Takeaways

  1. Incremental Update: Rolling Hash allows for efficient recalculation of hash values as you slide through substrings.

  2. Modular Arithmetic: The use of a modulo operation ensures that the hash value stays within a certain range, avoiding overflow.

  3. Versatility: Rolling Hash is an essential part of algorithms like Rabin-Karp and can be applied in various string processing problems.

  4. Constants: The BASE and MOD constants can be chosen according to the needs of the problem. Common choices are powers of 256 for BASE and large prime numbers for MOD.

The Rolling Hash technique simplifies complex string operations, making it a valuable tool in algorithmic problem-solving.

A rolling hash is a technique to efficiently calculate hash codes for sliding windows in a sequence. As the window slides, the hash is updated in O(1) time.

Some applications are:

  • String matching
  • Plagiarism detection
  • Data stream processing

Java example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
long hash(String s, int start, int end) {
  long h = 0;
  for (int i = start; i <= end; i++) {
    h += s.charAt(i) * PRIME^i; // Compute hash
  }
  return h;
}

// Update hash in O(1) as window slides
long oldHash = hash(s, 0, 3); 
long newHash = oldHash - s.charAt(0) * PRIME^0; // Remove first
newHash *= PRIME; // Scale 
newHash += s.charAt(4); // Add newest

C++ example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
long hash(const string& s, int start, int end) {
  long h = 0;
  for (int i = start; i <= end; i++) {
    h += s[i] * PRIME^i; 
  }
  return h;
}

// Update hash when window slides
long oldHash = hash(s, 0, 3);
long newHash = oldHash - s[0] * POW(PRIME, 0);
newHash *= PRIME;
newHash += s[4];

Python example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
def hash(s, start, end):
  h = 0
  for i in range(start, end+1):
    h += ord(s[i]) * PRIMES[i-start]
  return h

# Update hash  
old_hash = hash(s, 0, 3)
new_hash = old_hash - ord(s[0]) * PRIMES[0]  
new_hash *= PRIMES[1]
new_hash += ord(s[4])

Rolling hashes efficiently compute and update hash codes for sliding windows.

The rolling hash is an efficient string matching algorithm used to find duplicate substrings or patterns in a continuously changing stream of text.

The key ideas behind a rolling hash are:

  • It represents a string as a numerical value called a hash.
  • It can compute the new hash of a string after appending or removing a character in O(1) time by doing simple math on the current hash.
  • By comparing hash values, it can detect if two strings are identical without comparing character by character.

Some example pseudocode:

prime = large prime number 
window_size = length of substring to match

init_hash = 0
for i = 0 to window_size:
  init_hash += s[i] * prime^i 

for i = window_size to end:
  // Remove oldest char from hash
  curr_hash = curr_hash - s[i - window_size] * prime^(window_size-1)  

  // Add newest char 
  curr_hash = curr_hash * prime + s[i]

  if curr_hash == init_hash:
    print "Match found"

The key steps are:

  • Initialize hash based on first substring
  • Update hash by removing oldest char and adding newest char
  • Compare current hash to target to check for match

This avoids needing to compare strings directly each time. Overall time complexity is O(n) for string of length n.

Rolling hashes are useful for plagiarism detection, DNA sequence analysis, and other applications requiring efficient string matching.