Count Substrings with Only One Distinct Letter

The problem requires us to find all substrings that contain only one distinct letter. A brute-force solution would involve iterating through the string and checking all possible substrings, but this approach would be very inefficient.

Instead, we can use a sliding window approach, which is more efficient and elegant. Here’s a simple step-by-step solution:

Initialize two pointers at the start of the string.
Increment the end pointer until you encounter a different character.
For each window of identical characters of length ’len’, there are ’len * (len + 1) / 2’ substrings.
Move the start pointer to the end pointer’s location, and repeat steps 2-4 until you’ve traversed the entire string.

Python solution:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
class Solution:
    def countLetters(self, s: str) -> int:
        start, end = 0, 0
        result = 0
        while end < len(s):
            # Extend the window
            while end < len(s) and s[end] == s[start]:
                end += 1
            # Count the number of substrings in the window
            window_len = end - start
            result += (window_len * (window_len + 1)) // 2
            # Move the start pointer to the end pointer's location
            start = end
        return result

In the code, (window_len * (window_len + 1)) // 2 counts the number of substrings in the window using the formula for the sum of the first ’n’ natural numbers. This is because a substring of length ’n’ has ’n’ substrings of length 1, ’n - 1’ substrings of length 2, and so on, until 1 substring of length ’n’. This corresponds to the sum of the first ’n’ natural numbers, which is ’n * (n + 1) / 2'.

The number of substrings in ‘aaaa’ equals 10 = 1+2+3+4 which is a sum of arithmetic progression

If a letter repeats n times, it forms n * (n + 1) / 2 valid substrings:

“aaaa”: “a” * 4, “aa” * 3, “aaa” * 2, and “aaaa” * 1 = 4 * (4 + 1) / 2 = 10.

1
2
3
4
5
6
7
8
9
int countLetters(string S, int res = 0) {
  for (auto i = 1, j = 0; i <= S.size(); ++i) {
    if (i == S.size() || S[i] != S[j]) {
        res += (i - j + 1) * (i - j) / 2;
        j = i;
    } 
  }
  return res;
}

The C++ function countLetters calculates the number of substrings in S that have only one unique character.

Here’s how the function works:

It initializes two pointers i and j to 1 and 0, respectively, and res to 0. res is used to store the total count of valid substrings, i is the forward pointer, and j is the start index of the current contiguous block of the same characters.
It then enters a loop that continues until i is greater than the size of the string S.
Inside the loop, it checks whether i equals the size of S or the characters at indices i and j of S are different. If either condition is true, it means we have reached the end of a block of the same characters.
It then calculates the number of valid substrings in this block, which is (i - j + 1) * (i - j) / 2, and adds this to res. This formula comes from the mathematical formula for the sum of the first n integers, since for a block of length n, there are n substrings of length 1, n - 1 substrings of length 2, and so on, down to 1 substring of length n.
It then updates j to i, to start a new block of the same characters.
After the loop, it returns res, which is the total count of valid substrings.

This function effectively counts the valid substrings in S in a single pass, resulting in a time complexity of O(n), where n is the size of the string S. The space complexity is O(1), as it uses a fixed amount of space.

Building Blocks

Loops: Loops are used to iterate over the string. Here, a while loop is used to traverse through the string.
Two Pointers/Sliding Window: This technique is used to keep track of a “window” in the string where all characters are the same. This window is defined by two pointers - ‘start’ and ’end’. The ’end’ pointer moves ahead to expand the window until a different character is found. Then, the ‘start’ pointer moves to the ’end’ pointer’s location, essentially moving the window to the next character.
Arithmetic Series Formula: The formula for the sum of the first ’n’ natural numbers, n * (n + 1) / 2, is used to count the number of substrings in the current window.
String Indexing: In Python, strings are arrays of bytes representing unicode characters. This allows us to use indices to access the elements of the string.
Accumulator Pattern: The variable ‘result’ is used as an accumulator to sum the total number of substrings that only contain one distinct letter.

Q&A

Does the solution use Sum Equation of Arithmetic Sequence ?

Yes, the solution to this problem does use the sum equation of an arithmetic sequence, specifically, the formula for the sum of the first ’n’ natural numbers, which is n * (n + 1) / 2.

This formula is used to calculate the number of substrings that can be formed from a string of length ’n’ where all characters are the same. In this context, ’n’ would be the length of the “window” or sequence of characters that are the same.

This is because when all characters are the same, a substring can be any contiguous sequence of characters within the window. For a window of length ’n’, there can be 1 substring of length ’n’, 2 substrings of length ’n-1’, 3 substrings of length ’n-2’, and so on, up to ’n’ substrings of length 1. The total number of these substrings is given by the sum of the first ’n’ natural numbers, hence the use of the formula.