Count Substrings with Only One Distinct Letter
The problem requires us to find all substrings that contain only one distinct letter. A brute-force solution would involve iterating through the string and checking all possible substrings, but this approach would be very inefficient.
Instead, we can use a sliding window approach, which is more efficient and elegant. Here’s a simple step-by-step solution:
- Initialize two pointers at the start of the string.
- Increment the end pointer until you encounter a different character.
- For each window of identical characters of length ’len’, there are ’len * (len + 1) / 2’ substrings.
- Move the start pointer to the end pointer’s location, and repeat steps 2-4 until you’ve traversed the entire string.
Python solution:
|
|
In the code, (window_len * (window_len + 1)) // 2
counts the number of substrings in the window using the formula for the sum of the first ’n’ natural numbers. This is because a substring of length ’n’ has ’n’ substrings of length 1, ’n - 1’ substrings of length 2, and so on, until 1 substring of length ’n’. This corresponds to the sum of the first ’n’ natural numbers, which is ’n * (n + 1) / 2'.
The number of substrings in ‘aaaa’ equals 10 = 1+2+3+4 which is a sum of arithmetic progression
If a letter repeats n times, it forms n * (n + 1) / 2 valid substrings:
“aaaa”: “a” * 4, “aa” * 3, “aaa” * 2, and “aaaa” * 1 = 4 * (4 + 1) / 2 = 10.
|
|
The C++ function countLetters
calculates the number of substrings in S
that have only one unique character.
Here’s how the function works:
It initializes two pointers
i
andj
to1
and0
, respectively, andres
to0
.res
is used to store the total count of valid substrings,i
is the forward pointer, andj
is the start index of the current contiguous block of the same characters.It then enters a loop that continues until
i
is greater than the size of the stringS
.Inside the loop, it checks whether
i
equals the size ofS
or the characters at indicesi
andj
ofS
are different. If either condition is true, it means we have reached the end of a block of the same characters.It then calculates the number of valid substrings in this block, which is
(i - j + 1) * (i - j) / 2
, and adds this tores
. This formula comes from the mathematical formula for the sum of the firstn
integers, since for a block of lengthn
, there aren
substrings of length1
,n - 1
substrings of length2
, and so on, down to1
substring of lengthn
.It then updates
j
toi
, to start a new block of the same characters.After the loop, it returns
res
, which is the total count of valid substrings.
This function effectively counts the valid substrings in S
in a single pass, resulting in a time complexity of O(n), where n is the size of the string S
. The space complexity is O(1), as it uses a fixed amount of space.
Building Blocks
Loops: Loops are used to iterate over the string. Here, a while loop is used to traverse through the string.
Two Pointers/Sliding Window: This technique is used to keep track of a “window” in the string where all characters are the same. This window is defined by two pointers - ‘start’ and ’end’. The ’end’ pointer moves ahead to expand the window until a different character is found. Then, the ‘start’ pointer moves to the ’end’ pointer’s location, essentially moving the window to the next character.
Arithmetic Series Formula: The formula for the sum of the first ’n’ natural numbers,
n * (n + 1) / 2
, is used to count the number of substrings in the current window.String Indexing: In Python, strings are arrays of bytes representing unicode characters. This allows us to use indices to access the elements of the string.
Accumulator Pattern: The variable ‘result’ is used as an accumulator to sum the total number of substrings that only contain one distinct letter.
Q&A
Does the solution use Sum Equation of Arithmetic Sequence ?
Yes, the solution to this problem does use the sum equation of an arithmetic sequence, specifically, the formula for the sum of the first ’n’ natural numbers, which is n * (n + 1) / 2
.
This formula is used to calculate the number of substrings that can be formed from a string of length ’n’ where all characters are the same. In this context, ’n’ would be the length of the “window” or sequence of characters that are the same.
This is because when all characters are the same, a substring can be any contiguous sequence of characters within the window. For a window of length ’n’, there can be 1 substring of length ’n’, 2 substrings of length ’n-1’, 3 substrings of length ’n-2’, and so on, up to ’n’ substrings of length 1. The total number of these substrings is given by the sum of the first ’n’ natural numbers, hence the use of the formula.