Substring with Concatenation of All Words

We can approach this problem by creating a sliding window over the string s and checking if the substring within that window is a valid concatenation of the words in the words array.

Here’s a step-by-step solution:

  1. Calculate Parameters: Determine the length of a single word and the total length of all the words concatenated together.

  2. Create Word Count Map: Create a dictionary to count the occurrences of each word in words.

  3. Slide Window and Check: For each possible starting position, slide a window of the total length of all the words. Break the window into words of the given length and check if the count of words matches the count in words.

  4. Collect Results: If the words match, collect the starting index of the window.

Here’s the code implementing this algorithm:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
from collections import Counter

class Solution:
    def findSubstring(self, s: str, words: List[str]) -> List[int]:
        if not words:
            return []

        word_len = len(words[0])
        total_len = word_len * len(words)
        word_count = Counter(words)

        result = []
        for i in range(word_len): # Loop over each possible starting position
            left = i
            right = i
            window_count = Counter()

            while right + word_len <= len(s):
                word = s[right:right+word_len]
                right += word_len
                if word in word_count:
                    window_count[word] += 1
                    while window_count[word] > word_count[word]:
                        window_count[s[left:left+word_len]] -= 1
                        left += word_len
                    if right - left == total_len:
                        result.append(left)
                else:
                    window_count.clear()
                    left = right

        return result

This solution has a time complexity of (O(N \cdot M)), where (N) is the length of the string s, and (M) is the length of a word in words.

10 Prerequisite LeetCode Problems

“30. Substring with Concatenation of All Words” involves the use of hashmaps, sliding windows, and string manipulation.

Here are 10 problems to help you prepare:

  1. “3. Longest Substring Without Repeating Characters”: This problem introduces the sliding window technique, which is very useful in dealing with substrings.

  2. “76. Minimum Window Substring”: This is another problem on sliding windows, but is a little more difficult than the previous one.

  3. “438. Find All Anagrams in a String”: This problem is very similar to the target problem as it also requires finding specific substrings.

  4. “567. Permutation in String”: This problem also asks you to determine if the string contains a permutation of a given string, it’s another form of the substring problem.

  5. “209. Minimum Size Subarray Sum”: A problem to practice the sliding window technique with an array instead of a string.

  6. “49. Group Anagrams”: This problem gives practice on using hashmaps to solve string problems.

  7. “383. Ransom Note”: This problem also requires hashmap to solve, and it helps understand the concept of character counting which is important in solving the target problem.

  8. “242. Valid Anagram”: This is a simpler problem to understand the concept of checking if one string is a permutation of another string.

  9. “395. Longest Substring with At Least K Repeating Characters”: This problem requires both sliding window and hashmap to solve.

  10. “159. Longest Substring with At Most Two Distinct Characters”: This problem also requires sliding window and hashmap, and it is a bit more challenging than the previous problems.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
class Solution:
    def findSubstring(self, s: str, words: List[str]) -> List[int]:
        n_word = len(words)
        n_char = len(words[0])
        word2freq = {}
        for word in words:
            if word in word2freq:
                word2freq[word] += 1
            else:
                word2freq[word] = 1

        all_start_ind = []
        for start_ind in range(n_char):
            curr_map = {}
            curr_total = 0
            excessive = False
            for i in range(start_ind, len(s), n_char):
                word = s[i:i+n_char]
                if word in word2freq: # a valid word for permutation 
                    curr_total += 1
                    if word in curr_map: # found valid word
                        curr_map[word] += 1
                        if curr_map[word] > word2freq[word]: 
                            excessive = word
                    else:
                        curr_map[word] = 1

                    earliest_ind = i - (curr_total-1)*n_char
                    while excessive:
                        earliest_word = s[earliest_ind: earliest_ind+n_char]
                        curr_map[earliest_word] -= 1
                        curr_total -= 1
                        if earliest_word == excessive:
                            excessive = False
                            break
                        earliest_ind += n_char
                    if curr_total == n_word:
                        earliest_ind = i - (n_word-1)*n_char

                        all_start_ind.append(earliest_ind)

                        earliest_word = s[earliest_ind: earliest_ind+n_char]
                        curr_total -= 1
                        curr_map[earliest_word] -= 1
                else:
                    curr_total = 0
                    curr_map = {}
        return all_start_ind
    
    def check_map_equal(self, curr_map, ref_map) -> bool:
        for word, freq in curr_map.items():
            if word not in ref_map or freq != ref_map[word]:
                return False
        return True

Problem Classification

This problem can be classified under the Pattern Recognition and Matching category. In this problem, we are looking for a specific pattern (a substring that includes all given words concatenated together) within a larger context (the main string). This category is applicable in fields like text analysis or DNA sequencing, where you might need to find specific sequences of characters within a larger dataset.

Language Agnostic Coding Drills

Here’s a breakdown of the learning concepts utilized in this code and the order in which they should be understood for better comprehension of the solution:

  1. Understanding of basic language syntax and data structures: In this case, the script is written in Python, so one needs to understand the basic syntax of Python, how to define functions, loops, conditional statements, and use of basic data structures like Lists and Dictionaries.

  2. Understanding of strings: The problem is about finding substrings in a string. So one should understand how strings are represented, accessed, and manipulated in the language. This includes indexing, slicing, and concatenation.

  3. List Comprehensions and Loops: One should understand how to write loops to iterate through lists and strings, and how to use list comprehensions, which is a more concise way to create lists in Python.

  4. Dictionary Manipulation: The solution uses dictionaries (also called maps in some languages) to store frequencies of words. One should understand how to create, update, and access elements in a dictionary.

  5. Creating and using helper functions: In this case, the function check_map_equal is a helper function that is used inside the main function. One should understand how to define such functions and why they are useful.

  6. Understanding and implementing sliding window technique: This is a common technique used to solve array/string problems where we maintain a ‘window’ of elements and slide it based on certain conditions. In this case, the window is a sequence of words, and we slide it when we have excessive words or words not in the words list.

Arranging the drills in increasing level of difficulty:

  1. Working with strings and lists: Write functions to perform basic string and list operations, like slicing, concatenation, accessing elements, etc.

  2. Dictionary operations: Practice creating dictionaries, adding elements, removing elements, checking if a key exists, etc.

  3. Creating helper functions: Create a function that performs a certain task (like check_map_equal), and then use it inside another function.

  4. Implementing the sliding window technique: Start with a simpler problem where you need to find a subarray or substring that meets certain conditions, and try to solve it using the sliding window technique.

Problem-solving approach:

  1. Problem statement understanding: Understand the problem statement thoroughly. You are given a string and a list of words, and you need to find all the starting indices of the substrings that are a concatenation of all the words.

  2. Solution formulation: Create a frequency map of the words. Then use a sliding window to iterate over the string and check if the words in the current window match the frequency map.

  3. Implementing the solution: Write the code for the solution. Use helper functions as required. Test your solution with various test cases to ensure it’s working as expected.

  4. Optimization (if required): Look for any opportunities to optimize your solution, like removing unnecessary operations, using more efficient data structures, etc.

Targeted Drills in Python

  1. Working with strings and lists:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# Create a string
s = "Hello, world!"
print(s)

# Access specific characters in a string
print(s[0])  # Outputs: H
print(s[-1])  # Outputs: !

# Slice a string
print(s[0:5])  # Outputs: Hello
print(s[7:])  # Outputs: world!

# Concatenate strings
s1 = "Hello"
s2 = "World"
s3 = s1 + ", " + s2 + "!"
print(s3)  # Outputs: Hello, World!
  1. Dictionary operations:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Create a dictionary
d = {}

# Add elements to the dictionary
d["apple"] = 1
d["banana"] = 2

# Check if a key exists
if "apple" in d:
    print("Apple exists")

# Remove an element
del d["apple"]
  1. Creating helper functions:
1
2
3
4
5
6
7
8
9
# Define a helper function
def square(n):
    return n ** 2

# Use the helper function
def square_and_add_five(n):
    return square(n) + 5

print(square_and_add_five(3))  # Outputs: 14
  1. Implementing the sliding window technique:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Find the maximum sum of 'k' consecutive elements in a list
def max_sum(nums, k):
    max_sum = float('-inf')
    window_sum = 0

    for i in range(len(nums)):
        window_sum += nums[i]
        if i >= k - 1:
            max_sum = max(max_sum, window_sum)
            window_sum -= nums[i - k + 1]
    return max_sum

print(max_sum([2, 3, 4, 1, 5], 3))  # Outputs: 10

For problem-specific drills, a good exercise would be to manually create a frequency map of words from a given list, and then write a function to automatically create this frequency map. This will be helpful in solving the final problem. Here’s an example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
words = ["apple", "banana", "apple", "orange", "banana", "banana"]

# Manual frequency map
freq_map = {
    "apple": 2,
    "banana": 3,
    "orange": 1
}

# Function to create frequency map
def create_freq_map(words):
    freq_map = {}
    for word in words:
        if word in freq_map:
            freq_map[word] += 1
        else:
            freq_map[word] = 1
    return freq_map

print(create_freq_map(words))  # Outputs: {'apple': 2, 'banana': 3, 'orange': 1}