Most Common Word

We can break down the task of finding the most common word (that is not banned) in the paragraph into the following steps:

  1. Preprocess Paragraph: Replace all punctuation with spaces and convert the entire paragraph to lowercase to ensure that words are case-insensitive.
  2. Split Paragraph: Split the paragraph into individual words.
  3. Count Words: Count the frequency of each word, ignoring the ones that are in the banned list.
  4. Find Most Frequent Word: Identify the most frequent word from the count that is not in the banned list.

Here is the code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
class Solution:
    def mostCommonWord(self, paragraph: str, banned: List[str]) -> str:
        # Step 1: Replace punctuation with space and convert to lowercase
        for ch in "!?',;.":
            paragraph = paragraph.replace(ch, " ")
        paragraph = paragraph.lower()

        # Step 2: Split paragraph into words
        words = paragraph.split()

        # Step 3: Count frequency of non-banned words
        word_count = {}
        for word in words:
            if word not in banned:
                if word not in word_count:
                    word_count[word] = 0
                word_count[word] += 1

        # Step 4: Find the most frequent non-banned word
        max_count = 0
        most_common_word = ""
        for word, count in word_count.items():
            if count > max_count:
                max_count = count
                most_common_word = word

        return most_common_word

Explanation:

  • We replace the punctuation with spaces so that words are separated correctly, regardless of punctuation.
  • We keep a count of non-banned words using a dictionary (word_count).
  • We iterate through the dictionary to find the most frequent non-banned word.

The code returns the result in lowercase, as required, and ignores banned words. The time complexity is O(N), where N is the length of the paragraph.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
def most_common_word(paragraph, banned):
    words = paragraph.lower().split(" ")
    word_count = {}
    for word in words:
        word = ''.join(e for e in word if e.isalnum())
        if word not in word_count:
            word_count[word] = 1
        else:
            word_count[word] += 1

    for word in banned:
        if word in word_count:
            del word_count[word]

    res = None
    for word in word_count:
        if res is None or word_count[word] > word_count[res]:
            res = word

    return res

Problem Classification

The problem statement is primarily related to the domain of text processing and statistics in the field of computer science.

The “What” components:

  1. A paragraph of text is provided as a string.
  2. A list of banned words is provided as a string array.
  3. Words in the paragraph are case-insensitive.
  4. The task is to find the most frequent word in the paragraph that is not included in the list of banned words.
  5. The returned answer should be in lowercase.
  6. It is guaranteed that there will be at least one non-banned word in the paragraph, and the most frequent non-banned word is unique.

Classifying the problem:

  1. Text Analysis: The problem involves processing and analyzing a block of text, which involves operations like splitting the text into words, converting to lowercase, etc.
  2. Counting and Filtering: The problem requires counting the frequency of each word and then excluding certain words (banned words).
  3. Search and Decision Making: The problem requires searching for the most frequent word (max counting/frequency) that is not in the banned list, and making a decision based on these criteria.
  4. Result Presentation: The solution requires that the result be presented in a specific format (lowercase).

This problem combines elements of text analysis, statistical counting, and decision-making, and result presentation. The challenge lies in efficiently processing the text and implementing the counting and decision-making mechanisms.

Language Agnostic Coding Drills

  1. Distinct Concepts:

    • String Manipulation: This includes changing the case of the string, splitting the string into words, and removing punctuation from words.
    • Data Structures (List and Dictionary): The code involves the use of a list (for the words in the paragraph and banned words) and a dictionary (to keep track of word count).
    • Loops and Iteration: The code uses for-loops to iterate over the words and banned words list, as well as the word_count dictionary.
    • Conditionals: The if-else statements are used to check the existence of a word in the dictionary, increment its count, remove banned words, and find the most frequent word.
    • Algorithm for finding the maximum value: The code uses a basic algorithm for finding the key with the maximum value in a dictionary.
  2. Order of Increasing Difficulty:

    • String Manipulation: Beginner level. Basic operations like changing case, splitting strings, and removing punctuation marks.
    • Data Structures (List and Dictionary): Beginner to intermediate level. It requires understanding how to use lists and dictionaries, including adding and removing elements.
    • Loops and Iteration: Beginner to intermediate level. It involves iterating over different types of data structures and understanding the flow of control in a program.
    • Conditionals: Beginner level. Basic use of if-else statements to perform different actions based on a condition.
    • Algorithm for finding the maximum value: Intermediate level. Requires understanding of how to keep track of a maximum value and its corresponding key in a dictionary.
  3. Problem-Solving Approach:

    • Begin by converting the paragraph into a list of words. This involves changing the case to lower and splitting the paragraph based on space.
    • For each word in the list, remove any punctuation marks.
    • Count the frequency of each word using a dictionary.
    • Remove the banned words from the dictionary.
    • Finally, iterate over the dictionary to find the word with the highest frequency. This is the final result.

    Each of these drills contributes to a part of the solution. String manipulation helps prepare the data, data structures store the intermediate and final results, loops and conditionals are used to process the data, and the maximum value algorithm is used to determine the final output.

Targeted Drills in Python

  1. Python-Based Coding Drills for Identified Concepts:

    • String Manipulation:

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      
      # Sample string
      s = "Hello, World!"
      
      # Change case
      s = s.lower()
      
      # Split string into words
      words = s.split(" ")
      
      # Remove punctuation from words
      words = [''.join(e for e in word if e.isalnum()) for word in words]
      
    • Data Structures (List and Dictionary):

      1
      2
      3
      4
      5
      6
      7
      8
      9
      
      # List
      lst = [1, 2, 3]
      lst.append(4)  # Add element to list
      lst.remove(2)  # Remove element from list
      
      # Dictionary
      d = {"a": 1, "b": 2}
      d["c"] = 3  # Add element to dictionary
      del d["a"]  # Remove element from dictionary
      
    • Loops and Iteration:

      1
      2
      3
      4
      5
      6
      7
      
      # Iterate over list
      for i in lst:
          print(i)
      
      # Iterate over dictionary
      for key, value in d.items():
          print(key, value)
      
    • Conditionals:

      1
      2
      3
      4
      5
      
      x = 10
      if x > 5:
          print("x is greater than 5")
      else:
          print("x is not greater than 5")
      
    • Algorithm for finding the maximum value:

      1
      2
      3
      4
      5
      6
      7
      
      max_value = None
      max_key = None
      for key, value in d.items():
          if max_value is None or value > max_value:
              max_value = value
              max_key = key
      print("Key with maximum value: ", max_key)
      
  2. Problem-Specific Concepts:

    • Word Frequency Count: In this problem, we need to count the frequency of each word in the paragraph. This can be done using a dictionary where the keys are the words and the values are the counts.
      1
      2
      3
      4
      5
      6
      7
      
      word_count = {}
      words = ["apple", "banana", "apple"]
      for word in words:
          if word not in word_count:
              word_count[word] = 1
          else:
              word_count[word] += 1
      
  3. Integration of Drills:

    • Start by changing the case of the paragraph and splitting it into words (String Manipulation).
    • Remove punctuation from the words (String Manipulation).
    • Initialize an empty dictionary for word count (Data Structures).
    • Loop over the words and increment the count in the dictionary for each word (Loops and Iteration, Conditionals, and Word Frequency Count).
    • Loop over the banned words and remove them from the dictionary (Loops and Iteration, Conditionals, and Data Structures).
    • Finally, find the word with the maximum count in the dictionary (Loops and Iteration, Conditionals, and Algorithm for Finding Maximum Value). This is the most common non-banned word.