Remove All Adjacent Duplicates In String

This problem can be solved by using a stack data structure to keep track of the characters in the string.

Here’s the basic idea:

  • Initialize an empty stack.
  • Iterate over the characters in the input string.
    • If the stack is not empty and the current character is the same as the top character in the stack, it means we’ve encountered a pair of adjacent and equal letters. In this case, we pop the top character from the stack to ‘remove’ the pair.
    • If the stack is empty or the current character is not the same as the top character in the stack, we push the current character into the stack.
  • At the end of the iteration, the stack will contain the final string after all duplicate removals, but in reverse order. So, we need to reverse the stack and join its contents to get the final string.

Here is a Python code snippet that implements this approach:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
class Solution:
    def removeDuplicates(self, s: str) -> str:
        stack = []

        for char in s:
            if stack and char == stack[-1]:
                stack.pop()
            else:
                stack.append(char)

        return ''.join(stack)

In this solution, stack[-1] is used to access the top element of the stack. If the stack is empty, this will not raise an IndexError, but the stack and part of the condition ensures that we don’t attempt to access stack[-1] if the stack is empty.

When char == stack[-1] is true, we’ve found a pair of adjacent and equal letters, so we perform a ‘duplicate removal’ by popping the top character from the stack with stack.pop().

If char != stack[-1], we push the current character onto the stack with stack.append(char).

After we’ve iterated over all the characters in s, we join the remaining characters in the stack into a string with ''.join(stack). The join operation turns the list of characters in the stack into a string.

The removeDuplicates function takes a string s as input and returns the final string after all possible duplicate removals. This solution works for all valid inputs due to the problem’s guarantee that an answer exists and is unique.

This solution has a time complexity of O(n), where n is the length of the string, since we’re making one pass over the string. The space complexity is also O(n), in the worst case, when all characters are distinct, so the stack ends up containing all the characters from the string.

How did you decide that stack will be useful to solve this problem?

A stack is a data structure that follows the Last-In-First-Out (LIFO) principle, meaning the last element added to the stack will be the first one to be removed. It’s a very useful tool when you need to maintain a specific order of operations, and especially when you need to backtrack or ‘undo’ operations.

In this specific problem, the need to find and remove adjacent duplicate characters suggests that we need to look at the current character in the context of the ones that came before it. More specifically, we need to compare the current character to the most recent character that has not been removed.

This is where the stack comes into play. By adding characters to the stack as we traverse the string, we can easily compare the current character to the one on the top of the stack, which represents the most recent non-removed character. If the current character is the same as the top character on the stack, we can remove the top character from the stack, effectively removing the pair of adjacent duplicates.

In addition, the stack makes it easy to handle multiple rounds of removals. After a pair of duplicates is removed, the new top of the stack represents the next most recent non-removed character, allowing us to handle cases where removing a pair of duplicates reveals another pair.

So the decision to use a stack was based on the nature of the problem: the need to compare each character to its most recent non-removed predecessor and to handle multiple rounds of removals in an efficient manner.

Can you breakdown the solution into smaller exercises to separate different concepts used to solve the problem?

What are the building blocks of this solution?

At a higher level of abstraction, the comparison between data in different data structures, in this case, comparing an element from an iterable (string) with the top element of a stack, is a form of Cross-Referencing.

Cross-Referencing is a strategy that allows you to verify or coordinate elements from different data structures to achieve a desired result. Here, we’re essentially cross-referencing the current character from the string with the top element of the stack to identify and remove duplicate pairs.

The building blocks for this solution are:

  1. Iteration over a string: Here, we traverse the given string from left to right, examining each character in turn.

  2. Stack data structure: The stack is used to keep track of the characters we’ve examined so far. The top of the stack always holds the most recent character that has not been removed.

  3. Comparison of current character with top of stack: For each character, we compare it to the character at the top of the stack. If they are the same, it means we’ve found a pair of adjacent duplicates and we can remove them.

  4. Removal of characters from the stack: If the current character is the same as the one at the top of the stack, we remove the top character from the stack, effectively removing the pair of adjacent duplicates.

  5. Adding characters to the stack: If the current character is not the same as the top character on the stack, or if the stack is empty, we push the current character onto the stack.

  6. Conversion of final stack to string: Once we’ve processed all the characters, the stack will contain the characters of the final string in reverse order. We need to convert this to a string, reversing the order of characters in the process.

I also observed processing on the fly by manipulating two different data structures, how to name this kind of processing of data?

This type of processing is often referred to as Dynamic Processing. In this case, we are dynamically processing the string by actively manipulating a stack data structure as we iterate through the string. This allows us to handle and solve the problem in an efficient manner, dealing with each element once and updating our data structure(s) accordingly as we progress.

In the context of ‘on the fly’ computations, we’re typically referring to calculations that are done concurrently with data input or data reception. This might occur, for example, when we are processing a stream of incoming data in real time.

Consider the following simple example. Let’s say we want to calculate a running average from a stream of incoming numbers.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
def calculate_running_average(stream):
    total, count = 0, 0
    for number in stream:
        total += number
        count += 1
        yield total / count

# Assume we have a stream of incoming numbers
numbers = [10, 20, 30, 40, 50]
stream = iter(numbers)

for average in calculate_running_average(stream):
    print(average)

In this code, calculate_running_average is a generator function that calculates the running average of the numbers in stream on the fly. It does this by maintaining a total sum and count of the numbers it has processed so far, and yielding the average each time it processes a new number.

This is an example of dynamic or ‘on the fly’ processing, where we perform calculations concurrently with data input.

I think it is better to call it as On the Fly Computation to avoid confusing it with Dynamic Programming

You’re right. The term “On the Fly Computation” is indeed a more suitable term to describe the process of performing calculations concurrently with data input or reception. This term doesn’t carry the same connotations as “Dynamic Programming”, which is a specific strategy for solving problems by breaking them down into overlapping sub-problems, storing the results of these sub-problems, and reusing them when needed.

The reason is the name is neither too abstract nor too concrete.

Indeed, the term “On the Fly Computation” hits a nice balance in the abstraction scale. It is not as abstract as to lose its specificity, but also not so concrete that it only applies to a narrow subset of situations.

It conveys the idea of calculating or processing things immediately as data is being read or received, which is indeed what we’re doing in the solution of this problem: as we read each character from the string, we immediately decide whether to add it to the stack or to remove the top element from the stack.