Longest Substring Without Repeating Characters

To find the length of the longest substring without repeating characters in a given string, you can use the sliding window technique. Here’s a code that does this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
class Solution:
    def lengthOfLongestSubstring(self, s: str) -> int:
        max_length = 0
        start = 0
        char_index_map = {}  # To keep track of the characters and their last occurrence

        for end, char in enumerate(s):
            # If the character is found in the map and its index is greater than or equal to start, update start
            if char in char_index_map and char_index_map[char] >= start:
                start = char_index_map[char] + 1

            # Update the current character's index
            char_index_map[char] = end

            # Update the maximum length if the current window is larger
            max_length = max(max_length, end - start + 1)

        return max_length

Here’s a brief explanation of how it works:

  1. Use two pointers, start and end, to define a sliding window of unique characters.
  2. Move the end pointer to the right and check if the character is in the char_index_map.
  3. If the character is found and its last occurrence is within the current window, move the start pointer to the right of that occurrence.
  4. Keep track of the maximum length of the window, which gives you the length of the longest substring without repeating characters.

Abstract Representation of the Problem

In an abstract sense, this problem is about finding the longest contiguous subsequence in a given sequence where all elements are unique. The sequence can contain various types of elements such as letters, digits, symbols, and spaces. The measure of the solution is the length of this longest subsequence.

Key elements:

  • A sequence (array or string) of elements.
  • A subsequence that has unique elements.
  • Length as the measure of the solution.

Constraints:

  • Length of the sequence is within a certain limit.
  • Elements in the sequence can be from a specified set (English letters, digits, symbols, spaces).

The goal is to identify this subsequence and report its length.

Problem Classification

This problem falls under the category of sequence analysis. Specifically, it involves analyzing a sequence of characters to find a particular kind of sub-sequence (substring in this case) that satisfies certain uniqueness conditions. Sequence analysis is common in computer science and bioinformatics but can be applied in various other domains as well.

Cues

I have identified the following two elements as the cues from analyzing the problem:

  • Subsequence Extraction
  • Validate Uniqueness of Characters

These two elements serve as cues for approaching the problem.

  1. Subsequence Extraction: The goal is to extract a substring, which is a contiguous sequence of characters from the given string.

  2. Validate Uniqueness of Characters: While extracting, you need to ensure that the characters in the substring are unique.

Understanding these cues can guide your problem-solving strategy. For instance, you might decide to employ a sliding window technique to handle subsequence extraction while using a data structure like a set to validate the uniqueness of characters within that window.

This is about Subsequence Extraction where the characters must be unique and the longest we could find in the given input.

The problem is focused on subsequence extraction, but with the constraint that the characters in the subsequence must be unique. Furthermore, the objective is to find the longest such subsequence within the given input string. This is a more specialized form of sequence analysis, aimed at maximizing the length while maintaining character uniqueness.

Loop Invariant Structure for the Problem

Initialization

Before the loop starts, we create a set to store unique characters and two pointers start and end to indicate the boundaries of our current substring. At this point, the set is empty, and both pointers are at the 0th index. Our loop invariant is that the set contains all unique characters between start and end.

Maintenance

During each iteration, we move the end pointer one step forward and add the new character to our set. If we find a duplicate, we remove the character at the start pointer from the set and move start one step forward. This maintains the loop invariant: the set contains unique characters between start and end.

Termination

When the loop ends, the set will contain unique characters from the longest substring without repeating characters. The distance between start and end will give us the length of this substring, ensuring the problem is solved correctly.

By following these steps, we prove that our loop invariant holds at Initialization, is maintained during each iteration, and gives us a useful property (the longest substring without repeating characters) at Termination.

Tabular Representation

Here’s a table illustrating these steps for the string “pwwkew”:

Step NoStartEndCurrent CharSetActionLongest Substring
100p{p}Add ‘p’p
201w{p, w}Add ‘w’pw
302w{p}Remove ‘p’, Move Startpw
412w{w}Remove ‘w’, Move Startpw
522w{w}Add ‘w’w
623k{w, k}Add ‘k’wk
724e{w, k, e}Add ’e’wke
825w{k, e}Remove ‘w’, Move Startwke
935w{k, e, w}Add ‘w’kew

The “Longest Substring” column keeps track of the longest substring without repeating characters that we’ve seen so far. In this case, “wke” is the longest substring without repeating characters, and its length is 3.