Find K Pairs with Smallest Sums

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import heapq

class Solution:
    def kSmallestPairs(self, nums1, nums2, k):
        if not nums1 or not nums2:
            return []
        heap = [(nums1[0] + nums2[0], 0, 0)]
        visited = set([(0, 0)])
        pairs = []

        while heap and len(pairs) < k:
            _, i, j = heapq.heappop(heap)
            pairs.append([nums1[i], nums2[j]])

            if i + 1 < len(nums1) and (i + 1, j) not in visited:
                heapq.heappush(heap, (nums1[i + 1] + nums2[j], i + 1, j))
                visited.add((i + 1, j))

            if j + 1 < len(nums2) and (i, j + 1) not in visited:
                heapq.heappush(heap, (nums1[i] + nums2[j + 1], i, j + 1))
                visited.add((i, j + 1))

        return pairs

Problem Classification

This problem falls under the domain “Heap” and “Sorting” problems. The problem involves dealing with two sorted integer arrays and the task is to find pairs with the smallest sums.

The “What” components of the problem are:

  1. Input: Two sorted integer arrays nums1 and nums2, and an integer k.
  2. Output: The k pairs of integers, one from each array, that have the smallest sums.
  3. Constraints: The lengths of the arrays are between 1 and 105, the array elements can be any integer between -109 and 109, the arrays are sorted in ascending order, and k is between 1 and 104.

The problem can be further classified as a “Pairing problem with priority on sum” because we’re interested in pairs of elements from the two arrays, but the pairs must be chosen based on their sum, with a priority on smaller sums.

Understanding these aspects of the problem can guide the design of a solution. The sorted nature of the arrays suggests that there may be an efficient algorithm that doesn’t need to examine every possible pair of elements. The focus on the smallest sums and the number k suggests that a data structure like a heap could be useful, as it can efficiently keep track of the smallest elements seen so far.

Constraints

In this problem, there are a few characteristics and constraints that can be used to develop an efficient solution:

  1. Sorted arrays: nums1 and nums2 are already sorted in ascending order. This implies that pairs with smaller sums can be found towards the front of the arrays. Thus, we don’t necessarily need to consider all pairs, especially those that involve larger numbers.

  2. Limit on the number of pairs: The integer k limits the number of pairs we need to find. This can significantly reduce the computational complexity if k is much smaller than the product of the lengths of the two arrays. Again, this supports an approach where we prioritize considering pairs involving smaller numbers.

  3. Range of numbers: The elements of the arrays can range from -109 to 109. The negative numbers could potentially affect the sums, and hence the order of the pairs. Therefore, this problem doesn’t just involve finding the k smallest pairs in terms of the individual numbers, but the k pairs with the smallest sums.

  4. Size of arrays: The size of the arrays can be up to 105. This implies that a solution with a time complexity that is quadratic or worse in the size of the arrays may not be efficient enough.

These characteristics suggest that we may be able to use a heap or priority queue to maintain the k pairs with the smallest sums, which we continuously update as we consider pairs in some order dictated by the sorted arrays. The heap or priority queue allows us to efficiently find and remove the pair with the largest sum, which we can replace with a new pair if its sum is smaller. This approach can keep the time complexity manageable, even for large arrays.

Thought Process

The key insight from the problem statement is the need to find the k smallest pairs based on their sums from two sorted arrays. This is a clear cue that some form of priority-based selection is needed, potentially using a data structure like a min-heap.

Furthermore, the arrays are sorted in ascending order, which is a strong hint that we should always consider the pair with the smallest sum at each step. This is a common pattern in problems that involve sorted arrays or sequences.

The challenge here is to efficiently generate and consider the pairs in order of their sums, without having to explicitly generate all possible pairs, which could be very large.

With these insights in mind, let’s approach the problem in the following steps:

  1. Use a min-heap to store pairs of elements (with their sums as the priority). Initialize the heap with the pair consisting of the first elements from both arrays.

  2. Pop the heap to get the current smallest sum, and push the next two potential pairs into the heap. The next two pairs are made by pairing the popped elements’ neighbors with the other element.

  3. Repeat this process until we’ve found k pairs or the heap is empty.

  4. The k pairs popped from the heap form our answer.

Here is the Python3 code implementing this approach:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import heapq

class Solution:
    def kSmallestPairs(self, nums1, nums2, k):
        if not nums1 or not nums2:
            return []
        heap = [(nums1[0] + nums2[0], 0, 0)]
        visited = set([(0, 0)])
        pairs = []

        while heap and len(pairs) < k:
            _, i, j = heapq.heappop(heap)
            pairs.append([nums1[i], nums2[j]])

            if i + 1 < len(nums1) and (i + 1, j) not in visited:
                heapq.heappush(heap, (nums1[i + 1] + nums2[j], i + 1, j))
                visited.add((i + 1, j))

            if j + 1 < len(nums2) and (i, j + 1) not in visited:
                heapq.heappush(heap, (nums1[i] + nums2[j + 1], i, j + 1))
                visited.add((i, j + 1))

        return pairs

The visited set ensures that a pair is not inserted into the heap more than once. The heapq module in Python standard library provides an efficient implementation of a heap data structure, which supports the necessary operations, including pushing an element (heappush), popping the smallest element (heappop), and creating a heap from a list (heapify).

Solution Approach and Analysis

To solve this problem, one might be tempted to generate all possible pairs of elements from the two arrays, compute their sums, sort the pairs by their sums, and finally select the first k pairs. However, this approach can be quite inefficient, especially when the arrays are large, because it involves a sorting step with a time complexity of O(n log n), where n is the number of pairs, which can be up to the product of the lengths of the two arrays. Furthermore, it doesn’t take full advantage of the fact that the arrays are already sorted.

A more efficient approach is to use a min-heap or priority queue, which is a data structure that allows us to efficiently find and remove the element with the smallest (or largest) value. The elements in this heap will be pairs of numbers, one from each array, and they will be sorted by their sums. The size of the heap will never exceed k, which means that we will be maintaining the k pairs with the smallest sums at any given time.

Here is a step-by-step breakdown of the process:

  1. Initialization: Initialize the min-heap with the first pair of numbers (i.e., the first number from each array). Also, maintain a set (or another suitable data structure) to keep track of the pairs that we have already added to the heap to avoid duplicates. The pair of indices (not the actual numbers) can be used as a unique identifier for each pair.

  2. Heap construction: Remove the pair with the smallest sum from the heap, add it to the result list, and add the next two pairs involving the next number from either array to the heap (if they have not been added before). Repeat this process until we have found k pairs or have examined all possible pairs.

  3. Result extraction: The pairs in the result list will be the k pairs with the smallest sums. Note that they may not be in order of increasing sums because of the way we have added pairs to the heap. If the pairs need to be sorted by their sums, we can perform a sorting step at the end. However, this is not required by the problem statement.

This approach works because the arrays are sorted in ascending order, which means that pairs involving a smaller number from either array are likely to have a smaller sum. Therefore, by always considering the pair with the smallest sum (which can be efficiently found with the min-heap) and its “neighbors” in the next steps, we are exploring the pairs in roughly increasing order of their sums.

Let’s illustrate this process with an example:

Example 1:

  • nums1 = [1,7,11]
  • nums2 = [2,4,6]
  • k = 3

Step 1: Initialize the heap with the pair (1,2). The heap and result list are as follows:

  • Heap: [(1,2)]
  • Result: []

Step 2: Remove the pair with the smallest sum from the heap (i.e., (1,2)), add it to the result list, and add the next two pairs involving the next number from either array to the heap. The heap and result list are now:

  • Heap: [(1,4), (7,2)]
  • Result: [(1,2)]

Step 3: Repeat the process:

  • Heap: [(1,6), (7,2), (7,4)]
  • Result: [(1,2), (1,4)]

Step 4: Repeat the process again:

  • Heap: [(7,2), (7,4), (11,2)]
  • Result: [(1,2), (1,4), (1,6)]

At this point, we have found k = 3 pairs with the smallest sums

, so we stop and return the result list.

Note that if the value of k was larger, we would continue the process until we found k pairs or examined all possible pairs.

The time complexity of this approach is O(k log k) because we perform k insertions and removals from a heap of size k. The space complexity is also O(k) for the heap and the set.

This approach is quite flexible and can handle changes in the problem’s parameters well. If the arrays are not sorted, we would need to sort them first, which would add an O(n log n) time complexity step. If we need to find the k pairs with the largest sums instead, we can use a max-heap instead of a min-heap. If the numbers can be negative, this approach still works because it correctly handles pairs with negative sums.

Language Agnostic Coding Drills

  1. Concept Identification

    Let’s dissect the code and identify distinct concepts it encompasses:

    • Variable initialization: It includes creating and initializing the lists, tuples, and sets.
    • Condition checks: This involves the use of ‘if’ statements to check certain conditions like the presence of elements in lists.
    • Heap operations: The code uses a heap data structure to keep track of the smallest pairs, and the operations include heap push and pop.
    • Looping: ‘while’ loop is used for repetition until a certain condition is met.
    • List operations: This includes operations like list indexing and appending elements to the list.
    • Set operations: Involves adding elements to a set and checking membership in a set.
  2. Concept Difficulty Leveling

    Arranging these concepts in increasing order of difficulty:

    • Variable initialization: This is a basic skill necessary for almost any programming task. Hence, it is categorized as an easy level.
    • Condition checks: This is slightly more complex than variable initialization but still basic to programming. Therefore, it’s at the easy level.
    • List operations: These are fundamental to programming, especially in Python, and can be classified as easy to medium.
    • Looping: This is an intermediate-level concept as efficient use of loops often requires understanding of logic and control flow.
    • Set operations: Using sets efficiently involves understanding of data structures and can be categorized as medium difficulty.
    • Heap operations: This is a more advanced concept because heaps are more complex data structures that are not as commonly used as lists or sets. This would be classified as a high difficulty level.
  3. Problem-solving Approach

    To arrive at the solution, we start by initializing our variables. Then we check if the provided lists are empty. If they are, we immediately return an empty list as there are no pairs to be found.

    Next, we start our ‘while’ loop, which will continue as long as we have not found ‘k’ pairs and our heap is not empty. In each iteration of the loop, we pop the smallest element from the heap, which gives us the smallest sum and its corresponding indices. We add this pair to our result list.

    We then check if we can form a valid new pair by incrementing the index of nums1 or nums2, and if we haven’t already visited this pair. If we can, we add the pair’s sum and indices to the heap and mark the pair as visited.

    The loop continues until we have found ‘k’ pairs or exhausted all possible pairs. Finally, we return the list of smallest pairs.

    Each identified coding concept contributes to the final solution by performing a specific task, be it maintaining the data structure that allows us to efficiently find the smallest sums (heap operations), ensuring we do not consider the same pair twice (set operations), or simply controlling the overall flow of the program (looping and condition checks).

Similarity

It is hard for me to understand at the beginning, but I finally came up with an idea to view this question, hope it can help you!

It is actually the same as how we merge k sorted list, in this question, the following are the k sorted list(each number in nums1[] full-mesh with the numbers in nums2[].

(1,2) -> (1,9) -> (1,10) -> (1,15) (7,2) -> (7,9) -> (7,10) -> (7,15) (11,2) -> (11,9) -> (11,10) -> (11,15) (16,2) -> (16,9) -> (16,10) -> (16,15)

Remember how we do in “merge k sorted list”? We simply add the head of the list into the heap and when a node is poll(), we just add the node.next.

Your understanding is absolutely correct! This problem is very similar to “merge k sorted lists”, except that here we’re merging all possible pairs of two sorted arrays instead of merging sorted lists.

In “merge k sorted lists” problem, we put the first element of each list into a min heap, and then keep popping the smallest element from the heap and adding the next element of the list from which we’ve just removed the element.

We can use a similar approach in this problem, where we treat each possible pair (nums1[i], nums2[j]) as a “list”, where the “next” element of the pair is (nums1[i], nums2[j+1]) if j+1 is in the boundary of nums2.

Here’s how we can do this:

  1. Create a min heap to keep track of the next smallest pair, where the “smallness” is defined by the sum of the two numbers in the pair. Initialize the heap with the first pair (nums1[0], nums2[0]), along with their indices as (0, 0).

  2. Pop the heap to get the current smallest pair, and then push the next pair into the heap. The next pair is (nums1[i], nums2[j+1]) where (i, j) are the indices of the current smallest pair. Remember to check the boundary of nums2.

  3. Also, to make sure we do not have duplicate pairs in the heap, we can use a set to keep track of whether a pair of indices (i, j) has been visited. So whenever we push a new pair of indices into the heap, we first check if it has been visited. If not, we mark it as visited by adding it into the set.

  4. Repeat step 2-3 until we’ve got k pairs.

  5. The time complexity of this solution is O(k log min(k, n, m)), where n and m are the lengths of nums1 and nums2 respectively. This is because we do k times of heap push and pop, each of which takes log(min(k, n, m)) time. The space complexity is O(min(k, n, m)) for the heap and the set.

Targeted Drills in Python

Here are the individual coding drills for each of the concepts identified in the problem:

Concept 1: Understanding Data Structures - Arrays

In Python, arrays are implemented using lists. A list is an ordered collection of items. We can define a list as follows:

1
2
3
# Defining a list
nums1 = [1, 7, 11]
nums2 = [2, 4, 6]

Concept 2: Iterating over Arrays

We can iterate over a Python list using a for loop:

1
2
3
# Iterating over a list
for num in nums1:
    print(num)

Concept 3: Understanding Data Structures - Priority Queue (Heap)

In Python, a heap is implemented using the heapq library. A heap is a binary tree where parent nodes are less than or equal to their child nodes. It’s typically used to create a min heap.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
import heapq

# Defining a heap
heap = []

# Adding items to the heap
heapq.heappush(heap, (1, (0, 0)))

# Popping the smallest item from the heap
smallest = heapq.heappop(heap)

Concept 4: Understanding Data Structures - Sets

In Python, a set is an unordered collection of unique items. We can define a set as follows:

1
2
3
4
5
6
7
8
9
# Defining a set
visited = set()

# Adding items to the set
visited.add((0, 0))

# Checking if an item is in the set
if (0, 0) in visited:
    print("Visited")

Concept 5: Problem-Specific Concept - Generating and Tracking Pairs

In the problem, we need to generate pairs from two arrays and track whether a pair of indices has been visited.

1
2
3
4
5
6
7
8
9
# Generating pairs and adding to the heap
for i in range(len(nums1)):
    for j in range(len(nums2)):
        pair = (nums1[i], nums2[j])
        heapq.heappush(heap, (sum(pair), (i, j)))
        
        # Checking if the pair of indices has been visited
        if (i, j) not in visited:
            visited.add((i, j))

To solve the initial problem, we need to integrate these pieces together in the correct order:

  1. Define the heap and the set.
  2. Add the first pair to the heap and its indices to the set.
  3. Pop the heap to get the current smallest pair and push the next pair into the heap.
  4. Repeat the previous step until we’ve got k pairs or the heap is empty.
  5. Return the k pairs.

Q&A

I’m sorry to hear about your experience, it can be challenging to communicate our thought processes in high-pressure situations like job interviews.

You’re absolutely right about the rationale behind adding only [i, j + 1] and not [i + 1, j] to the queue. This has to do with how the problem is structured and how we ensure all valid pairs are considered without redundancy.

The algorithm works under the premise that we always add pairs from the sequence in a sorted order. As you pointed out, initially we add pairs [0, 0], [1, 0], … [min(k, n), 0] to the queue, which ensures that for any i, there is always a pair [i + 1, x] in the queue (except for the last pair).

As we proceed and start polling the queue, for any pair [i, j] polled out, we add the next pair [i, j + 1] to the queue. At this point, the pair [i + 1, j] would have already been considered and added to the queue in previous steps, because we started with pairs where the second element of the pair is zero and we are incrementing it. Adding [i, j + 1] ensures we’re considering all valid pairs that follow [i, j] in the sorted order.

This approach allows us to maintain a sorted sequence without having to generate all possible pairs upfront. It’s essentially a lazy approach where we generate the next smallest pair only when needed.

It is important to note that this problem is an example of using a Priority Queue to merge sorted sequences, and it might help to frame the explanation in those terms, emphasizing that we ensure each possible pair is considered exactly once.

Hope this helps you for your future interviews. Don’t be too hard on yourself. Sometimes, the communication gap is not because of a flaw in your explanation but due to the listener’s preconceived notions or lack of familiarity with the approach. Your understanding of the problem and the solution is spot on.

I’m really confused… when we poll pair (i, j), the next candidates are (i+1, j) and (i, j+1), the code just add (i, j+1), how can we ensure that (i+1, j) is in the heap???? Can anyone explain it to me. thks!

That’s a great question, and I understand why it might be a bit confusing. Let’s walk through it.

The key point here is to understand that we initially add all pairs of the form [i, 0] to the priority queue, where i ranges from 0 to min(k, n). So, for every i-th element in nums1, the pair [i, 0] will be in the queue at the beginning.

Then, during the polling process, for each pair [i, j] that gets polled out of the queue, we add the next pair [i, j + 1] to the queue (if it exists).

Let’s take an example:

Let nums1 = [1,7,11], nums2 = [2,4,6], and k = 3.

At the start, we add [0, 0], [1, 0], and [2, 0] to the priority queue, which correspond to pairs (1,2), (7,2), and (11,2) respectively.

Now let’s poll out the smallest pair, which is [0, 0] (i.e., (1,2)). Next, we add [0, 1] to the queue, which corresponds to (1,4).

At this point, the queue contains [1, 0] (i.e., (7,2)), [2, 0] (i.e., (11,2)), and [0, 1] (i.e., (1,4)).

As we keep polling out the smallest pairs and adding the next pair, we ensure that all possible pairs are considered. When we poll out [1, 0], we will add [1, 1] (i.e., (7,4)) to the queue. Hence, the pair [2, 0] (i.e., (11,2)), which might seem missing when [1, 0] was polled out, will already be in the queue before we poll [1, 0].

So, the [i+1, j] pair will have been already considered in a previous step, and we do not need to add it again when we poll [i, j].

I hope this clarifies your confusion!

I read the popular solutions using a priority queue and understand how it works. However, intuitively, doesn’t this problem seem like it can be done with 2 pointers? One pointer for each array, advance them in a certain way. I can’t get it to work and I guess it cannot be done. Can anyone explain why?

Intuitively, a two-pointer approach seems viable because you’re looking to pair elements from two sorted arrays. Usually, for problems involving sorted arrays or linked lists, a two-pointer strategy can be efficient.

However, in this case, a two-pointer strategy won’t work. Let’s see why.

Consider the two arrays nums1 = [1,7,11] and nums2 = [2,4,6]. The smallest possible sum comes from pairing the first elements of both arrays, 1 and 2. So you might think you could initialize two pointers at the start of both arrays, add the pair to your result, and then move the pointer of nums2 forward to create the next pair (1,4).

But this strategy falls apart quickly because the second smallest sum comes from pairing 7 (second element of nums1) and 2 (first element of nums2), not from the pair (1,4).

The problem here is that moving forward in both arrays independently doesn’t guarantee you’ll get the smallest sums because every element in nums1 can be paired with every element in nums2. The two-pointer strategy fails to account for these cross pairings that might yield a smaller sum.

That’s why we need a priority queue. It enables us to keep track of all potential pairings in a systematic way. It allows us to consider all possible sums and ensures we’re always choosing the smallest one available.

There will be many variables, two pointer wont work that way. If i,j in num1,num2: Once we move forward we have to maintain additional pointers for i+1,j-1 combination, and moving ahead,i+2,j-2,j-1 and i+1,j,j+1 and so on. Things will get more tricky if sum1==sum2.

Indeed, you are correct. A two-pointer approach would indeed become too complex and cumbersome for this problem. The main reason behind this is that for each element in the first array, we have potential pairs with each element in the second array, and vice versa. This leads to a combinatorial explosion of possible pairings.

Even if we maintain additional pointers for the combinations like (i+1, j-1), (i+2, j-2), etc., we’ll face difficulties. The number of such combinations will continue to grow as we proceed, and managing these pointers will become increasingly tricky. Also, as you pointed out, if we have equal sums, deciding which pointer to move becomes even more complex.

The problem essentially requires us to maintain a global ordering of the sums, which is why a priority queue based approach is better suited here. The priority queue automatically handles the ordering of the sums and allows us to systematically explore the pairings in increasing order of their sums. Therefore, it offers a much cleaner and more manageable solution compared to a multi-pointer approach.

I’ll try to put this in a simple way. If we keep 2 pointers both pointers at 0th index for each array. Now if we advance the pointer at 2nd array, we lost all the possible pairs that can be made with the other elements present in the 1st array and the element present in the 0th index of the 2nd array that could be a smaller pair.

Exactly, that’s a great way to illustrate why a two-pointer approach won’t work for this problem. If we were to advance the pointer in one of the arrays, we’d potentially be missing out on pairs that could yield a smaller sum.

This problem is essentially about finding the smallest possible sums from all possible pairings between two arrays. When we advance one pointer, we’re essentially ignoring all the potential pairings that could be made with the element at the current index in the other array.

To ensure we consider all possible pairs and find the smallest sums, we need a way to keep track of all potential pairs that could be considered next. This is why a data structure like a heap (priority queue) is more appropriate for this problem: it lets us keep track of all potential pairings in an organized way, based on the sum of each pair.

Thank you for this simple and clear explanation! It helps highlight why it’s important to consider all potential pairings and why a two-pointer approach falls short in this case.

[1, 2, 2, 4] [1, 2, 2, 4]

pq solution is like a two pointer one, suppose we are doing 2pointer l = 0, r = 3 now we see l = 1, r = 0 is better (smaller) then we have to save the state of l=0,r=3 we might need it again, now suppose we reach at l=1,r=1 but though being better than l=0,r=3, not better than l=2,r=0 now we have to manage the which state occupies which position we can do binary search and add at the relevant position or just use pq.

pq seems less implementation heavy.

Yes, your explanation makes it very clear why a priority queue is a good choice for this problem!

A two-pointer approach could in theory work, but as you’ve pointed out, it would involve managing multiple different “states” or pairs of pointers, and determining which state to consider next could involve something as complex as a binary search.

The priority queue (pq) solution, on the other hand, automatically keeps track of the “best” (i.e., smallest) sums that can be achieved with the current pairs. This way, you always have the next best pair ready to be processed without having to manually search for it among all possible pairs.

With the pq approach, you just keep adding new pairs to the queue, and it does all the work of maintaining the order of the pairs based on their sums. This greatly simplifies the implementation, which is a big plus in coding problems where simplicity and readability matter.

So, indeed, a priority queue approach is less implementation heavy and more efficient for this problem. Thank you for your insightful observation!

I though the same way. The idea is to place one pointer on nums1 and another on nums2, you advance nums2 pointer as long as the sum of both pointers is less than pointer1+nums2[0], that is sum after advancing nums1 pointer and the start of nums2. I am also curious for some test cases, k>n^2. How come that be possible? You can only form n^2 pairs from vectors of size n, this is insane.

The two-pointer approach you mentioned does seem intuitive, but the problem with it is that it’s not always guaranteed to give you the smallest possible sums. When you advance the pointer in nums2 as long as the sum of both pointers is less than nums1[pointer1] + nums2[0], you might miss out on combinations that involve the initial elements of nums2 with later elements of nums1 that could give smaller sums. The priority queue solution overcomes this problem by considering all possible pairs, not just adjacent or near-adjacent ones.

As for your question about k > n^2, the problem constraints do indeed allow k to be up to 10^4, while the lengths of nums1 and nums2 can be up to 10^5. In most real-world cases, you’d expect k to be less than or equal to the total number of possible pairs (n^2), as there are only n^2 unique pairs to choose from if nums1 and nums2 both have length n. However, the problem might be allowing for k > n^2 to test how your code handles edge cases where you’re asked for more pairs than actually exist. In such cases, your code should still return the correct output, which would be all n^2 unique pairs.

In case anyone else started down the same wrong path I did - I thought I could use two pointers and compare the values of the next numbers in each list to find the smaller value. But this will not work because it will not search all possible pairs - need to think of a new approach

Indeed, a two-pointer approach isn’t suitable for this problem. The problem is looking for the smallest sums of pairs, not the smallest individual elements. This means that a smaller number in one list could pair with a larger number in the other list to make a smaller sum than a pair of larger numbers from the opposite positions in the lists.

This problem is better approached using a priority queue (heap), where each entry in the queue is a pair consisting of the sum of a pair of numbers (one from each list), and the indices of those numbers in their respective lists. The queue is ordered by the sums, so that the pair with the smallest sum is always at the front.

Here’s the step-by-step approach:

  1. Create a min-heap to keep track of the sum of pairs (along with their indices). The smaller sums are arranged at the top.

  2. Initialize the heap with the smallest possible sum pair, i.e., nums1[0] + nums2[0], and its indices (0,0).

  3. Pop the heap to get the current smallest sum, along with the indices of the elements that make up the sum. Let’s say the indices are (i, j).

  4. Next, insert only the next possible pairs in the heap, which could potentially have the next smallest sum. They can be either (i+1, j) or (i, j+1). Make sure you don’t repeat any pair.

  5. Keep repeating steps 3 and 4 until we’ve found ‘k’ pairs.

This approach ensures that all potential pairs are considered without duplication, and always selects the pair with the smallest sum, making it an efficient solution to the problem.

the example gave: [1,1],[1,1],[1,2],[2,1],[1,2],[2,2],[1,3],[1,3],[2,3]

what about: [1,1],[1,1],[1,2],[1,2],[2,1],[1,3],[1,3],[2,2],[2,3] ?

Both of your sequences are valid and correctly ordered, but they represent different sequences of choices for pairs. Let me explain:

The sequence you gave: [1,1],[1,1],[1,2],[1,2],[2,1],[1,3],[1,3],[2,2],[2,3]

This sequence starts by choosing both [1,1] pairs, then choosing both pairs with sum 3 (i.e., [1,2]), then choosing [2,1], then choosing both pairs with sum 4 (i.e., [1,3]), then choosing [2,2], and finally choosing [2,3].

The sequence from the example: [1,1],[1,1],[1,2],[2,1],[1,2],[2,2],[1,3],[1,3],[2,3]

This sequence also starts by choosing both [1,1] pairs, but then it alternates between choosing the pairs with sum 3 (i.e., [1,2] and [2,1]), then it alternates between the pairs with sum 4 (i.e., [1,3] and [2,2]), and finally it chooses [2,3].

Both sequences are correct in the sense that they contain the same pairs and the sums of the pairs are non-decreasing. However, the order in which the pairs are chosen may differ depending on the specifics of the implementation. Different valid implementations may produce either sequence.

Why does order within the pairs matter for this?

The order of the pairs matters in this problem because the problem statement asks for the “k pairs with the smallest sums”, and this implies an order of the sums of the pairs. However, the order of the numbers within each pair doesn’t matter as long as they’re correctly formed from the two input lists.

In other words, if the problem asks for the first 3 pairs with the smallest sums from the lists [1,7,11] and [2,4,6], it expects an output like [[1,2],[1,4],[1,6]] and not like [[1,6],[1,4],[1,2]], because the sum of the pairs in the first list is in increasing order, whereas in the second list, it is not.

It’s also worth noting that if there are multiple pairs with the same sum, they can appear in any order relative to each other. For example, if the problem asks for the first 2 pairs from the lists [1,1,2] and [1,2,3], both [[1,1],[1,1]] and [[1,1],[1,1]] are valid outputs because both pairs have the same sum.

If you’re stuck mentally thinking about two pointers and can’t figure out how to get to the answer- here’a a hint:

Each time you can either take a step down nums1 or nums2, a bit like traversing a binary tree What’s the best type of search to traverse a tree without going to the end of each branch? Remember to make sure you don’t visit the same node twice!

Your thought process is going in the right direction. Given that you’re considering each pair of elements from nums1 and nums2 and comparing their sums, it’s similar to traversing a binary tree or a grid, where at each step, you can choose to move to the right or downwards.

The type of search that allows you to traverse through such a data structure without necessarily going to the end of each branch is called a Breadth-First Search (BFS). BFS allows you to explore all the neighbors at the present depth before moving on to nodes at the next depth level. This is in contrast to a Depth-First Search (DFS) which dives deep into a branch before coming back up and moving to another branch.

A Priority Queue (a queue where elements are processed based on their priority rather than their insertion order) is often used in such situations to always process the pair with the smallest sum first. This is essentially implementing a version of the Dijkstra’s algorithm, which is an algorithm for finding the shortest paths in a weighted graph.

In this problem, the Priority Queue would store pairs of indices of elements from nums1 and nums2, along with the sum of the elements these indices point to. The Priority Queue is ordered by the sum, so the pair with the smallest sum always gets processed first.

As for avoiding visiting the same pair of indices twice, you can use a Set data structure to store the pairs of indices that you’ve already visited. Before adding a new pair of indices to the Priority Queue, you can check if it’s already in the Set.

It’s important to note that BFS with a Priority Queue is also known as a Best-First Search, which is an informed search algorithm that uses an evaluation function to decide which adjacent node is most promising and then explores. It’s used in route optimization algorithms and in machine learning.

Strict way to solve:

We need function F to find how many pairs with sum less than given value. Use two pointers and there will be O(n+m) time complexity.

You should find the exact Value. Condition: F(Value) <= k < F(Value+1). For this you will use binary search. It needs maximum log(4 * 1e9) = 34 iterations. So, it is just huge constant value and does not depend on input size.

With two pointers collect all pairs with sum less than Value. And add some pairs with sum equal to Value if it is needed. Time complexity O(k + n + m).

So, overall time complexity is O(k + n + m); As result: this solution beats 99% other solutions by time and 95% by memory.

Here is a Python solution based on your approach:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
class Solution:
    def kSmallestPairs(self, nums1, nums2, k):
        if not nums1 or not nums2:
            return []
        
        # Define binary search function
        def binary_search():
            left, right = nums1[0] + nums2[0], nums1[-1] + nums2[-1]
            while left < right:
                mid = left + (right - left) // 2
                cnt, j = 0, len(nums2) - 1
                for num in nums1:
                    while j >= 0 and num + nums2[j] > mid:
                        j -= 1
                    cnt += j + 1
                if cnt < k:
                    left = mid + 1
                else:
                    right = mid
            return left
        
        # Find the value
        value = binary_search()
        
        # Find the pairs with sum less than or equal to value
        result, j = [], len(nums2) - 1
        for num in nums1:
            while j >= 0 and num + nums2[j] > value:
                j -= 1
            for i in range(j + 1):
                result.append([num, nums2[i]])
                
        # Sort and return the k smallest pairs
        result.sort(key=sum)
        return result[:k]

This solution first uses binary search to find the smallest value that can be achieved by the sum of a pair of elements from nums1 and nums2 such that there are at least k pairs with a sum smaller or equal to this value. It then finds all the pairs with a sum less than or equal to this value, sorts them, and returns the k smallest pairs.

This solution has a time complexity of O(k + n + m) as described in your strict way to solve. The binary search operation takes O(log(sum_max - sum_min)) which is constant, the two-pointer approach to count the number of pairs with a sum less than or equal to the mid value takes O(n + m), and finding the k smallest pairs also takes O(n + m) and sorting them takes O(k log k). So, the overall time complexity is dominated by O(k + n + m).

Let’s take two arrays: [1,7,11] and [2,4,30]. What’s wrong with having a pointer at 1 and 2, then at each iteration comparing, for example, (1,4) vs. (7,2). If 1,4 greater than 7,2, we can them compare (1,30) vs 7,2 and continue like this k times?

Your approach is based on comparing the sums of two potential pairs and then moving the pointer accordingly. However, this approach doesn’t guarantee that you will find the k smallest sums.

In your example, your first pair will be (1,2). Now, you want to compare (1,4) and (7,2). Although (1,4) is smaller than (7,2), it does not necessarily mean that (7,2) is not part of the k smallest pairs. You would have skipped it in your algorithm, but (7,2) could potentially be a smaller pair than the other pairs you would encounter later in the algorithm.

The problem is that you’re making local decisions (i.e., choosing the smallest sum at each step), but these decisions do not necessarily lead to a global optimum (i.e., finding the k smallest pairs). This is why we use a priority queue in the algorithm. The queue always gives the smallest sum seen so far, and we don’t miss any potential smaller sums because we add all possible next steps to the queue.

In essence, the proposed approach might miss some valid smaller pairs, and thus it can’t be reliably used to solve this problem.

I tried this approach at first but the problem is once you’ve chosen to increment the pointers you can’t go back. You won’t iterate through all possible pairs, only a subset. But I only tried with one pointer for each array, not two like you suggested. So maybe there’s a way to make that work?

The approach you mentioned, also known as the “two-pointer” method, is indeed very useful for certain problems such as finding a pair with a specific sum. However, for this problem, it’s not applicable because the aim is to find the k pairs with the smallest sums from all possible combinations of pairs, not just adjacent or sequential ones.

Here’s why the two-pointer method does not work in this case:

  1. Once you’ve incremented a pointer, you cannot go back. Hence, you cannot create all possible pairs, only a subset of them.

  2. This problem requires us to consider pairs formed by elements from all positions in the arrays, not just from the current and next positions of the pointers. The two-pointer approach will miss out on potential pairs that could have smaller sums but are not being considered because they don’t fall next to the current pointer positions.

  3. The smallest sums might not necessarily come from numbers that are close together in the arrays, as the arrays themselves are sorted individually and not in relation to each other. So, the two-pointer method might skip potential pairs that could be part of the k smallest sums.

Instead, a priority queue-based approach ensures that you consider all potential pairs. It continually generates new pairs from the given arrays and always selects the pair with the smallest sum, guaranteeing the correct solution.

The smallest sum will be formed by index 0 of nums1 and index 0 of nums2. The next smaller sum will be either from {index 0 of nums1 and index1 of nums2} or {index 1 of nums1 and index 0 of nums2}.

Yes, you’re correct. In this problem, given two sorted lists, the smallest possible sum is indeed formed by the first elements of each list. The next smallest sum would either be the sum of the first element of the first list and the second element of the second list, or the sum of the second element of the first list and the first element of the second list.

This can be visualized as a grid where the cell at the i-th row and j-th column represents the pair (nums1[i], nums2[j]). The problem then boils down to efficiently traversing this grid to find the k smallest sums. The smallest sum is at the top left (0, 0). The next smallest sum is either one cell to the right (0, 1), or one cell down (1, 0).

But extending this logic, after selecting (0, 1), the next smallest could be at (0, 2) or (1, 1). It’s clear that we can’t simply iterate through the rows and columns, because the third smallest sum could be at (1, 1) or (0, 2). This is why a priority queue (or a heap) is a useful data structure for this problem: it can keep track of multiple “candidates” for the next smallest sum and efficiently select the smallest one.

Here’s how it might look in Python:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
class Solution:
    def kSmallestPairs(self, nums1, nums2, k):
        queue = []
        def push(i, j):
            if i < len(nums1) and j < len(nums2):
                heapq.heappush(queue, [nums1[i] + nums2[j], i, j])
        push(0, 0)
        pairs = []
        while queue and len(pairs) < k:
            _, i, j = heapq.heappop(queue)
            pairs.append([nums1[i], nums2[j]])
            push(i, j + 1)
            if j == 0:
                push(i + 1, 0)
        return pairs

In this code, for each pair, we “push” two new pairs into the priority queue: (i, j + 1) and (i + 1, j). The heap will automatically sort these pairs based on their sums, ensuring the smallest sums are popped first. We also need to ensure we don’t count any pair more than once, so we only push (i + 1, j) when j == 0.

can anyone tell me the what wrong with this solution .

//MY OUTPUT :[[-10,3],[-10,5],[-10,6],[-10,7],[-10,8],[-4,3],[-4,5],[-4,6],[-4,7],[0,3],[0,3]] //Expected :[[-10,3],[-10,5],[-10,6],[-10,7],[-10,8],[-4,3],[-4,5],[-4,6],[0,3],[0,3]]

public List kSmallestPairs(int[] num1, int[] num2, int k) {

Map<Integer,List<Integer> > map = new TreeMap<>();
List<List<Integer>>finalResult = new ArrayList<>();

for(int i=0;i<num1.length;i++)
{ 
    for(int j=0;j<num2.length;j++)
    {
        int sum = num1[i]+num2[j];
        List<Integer> l1 = new ArrayList<>();
        l1.add(num1[i]);
        l1.add(num2[j]);
        if(map.containsKey(sum))
        {
            List<Integer> l2 = map.get(sum);
            l2.addAll(l1);
            map.put(sum,l2);

        }
        else
        {
           map.put(sum,l1);
        }
       

    }
}

int counter=0;

for(Integer key : map.keySet())
{
    if(counter<k)
    {
       
        List<Integer> l1 = map.get(key);
        for(int i=0;i<l1.size();i+=2)
        {
            List<Integer> l2 = new ArrayList<>();
            l2.add(l1.get(i));
            l2.add(l1.get(i+1));
            finalResult.add(l2);    
            counter++;
            
        }
     
        
       
    }
    else
    {
        break;
    }
}

return finalResult;

}

The logic of your code is almost correct, but you need to consider that in your case you’re creating one pair every two numbers, but the map does not have a limit per key. This means that for each sum, you’re trying to put as many pairs as there are in the cross product of the two input arrays that sum to that key.

This can cause issues if, for instance, there are many pairs that have the same sum. In your case, the counter could increment more than once within the inner for loop because the if(counter<k) condition is checked only at the beginning of each key in your map. As a result, you might end up adding more than k pairs to your finalResult.

To fix this, you should check if you’ve added k pairs after adding each pair, not after adding all pairs of a specific sum. If counter is equal to k, you should break from both loops.

Here’s a corrected version of your code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
public List<List<Integer>> kSmallestPairs(int[] num1, int[] num2, int k) {
    Map<Integer,List<Integer>> map = new TreeMap<>();
    List<List<Integer>>finalResult = new ArrayList<>();

    for(int i=0;i<num1.length;i++) {
        for(int j=0;j<num2.length;j++) {
            int sum = num1[i]+num2[j];
            List<Integer> l1 = new ArrayList<>();
            l1.add(num1[i]);
            l1.add(num2[j]);
            map.putIfAbsent(sum, new ArrayList<>());
            map.get(sum).addAll(l1);
        }
    }

    int counter=0;
    outer: 
    for(Integer key : map.keySet()) {
        List<Integer> l1 = map.get(key);
        for(int i=0;i<l1.size();i+=2) {
            List<Integer> l2 = new ArrayList<>();
            l2.add(l1.get(i));
            l2.add(l1.get(i+1));
            finalResult.add(l2);
            counter++;
            if(counter==k)
                break outer;
        }
    }

    return finalResult;
}

In the corrected code, I added the outer: label and the break outer; statement to exit both loops once k pairs have been added to finalResult.

This solution can solve the problem, but it might not be the most efficient one. There’s an alternative way to solve this problem by using a priority queue, which can have a better time complexity. This approach avoids storing all pairs and only maintains the smallest pairs, which is a more efficient use of memory.

I am also curious for some test cases, k>n^2. How come that be possible? You can only form n^2 pairs from vectors of size n, this is insane.

You are correct in your observation that if you have two arrays of length n, you can only form a maximum of n^2 pairs. However, the parameter k in this problem doesn’t have to correspond to the actual number of possible pairs. k is simply the number of pairs the function is being asked to return.

For instance, if the arrays are of length 3 (n=3) so we have 3^2 = 9 possible pairs. But the function could be called with k = 20.

In such scenarios where k exceeds the number of possible pairs, the function should simply return all possible pairs, as it can’t return more pairs than exist.

So it’s not really “insane”, it’s just that in those cases k would be effectively ignored in terms of limiting the output size, since the number of possible pairs is the real limit.

`class Solution { public: vector kSmallestPairs(vector& nums1, vector& nums2, int k) { // Define a max-heap using the default greater comparator priority_queue<pair<int, pair<int, int»> maxHeap;

// Iterate over each pair (num1, num2)
for (int num1 : nums1) {
    for (int num2 : nums2) {
        // Calculate the sum of the pair (num1, num2)
        int sum = num1 + num2;

        // Add pair (sum, num1, num2) to the max-heap
        maxHeap.push(make_pair(sum, make_pair(num1, num2)));

        // If the size of max-heap exceeds K, remove the pair with the largest sum
        if (maxHeap.size() > k) {
            maxHeap.pop();
        }
    }
}

// Extract the pairs from max-heap in the reverse order to get the final result
vector<vector<int>> result;
while (!maxHeap.empty()) {
    vector<int> pair = {maxHeap.top().second.first, maxHeap.top().second.second};
    result.push_back(pair);
    maxHeap.pop();
}
reverse(result.begin(), result.end());  // Reverse to get the pairs in ascending order of sum

return result;

} }; `

Its showing TLE for a testcase, can someone help please :(

This solution is running into a time limit exceeded (TLE) error because it iterates over all possible pairs in nums1 and nums2 which has a time complexity of O(n^2) where n is the size of the arrays.

Also, for each pair, it is doing a push operation and a pop operation on the heap (when the heap size exceeds k). The time complexity of these operations is O(log(k)), so in total the time complexity will be O(n^2*log(k)). For large inputs of n and k, this could be too slow and result in a TLE.

Instead of iterating over all pairs and maintaining a heap of size k, a more efficient approach is to use a min heap and only consider potential candidates for smallest pair. A pair (i, j) is a potential candidate if at least one of its adjacent pair (i-1, j) or (i, j-1) is already in the heap.

Here is a Python solution following this approach:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
import heapq

def kSmallestPairs(nums1, nums2, k):
    queue = []
    def push(i, j):
        if i < len(nums1) and j < len(nums2):
            heapq.heappush(queue, [nums1[i] + nums2[j], i, j])
    push(0, 0)
    pairs = []
    while queue and len(pairs) < k:
        _, i, j = heapq.heappop(queue)
        pairs.append([nums1[i], nums2[j]])
        push(i, j + 1)
        if j == 0:
            push(i + 1, 0)
    return pairs

This Python solution maintains a min-heap where the key is the sum. It initializes the min-heap with the smallest pair (0, 0) and then generates the next pair from the popped pair. If the popped pair is (i, j), the next pairs could be (i+1, j) and (i, j+1). Then it keeps pushing new pairs into the min-heap until we find the k pairs with the smallest sums.

This approach guarantees that the pairs are generated in correct order. It only pushes k pairs into the min heap, thus providing the same k smallest pairs but with a better time complexity. The time complexity is O(k log k), because each insertion operation (push) takes O(log k) and we do this at most k times.

Two Pointer approach fails Consider this testcase: nums1 = [1,1,2] nums2 = [1,2,3] k = 10

In this when you maintain two pointers (i and j), and say you reach i = 2 and j = 0. So you will add the pair [nums1[i], nums2[j]] to the answer vector and then increment j.

Now i = 2 and j = 1. But before [nums1[i], nums2[j]] can make pair of [2, 2], you also need pairs of [nums1[0], nums2[1]], [nums1[1], nums2[1]] i.e. [1,2] and [1,2].

By maintaining two pointer such as i and j, you will miss out on these pairs. Thus this approach fails.

You’re correct. The two pointer approach is indeed prone to missing some valid pairs, hence it fails in certain cases.

The problem with the two pointer approach here is that it’s not enough to keep track of a single pair of indices (i, j). When we move forward in the array (either incrementing i or j), we potentially miss out on pairs that could have been formed with the elements we just passed over. This is because, for each element in nums1[i], we should consider all elements in nums2[j] and vice versa, which means we need to explore all possible pairs.

Let’s illustrate this with the example you’ve given:

nums1 = [1,1,2] nums2 = [1,2,3] k = 10

If we start with i=0, j=0, we get the pair [1,1]. Now if we move the pointer in nums2 (i.e., increment j), we get the pair [1,2]. But at the same time, we also need to consider the pair with the second element in nums1 and the first element in nums2, i.e., [1,1] (formed by nums1[1], nums2[0]). So, there’s no easy way to choose which pointer to move forward without potentially missing some valid pairs.

A more appropriate way to solve this problem is to use a priority queue (or a heap data structure) that keeps track of the next smallest pair. This way, we can systematically generate all possible pairs without missing out any. The priority queue helps us maintain the smallest pairs based on the sum of elements. As we remove the smallest pair from the heap, we then insert the next potential pairs into the heap, hence always maintaining the k smallest pairs in the heap.

In case someone else did the same, I started with two pointer approach as I thought it would be similar to the procedure followed during merging in merge sort, but after many trials I couldn’t get it work.

That’s a common initial approach to consider and it makes sense given the sorted nature of the arrays. However, the two-pointer approach does not work in this particular problem. The reason is that the two-pointer approach assumes that once a pair is processed, you will never need to consider that pair again, which is not true in this case.

In this problem, for each element in the first array, you need to consider all the elements in the second array. So for an element in nums1 at position i, you will have to pair it with all elements in nums2 at position j (where j varies from 0 to nums2.length-1).

When you just increment either of the pointers, you lose all the other potential pairs that could have been formed with the element at the position that was just incremented.

For this reason, a better approach to solving this problem would be to use a min heap/priority queue where each element in the heap is a pair consisting of the pair of numbers from nums1 and nums2, their sum, and the index from nums2.

The heap is ordered by the sum of pairs. On each step, we pop the smallest pair from the heap and add the next pair from nums1 and nums2 to the heap. This guarantees that we generate all pairs in order of their sum and don’t miss out any valid pairs.

Should the k pairs be in ascending order in the output nested vector

The problem statement does not require the k pairs to be returned in ascending order based on the sums of the pairs. You only need to return the k pairs with the smallest sums.

Is there a unique answer in all cases? what if there are k+1 pairs with the lowest possible sum?

For the problem as stated, if there are more than k pairs with the smallest sums, you only need to return k pairs, and it does not matter which ones you choose among those pairs. In other words, if there are multiple valid solutions, you can return any of them.

To illustrate, let’s take an example:

nums1 = [1,2,3], nums2 = [1,2,3], and k = 2. There are three pairs with the smallest sum (2): [1,1], [2,2], and [3,3]. Any two pairs out of these three would form a valid solution.

However, if the problem required you to return the pairs in lexicographical order, or some other specified order, then the solution would be unique. In such a case, you would need to return the pairs in that specified order.

Why to use terms like “non-decreasing” when you cant use it properly, here the arrays are non-decreasing and not increasing.

The terms “non-decreasing” and “increasing” are actually not identical in the realm of mathematics and computer science.

When we say an array is “increasing”, we are implying that each subsequent element is strictly larger than the previous one. For example, [1, 2, 3] is an increasing array. However, [1, 1, 2, 3] is not considered as an increasing array because the second element isn’t strictly larger than the first.

On the other hand, when we say an array is “non-decreasing”, it means each subsequent element is either larger than or equal to the previous one. So, [1, 1, 2, 3] is a non-decreasing array.

The problem statement uses the term “non-decreasing” to make it clear that the input arrays can contain repeated elements. In a strictly “increasing” array, you wouldn’t be able to have repeating elements, but in a “non-decreasing” array, repeating elements are allowed.

For anyone who gets a TLE using heap/priority queue, you need to store the sum into the heap besides the indices. It works though I have no idea why I should do it.

The reason storing the sum in the heap can improve performance is because it allows the heap operations to execute more efficiently.

A priority queue, or heap, works by keeping its elements partially sorted. The time complexity of adding an element, or “pushing” onto the heap, is O(log N), where N is the number of elements in the heap. This is because the heap needs to maintain its partially sorted property when a new element is added, and doing so requires log N operations.

However, if you were to calculate the sum each time you perform a heap operation (like pushing or popping), you would be adding extra time to each operation. Calculating the sum isn’t a big operation by itself, but when it’s done for each heap operation, it can add up.

By storing the sum in the heap instead, you only need to calculate it once, when you first create the pair of elements. This means that the heap operations can run at their optimal speed, because they no longer have to wait for the sum to be calculated each time.

This is a good illustration of a tradeoff between time and space in computer programming. By using a bit more space (to store the sum), you can save a significant amount of time.

i still havent figured out the solution. It seems like the trciky. i have some questions

Why we start filling the heap from the first list ? and why we go only untill k in case that the k is smaller than the length of the first list ? Why we only increment j ? beacuse its clear as the arrays are sorted and when we start from the first list i, i will never changed so plus few questions, i still dont have a clear vision of the algorithm, when i see this problem first, i thouhgt about two pointers approach, but i couldnt solve with them. Is anyone can explain to these questions.

  1. Why do we start filling the heap from the first list?

We need to start somewhere and because both the lists are sorted, it makes sense to start from the smallest possible sum. The smallest possible sum would be the sum of the first elements of both lists, since they are sorted in ascending order.

  1. Why do we go only until k in case that the k is smaller than the length of the first list?

We only need to find k pairs, not all possible pairs. Once we have found k pairs with the smallest sums, we can stop our process. This is an optimization to avoid unnecessary computation.

  1. Why do we only increment j?

We start by considering pairs consisting of the first element of the first array (nums1[0]) and each element in the second array. We push these into the priority queue/heap. Then, we pop off the heap (giving us the smallest current pair), and push the next pair into the heap, which could either be the next element from nums1 with the current element from nums2, or the next element from nums2 with the current nums1 element.

However, because we’re already considering all pairs with nums1[0] and each nums2[j], we only need to add the pair nums1[i + 1] and nums2[j] (where i is the current nums1 index and j is the current nums2 index) to the heap. This is why it appears we’re only incrementing j.

In general, this algorithm maintains a min-heap of pairs, where the “priority” is the sum. It starts with the smallest possible pair (nums1[0], nums2[0]) and “generates” the next potential smallest pairs, adding them to the heap. The generation process ensures each pair is added exactly once in the heap.

This problem can indeed be a bit tricky to grasp at first due to its heap-based approach.

Here’s how I understand this solution.

Think about the possible pairs in the question:

(nums1[0], nums2[0]), (nums1[0], nums2[1]), (nums1[0], nums2[2]), … (nums1[1], nums2[0]), (nums1[1], nums2[1]), (nums1[1], nums2[2]), … (nums1[2], nums2[0]), (nums1[2], nums2[1]), (nums1[2], nums2[2]), … . . . Since both nums1 and nums2 are sorted, we know each row and col must also be sorted in increasing sum value of the pairs.

This is now essentially a multiple-way sort just like the question 378.Kth Smallest Element in a Sorted Matrix.

Yes, you’re absolutely correct! You’ve hit the nail on the head.

Indeed, both the problems “Kth Smallest Element in a Sorted Matrix” and “Find K Pairs with Smallest Sums” have very similar patterns in their solutions. In both cases, the problem boils down to a “K-th smallest” problem on a sorted 2D grid.

In the “Find K Pairs with Smallest Sums” problem, the grid represents all possible pairs of elements from the two input arrays, with the sum of each pair as the value. And like you said, due to the sorted nature of the input arrays, this grid is also sorted row-wise and column-wise. This makes the problem an instance of the “K-th smallest” problem where we are required to find the K-th smallest element (pair, in this case) from this sorted grid.

The typical solution for such problems involves using a priority queue or heap data structure, allowing us to efficiently extract the smallest element (pair) each time and update our heap with the next smallest candidates (either the next element in the current row or the first element of the next row).

In this way, your understanding of transforming the problem to a more abstract “K-th smallest” problem is spot on and very insightful. This is indeed a common technique in solving such kind of problems.

The above solution is essentially a multiple-way merge sort See the hand drawing below:

A multiple-way merge sort is typically used to merge multiple sorted lists into one sorted list. In the context of this problem, it can be applied by visualizing each combination of the first list element with all the elements of the second list as separate sorted lists.

For example, consider the two lists: nums1 = [1,2,3] nums2 = [4,5,6]

We can consider the pairs with nums1[0], nums1[1] and nums1[2] as three separate lists, which are:

Pairs with nums1[0]: [(1,4), (1,5), (1,6)] Pairs with nums1[1]: [(2,4), (2,5), (2,6)] Pairs with nums1[2]: [(3,4), (3,5), (3,6)]

Each of these lists is sorted. Now we can apply a multiple-way merge sort. We take the smallest element from each list and put them in a heap. Then we pop out the smallest one (which is the heap top), push the next element from the same list into the heap, and so on, until we have found the k smallest pairs.

This process is similar to the standard merge step in a merge sort, but applied to multiple (in this case, three) lists. This is why it’s referred to as a “multiple-way” merge sort.

I hope this explanation helps clarify the concept. If you need more help, feel free to ask!