Optimal Insertion Cost in a Dynamic Sorted Array

Given an integer array instructions, you are asked to create a sorted array from the elements in instructions. You start with an empty container nums. For each element from left to right in instructions, insert it into nums. The cost of each insertion is the minimum of the following:

The number of elements currently in nums that are strictly less than instructions[i]. The number of elements currently in nums that are strictly greater than instructions[i]. For example, if inserting element 3 into nums = [1,2,3,5], the cost of insertion is min(2, 1) (elements 1 and 2 are less than 3, element 5 is greater than 3) and nums will become [1,2,3,3,5].

Return the total cost to insert all elements from instructions into nums. Since the answer may be large, return it modulo 109 + 7

Example 1:

Input: instructions = [1,5,6,2] Output: 1 Explanation: Begin with nums = []. Insert 1 with cost min(0, 0) = 0, now nums = [1]. Insert 5 with cost min(1, 0) = 0, now nums = [1,5]. Insert 6 with cost min(2, 0) = 0, now nums = [1,5,6]. Insert 2 with cost min(1, 2) = 1, now nums = [1,2,5,6]. The total cost is 0 + 0 + 0 + 1 = 1. Example 2:

Input: instructions = [1,2,3,6,5,4] Output: 3 Explanation: Begin with nums = []. Insert 1 with cost min(0, 0) = 0, now nums = [1]. Insert 2 with cost min(1, 0) = 0, now nums = [1,2]. Insert 3 with cost min(2, 0) = 0, now nums = [1,2,3]. Insert 6 with cost min(3, 0) = 0, now nums = [1,2,3,6]. Insert 5 with cost min(3, 1) = 1, now nums = [1,2,3,5,6]. Insert 4 with cost min(3, 2) = 2, now nums = [1,2,3,4,5,6]. The total cost is 0 + 0 + 0 + 0 + 1 + 2 = 3. Example 3:

Input: instructions = [1,3,3,3,2,4,2,1,2] Output: 4 Explanation: Begin with nums = []. Insert 1 with cost min(0, 0) = 0, now nums = [1]. Insert 3 with cost min(1, 0) = 0, now nums = [1,3]. Insert 3 with cost min(1, 0) = 0, now nums = [1,3,3]. Insert 3 with cost min(1, 0) = 0, now nums = [1,3,3,3]. Insert 2 with cost min(1, 3) = 1, now nums = [1,2,3,3,3]. Insert 4 with cost min(5, 0) = 0, now nums = [1,2,3,3,3,4]. ​​​​​​​Insert 2 with cost min(1, 4) = 1, now nums = [1,2,2,3,3,3,4]. ​​​​​​​Insert 1 with cost min(0, 6) = 0, now nums = [1,1,2,2,3,3,3,4]. ​​​​​​​Insert 2 with cost min(2, 4) = 2, now nums = [1,1,2,2,2,3,3,3,4]. The total cost is 0 + 0 + 0 + 0 + 1 + 0 + 1 + 0 + 2 = 4.

Constraints:

1 <= instructions.length <= 10^5 1 <= instructions[i] <= 10^5

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
#include <bits/stdc++.h>
using namespace std;

class SparseTable
{
    vector<vector<int>> table;

public:
    SparseTable(const vector<int>& arr)
    {
        int n = arr.size();
        int k = log2(n) + 1;
        table.assign(n, vector<int>(k));

        for (int i = 0; i < n; i++)
        {
            table[i][0] = arr[i];
        }

        for (int j = 1; (1 << j) <= n; j++)
        {
            for (int i = 0; i + (1 << j) - 1 < n; i++)
            {
                table[i][j] = table[i][j - 1] + table[i + (1 << (j - 1))][j - 1];
            }
        }
    }

    int query(int l, int r)
    {
        int sum = 0;
        for (int j = log2(r - l + 1); j >= 0; j--)
        {
            if ((1 << j) <= (r - l + 1))
            {
                sum += table[l][j];
                l += 1 << j;
            }
        }
        return sum;
    }
};

class Solution
{
public:
    int createSortedArray(vector<int> &instructions)
    {
        int _max = *max_element(instructions.begin(), instructions.end());
        vector<int> freq(_max + 1, 0);

        unsigned int cost = 0;

        SparseTable st(freq);

        for (int i = 0; i < instructions.size(); i++)
        {
            int l = st.query(0, instructions[i] - 1);
            int r = i - st.query(0, instructions[i]);

            cost += min(l, r);

            freq[instructions[i]]++;
            st = SparseTable(freq);
        }
        return cost % 1000000007;
    }
};

The Sparse Table data structure is not an ideal fit for this problem due to the following reasons:

  1. Updates Are Slow: Sparse Table is designed for static arrays, where you construct the table once and make multiple queries. Since you’re updating the frequency array in every iteration, you have to reconstruct the Sparse Table, which is inefficient.

  2. Query Time: Sparse Table query time can add up if the number of elements is large, despite its good average time complexity.

  3. Space Complexity: Sparse Table also consumes quite a bit of space.

Instead, you can use a data structure like a Balanced Binary Search Tree (BBST), for example, an Ordered Set in C++ (which is implemented using a BBST). This will allow you to maintain a dynamic sorted sequence efficiently, with logarithmic time complexity for both insertion and queries. Below is an example that demonstrates this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#include <bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
using namespace std;
using namespace __gnu_pbds;

typedef tree<int, null_type, less<int>, rb_tree_tag, tree_order_statistics_node_update> ordered_set;

int createSortedArray(vector<int> &instructions) {
    const int MOD = 1e9+7;
    ordered_set s;
    long long cost = 0;
    for(int i = 0; i < instructions.size(); ++i) {
        int less_count = s.order_of_key(instructions[i]);
        int greater_count = i - s.order_of_key(instructions[i] + 1);
        cost += min(less_count, greater_count);
        cost %= MOD;
        s.insert(instructions[i]);
    }
    return cost;
}

int main() {
    vector<int> instructions = {1,5,6,2};
    cout << createSortedArray(instructions) << endl;
    return 0;
}

In this code snippet, ordered_set acts like a set but with the ability to find the number of elements less than a given element efficiently. The createSortedArray function inserts each element in the instructions array into the ordered set while keeping track of the cost. The function returns the total cost modulo (10^9+7).