Contain Virus

Identifying Problem Isomorphism

“Contain Virus” can be mapped to “Surrounded Regions”.

“Contain Virus” involves a grid where each cell can be a wall, an uninfected area, or an infected area. The goal is to build walls to prevent the virus from spreading to the maximum number of cells. We are to return the total number of walls required.

In “Surrounded Regions”, the grid contains ‘X’ and ‘O’. Any ‘O’ that isn’t surrounded by ‘X’ or isn’t on the border gets flipped to ‘X’. The idea of region isolation and “spread” is common to both problems.

“Contain Virus” is more complex than “Surrounded Regions”. While “Surrounded Regions” only requires a basic search algorithm (DFS/BFS) to find the regions and check if they’re surrounded, “Contain Virus” requires not only finding the infected regions but also deciding where to build the walls, which introduces an additional layer of complexity.

10 Prerequisite LeetCode Problems

“Contain Virus” is a graph-based problem involving concepts such as Depth-First Search (DFS), Breadth-First Search (BFS), and some elements of Greedy Algorithm. The following problems are simpler as good preparation:

200. Number of Islands: This is a good introduction to basic DFS and BFS on a 2D grid.
994. Rotting Oranges: This problem helps you understand BFS in a grid and how to handle time steps.
733. Flood Fill: This problem provides practice for DFS on a grid.
130. Surrounded Regions: This problem is a more complicated application of DFS on a 2D grid.
542. 01 Matrix: This problem requires you to perform BFS from multiple sources at the same time, a useful technique for the “Contain Virus” problem.
207. Course Schedule: It provides practice for topological sorting and detecting cycles in a graph.
417. Pacific Atlantic Water Flow: This problem requires BFS or DFS from multiple sources, similar to the “Contain Virus” problem.
785. Is Graph Bipartite?: This problem is good for practicing graph coloring, a technique that can be used to mark different regions in the “Contain Virus” problem.
1020. Number of Enclaves: This problem also requires DFS on a grid and has a similar theme of isolating regions.
934. Shortest Bridge: This problem will require you to use DFS to identify a region, and then BFS to find the shortest distance to another region. This could be quite helpful for the “Contain Virus” problem.

Clarification Questions

What are the clarification questions we can ask about this problem?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
from typing import List
from queue import PriorityQueue

class Cluster:
    def __init__(self):
        self.c = set()  # Set of cells that are contaminated
        self.uc = set() # Set of cells that are next to this cluster and are uncontaminated
        self.wcnt = 0  # Walls required to contain this cluster

    def __lt__(self, other):
        return len(self.uc) < len(other.uc)

class Solution:
    def dfs(self, grid: List[List[int]], i: int, j: int, vis: List[List[bool]], cl: Cluster) -> None:
        if (i == len(grid) or i < 0 or j < 0 or j == len(grid[0]) or vis[i][j] or grid[i][j] == -1):
            return
        if grid[i][j] == 0:
            cl.wcnt += 1
            cl.uc.add((i,j))
            return
        cl.c.add((i,j))
        vis[i][j] = True
        self.dfs(grid, i+1, j, vis, cl)
        self.dfs(grid, i-1, j, vis, cl)
        self.dfs(grid, i, j+1, vis, cl)
        self.dfs(grid, i, j-1, vis, cl)

    def containVirus(self, grid: List[List[int]]) -> int:
        n = len(grid)
        m = len(grid[0])
        ans = 0

        while True:
            vis = [[False]*m for _ in range(n)]
            pq = PriorityQueue()

            for i in range(n):
                for j in range(m):
                    if not vis[i][j] and grid[i][j] == 1:
                        cl = Cluster()
                        self.dfs(grid, i, j, vis, cl)
                        pq.put((-len(cl.uc), cl))

            if pq.qsize() == 0:
                break

            _, cl = pq.get()
            for x in cl.c:
                grid[x[0]][x[1]] = -1
            ans += cl.wcnt

            while pq.qsize() != 0:
                _, cl = pq.get()
                for x in cl.uc:
                    grid[x[0]][x[1]] = 1
        return ans

The Cluster class is used to represent each cluster of infected cells along with the number of walls required to contain it, and the set of uninfected cells around it. In the containVirus function, a Priority Queue is used to process the clusters with most threatening regions first (those with the most uninfected neighbors). The dfs function is used to traverse the grid recursively to find all infected cells and the uninfected cells around them.

In Python, PriorityQueue by default serves the smallest item first. So we put the negative of the length to get the largest item first. Also, we use tuple to put into queue. The first element is the priority and the second element is the actual data.

Problem Classification

This problem can be classified as a simulation problem from the domain of Graph theory and Depth-First Search.

Here are the ‘What’ components of the problem:

Input: The problem involves an m x n grid that represents the world. Each cell in this grid could either be infected with a virus (represented by 1) or be uninfected (represented by 0).
Process: Every day, a wall can be installed between any two 4-directionally adjacent cells. At night, the virus spreads to all four adjacent cells unless there’s a wall blocking it. The wall can be installed around only one region each day, specifically, the region that threatens the most uninfected cells.
Output: The problem asks us to return the number of walls used to quarantine all the infected regions. If the world will become fully infected, the output should be the number of walls used till that point.

The problem can be classified as a Graph theory problem as it involves finding a path (or block of cells) in a grid, which represents a graph. It involves determining the boundary of a region of cells, which requires traversing the grid in a way that is typical of Depth-First Search. Therefore, it is a problem in the domain of Depth-First Search as well. The simulation aspect comes from the fact that the problem is iterative with actions happening (walls built, virus spread) at each iteration (day/night).

It’s also a problem in priority queues as we need to prioritize the region that threatens the most uninfected cells each day. This involves a comparison of the areas that could be infected by different regions, which is a typical use case for priority queues.

Thought Process

This problem can be solved by combining depth-first search and a priority queue.

The depth-first search is used to identify all the infected regions in the grid. For each infected region, we calculate the number of walls required to quarantine it and the number of uninfected cells it can infect. We store this information for all regions in a priority queue where regions are ordered by the number of uninfected cells they can infect.

At each step, we select the region that can infect the most uninfected cells and quarantine it. This is done by removing the region from the priority queue and adding the number of walls required to quarantine it to our total.

This process is repeated until there are no more regions that can infect any more cells. If at some point all cells are infected, we stop the process and return the total number of walls used so far.

Language Agnostic Coding Drills

Dissect the code and identify each distinct concept:

Here are some key coding concepts that are in use:

a. Classes and Object-Oriented Programming (OOP): The code defines two classes, Cluster and Solution. The former is a data structure that encapsulates a set of related variables, whereas the latter includes methods that operate on the Cluster instances.

b. Methods and recursion: The code includes several methods, one of which (dfs) is recursive. Recursive methods are those that call themselves, either directly or indirectly.

c. Conditional statements: These are used extensively to control the flow of the program, particularly within the dfs method.

d. Data Structures: The code uses complex data structures such as lists, sets, and priority queues, as well as 2D arrays.

e. Looping Constructs: The code uses several types of loops, including for loops and while loops.

f. Type Hinting: The code uses Python’s type hinting feature to specify the expected types of function arguments and return values.

List out coding concepts or drills in order of increasing difficulty:
a. Conditional Statements: These are fundamental to any programming language and are generally one of the first things learned by new programmers.
b. Looping Constructs: While also fundamental, these can be slightly more complex due to issues such as off-by-one errors or understanding when to use which type of loop.
c. Classes and OOP: While Python makes OOP relatively simple, it can still be complex for beginners due to the need to understand concepts like self, methods, and instances.
d. Methods and recursion: Writing methods is a basic task, but understanding recursion can be difficult for beginners because it involves thinking about problems in a different way.
e. Data Structures: Using lists and sets are relatively simple, but priority queues and 2D arrays can be harder to understand.
f. Type Hinting: This is a relatively advanced concept that is not necessary for writing functional code but can make code easier to understand and debug.
Describe the problem-solving approach:

The overall problem-solving approach here is a combination of depth-first search and greedy algorithms.

Initialization of Classes: Before starting the main logic, we initialize the Cluster and Solution classes. The Cluster class is a custom data structure used to store the relevant information about each cluster of infected cells.
Depth-First Search (DFS): The main logic of the program starts by running a DFS from each cell in the grid that contains a virus and hasn’t been visited yet. This DFS is used to identify all the cells in the same cluster as the starting cell and to calculate the number of walls that would be needed to quarantine this cluster.
Priority Queue: The clusters are then added to a priority queue, sorted by the number of uninfected cells that could be infected by each cluster. This is the greedy part of the algorithm: by always dealing with the most dangerous cluster first, we can minimize the number of walls needed to quarantine all the viruses.
Building Walls and Spreading Virus: We then “build walls” around the most dangerous cluster and let the virus spread from the remaining clusters. This involves updating the grid to reflect these changes and incrementing a counter to keep track of how many walls have been built.
Iteration: This process repeats until there are no more infected cells that are not quarantined.

Each of these steps can be thought of as a separate coding drill. First, one would learn how to implement DFS. Then, one would learn how to use priority queues and how to sort objects based on a custom criterion. Finally, one would learn how to update the grid and the counter based on the current state of the simulation.

Understanding how these drills fit together to solve the problem requires an understanding of both the specific coding concepts and the high-level algorithm. The key is to realize that each cluster of infected cells can be dealt with independently, and that the best strategy is to always deal with the most dangerous cluster first. The coding drills are the tools that allow us to implement this strategy.

Targeted Drills in Python

Separate pieces of Python code for each identified concept:

a. Conditional Statements:

1
2
3
4
5
6
7
x = 10
if x > 0:
    print("x is positive")
elif x < 0:
    print("x is negative")
else:
    print("x is zero")

b. Looping Constructs:

1
2
3
4
5
6
7
for i in range(5):
    print(i)

i = 0
while i < 5:
    print(i)
    i += 1

c. Classes and OOP:

1
2
3
4
5
6
7
8
9
class MyClass:
    def __init__(self, x):
        self.x = x

    def print_x(self):
        print(self.x)

my_instance = MyClass(10)
my_instance.print_x()

d. Methods and recursion:

1
2
3
4
5
6
7
def factorial(n):
    if n == 0:
        return 1
    else:
        return n * factorial(n-1)

print(factorial(5))

e. Data Structures:

1
2
3
my_list = [1, 2, 3]
my_set = {1, 2, 3}
my_2d_array = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

f. Type Hinting:

1
2
def add_numbers(a: int, b: int) -> int:
    return a + b

Problem-specific concepts and their coding drills:

a. Depth-First Search (DFS):

This is used to explore all the cells in the same cluster starting from a certain infected cell.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
def dfs(graph, start):
    visited = []
    stack = [start]

    while stack:
        node = stack.pop()
        if node not in visited:
            visited.append(node)
            neighbours = graph[node]
            for neighbour in neighbours:
                stack.append(neighbour)
    return visited

b. Priority Queue:

This is used to always process the cluster with the most uninfected neighbours first.

1
2
3
4
5
6
7
8
import heapq
pq = []
heapq.heappush(pq, (3, 'task3'))
heapq.heappush(pq, (1, 'task1'))
heapq.heappush(pq, (2, 'task2'))
while pq:
    priority, task = heapq.heappop(pq)
    print(task)

Integration of these drills:

Now that we have coded each individual drill, let’s see how these pieces can be integrated together to solve the initial problem:

a. Classes and OOP: We start by defining our Cluster and Solution classes. Inside the Solution class, we include methods that perform DFS and solve the main problem.

b. Methods and recursion: DFS is implemented as a recursive method in the Solution class.

c. Conditional Statements: Inside our DFS and main problem-solving methods, we use conditional statements to control the flow of the program based on the current state of the grid and the clusters.

d. Looping Constructs: We use loops to iterate through the grid and process each cell. In the main problem-solving method, we use a while loop to keep processing the clusters until all viruses are quarantined.

e. Data Structures: We use lists to represent the

grid, sets to store contaminated and uncontaminated cells in each cluster, and a priority queue to process the clusters in the desired order.

f. Type Hinting: We use type hinting to make our code clearer and catch any potential type errors early.

g. DFS and Priority Queue: DFS is used to identify and process the clusters, and a priority queue is used to always quarantine the most dangerous cluster first.

As you can see, each of these drills contribute to the final solution. They help us model the problem, explore the grid, identify and process clusters, and keep track of the state of the simulation.