LFU Cache

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
import collections

class Node:
    def __init__(self, key, val):
        self.key = key
        self.val = val
        self.freq = 1
        self.prev = self.next = None

class DLinkedList:
    """ An implementation of doubly linked list.

	Two APIs provided:

    append(node): append the node to the head of the linked list.
    pop(node=None): remove the referenced node. 
                    If None is given, remove the one from tail, which is the least recently used.

    Both operation, apparently, are in O(1) complexity.
    """
    def __init__(self):
        self._sentinel = Node(None, None) # dummy node
        self._sentinel.next = self._sentinel.prev = self._sentinel
        self._size = 0
    
    def __len__(self):
        return self._size
    
    def append(self, node):
        node.next = self._sentinel.next
        node.prev = self._sentinel
        node.next.prev = node
        self._sentinel.next = node
        self._size += 1

    def pop(self, node=None):
        if self._size == 0:
            return

        if not node:
            node = self._sentinel.prev

        node.prev.next = node.next
        node.next.prev = node.prev
        self._size -= 1

        return node

class LFUCache:
    def __init__(self, capacity):
        """
        :type capacity: int

        Three things to maintain:

        1. a dict, named as `self._node`, for the reference of all nodes given key.
           That is, O(1) time to retrieve node given a key.

        2. Each frequency has a doubly linked list, store in `self._freq`, where key
           is the frequency, and value is an object of `DLinkedList`

        3. The min frequency through all nodes. We can maintain this in O(1) time, taking
           advantage of the fact that the frequency can only increment by 1. Use the following two rules:

           Rule 1: Whenever we see the size of the DLinkedList of current min frequency is 0,
                   the min frequency must increment by 1.

           Rule 2: Whenever put in a new (key, value), the min frequency must 1 (the new node)

        """
        self._size = 0
        self._capacity = capacity

        self._node = dict() # key: Node
        self._freq = collections.defaultdict(DLinkedList)
        self._minfreq = 0

    def _update(self, node):
        """ 
        This is a helper function that used in the following two cases:

            1. when `get(key)` is called; and
            2. when `put(key, value)` is called and the key exists.

        The common point of these two cases is that:

            1. no new node comes in, and
            2. the node is visited one more times -> node.freq changed -> 
               thus the place of this node will change

        The logic of this function is:

            1. pop the node from the old DLinkedList (with freq `f`)
            2. append the node to new DLinkedList (with freq `f+1`)
            3. if old DlinkedList has size 0 and self._minfreq is `f`,
               update self._minfreq to `f+1`

        All of the above opeartions took O(1) time.
        """
        freq = node.freq

        self._freq[freq].pop(node)
        if self._minfreq == freq and not self._freq[freq]:
            self._minfreq += 1

        node.freq += 1
        freq = node.freq
        self._freq[freq].append(node)

    def get(self, key):
        """
        Through checking self._node[key], we can get the node in O(1) time.
        Just performs self._update, then we can return the value of node.

        :type key: int
        :rtype: int
        """
        if key not in self._node:
            return -1

        node = self._node[key]
        self._update(node)
        return node.val

    def put(self, key, value):
        """
        If `key` already exists in self._node, we do the same operations as `get`, except
        updating the node.val to new value.

        Otherwise, the following logic will be performed

        1. if the cache reaches its capacity, pop the least frequently used item. (*)
        2. add new node to self._node
        3. add new node to the DLinkedList with frequency 1
        4. reset self._minfreq to 1

        (*) How to pop the least frequently used item? Two facts:

        1. we maintain the self._minfreq, the minimum possible frequency in cache.
        2. All cache with the same frequency are stored as a DLinkedList, with
           recently used order (Always append at head)

        Consequence? ==> The tail of the DLinkedList with self._minfreq is the least
                         recently used one, pop it...

        :type key: int
        :type value: int
        :rtype: void
        """
        if self._capacity == 0:
            return

        if key in self._node:
            node = self._node[key]
            self._update(node)
            node.val = value
        else:
            if self._size == self._capacity:
                node = self._freq[self._minfreq].pop()
                del self._node[node.key]
                self._size -= 1

            node = Node(key, value)
            self._node[key] = node
            self._freq[1].append(node)
            self._minfreq = 1
            self._size += 1

Identifying Problem Isomorphism

“LFU Cache” involves designing a cache system with a specific eviction policy, namely, least frequently used (LFU). The main crux of the problem lies in effectively managing the cache with capacity constraints and updating the frequency of use for each entry.

A simpler problem sharing this theme is “LRU Cache”. In this problem, the cache system design is similar, but the eviction policy is least recently used (LRU), which is simpler to implement compared to LFU. Here, entries need to be updated based on recency of use, rather than frequency.

A more complex problem is “Design In-Memory File System”. This problem requires design of a more complex data structure (a file system), maintaining different types of entries (files and directories) and implementing various operations.

The reasoning behind these choices:

  • “LRU Cache” is simpler than “LFU Cache” as it only requires tracking of the recency of usage, which can be achieved with a doubly linked list and a hash map.
  • “Design In-Memory File System” is more complex because it involves design of multiple interacting parts (files, directories, and the operations on them), making it a step further in complexity.

From simpler to more complex:

  1. “LRU Cache”
  2. “LFU Cache”
  3. “Design In-Memory File System”

This classification is based on the complexity of the data structure that needs to be designed and the operations that need to be supported. As always, this mapping is an approximation, and actual difficulty can vary based on individual understanding and familiarity with the concepts involved.

10 Prerequisite LeetCode Problems

“LFU Cache” involves data structure design and implementation. Specifically, hash maps, doubly-linked lists, and caching algorithms.

Here are 10 problems to build these foundational skills:

  1. “Two Sum” (LeetCode Problem #1): This problem introduces you to the use of hash maps for efficient lookups.

  2. “LRU Cache” (LeetCode Problem #146): This problem will help you understand the LRU caching mechanism, which is a simpler version of LFU.

  3. “Design Doubly Linked List” (LeetCode Problem #707): This problem requires you to design a basic doubly-linked list, which is an essential part of the LFU cache.

  4. “Implement Stack using Queues” (LeetCode Problem #225): This problem involves implementing a basic data structure using another, similar to how you might implement an LFU cache using a hash map and a doubly-linked list.

  5. “Valid Anagram” (LeetCode Problem #242): This problem further practices the use of hash maps to count elements.

  6. “Contains Duplicate” (LeetCode Problem #217): This problem provides practice in using hash maps to track the existence of items.

  7. “Design HashMap” (LeetCode Problem #706): This problem requires you to design a basic hash map, which is an essential part of the LFU cache.

  8. “Design Circular Queue” (LeetCode Problem #622): This problem introduces you to the design of more complex data structures, which is a good step up towards the LFU cache.

  9. “Design Linked List” (LeetCode Problem #707): This problem has you design a simple linked list. Understanding linked lists is crucial for the LFU cache problem, as doubly-linked lists are a kind of linked list.

  10. “Design Circular Deque” (LeetCode Problem #641): This problem involves the design of a more complex linked list structure, which is a great stepping stone towards the LFU cache problem.

Design and implement a data structure for a Least Frequently Used (LFU) cache.

Implement the LFUCache class:

LFUCache(int capacity) Initializes the object with the capacity of the data structure. int get(int key) Gets the value of the key if the key exists in the cache. Otherwise, returns -1. void put(int key, int value) Update the value of the key if present, or inserts the key if not already present. When the cache reaches its capacity, it should invalidate and remove the least frequently used key before inserting a new item. For this problem, when there is a tie (i.e., two or more keys with the same frequency), the least recently used key would be invalidated. To determine the least frequently used key, a use counter is maintained for each key in the cache. The key with the smallest use counter is the least frequently used key.

When a key is first inserted into the cache, its use counter is set to 1 (due to the put operation). The use counter for a key in the cache is incremented either a get or put operation is called on it.

The functions get and put must each run in O(1) average time complexity.

Example 1:

Input [“LFUCache”, “put”, “put”, “get”, “put”, “get”, “get”, “put”, “get”, “get”, “get”] [[2], [1, 1], [2, 2], [1], [3, 3], [2], [3], [4, 4], [1], [3], [4]] Output [null, null, null, 1, null, -1, 3, null, -1, 3, 4]

Explanation // cnt(x) = the use counter for key x // cache=[] will show the last used order for tiebreakers (leftmost element is most recent) LFUCache lfu = new LFUCache(2); lfu.put(1, 1); // cache=[1,_], cnt(1)=1 lfu.put(2, 2); // cache=[2,1], cnt(2)=1, cnt(1)=1 lfu.get(1); // return 1 // cache=[1,2], cnt(2)=1, cnt(1)=2 lfu.put(3, 3); // 2 is the LFU key because cnt(2)=1 is the smallest, invalidate 2. // cache=[3,1], cnt(3)=1, cnt(1)=2 lfu.get(2); // return -1 (not found) lfu.get(3); // return 3 // cache=[3,1], cnt(3)=2, cnt(1)=2 lfu.put(4, 4); // Both 1 and 3 have the same cnt, but 1 is LRU, invalidate 1. // cache=[4,3], cnt(4)=1, cnt(3)=2 lfu.get(1); // return -1 (not found) lfu.get(3); // return 3 // cache=[3,4], cnt(4)=1, cnt(3)=3 lfu.get(4); // return 4 // cache=[4,3], cnt(4)=2, cnt(3)=3

Constraints:

1 <= capacity <= 104 0 <= key <= 105 0 <= value <= 109 At most 2 * 105 calls will be made to get and put.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
from collections import deque, defaultdict, OrderedDict

class Node:
    def __init__(self, key, value):
        self.key = key
        self.value = value
        self.freq = 1

class LFUCache:

    def __init__(self, capacity: int):
        self.capacity = capacity
        self.minFreq = 0
        self.keyToNode = dict()
        self.freqToList = defaultdict(deque)
        self.freqToKey = defaultdict(set)

    def get(self, key: int) -> int:
        if key not in self.keyToNode:
            return -1
        
        node = self.keyToNode[key]
        self.touch(node)
        return node.value

    def put(self, key: int, value: int) -> None:
        if self.capacity == 0:
            return
        
        if key in self.keyToNode:
            node = self.keyToNode[key]
            node.value = value
            self.touch(node)
            return
        
        if len(self.keyToNode) == self.capacity:
            keyToEvict = self.freqToList[self.minFreq].pop()
            self.freqToKey[self.minFreq].remove(keyToEvict)
            del self.keyToNode[keyToEvict]
        
        self.minFreq = 1
        self.freqToList[1].appendleft(key)
        self.freqToKey[1].add(key)
        self.keyToNode[key] = Node(key, value)
        

    def touch(self, node):
        prevFreq = node.freq
        newFreq = node.freq + 1
        self.freqToList[prevFreq].remove(node.key)
        self.freqToKey[prevFreq].remove(node.key)
        
        if len(self.freqToList[prevFreq]) == 0:
            del self.freqToList[prevFreq]
            if prevFreq == self.minFreq:
                self.minFreq += 1
        
        if newFreq not in self.freqToList:
            self.freqToList[newFreq] = deque()
            self.freqToKey[newFreq] = set()
            
        self.freqToList[newFreq].appendleft(node.key)
        self.freqToKey[newFreq].add(node.key)
        node.freq = newFreq

LFU Cache - The problem involves designing a cache that evicts the least frequently used item when it’s full. This is a Cache Design Problem.

Language Agnostic Coding Drills

The problem is to implement an LFU (Least Frequently Used) Cache, which requires some knowledge in data structure and algorithms. Here is the break down of the code:

  1. Understanding Classes and Objects: In Object-Oriented Programming (OOP), objects are instances of classes. Here, there are two classes - Node and LFUCache. The Node class represents a single node with a key, value, and frequency. The LFUCache class is the actual cache with several methods to manage the nodes (get, put, touch).

  2. Using Built-in Collections: The code uses Python’s built-in collections like deque, defaultdict, and OrderedDict. deque is used for appending and removing elements efficiently, defaultdict is used for automatically initializing missing keys, and OrderedDict keeps track of the order in which keys are added.

  3. Implementing LFU Cache: The idea is to have the least frequently used cache system where each operation (get, put) should be as efficient as possible. The get method retrieves a value if it exists, and put method inserts a value. When the cache is at full capacity, the least frequently used items are removed.

  4. Touch Method: This method updates the frequency of a node when it is accessed (either through get or put). If a frequency list becomes empty after removing a node, it is deleted from the frequency list.

  5. Managing Frequencies: For each frequency, the program maintains a list of keys with that frequency, and also keeps track of the minimum frequency to efficiently evict the least frequently used item when necessary.

As a learning path, you’d want to first get comfortable with classes and objects in OOP, then dive into Python’s collections module. After that, understand the logic behind LFU Cache, its implementation, and how each of its operations work. Finally, review the touch method and the management of frequencies.

To solve this problem, start by understanding what an LFU Cache is and how it works. Then, initialize your LFUCache with the necessary data structures. Implement the get and put methods, making sure to update the nodes’ frequencies and remove the least frequently used item when the cache is full. Finally, make sure your touch method correctly updates a node’s frequency and maintains the frequency lists.

Targeted Drills in Python

1. Understanding Classes and Objects:

Python Drill: Implement a simple Book class, where each book object has a title and author. Implement a method get_info that returns a string of the book’s title and author.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
class Book:
    def __init__(self, title, author):
        self.title = title
        self.author = author
    
    def get_info(self):
        return f"'{self.title}' by {self.author}"

book1 = Book("1984", "George Orwell")
print(book1.get_info())

2. Using Built-in Collections:

Python Drill: Implement a program using deque to append values to both ends and pop values from both ends.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
from collections import deque

d = deque()

d.append(1)
d.appendleft(2)
print(d)  # deque([2, 1])

d.pop()
d.popleft()
print(d)  # deque([])

Python Drill: Use defaultdict to count the frequency of characters in a string.

1
2
3
4
5
6
7
8
9
from collections import defaultdict

def char_frequency(s):
    freq = defaultdict(int)
    for char in s:
        freq[char] += 1
    return freq

print(char_frequency("hello world"))

3. Managing Frequencies

Python Drill: Implement a frequency counter for a list of numbers using dictionary.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def frequency_counter(nums):
    freq = {}
    for num in nums:
        if num in freq:
            freq[num] += 1
        else:
            freq[num] = 1
    return freq

print(frequency_counter([1, 1, 2, 3, 3, 3, 4, 4, 4, 4]))

4. Implementing LFU Cache Operations

Python Drill: Create a class MyCache with methods get and put. get retrieves a value from the cache and put inserts a value into the cache. If the cache is at its limit, evict the least recently used item.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
class MyCache:
    def __init__(self, capacity):
        self.capacity = capacity
        self.cache = OrderedDict()
        
    def get(self, key):
        if key in self.cache:
            value = self.cache.pop(key)
            self.cache[key] = value
            return value
        else:
            return -1
        
    def put(self, key, value):
        if key in self.cache:
            self.cache.pop(key)
        elif len(self.cache) == self.capacity:
            self.cache.popitem(last=False)
        self.cache[key] = value

my_cache = MyCache(2)
my_cache.put(1, 1)
my_cache.put(2, 2)
print(my_cache.get(1))  # returns 1
my_cache.put(3, 3)  # evicts key 2
print(my_cache.get(2))  # returns -1 (not found)

The problem-specific drills would be creating a Node class and managing frequencies. The Node class is integral to how the LFU Cache is implemented and managing frequencies is key to the LFU Cache’s operations. These concepts are covered in the LFU Cache operations drill.