Tries

excerpt: This covers the implementation of insert operation in a Trie.

The word trie comes from retrieval. Trie is a tree that holds strings. Each internal node represents a single letter. The leaf nodes may represent more than one letter. A path from the root to a leaf node corresponds to a string.

A partial path from the root to an internal node forms a prefix for longer strings, so tries are sometimes called prefix trees. A path that represents a key string has an associated value. The diagram shows a trie that holds the keys and values shown in table.

Consider the path from the root to the node E. The nodes visited correspond to the letters W, A, N and E, so that node represents the key WANE. That key’s value is 29. For another example, consider the path to the node T. The nodes visited correspond to the letters W, A, N and T, so the node represents the key WANT. That key’s value is 36.

The path to the node N forms the string WAN, which is a prefix of both WANE and WANT. The leaf node may represent more than one letter. In this example, the node ISP represents three letters. The path from the root to that node represents the key WISP and has value 72.

A new key to a trie can be added by using the key’s letters to follow the appropriate path through the trie. If the leaf node is reached and the key has still more letters that are not represented by the current path, a new child node can be added to represent the rest of the key.

For example, to add the key WANTED to the trie, follow the path through the nodes W, A, N and T. The key still has the letters ED, so a new node is added to hold them. The diagram shows the new trie.

Sometimes the key is found early when adding a new key to the trie. For example, the trie shown in the diagram already has the nodes needed to represent the key WAN. In that case, just add a value to the appropriate node as shown in the diagram.

Add an Item to Trie

The Trie will have one root. The root will not store any character. It is going to have pointers. The TrieNode has metadata (has links to other trie nodes). Array can store links to trie nodes. The is_end flag is initialized to false.

We can use HashMap as an alternative to using an array. The key will be the character to be stored on the value will be TrieNode. The link attribute will link a character and a reference to a trie node.

Take each letter from the word. For the first character start with root. For subsequent characters operate on the TrieNode you just added to the root.

Trie is also known as the prefix tree. We can efficiently match prefixes. Why trie? It uses less space than hashtable. The outline of the solution for implementing the insert operation in a Trie:

  1. How to create a root?
    • The size of the links is 26 (will hold the characters)
  2. Create empty links and define a end_of_word boolean in the constructor.
  3. For the insert operation: Traverse the characters in the word Pick the first character Check if it is already in the root in the links held by the root If it is not: Create a new trie node instance and set it’s character value to the first letter Outside the loop, mark the end_of_word = true for the last character

The code shows how an item can be added to a trie.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
class TrieNode
  attr_reader :is_end, :children
    
  def initialize    
    @is_end = false
    @children = []
  end

  def set_word
    @is_end = true
  end  
end

class Trie
  attr_reader :root
    
  def initialize
    @root = TrieNode.new  
  end

  def insert(word)      
    node = root
    word.chars.each_with_index do |c, index|        
      children = node.children
      index = word[0].ord - 'a'.ord
      node = TrieNode.new
      if children[index].nil?
        children[index] = node
      end
      if (index == word.size - 1)
        node.set_word
      end
    end
  end
end

Each level has its own children array (possible to have 26 characters). In the first level, we calculate index for the first letter, store a trienode at that level. In the second level: retrieve the children array for this level and calculate the index for the second character and store a trienode. We need to consider two cases:

Case 1

  • Nothing in that index, put the node in that space

Case 2

  • If the space is not available, don’t add any trie node