Detect Anagrams

excerpt: This covers the building block, Word Signature tags: word-signature

A word w is an anagram of a word v if a permutation of the letters transforming w into v exists. Given a set of n words of length at most k, detect all possible anagrams.

Input

below the car is a rat drinking cider and bending its elbow while this thing is an arc that can act like a cat which cried during the night caused by pain in its bowel

Output

{bowel below elbow}, {arc car}, {night thing}, {cried cider}, {act cat}

Solution

Compute a signature for each word, so that two words are anagrams of each other if and only if they have the same signature. This signature is simply a new string made up of the same letters sorted in lexicographical order. The data structure used is a dictionary associating with each signature the list of words with this signature.

Implementation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
def anagrams(strings)
  # Map string s to list of words with signature t
  # Set default value of all keys to empty array
  map = Hash.new { |h, k| h[k] = [] }
  
  # group words according to the signature
  for word in strings
    # Calculate the signature
    signature = word.chars.sort.join
		# We want to store only unique words for a given signature
		# Otherwise we will have: 'eht' => ['the', 'the']
    if map[signature].include?(word)
      map[signature] = [word]
    else
      map[signature] << word
    end 
  end
	
  # extract anagrams, ingoring anagram groups of size 1
	# We want to ignore entries like this: 'eht' => ['the']
  map.values.select{|w| w.size > 1}
end

strings = %w{below the car is a rat drinking cider and bending its elbow while this thing is an arc that can act like a cat which cried during the night caused by pain in its bowel}

p anagrams(strings)

This prints:

[["below", "elbow", "bowel"], ["car", "arc"], ["cider", "cried"], ["thing", "night"], ["act", "cat"]]

Complexity Analysis

The time complexity is O(n*k logk) on average and O(n2k log k) in the worst case, due to the use of a dictionary.