How to Write a Spelling Corrector in Ruby

Peter Norvig wrote a simple spelling corrector in 20 lines of Python 2.5,
so I thought I’d see what it looks like in Ruby. Here are some areas I’m not pleased with:

  1. List comprehensions in Python made the edits1 function more elegant IMO.
  2. The boolean expression in the correct function evaluates empty sets/arrays as false in Python but not in Ruby, so I had to add the “result.empty? ? nil : result” expression to several functions. I expect there’s a better way to handle this also.

Otherwise, the translation was pretty straightforward.

Here’s a link to Norvig’s page:
http://www.norvig.com/spell-correct.html

That page includes a link to a text file that I saved locally as
holmes.txt: http://www.norvig.com/holmes.txt

def words text
  text.downcase.scan(/[a-z]+/)
end

def train features
  model = Hash.new(1)
  features.each {|f| model[f] += 1 }
  return model
end

NWORDS = train(words(File.new('holmes.txt').read))
LETTERS = ("a".."z").to_a.join

def edits1 word
  n = word.length
  deletion = (0...n).collect {|i| word[0...i]+word[i+1..-1] }
  transposition = (0...n-1).collect {|i| word[0...i]+word[i+1,1]+word[i,1]+word[i+2..-1] }
  alteration = []
  n.times {|i| LETTERS.each_byte {|l| alteration << word[0...i]+l.chr+word[i+1..-1] } }
  insertion = []
  (n+1).times {|i| LETTERS.each_byte {|l| insertion << word[0...i]+l.chr+word[i..-1] } }
  result = deletion + transposition + alteration + insertion
  result.empty? ? nil : result
end

def known_edits2 word
  result = []
  edits1(word).each {|e1| edits1(e1).each {|e2| result << e2 if NWORDS.has_key?(e2) }}
  result.empty? ? nil : result
end

def known words
  result = words.find_all {|w| NWORDS.has_key?(w) }
  result.empty? ? nil : result
end

def correct word
  (known([word]) or known(edits1(word)) or known_edits2(word) or
    [word]).max {|a,b| NWORDS[a] <=> NWORDS[b] }
end

After you’ve saved the holmes.txt file, load the code into irb and call the correct function with a string as follows:

badkins:~/sync/code/ruby$ irb
irb(main):001:0> require 'spelling_corrector.rb'
=> true
irb(main):002:0> correct "whree"
=> "where"

Tags: ,

  1. Antonio Cangiano’s avatar

    Brian, is it okay with you if I include your code in the Ruby Benchmark Suite? It’s an open source project that’s released under the MIT license. Your file would therefore be released as MIT as well, carry a copyright line with your name, and a link back to this post.

  2. Brian Adkins’s avatar

    @Antonio – sure, as long as the copyright is retained.

  3. Ray Pereda’s avatar

    I know both Python and Ruby, and IMHO your Ruby edition is much more readable. As pretty as a typical elementary number theory theorem and proof.

  4. McGee’s avatar

    for the alteration (and insertion) you could also use this syntax, for consistency:

    alteration = (0…n).collect { |i| (0…26).collect { |l| word[0...i] + LETTERS[l].chr+word[i+1..-1] } }.flatten

  5. Mark Wilden’s avatar

    I think this is a bit more Rubyish:

    !result.empty? && result

  6. Michael Dvorkin’s avatar

    result.any? && result

    Also, result = words.find_all {|w| NWORDS.has_key?(w) } can be rewritten as:

    result = words & HWORDS.keys

    which is faster and removes duplicates.

  7. alejandrodumas’s avatar

    thanks for the ruby version, remember that in the
    train method you don’t need to put return model as in
    other languages.