Saturday, August 18, 2007

Word ladder in Haskell

Another followup to my earlier post about searching for word ladders, this time using Haskell!

This blog post is a Haskell program, written using the "literate comment" convention.

First a bit of front matter. This implementation of the word-ladder search will use the State and list monads.


> module Main where
> import Control.Monad.State
> import Data.Char
> import Data.List (find)
> import Data.Set (Set, member, difference)
> import qualified Data.Set as Set
> import System.Environment (getArgs)
> import System.Exit

The idea is simple: read the dictionary, search for the desired ladder, and show it to the user:


> main :: IO ()
> main = do
>   (start, goal, dict) <- getArgs >>= parse
>   fullDictionary <- readDictionary dict
>   print $ search start goal (trim fullDictionary start)

Remember that getArgs is an action that returns the list of command-line arguments. We bind this action to the following:


>   where parse [start,goal,dict] = return (start,goal,dict)
>         parse [start,goal]      = return (start,goal,"/usr/dict/words")
>         parse _ =
>           putStrLn "Usage: ladder start goal [ dictionary ]" >>
>           exitWith (ExitFailure 1)

Haskell's pattern matching shows that the program takes two or three arguments. The first two are the start and goal words. The optional third argument is the path to a dictionary (one word per line) to use.

We condition the dictionary by eliminating words whose lengths differ from the length of the start word and also converting everything to lowercase.


>         trim :: [String] -> String -> [String]
>         trim words start = filter (sameLength start) (lc words)
>
>         sameLength start = (== length start) . length
>
>         lc = map (map toLower)

The search can fail, so result is of type Maybe [String]. Handling both cases is straightforward:


>         print Nothing = putStrLn "No ladder found."
>         print (Just a) = mapM_ putStrLn a

The dictionary's format is simple, so reading it is a matter of extracting the lines from the file:


> readDictionary :: FilePath -> IO [String]
> readDictionary path = liftM lines $ readFile path

Now for the fun bits. Imagine a graph where nodes are words from the dictionary and where edges are between words that are "one hop' from each other, i.e., words that could be on consecutive "rungs" of a ladder.

Beginning with the start word, the program performs a breadth-first search of this graph. We call the set of words reached in the most recent iteration the "fringe." When the fringe contains the goal word, we're done.

The state monad simulates destructive update in imperative programming languages. (Haskell is purely functional.) Without it, we'd have to explicitly thread the state value through the call chain, but with it, we retrieve and update the state value with get and put as below:


> search start goal words =
>   evalState (loop [[start]]) (Set.fromList $ filter (/=start) words)
>   where
>     loop :: [[String]] -> State (Set String) (Maybe [String])
>     loop [] = return Nothing
>     loop paths = do
>       next <- step paths
>       let newFringe = fringe next
>       words <- get
>       put $ words `difference` newFringe
>       if goal `member` newFringe
>         then return $ Just (winner next)
>         else loop next

The list monad is handy for representing nondeterministic computations. In concept at least, the search carries around a list of lists that has all of the partial results computed so far.

For example, if the start word is dog, the state value on the second iteration might be [["dog", "dig"], ["dog", "fog"], ["dog", "bog"]]. This approach might seems to be a memory pig, but it remains surprisingly frugal.

To proceed to the next iteration of the search, for each partial result (one ladder beginning with the start word) we find the as-yet unseen neighbors of its last element (a member of the current fringe) and replace the current partial result with new ones for each of the neighbors. Again, consider the partial results at the second iteration in the previous paragraph.


>     step :: [[String]] -> State (Set String) [[String]]
>     step paths = do
>       words <- get
>       return $ paths >>= augment words
>
>     augment :: Set String -> [String] -> [[String]]
>     augment words path = [ path ++ [n] | n <- ns ]
>       where ns = Set.elems $ neighbors (last path) words
>     
>     neighbors :: String -> Set String -> Set String
>     neighbors word words = Set.filter (oneHop word) words
>       where oneHop [] [] = False
>             oneHop (x:xs) (y:ys) | x /= y = xs == ys
>                                  | otherwise = oneHop xs ys

As described above, the fringe is the set of words at the ends of the partial ladders computed so far:

>     fringe :: [[String]] -> Set String
>     fringe paths = Set.fromList (map last paths)

Once we've seen the goal in the fringe, we return the ladder that ends with the goal word:

>     winner :: [[String]] -> [String]
>     winner paths =
>       case (find ((== goal) . last) paths) of
>         Nothing -> undefined
>         Just a -> a

Saturday, August 04, 2007

Word ladder in Python

In an earlier post, I described an implementation in Common Lisp of a breadth-first search to find word ladders.

This time I practiced the kata using Python. Python's list comprehensions help to make the solution concise, but apparently the lunch isn't free. For example, I could have written one_hop as

    def one_hop(a, b):
      return len([aa for aa, bb in zip(a, b) if aa != bb]) == 1

but that resulted in about a twenty percent slowdown.

The code falls out pretty easily:

#! /usr/bin/env python

"""Usage: %(prog)s start-word goal-word [ dictionary-path ]
"""

import sys

prog = sys.argv[0]

def read_words(path):
  words = []

  try:
    f = open(path, "r")
  except IOError, (errno, error):
    sys.stderr.write("%s: open %s: %s\n" % (prog, path, error))
    sys.exit(1)

  for word in f:
    words.append(word[:-1].lower())

  return words

def unpack_args(args):
  dict = "/usr/dict/words"

  if len(args) < 2 or len(args) > 3:
    sys.stderr.write(__doc__ % globals())
    sys.exit(1)

  start, goal = args[0:2]
  if len(args) == 3:
    dict = args[2]

  return (start, goal, dict)

def one_hop(a, b):
#  return len([aa for aa,bb in zip(a, b) if aa != bb]) == 1
  hops = 0
  for aa, bb in zip(a, b):
    if aa != bb:
      hops += 1

  return hops == 1

def rungs(start, goal, begat):
  path = [goal]
  while path[-1] != start:
    path.append(begat[path[-1]])
  path.reverse()

  return path

def ladder(start, goal, dict):
  if len(start) != len(goal):
    return None

  words = read_words(dict)
  candidates = set([w for w in words if len(w) == len(start)])

  start = start.lower()
  goal  = goal.lower()

  begat = {}

  last = [start]
  while len(last) > 0:
    fringe = []

    for w in last:
      neighbors = [n for n in candidates if one_hop(n, w)]

      for n in neighbors:
        begat[n] = w
        candidates.remove(n)

      if goal in neighbors:
        return rungs(start, goal, begat)
      else:
        fringe.extend(neighbors)

    last = fringe
  else:
    return None

def main(args):
  start, goal, dict = unpack_args(args)

  path = ladder(start, goal, dict)
  if path is None:
    print "%s: no path from '%s' to '%s'" % (prog, start, goal)
  else:
    for w, i in zip(path, range(1, len(path) + 1)):
      print "%3d. %s" % (i, w)

  return 0

if __name__ == "__main__":
    sys.exit(main(sys.argv[1:]))