Thursday, July 09, 2009

Gyrigrams

One way to hide spoilers or off-color comments in plain sight is ROT13. The popular Usenet newsreader trn even has a builtin command to unmask text protected in this fashion.

To understand ROT13, imagine an analog clock face. Instead of the numbers one to twelve, this face has the letters A to Z. To get the secret code for any letter, find the letter on the clock face and advance 13 spots. For example, A becomes N, and X becomes K.

Implementing ROT13 is straightforward with the tr command in Unix:

tr A-Za-z N-ZA-Mn-za-m
The word anagram comes from a Greek word for shuffling letters. What about gyrigrams, pairs of words equivalent up to ROT13? (The Greek word γυρίζω means turn or return, so it indicates rotation and also the cipher's symmetry.)

This post is a literate Haskell program that will find interesting gyrigrams in a dictionary file. Copy-and-paste it into a file named gyrigram.lhs to get a runnable program.

Some front matter:

> module Main where
> import Data.Char (toLower)
> import Data.List (sort)
> import qualified Data.Map as M
> import qualified Data.Set as S
> import System.Environment (getArgs, getProgName)
> import System.Exit (ExitCode(ExitFailure), exitWith)
> import System.IO (hPutStrLn, stderr)
To run the program, either provide the path to your dictionary file as the sole command-line argument, or omit it to use /usr/share/dict/words:
> usage :: IO a
> usage = do
>   me <- getProgName
>   hPutStrLn stderr $ "Usage: " ++ me ++ " [ dictionary ]"
>   exitWith (ExitFailure 1)
The implementation of rot13 below performs a table lookup for all characters in the input. Characters outside the set [A-Za-z] pass through unchanged.
> rot13 :: String -> String
> rot13 = map $ \c -> maybe c id (M.lookup c table)
>   where table = M.fromList $ zip (uc ++ lc) (uc' ++ lc')
>         (uc,  lc)  = (['A'..'Z'], ['a'..'z'])
>         (uc', lc') = (rot uc,     rot lc)
>         rot xs = [drop,take] >>= \f -> f 13 xs
To find all gyrigrams, we stuff the input list, normalizing to lowercase, in a Set for quick lookups. Then for each word in the input, probe for its rot13 counterpart and add hits to the result. Removing matches from the dictionary prevents duplicated values. Note also that we ignore single-letter words.
> gyrigrams :: [String] -> [(String,String)]
> gyrigrams xs = go dict xs
>   where go _ [] = []
>         go d (w:ws)
>           | d `has` w' = (w,w') : go d' ws
>           | otherwise  =          go d  ws
>           where has = flip $ S.member . lc
>                 w' = rot13 w
>                 d' = foldr (S.delete . lc) d [w,w']
>         dict = S.fromList $ map lc $ filter ((>1) . length) xs
>         lc = map toLower
The main program reads the input and prints a sorted list of pairs:
> main :: IO ()
> main =
>   getPath >>= readFile >>= mapM_ (putStrLn . show') .
>                              sort . gyrigrams . lines
>   where show' (a,b) = a ++ " => " ++ b
Argument processing:
> getPath :: IO FilePath
> getPath = getArgs >>= go
>   where go [path] = return path
>         go []     = return "/usr/share/dict/words"
>         go _      = usage
One pair is especially interesting because they're both gyrigrams and synonyms: irk and vex.

2 comments:

Anonymous said...

I like "green" and "Terra" as anagrams. Also "Pres" and "Cerf" -- Vint for president?

Here's my version of the program:

import Data.Char
import Data.List
import qualified Data.Set as S

rot13 :: Char->Char
rot13 c = chr $ xa + ((ord c - xa + 13)`mod`26)
where
xa | isUpper c = ord 'A'
| otherwise = ord 'a'

rot13w :: String -> String
rot13w = map rot13

main = do
f <- readFile "/usr/share/dict/words"
let ws = [map toLower w | w <- words f, length w > 2]
(front, back) = break (>= "n") (sort ws)
anagrams = (S.fromList front
`S.intersection` S.fromList (map rot13w back))
print [(a, rot13w a) | a<-S.toList anagrams]

Anonymous said...

Well that paste didn't come out too well. Here's a pastebin:

http://hpaste.org/fastcgi/hpaste.fcgi/view?id=6784