Thursday, October 22, 2009

While you're wondering

Hey Vols, this weekend while a delightful little tune is being sung in your honor, the more curious among your number may want to know just what a yellowhammer is. See for yourselves:
Image credit: BBC
The yellowhammer has been the state bird of Alabama since 1927. According to the Alabama Department of Archives & History,
Alabama has been known as the “Yellowhammer State” since the Civil War. The yellowhammer nickname was applied to the Confederate soldiers from Alabama when a company of young cavalry soldiers from Huntsville, under the command of Rev. D.C. Kelly, arrived at Hopkinsville, KY, where Gen. Forrest's troops were stationed. The officers and men of the Huntsville company wore fine, new uniforms, whereas the soldiers who had long been on the battlefields were dressed in faded, worn uniforms. On the sleeves, collars and coattails of the new calvary troop were bits of brilliant yellow cloth. As the company rode past Company A, Will Arnett cried out in greeting “Yellowhammer, Yellowhammer, flicker, flicker!” The greeting brought a roar of laughter from the men and from that moment the Huntsville soldiers were spoken of as the “yellowhammer company.” The term quickly spread throughout the Confederate Army and all Alabama troops were referred to unofficially as the “Yellowhammers.”

Saturday, September 19, 2009

Haskell craps

A Haskell neophyte at $WORK talked about writing a craps simulator as a learning exercise. The rules, limiting consideration to pass-line bets, are complex enough to make it an interesting kata. Designing a processor for the game's complex prop bets, on the other hand, might make a good interview discussion.

Front matter:

> module Craps ( games
>              , rolls
>              , runTests
>              , Game
>              , Roll
>              ) where

> import Data.List ((\\))
> import System.Random (randomRs,Random,RandomGen)
> import Test.QuickCheck (choose,forAll,oneof,sized,Arbitrary(..),Gen,Property)
> import Test.QuickCheck.Batch (defOpt,run,TestOptions(..))
> import qualified Test.QuickCheck.Batch as QC
Craps is played with two dice:
> data Roll = Roll Int Int
>   deriving (Show)
At the pass line, the bettor can win two ways and lose two ways. With no point, rolls of 7 or 11 win (“natural”), and rolls of 2, 3, or 12 lose (“craps”). Any other roll becomes the point, and the shooter continues until she rolls the point again (“pass” or “win”) or 7 (“seven out”).
> data Game = Natural Roll
>           | Pass [Roll]
>           | CrapOut Roll
>           | SevenOut [Roll]
>   deriving Show
To generate a lazy list of rolls, pass a random-number generator (created, for example, with newStdGen, and use as many as you need. The second case in the definition of go silences a partial-function warning (“Pattern match(es) are non-exhaustive” with ghc), viz. empty and singleton lists. We'll always have at least two elements because randomRs produces an infinite list of bounded random numbers.
> rolls :: RandomGen g => g -> [Roll]
> rolls g = go $ randomRs (1,6) g
>   where
>     go (a:b:xs) = Roll a b : go xs
>     go _ = undefined
Now that we have as many rolls as we want, let's separate them into games. For the trivial case, if you aren't rolling, you aren't playing:
> games :: [Roll] -> [Game]
> games [] = []
Before the shooter establishes a point, we watch for magic numbers:
> games (r:rs) | any (rolled r) [7,11]   = Natural r : games rs
> games (r:rs) | any (rolled r) [2,3,12] = CrapOut r : games rs
Otherwise, whatever the shooter rolled becomes the point. The game ends when the shooter rolls 7 or makes the point.
> games (pt:rs) = go rest : games rs'
>   where
This inner go is also partial. If the list of rolls is finite, every point must be resolved, either pass or seven out. Note that the roll that ends the round will be the first element of the snd of the pair we get from break, so we use pattern matching to grab it and tack it on the end of the round.
>     go xs@(final:_) = outcome $ reverse xs
>       where outcome | final `rolled` 7 = SevenOut
>                     | otherwise        = Pass
>     go _ = undefined
>     (ensuing,x:rs') = break (\r -> r `rolled` 7 || r `eq` pt) rs
>     rest = x : reverse (pt : ensuing)
rolled is a simple helper for testing whether the shooter rolled a particular number, e.g., r `rolled` 7 as seen above.
> rolled :: Roll -> Int -> Bool
> rolled r = (== total r)
Two rolls are equal if they have the same total (yes, Lispers, I should have spelled it equal):
> eq :: Roll -> Roll -> Bool
> a `eq` b = total a == total b

> total :: Roll -> Int
> total (Roll a b) = a + b
Everything below is for testing with classic QuickCheck. Earlier iterations used this Arbitrary instance, but now it's window dressing.
> instance Arbitrary Roll where
>   arbitrary = do a <- choose (1,6)
>                  b <- choose (1,6)
>                  return $ Roll a b
>   coarbitrary = undefined
vectorOf turns a generator's crank a few times. We'll use this to generate multiple non-point rolls, for example. Note the use of sequence to allow pseudo-random number generator state to update between rolls.
> vectorOf :: Int -> Gen a -> Gen [a]
> vectorOf n gs = sequence [ gs | _ <- [1..n] ]
After the come-out roll establishes a point, the difference between a win and a loss is whether the game's last roll is 7 or the point. If pass is true, we generate a winner, otherwise a loser.
> afterComeOut :: Bool -> Int -> Gen [Roll]
> afterComeOut pass n = do
>   n' <- choose (1,n)
>   pt <- oneof points
>   rs <- vectorOf n' (oneof $ noPoint pt)
>   let rollpt = mkRoll pt
>       final = if pass then rollpt else seven
>   return $ rollpt : rs ++ [final]
>   where
>     noPoint p = mayroll $ except [7,p]
>     points = map return $ except [2,3,7,11,12]
>     seven = Roll 3 4
>     except = ([2..12] \\)
Our testing strategy will be to generate games of all four types and then make sure they're correctly recognized. For example, the test for passes will use expect isPass ...
> expect :: (Game -> Bool) -> [Roll] -> Bool
> expect what = all what . games
mayroll creates a list of generators ultimately for use with oneof, e.g., mayroll [2,3,12] in the crap-out property.
> mayroll :: [Int] -> [Gen Roll]
> mayroll = map (return . mkRoll)
QuickCheck opens the throttle on the size of testcases with sized, and many connects to this hook.
> many :: [Gen Roll] -> Int -> Gen [Roll]
> many what n = do
>   n' <- choose (1,n)
>   vectorOf n' (oneof what)
mkRoll starts from a roll total and backs into the individual components. An obvious improvement would be adding choices other than 1 and 6.
> mkRoll :: Int -> Roll
> mkRoll t = Roll less (t - less)
>   where less | t <= 6    = 1
>              | otherwise = 6
Now we get to the properties that use QuickCheck's forAll to generate random test data of the appropriate class and check for the expected results.
> prop_crapOut :: Property
> prop_crapOut =
>   forAll allCraps $ expect isCrapOut
>   where isCrapOut (CrapOut _) = True
>         isCrapOut _ = False
>         allCraps = sized $ many craps
>         craps = mayroll [2,3,12]

> prop_natural :: Property
> prop_natural =
>   forAll allNats $ expect isNat
>   where isNat (Natural _) = True
>         isNat _ = False
>         allNats = sized $ many nats
>         nats = mayroll [7,11]

> prop_sevenOut :: Property
> prop_sevenOut =
>   forAll allSevenOuts $ expect is7Out
>   where is7Out (SevenOut _) = True
>         is7Out _ = False
>         allSevenOuts = sized $ afterComeOut False

> prop_pass :: Property
> prop_pass =
>   forAll allPasses $ expect isPass
>   where isPass (Pass _) = True
>         isPass _ = False
>         allPasses = sized $ afterComeOut True
Finally, a simple test driver so we don't have to check them one-by-one:
> runTests :: IO ()
> runTests = do
>   let opts = defOpt { no_of_tests = 200 }
>   QC.runTests "crap out"  opts [ run prop_crapOut ]
>   QC.runTests "natural"   opts [ run prop_natural ]
>   QC.runTests "seven out" opts [ run prop_sevenOut ]
>   QC.runTests "pass"      opts [ run prop_pass ]

Wednesday, September 16, 2009

MediaWiki Collection extension: load saved books

At work, we're using the Collection extension for MediaWiki to render PDF versions of our documentation.

Installation went smoothly. We were able to generate nice-looking PDFs for single pages, create “books” with chapters and subsections, and save the books. After we'd gone to all the trouble of organizing the books, the reader might understand our frustration in seeing no obvious way to load our books later.

After much Googling and prodding around the source, I noticed a passage at the bottom of the README:

The Wikipedia template has lots of nice chrome, but for a quick fix, edit Template:Saved_book to have the following contents:

<div align="center">
<span class="plainlinks">
[ [{{fullurl:Special:Book/load_collection/|colltitle={{FULLPAGENAMEE}}}} load book] ] &nbsp;&nbsp;
[ [{{fullurl:Special:Book/render_collection/|colltitle={{FULLPAGENAMEE}}&amp;writer=rl}} PDF] ] &nbsp;&nbsp;
[ [{{fullurl:Special:Book/render_collection/|colltitle={{FULLPAGENAMEE}}&amp;writer=odf}} OpenOffice] ] &nbsp;&nbsp;
[ [[:Category:Books|bookshelf]] &nbsp;]
</span>
</div>

This will be transcluded into your saved book pages (via {{saved_book}} at the very top) and produce handy links for loading, generating PDF, generating ODT, or browsing the rest of your saved books, as in

load  book ] [ PDF ] [ OpenOffice ] [ bookshelf ]

Note that to enable ODT option, you'll need to modify LocalSettings.php along the lines of

$wgCollectionFormats = array(
  'rl' => 'PDF',
  'odf' => 'ODT',
);

Tuesday, September 15, 2009

Don't repeat yourself!

Jose Rey demonstrates a few features of Perl 5.10, but all the nearly identical actions scream for smart matching!

# ...

my %func;
@func{qw( count   geometric_mean  harmonic_mean
          max     maxdex          mean
          median  min             mindex
          mode    sample_range    standard_deviation
          sum     trimmed_mean    variance           )} = ();

my $s = Statistics::Descriptive::Full->new();
while (1) {
    print "Listo> ";
    my $command = readline(STDIN) // last;
    $command =~ s/^\s+//; $command =~ s/\s+$//;
    given ($command) {
        when ( looks_like_number($_) ) { $s->add_data($command) }
        when (%func)                   { say "$command = " . $s->$command() }
        when (/^(exit|quit)$/)         {last}
        default                        { say SYNTAX_ERROR }
    }
}

As the smart-match table shows, $scalar ~~ %hash tests for hash-key existence. In this case, given ($command) followed by when (%func) checks whether the current command is a builtin and, when it is, invokes the method with the same name.

Monday, August 31, 2009

Finding duplicates with Perl and Haskell

A coworker wanted to check a family of log files to be sure that a given task never appeared on multiple nodes at the same time. Log entries are on single, whitespace-separated lines, and the last field records a task's start time, e.g.,
1251475056672590000_1732248586_4
Of the three underscore-separated fields, the first is a timestamp, the second we don't care about, and the third is a task identifier.

This task is straightforward with Perl. The diamond operator (or null filehandle, as described in the "I/O Operators" section of the perlop manpage) takes care of the boilerplate for iterating over the paths on the command line, opening them, and reading each line. The scalar $ARGV contains the name of the current file.

By default, split separates fields by whitespace, so (split)[-1] gives us the last field, from which we then grab the time and task with a regular expression and record its presence by pushing the entry's path and line number onto an array associated with that time/task pair. After we've processed the logs, these arrays should all be singletons.

The continue clause is a little weird but necessary because the special variable $., the current line number, does not reset on <>'s implicit opens. ARGV is a handle on the file being read.

With this data structure, detecting duplicates is a search for time/task pairs with multiple hits. We count duplicates and let the user know what we found.

#! /usr/bin/perl

use warnings;
use strict;

# e.g., $hits = @{ $seen{$time}{$task} };
my %seen;

sub num { $a <=> $b }

while (<>) {
  if ((split)[-1] =~ /^(\d+)_\d+_(\d+)$/) {
    my($time,$task) = ($1,$2);
    push @{ $seen{$time}{$task} } => "$ARGV:$.";
  }
  else {
    die "$0: $ARGV:$.: bad timestamp/task field\n";
  }
}
continue {
  close ARGV if eof;
}

my $duplicates = 0;
foreach my $time (sort num keys %seen) {
  foreach my $task (sort num keys %{ $seen{$time} }) {
    my @hits = @{ $seen{$time}{$task} };
    next if @hits == 1;

    $duplicates += @hits - 1;
    warn "$0: duplicates for time=$time, task=$task:\n",
         map "    - $_\n", @hits;
  }
}

my $s = $duplicates == 1 ? "" : "s";
print "$0: $duplicates duplicate$s detected.\n";

exit $duplicates == 0 ? 0 : 1;

For comparison, I implemented the same log checker in Haskell. The function allInputs emulates Perl's diamond operator, and instead of a multi-level hash, the association is more direct: time/task pair to a list of hits.

module Main where

import Control.Monad (liftM)
import Data.List (sort)
import Data.Map (empty,filter,fromListWith,toList,unionWith)
import Prelude hiding (filter)
import System.Environment (getArgs,getProgName)
import System.Exit (ExitCode(..),exitWith)
import Text.Printf (printf)

type Time = String
type Task = String
data Duplicates =
  Duplicates { timestamp :: Time
             , taskId    :: Task
             , locations :: [(FilePath, Int)]
             }

main :: IO ()
main = do
  logs <- allInputs
  let multi = dups logs
      n = sum $ map (subtract 1 . length . locations) multi
  mapM_ (msg . lines . dupmsg) multi
  msg $ ndups n
  exitWith $ if n == 0
               then ExitSuccess
               else ExitFailure 1
  where
    msg info = do me <- getProgName
                  putStrLn $ me ++ ": " ++ head info
                  mapM_ putStrLn (tail info)

    ndups 1 = ["1 duplicate detected"]
    ndups n = [show n ++ " duplicates detected"]

    dupmsg (Duplicates tm task ls) = unlines $
      printf "duplicates for time=%s, task=%s:" tm task :
      map (\(path,n) -> printf "    - %s:%d" path n) ls

allInputs :: IO [(FilePath, String)]
allInputs = getArgs >>= go
  where go [] = ((:[]) . (,) "-"`liftM` getContents
        go fs = mapM readFile fs >>= return . zip fs

dups :: [(FilePath, String)] -> [Duplicates]
dups = map (\((tm,task),ds) -> Duplicates tm task ds) .
       sort .
       toList .
       filter ((> 1. length) .
       foldl (unionWith (++)) empty .
       map (\(path, contents) ->
              fromListWith (++$
              map (wrap path . getTimeTask) $
              zip [1..$ lines contents)
  where
    wrap path (tm,task,n) = ((tm,task), [(path,n)])

getTimeTask :: (Int,String) -> (Time,Task,Int)
getTimeTask (n,line) = (tm,tsk,n)
  where
    [tm,_,tsk] = splitBy '_' (last $ words line)

    splitBy :: Eq a => a -> [a] -> [[a]]
    splitBy _ [] = []
    splitBy x xs = h : splitBy x t
      where (h,rest) = break (== x) xs
            t = drop 1 rest

Thursday, August 27, 2009

Parker Griffith radio interview

U.S. Representative Parker Griffith, whom I've written about before in this space, was recently on WBHP.

I like his positions on the stimulus, bailouts, cap & tax, and healthcare: all no!

I was happy to hear him speak out against the AMA, which many doctors can't do out of fear of losing their licenses. The AMA cartel is a big source of what's wrong with healthcare in America.

He was also correct that the so-called reform proposals are really "overreach" and a "power grab," nothing to do with improving health care. In a 1961 speech, Ronald Reagan warned, "One of the traditional methods of imposing statism or socialism on a people has been by way of medicine."

So far so good. But then he got to his solution.

I understand the poor guy was sleep deprived, but he fell back into the usual feel-good blah-blah-blah about making health insurance companies "play by the same rules." In real life, this will mean more federal oversight, higher prices, less competition, and therefore less consumer satisfaction.

As a physician, he ought to understand the folly of treating the symptom. The real problem is adults still believe in Santa Claus but use the code name "health insurance." Imagine if State Farm were expected to pony up every time you took your car in for an oil change or other routine maintenance. Health insurance is no such thing: it's really a sort of pre-paid entitlement program, but entirely perverse. For example, legislation requires everyone to pay the same premium in employer-sponsored group insurance. This means healthy people are overcharged, and the couch potato gets a free ride. The system subsidizes and therefore encourages poor health!

In the market, sellers compete against sellers, but buyers also compete against buyers. Throwing deep-pocketed "insurers" into the mix inevitably drives prices beyond affordable levels, especially given the way "insurance" destroys patient price-sensitivity.

I like Peter Schiff's proposal much better. Health "insurers" exist only because of aberration in the tax code. This special tax treatment is why open-market consumers can't purchase their products for reasonable prices. Schiff proposes ending this subsidy but simultaneously raising the personal exemption so as to make the change a wash for taxpayers. The true costs of health "insurance" would then be laid bare. The much cheaper option of major-medical policies would become more attractive. Insurers would have to compete with each other on price, terms, service, and so on. We'd no longer be chained to our employer-sponsored plans.

This would create enormous downward pressure on healthcare prices because people would pay for oil changes.. err, routine doctor visits out of pocket from money that they gasp saved in anticipation of such expenses. Of course when the out-of-pocket limit hits, their bona fide insurance would kick in and cover the rest. Economic arbitrage doing its work means that such a system would also benefit those who opt not to purchase insurance policies.

The jingoistic breast-beating about money versus commitment was poor cover for the fact that Uncle Sam is drowning in debt and flat-out broke. The feds can barely afford to pay their bills now. How will they afford a new program that will dwarf Social Security and Medicare?

Monday, August 24, 2009

Press hit

A good friend of mine rated a mention in a Wall Street Journal article about blatant bias in Heisman voting:
"It seems like it's always something with us," says Patrick Bobo, a Tennessee fan who contributes to a site called Third Saturday in Blogtober. "A lot of Tennessee fans say they don't want the stupid award because it's a joke."
(Back in college, he wore FSU gear but then started pulling for the Vols when they did so well under Peyton Manning, thus earning plenty of good-natured ribbing about bandwagon-hopping.)

As Darren Everson notes in "The South's Heisman Trophy Grudge," no Crimson Tide player has ever won the Heisman even though the program is one of the all-time elites in college football. Bobo pulls for the wrong team but is right on about what a joke Heisman voting is.

The name of the blog is a nod to the third Saturday in October, the annual renewal of the big Alabama-Tennessee rivalry—which happens to take place on the fourth Saturday in October of this year, but whatever. Bloggers from both sides fuel the fire with some good ol' trash-talk.

Even people outside the South know of the Alabama-Auburn rivalry, but the Vols run a very close second. A few years ago, for example, I said I'd rather beat UT than the Tigers if I had to choose. Maybe it has something to do with living in north Alabama and having to endure a higher rate of Tennessee puke-orange sightings. The inset is Daniel Moore's "Rocky Stop," which depicts Roman Harper's (#41) fumble-causing, game-saving hit in an ugly 6-3 matchup.

Congratulations, but I'm looking forward to enjoying this year's victory cigar, Bobo!