Tuesday, November 24, 2009

GMT crontab

Dealing with time is a problem domain where everything seems like it ought to be dead-simple, but getting all the fiddly details correct is never trivial.

Below is a sketch at converting simple crontabs whose times are expressed in GMT to the host's local time. This blog post wishes it were a literate Haskell program.

In general, if you care about timezones, represent times internally in some universal format and convert times for display purposes only.

Front matter:

#! /usr/bin/perl

use warnings;
use strict;

use feature qw/ switch /;

use Time::Local qw/ timegm /;

Given a five-field job time in GMT, gmtoday returns the hour in the local timezone and the day offset. The function's name comes from its implementation, nearly always a terrible practice. It uses the time the program started ($^T), decomposes it with gmtime, substitutes the hour from cron, and goes the other direction with timegm.

Now that I think about it, this probably doesn't handle the day-of-week wraparound: Sunday is 0 and Saturday is 6, but the days are adjacent.

sub gmtoday {
  my($gmmin,$gmhr,$gmmday,$gmmon,$gmwday) = @_;

  my @gmtime = gmtime $^T;
  my(undef,undef,$hour,$mday,$mon,$year,$wday) = @gmtime;

  my @args = (
    0,  # sec
    $gmmin eq "*" ? "0" : $gmmin,
    $gmhr,
    $mday,                        
    $mon,
    $year,
  );

  my($lhour,$lwday) = (localtime timegm @args)[2,6];

  ($lhour, $lwday - $wday);
}

Given the five-field time specification from the current cronjob, localcron converts it from GMT to local time. Note that a fully general implementation would support 32 (i.e., 2 ** 5) cases.

This is a nice use of given-when, new in perl-5.10, and resembles a familiar shell idiom.

sub localcron {
  my($gmmin,$gmhr,$gmmday,$gmmon,$gmwday) = @_;

  given ("$gmmin,$gmhr,$gmmday,$gmmon,$gmwday") {
    # trivial case: no adjustment necessary
    when (/^\d+,\*,\*,\*,\*$/) {
      return ($gmmin,$gmhr,$gmmday,$gmmon,$gmwday);
    }

    # hour and maybe minute
    when (/^(\d+|\*),\d+,\*,\*,\*$/) {
      my($lhour) = gmtoday @_;
      return ($gmmin,$lhour,$gmmday,$gmmon,$gmwday);
    }

    # day of week, hour, and maybe minute
    when (/^(\d+|\*),\d+,\*,\*,\d+$/) {
      my($lhour,$wdoff) = gmtoday @_;
      return ($gmmin,$lhour,$gmmday,$gmmon,$gmwday+$wdoff);
    }

    default {
      warn "$0: unhandled case: $gmmin $gmhr $gmmday $gmmon $gmwday";
      return;
    }
  }
}

Finally, the main loop reads each line from the input and generates the appropriate output. Note that we do not throw away unhandled times: they instead appear in the output as comments.

while (<>) {
  if (/^\s*(?:#.*)?$/) {
    print;
    next;
  }

  chomp;
  my @gmcron = split " ", $_, 6;

  my $cmd = pop @gmcron;
  my @localcron = localcron @gmcron;

  if (@localcron) {
    print join(" " => @localcron), "\t", $cmd, "\n"
  }
  else {
    print "# ", $_, "\n";
  }
}

For this sorta-crontab

33  * * * * minute only
 0  0 * * * minute and hour
 0 10 * * 1 minute, hour, and wday (same day)
 0  2 * * 1 minute, hour, and wday (cross day)
the output is the following when run in the US Central timezone:
33 * * * *  minute only
0 18 * * *  minute and hour
0 4 * * 1   minute, hour, and wday (same day)
0 20 * * 0  minute, hour, and wday (cross day)

Sunday, November 22, 2009

Sweet potato casserole recipe

If you have any other recipes for this dish, throw them out. This is the last you'll ever need.
For everyone who's tired of having to set a good example for the kids and wait until after dinner to eat dessert, add this recipe to your Thanksgiving feast.

Ingredients:

  • 3 cups cooked potatoes (about 4 medium potatoes)
  • 2 eggs
  • ¼ teaspoon salt
  • 1 teaspoon vanilla
  • ½ cup milk
  • ¼ cup butter
  • 1 cup sugar

Directions

Blend all ingredients with mixer and pour into casserole dish. Combine dry topping ingredients and then add melted butter. Mix until crumbly and sprinkle over potato mixture. Bake at 350° for 30 minutes.

Topping:

  • 1 cup brown sugar
  • ¼ cup flour (all-purpose)
  • ¼ cup butter
  • 1 cup chopped pecans

From Hazel Green Elementary School cookbook with modifications by Samantha Bacon.

Thursday, October 22, 2009

While you're wondering

Hey Vols, this weekend while a delightful little tune is being sung in your honor, the more curious among your number may want to know just what a yellowhammer is. See for yourselves:
Image credit: BBC
The yellowhammer has been the state bird of Alabama since 1927. According to the Alabama Department of Archives & History,
Alabama has been known as the “Yellowhammer State” since the Civil War. The yellowhammer nickname was applied to the Confederate soldiers from Alabama when a company of young cavalry soldiers from Huntsville, under the command of Rev. D.C. Kelly, arrived at Hopkinsville, KY, where Gen. Forrest's troops were stationed. The officers and men of the Huntsville company wore fine, new uniforms, whereas the soldiers who had long been on the battlefields were dressed in faded, worn uniforms. On the sleeves, collars and coattails of the new calvary troop were bits of brilliant yellow cloth. As the company rode past Company A, Will Arnett cried out in greeting “Yellowhammer, Yellowhammer, flicker, flicker!” The greeting brought a roar of laughter from the men and from that moment the Huntsville soldiers were spoken of as the “yellowhammer company.” The term quickly spread throughout the Confederate Army and all Alabama troops were referred to unofficially as the “Yellowhammers.”

Saturday, September 19, 2009

Haskell craps

A Haskell neophyte at $WORK talked about writing a craps simulator as a learning exercise. The rules, limiting consideration to pass-line bets, are complex enough to make it an interesting kata. Designing a processor for the game's complex prop bets, on the other hand, might make a good interview discussion.

Front matter:

> module Craps ( games
>              , rolls
>              , runTests
>              , Game
>              , Roll
>              ) where

> import Data.List ((\\))
> import System.Random (randomRs,Random,RandomGen)
> import Test.QuickCheck (choose,forAll,oneof,sized,Arbitrary(..),Gen,Property)
> import Test.QuickCheck.Batch (defOpt,run,TestOptions(..))
> import qualified Test.QuickCheck.Batch as QC
Craps is played with two dice:
> data Roll = Roll Int Int
>   deriving (Show)
At the pass line, the bettor can win two ways and lose two ways. With no point, rolls of 7 or 11 win (“natural”), and rolls of 2, 3, or 12 lose (“craps”). Any other roll becomes the point, and the shooter continues until she rolls the point again (“pass” or “win”) or 7 (“seven out”).
> data Game = Natural Roll
>           | Pass [Roll]
>           | CrapOut Roll
>           | SevenOut [Roll]
>   deriving Show
To generate a lazy list of rolls, pass a random-number generator (created, for example, with newStdGen, and use as many as you need. The second case in the definition of go silences a partial-function warning (“Pattern match(es) are non-exhaustive” with ghc), viz. empty and singleton lists. We'll always have at least two elements because randomRs produces an infinite list of bounded random numbers.
> rolls :: RandomGen g => g -> [Roll]
> rolls g = go $ randomRs (1,6) g
>   where
>     go (a:b:xs) = Roll a b : go xs
>     go _ = undefined
Now that we have as many rolls as we want, let's separate them into games. For the trivial case, if you aren't rolling, you aren't playing:
> games :: [Roll] -> [Game]
> games [] = []
Before the shooter establishes a point, we watch for magic numbers:
> games (r:rs) | any (rolled r) [7,11]   = Natural r : games rs
> games (r:rs) | any (rolled r) [2,3,12] = CrapOut r : games rs
Otherwise, whatever the shooter rolled becomes the point. The game ends when the shooter rolls 7 or makes the point.
> games (pt:rs) = go rest : games rs'
>   where
This inner go is also partial. If the list of rolls is finite, every point must be resolved, either pass or seven out. Note that the roll that ends the round will be the first element of the snd of the pair we get from break, so we use pattern matching to grab it and tack it on the end of the round.
>     go xs@(final:_) = outcome $ reverse xs
>       where outcome | final `rolled` 7 = SevenOut
>                     | otherwise        = Pass
>     go _ = undefined
>     (ensuing,x:rs') = break (\r -> r `rolled` 7 || r `eq` pt) rs
>     rest = x : reverse (pt : ensuing)
rolled is a simple helper for testing whether the shooter rolled a particular number, e.g., r `rolled` 7 as seen above.
> rolled :: Roll -> Int -> Bool
> rolled r = (== total r)
Two rolls are equal if they have the same total (yes, Lispers, I should have spelled it equal):
> eq :: Roll -> Roll -> Bool
> a `eq` b = total a == total b

> total :: Roll -> Int
> total (Roll a b) = a + b
Everything below is for testing with classic QuickCheck. Earlier iterations used this Arbitrary instance, but now it's window dressing.
> instance Arbitrary Roll where
>   arbitrary = do a <- choose (1,6)
>                  b <- choose (1,6)
>                  return $ Roll a b
>   coarbitrary = undefined
vectorOf turns a generator's crank a few times. We'll use this to generate multiple non-point rolls, for example. Note the use of sequence to allow pseudo-random number generator state to update between rolls.
> vectorOf :: Int -> Gen a -> Gen [a]
> vectorOf n gs = sequence [ gs | _ <- [1..n] ]
After the come-out roll establishes a point, the difference between a win and a loss is whether the game's last roll is 7 or the point. If pass is true, we generate a winner, otherwise a loser.
> afterComeOut :: Bool -> Int -> Gen [Roll]
> afterComeOut pass n = do
>   n' <- choose (1,n)
>   pt <- oneof points
>   rs <- vectorOf n' (oneof $ noPoint pt)
>   let rollpt = mkRoll pt
>       final = if pass then rollpt else seven
>   return $ rollpt : rs ++ [final]
>   where
>     noPoint p = mayroll $ except [7,p]
>     points = map return $ except [2,3,7,11,12]
>     seven = Roll 3 4
>     except = ([2..12] \\)
Our testing strategy will be to generate games of all four types and then make sure they're correctly recognized. For example, the test for passes will use expect isPass ...
> expect :: (Game -> Bool) -> [Roll] -> Bool
> expect what = all what . games
mayroll creates a list of generators ultimately for use with oneof, e.g., mayroll [2,3,12] in the crap-out property.
> mayroll :: [Int] -> [Gen Roll]
> mayroll = map (return . mkRoll)
QuickCheck opens the throttle on the size of testcases with sized, and many connects to this hook.
> many :: [Gen Roll] -> Int -> Gen [Roll]
> many what n = do
>   n' <- choose (1,n)
>   vectorOf n' (oneof what)
mkRoll starts from a roll total and backs into the individual components. An obvious improvement would be adding choices other than 1 and 6.
> mkRoll :: Int -> Roll
> mkRoll t = Roll less (t - less)
>   where less | t <= 6    = 1
>              | otherwise = 6
Now we get to the properties that use QuickCheck's forAll to generate random test data of the appropriate class and check for the expected results.
> prop_crapOut :: Property
> prop_crapOut =
>   forAll allCraps $ expect isCrapOut
>   where isCrapOut (CrapOut _) = True
>         isCrapOut _ = False
>         allCraps = sized $ many craps
>         craps = mayroll [2,3,12]

> prop_natural :: Property
> prop_natural =
>   forAll allNats $ expect isNat
>   where isNat (Natural _) = True
>         isNat _ = False
>         allNats = sized $ many nats
>         nats = mayroll [7,11]

> prop_sevenOut :: Property
> prop_sevenOut =
>   forAll allSevenOuts $ expect is7Out
>   where is7Out (SevenOut _) = True
>         is7Out _ = False
>         allSevenOuts = sized $ afterComeOut False

> prop_pass :: Property
> prop_pass =
>   forAll allPasses $ expect isPass
>   where isPass (Pass _) = True
>         isPass _ = False
>         allPasses = sized $ afterComeOut True
Finally, a simple test driver so we don't have to check them one-by-one:
> runTests :: IO ()
> runTests = do
>   let opts = defOpt { no_of_tests = 200 }
>   QC.runTests "crap out"  opts [ run prop_crapOut ]
>   QC.runTests "natural"   opts [ run prop_natural ]
>   QC.runTests "seven out" opts [ run prop_sevenOut ]
>   QC.runTests "pass"      opts [ run prop_pass ]

Wednesday, September 16, 2009

MediaWiki Collection extension: load saved books

At work, we're using the Collection extension for MediaWiki to render PDF versions of our documentation.

Installation went smoothly. We were able to generate nice-looking PDFs for single pages, create “books” with chapters and subsections, and save the books. After we'd gone to all the trouble of organizing the books, the reader might understand our frustration in seeing no obvious way to load our books later.

After much Googling and prodding around the source, I noticed a passage at the bottom of the README:

The Wikipedia template has lots of nice chrome, but for a quick fix, edit Template:Saved_book to have the following contents:

<div align="center">
<span class="plainlinks">
[ [{{fullurl:Special:Book/load_collection/|colltitle={{FULLPAGENAMEE}}}} load book] ] &nbsp;&nbsp;
[ [{{fullurl:Special:Book/render_collection/|colltitle={{FULLPAGENAMEE}}&amp;writer=rl}} PDF] ] &nbsp;&nbsp;
[ [{{fullurl:Special:Book/render_collection/|colltitle={{FULLPAGENAMEE}}&amp;writer=odf}} OpenOffice] ] &nbsp;&nbsp;
[ [[:Category:Books|bookshelf]] &nbsp;]
</span>
</div>

This will be transcluded into your saved book pages (via {{saved_book}} at the very top) and produce handy links for loading, generating PDF, generating ODT, or browsing the rest of your saved books, as in

load  book ] [ PDF ] [ OpenOffice ] [ bookshelf ]

Note that to enable ODT option, you'll need to modify LocalSettings.php along the lines of

$wgCollectionFormats = array(
  'rl' => 'PDF',
  'odf' => 'ODT',
);

Tuesday, September 15, 2009

Don't repeat yourself!

Jose Rey demonstrates a few features of Perl 5.10, but all the nearly identical actions scream for smart matching!

# ...

my %func;
@func{qw( count   geometric_mean  harmonic_mean
          max     maxdex          mean
          median  min             mindex
          mode    sample_range    standard_deviation
          sum     trimmed_mean    variance           )} = ();

my $s = Statistics::Descriptive::Full->new();
while (1) {
    print "Listo> ";
    my $command = readline(STDIN) // last;
    $command =~ s/^\s+//; $command =~ s/\s+$//;
    given ($command) {
        when ( looks_like_number($_) ) { $s->add_data($command) }
        when (%func)                   { say "$command = " . $s->$command() }
        when (/^(exit|quit)$/)         {last}
        default                        { say SYNTAX_ERROR }
    }
}

As the smart-match table shows, $scalar ~~ %hash tests for hash-key existence. In this case, given ($command) followed by when (%func) checks whether the current command is a builtin and, when it is, invokes the method with the same name.

Monday, August 31, 2009

Finding duplicates with Perl and Haskell

A coworker wanted to check a family of log files to be sure that a given task never appeared on multiple nodes at the same time. Log entries are on single, whitespace-separated lines, and the last field records a task's start time, e.g.,
1251475056672590000_1732248586_4
Of the three underscore-separated fields, the first is a timestamp, the second we don't care about, and the third is a task identifier.

This task is straightforward with Perl. The diamond operator (or null filehandle, as described in the "I/O Operators" section of the perlop manpage) takes care of the boilerplate for iterating over the paths on the command line, opening them, and reading each line. The scalar $ARGV contains the name of the current file.

By default, split separates fields by whitespace, so (split)[-1] gives us the last field, from which we then grab the time and task with a regular expression and record its presence by pushing the entry's path and line number onto an array associated with that time/task pair. After we've processed the logs, these arrays should all be singletons.

The continue clause is a little weird but necessary because the special variable $., the current line number, does not reset on <>'s implicit opens. ARGV is a handle on the file being read.

With this data structure, detecting duplicates is a search for time/task pairs with multiple hits. We count duplicates and let the user know what we found.

#! /usr/bin/perl

use warnings;
use strict;

# e.g., $hits = @{ $seen{$time}{$task} };
my %seen;

sub num { $a <=> $b }

while (<>) {
  if ((split)[-1] =~ /^(\d+)_\d+_(\d+)$/) {
    my($time,$task) = ($1,$2);
    push @{ $seen{$time}{$task} } => "$ARGV:$.";
  }
  else {
    die "$0: $ARGV:$.: bad timestamp/task field\n";
  }
}
continue {
  close ARGV if eof;
}

my $duplicates = 0;
foreach my $time (sort num keys %seen) {
  foreach my $task (sort num keys %{ $seen{$time} }) {
    my @hits = @{ $seen{$time}{$task} };
    next if @hits == 1;

    $duplicates += @hits - 1;
    warn "$0: duplicates for time=$time, task=$task:\n",
         map "    - $_\n", @hits;
  }
}

my $s = $duplicates == 1 ? "" : "s";
print "$0: $duplicates duplicate$s detected.\n";

exit $duplicates == 0 ? 0 : 1;

For comparison, I implemented the same log checker in Haskell. The function allInputs emulates Perl's diamond operator, and instead of a multi-level hash, the association is more direct: time/task pair to a list of hits.

module Main where

import Control.Monad (liftM)
import Data.List (sort)
import Data.Map (empty,filter,fromListWith,toList,unionWith)
import Prelude hiding (filter)
import System.Environment (getArgs,getProgName)
import System.Exit (ExitCode(..),exitWith)
import Text.Printf (printf)

type Time = String
type Task = String
data Duplicates =
  Duplicates { timestamp :: Time
             , taskId    :: Task
             , locations :: [(FilePath, Int)]
             }

main :: IO ()
main = do
  logs <- allInputs
  let multi = dups logs
      n = sum $ map (subtract 1 . length . locations) multi
  mapM_ (msg . lines . dupmsg) multi
  msg $ ndups n
  exitWith $ if n == 0
               then ExitSuccess
               else ExitFailure 1
  where
    msg info = do me <- getProgName
                  putStrLn $ me ++ ": " ++ head info
                  mapM_ putStrLn (tail info)

    ndups 1 = ["1 duplicate detected"]
    ndups n = [show n ++ " duplicates detected"]

    dupmsg (Duplicates tm task ls) = unlines $
      printf "duplicates for time=%s, task=%s:" tm task :
      map (\(path,n) -> printf "    - %s:%d" path n) ls

allInputs :: IO [(FilePath, String)]
allInputs = getArgs >>= go
  where go [] = ((:[]) . (,) "-"`liftM` getContents
        go fs = mapM readFile fs >>= return . zip fs

dups :: [(FilePath, String)] -> [Duplicates]
dups = map (\((tm,task),ds) -> Duplicates tm task ds) .
       sort .
       toList .
       filter ((> 1. length) .
       foldl (unionWith (++)) empty .
       map (\(path, contents) ->
              fromListWith (++$
              map (wrap path . getTimeTask) $
              zip [1..$ lines contents)
  where
    wrap path (tm,task,n) = ((tm,task), [(path,n)])

getTimeTask :: (Int,String) -> (Time,Task,Int)
getTimeTask (n,line) = (tm,tsk,n)
  where
    [tm,_,tsk] = splitBy '_' (last $ words line)

    splitBy :: Eq a => a -> [a] -> [[a]]
    splitBy _ [] = []
    splitBy x xs = h : splitBy x t
      where (h,rest) = break (== x) xs
            t = drop 1 rest

Thursday, August 27, 2009

Parker Griffith radio interview

U.S. Representative Parker Griffith, whom I've written about before in this space, was recently on WBHP.

I like his positions on the stimulus, bailouts, cap & tax, and healthcare: all no!

I was happy to hear him speak out against the AMA, which many doctors can't do out of fear of losing their licenses. The AMA cartel is a big source of what's wrong with healthcare in America.

He was also correct that the so-called reform proposals are really "overreach" and a "power grab," nothing to do with improving health care. In a 1961 speech, Ronald Reagan warned, "One of the traditional methods of imposing statism or socialism on a people has been by way of medicine."

So far so good. But then he got to his solution.

I understand the poor guy was sleep deprived, but he fell back into the usual feel-good blah-blah-blah about making health insurance companies "play by the same rules." In real life, this will mean more federal oversight, higher prices, less competition, and therefore less consumer satisfaction.

As a physician, he ought to understand the folly of treating the symptom. The real problem is adults still believe in Santa Claus but use the code name "health insurance." Imagine if State Farm were expected to pony up every time you took your car in for an oil change or other routine maintenance. Health insurance is no such thing: it's really a sort of pre-paid entitlement program, but entirely perverse. For example, legislation requires everyone to pay the same premium in employer-sponsored group insurance. This means healthy people are overcharged, and the couch potato gets a free ride. The system subsidizes and therefore encourages poor health!

In the market, sellers compete against sellers, but buyers also compete against buyers. Throwing deep-pocketed "insurers" into the mix inevitably drives prices beyond affordable levels, especially given the way "insurance" destroys patient price-sensitivity.

I like Peter Schiff's proposal much better. Health "insurers" exist only because of aberration in the tax code. This special tax treatment is why open-market consumers can't purchase their products for reasonable prices. Schiff proposes ending this subsidy but simultaneously raising the personal exemption so as to make the change a wash for taxpayers. The true costs of health "insurance" would then be laid bare. The much cheaper option of major-medical policies would become more attractive. Insurers would have to compete with each other on price, terms, service, and so on. We'd no longer be chained to our employer-sponsored plans.

This would create enormous downward pressure on healthcare prices because people would pay for oil changes.. err, routine doctor visits out of pocket from money that they gasp saved in anticipation of such expenses. Of course when the out-of-pocket limit hits, their bona fide insurance would kick in and cover the rest. Economic arbitrage doing its work means that such a system would also benefit those who opt not to purchase insurance policies.

The jingoistic breast-beating about money versus commitment was poor cover for the fact that Uncle Sam is drowning in debt and flat-out broke. The feds can barely afford to pay their bills now. How will they afford a new program that will dwarf Social Security and Medicare?

Monday, August 24, 2009

Press hit

A good friend of mine rated a mention in a Wall Street Journal article about blatant bias in Heisman voting:
"It seems like it's always something with us," says Patrick Bobo, a Tennessee fan who contributes to a site called Third Saturday in Blogtober. "A lot of Tennessee fans say they don't want the stupid award because it's a joke."
(Back in college, he wore FSU gear but then started pulling for the Vols when they did so well under Peyton Manning, thus earning plenty of good-natured ribbing about bandwagon-hopping.)

As Darren Everson notes in "The South's Heisman Trophy Grudge," no Crimson Tide player has ever won the Heisman even though the program is one of the all-time elites in college football. Bobo pulls for the wrong team but is right on about what a joke Heisman voting is.

The name of the blog is a nod to the third Saturday in October, the annual renewal of the big Alabama-Tennessee rivalry—which happens to take place on the fourth Saturday in October of this year, but whatever. Bloggers from both sides fuel the fire with some good ol' trash-talk.

Even people outside the South know of the Alabama-Auburn rivalry, but the Vols run a very close second. A few years ago, for example, I said I'd rather beat UT than the Tigers if I had to choose. Maybe it has something to do with living in north Alabama and having to endure a higher rate of Tennessee puke-orange sightings. The inset is Daniel Moore's "Rocky Stop," which depicts Roman Harper's (#41) fumble-causing, game-saving hit in an ugly 6-3 matchup.

Congratulations, but I'm looking forward to enjoying this year's victory cigar, Bobo!

Tuesday, August 18, 2009

We have universal legal care

Wow, is it awesome too!
After his appeal had failed and before he was exonerated, Lloyd filed a complaint with the state. He told them Slameka [Lloyd's public defender] never gave him the time of day. Harlin still has a copy of Slameka's rebuttal, which she read out loud:
“This is a sick individual who raped, kidnapped and strangled a young woman on her way to school. His claim of my wrongdoing is frivolous, just as is his existence. Both should be terminated.”
When asked if he really thought his own client should be executed, Slameka said yes, that's what he wrote at the time and that's how he felt.
Read the full story at NPR.

After having 17 years of his life stolen over a horrible crime that DNA evidence showed he didn't commit, Lloyd got out but died two short years later.

His sister, Ruth Harlin, says if someone close to her needed a defense lawyer today, she would do whatever it took to pay for a lawyer.

“I'd mortgage the house,” Harlin says. “I would not let a public defender defend anyone in my family ever again. Ever, ever again.”

Universal legal care's problems—chronic underfunding, unmanageable caseloads, top talent unwilling to participate in the insanity, poor standard of care, and on and on and on—are built in despite its proponents' good intentions.

Economics 101: Incentives Matter!

Sunday, August 16, 2009

Simple analogy for lazy evaluation

In my talk on Friday, I was explaining Haskell's laziness using the following example:
evens (_:x:xs) = x : evens xs
I used promises and thunks to show that the recursive application of evens xs is not evaluated immediately and that we could define the unbounded list of all positive even numbers with evens [1..] without running off into an infinite loop. Some in the mostly imperative-minded audience were still confused.

Someone asked, “Is it kind of like a recipe?” Again my wife's oatmeal cream pies found a fun place in Haskell and this time provided a helpful analogy!

In our pantry we have ingredients such as oatmeal, sugar, and so on. She has her trusty recipe. As long as we provide ingredients and are willing to wait, we can get as many oatmeal cream pies as we want.

The analogy is imperfect. Diminishing marginal utility doesn't weigh on a CPU's execution of programs, but smiling, appreciative faces and delighted exclamations of "Mmm!"—although powerful—may ultimately need to be augmented with shoe-shopping gift certificates to keep the party going.

עִמָּנוּאֵל

For a child will be born to us, a son will be given to us;
  And the government will rest on His shoulders;
  And His name will be called Wonderful Counselor, Mighty God,
  Eternal Father, Prince of Peace.

Isaiah 9:6

Monday, August 10, 2009

10Log10 Tech Seminar: Haskell

Come sharpen your skills with a beginner-friendly introduction to Haskell, the advanced functional language that will completely rewire your brain—for the better! I'll give an overview of the language, using the type system to prevent bugs, automatic property-based testing with QuickCheck, and how to improve your procedural or object-oriented code by borrowing from the functional paradigm.

Join us on Friday, August 14, from 11:30am to 1pm in dB's large conference room:

deciBel Research, Inc.
1525 Perimeter Parkway, Suite 500
Huntsville, Alabama 35806
Phone: 256-716-0787
via Google Maps
RSVP to margie at dbresearch dot net. This is a lunchtime event, so feel free to brownbag it!

Saturday, August 08, 2009

Dangerous

His speech was smooth as butter,
    yet war was in his heart;
his words were softer than oil,
    yet they were drawn swords.

Psalm 55:21

Friday, August 07, 2009

git: shrinking Subversion import

At $WORK, we've been attempting for years—but fairly infrequently—to do distributed development with centralized Subversion. We finally had enough and decided to move to git.

Part of that move involved importing a couple of projects with 6+ years of history. Early revisions carried lots of binary test data, so git svn clone produced repositories weighing in at 3.5 and 4.5 gigabytes.

Another less than satisfactory result was the umpteen bazillion git branches corresponding to git tags. Some of the git branches formed families with names of the form name-x.y.z@1234, where name-x.y.z is the name of a Subversion release tag and 1234 was a Subversion revision that modified the tag. A happy design choice made the branch name-x.y.z (with no @nnn) the head revision of that Subversion tag, so we easily picked off some targets:

$ git branch -r -D `git branch -r | grep @`
Cribbing from svn2git, converting the git branches to git tags was a series of commands of the form
$ git checkout 1.2.3
$ git tag -a -m "Tagging release 1.2.3" v1.2.3
$ git branch -r -D 1.2.3
Then to make the Subversion trunk the git master branch:
$ git branch -D master
$ git checkout trunk
$ git checkout -f -b master
$ git branch -r -D trunk
Here's a good point to checkpoint your work in case you hose something later.

Using Antony Stubbs's script to find the biggest objects in a repo's packs, we determined that much of the bulk came from huge HDF5-format test baselines along with a few others. So we cut them out:

$ git filter-branch -d /dev/shm/scratch --index-filter \
  "git rm --cached -f --ignore-unmatch '*.h5'; \
   git rm --cached -f --ignore-unmatch '*.sig'; \
   git rm --cached -f --ignore-unmatch '*.2dsc'" \
  --tag-name-filter cat -- --all
The use of --index-filter makes the long process (remember, it has to hit all possible revisions) quicker because it operates directly on the index rather than checking out every snapshot, munging the filesystem, and shoving the new snapshot back in. Also, /dev/shm is a tmpfs mount for better throughput, and the directory named with -d shouldn't exist.

The git filter-branch manpage has a checklist for shrinking a repository that recommends running filter-branch and then cloning to leave behind the cruft.

Cloning with a filesystem path makes hardlinks, so use a URL:

$ git clone file:///home/gbacon/src/dBTools.git
Even after doing this, some big unnamed blobs had survived the clone. Thanks to #git on freenode for the suggestion to excise the reflog:
$ git reflog expire --verbose --expire=0 --all
$ git gc --prune=0
Note that these options will require a fairly recent git.

After all these steps, the git repositories were went from gigabytes to 75 and 100 megabytes, much nicer!

Monday, August 03, 2009

Friday, July 31, 2009

Mrs. B's tasty cooking

I was enjoying a wonderful homemade lunch today and tweeted, “Sam's chicken pot pie == delicious.”

Over on Facebook, someone commented

if (sam's oatmeal cream pies = delicious)
    goto "mmm mmmm yummy!"
else
    try another one
end
Her oatmeal cream pies are great, but her taco salad will make you forget your troubles.

Geeky, yes—“the geekiest thing ANYONE has ever done” was someone else's rejoinder—but I responded with

mapM_ putStrLn $
samsOatmealCreamPies >>=
(>> return "mmm mmmm yummy!") . guard . delicious
Assimilation may be complete.

Monday, July 27, 2009

Simple FitNesse example with CSlim

Tinkering with the FitNesse acceptance testing system, I cloned the CSlim repo, but its README was short on detail about getting the thing to talk to FitNesse.

Attempting to build (on Ubuntu karmic) out of the box failed:

$ make
compiling ListExecutor.c
compiling SlimConnectionHandler.c
compiling SlimList.c
compiling SlimListDeserializer.c
compiling SlimListSerializer.c
compiling SlimUtil.c
compiling StatementExecutor.c
compiling SymbolTable.c
compiling SocketServer.c
compiling TcpComLink.c
Building archive lib/libCSlim.a
ar: lib/libCSlim.a: No such file or directory
make: *** [lib/libCSlim.a] Error 1
The workaround is straightforward: mkdir lib followed by make.

So now what?

$ ./CSlim_server --help
getaddrinfo: Servname not supported for ai_socktype
Following the recommendation in the SLiM docs, I created a page called CslimTest (by editing FrontPage to contain a new line with CslimTest) that contained merely
!define TEST_SYSTEM {slim}
After I clicked the Test button, the page cleared, paused for a few seconds, and then gave a red box with "Testing was interupted and results are incomplete." Clicking Output Captured, I saw a stacktrace that ended with java.lang.ClassNotFoundException: fitnesse.slim.SlimService.

Maybe I could crib from the docs for another server. The FitNesse download page points to servers for various languages, and from there I arrived at a Getting Started page for fitSharp, a C# server for FitNesse, which had the following config:

!define TEST_SYSTEM {slim}
!path c:\myfolder\mytest.dll
!define COMMAND_PATTERN {%m -r fitSharp.Slim.Service.Runner,c:\program files\fitsharp\fitsharp.dll %p}
!define TEST_RUNNER {c:\program files\fitsharp\Runner.exe}
So next I try
!define TEST_SYSTEM {slim}
!define TEST_RUNNER {/home/gbacon/src/cslim/CSlim_server}
I got a similar failure, but this time the class that failed to load was .home.gbacon.src.cslim.CSlim_server.

Even though the fitSharp used !path for test assemblies, Drew suggested pointing it to CSlim_server as in

!define TEST_SYSTEM {slim}
!path /home/gbacon/src/cslim/CSlim_server
But the test still died with ClassNotFoundException.

I monkeyed more with COMMAND_PATTERN, and finally got a quick error using

!define TEST_SYSTEM {slim}
!path /home/gbacon/src/cslim/CSlim_server
!define COMMAND_PATTERN {%m %p}
Instead of Output Captured, I see Tests Executed OK. I notice in src/Main/DecisionTableExample.c there's a Division fixture that seems to match the example in the two-minute example, so I copy-and-paste to get
!define TEST_SYSTEM {slim}
!path /home/gbacon/src/cslim/CSlim_server
!define COMMAND_PATTERN {%m %p}

|eg.Division|
|setNumerator|setDenominator|Quotient?|
|10          |2             |5        |
|12.6        |3             |4.2      |
|100         |4             |33       |
Red box still, and the tests aren't running. Not much help in the output log, but FitNesse complained: "Cannot run program "fitnesse.slim.SlimService".

Remove %m from COMMAND_PATTERN. Progress! Nothing's running, but I see a bunch of exceptions and text with yellow backgrounds, such as "eg.Division Could not find class eg.Division." Maybe it doesn't like the leading eg.

!define TEST_SYSTEM {slim}
!path /home/gbacon/src/cslim/CSlim_server
!define COMMAND_PATTERN {%p}

|Division|
|setNumerator|setDenominator|Quotient?|
|10          |2             |5        |
|12.6        |3             |4.2      |
|100         |4             |33       |
Now Division goes green, but it doesn't like the input methods ("Method setSetNumerator[1] not found in Division." for example). The method names in the fixture are setNumerator, setDenominator, and Quotient, so mangling must be happening somewhere.

That's what I get for trying to think ahead of it:

!define TEST_SYSTEM {slim}
!define COMMAND_PATTERN {%p}
!path /home/gbacon/src/cslim/CSlim_server

|Division|
|numerator|denominator|Quotient?|
|10       |2          |5        |
|12.6     |3          |4.2      |
|100      |4          |33       |
Now the first two rows pass, and the third fails as expected.

Wednesday, July 22, 2009

Hosed at the gas pump — by your debit card

Watch out for this gotcha when using a debit card at the gas pump where funds in your checking account can be "reserved" for days after making gasoline purchases.

My wife and I load Wal-Mart gift cards with our gas budget, which we then use to buy gas at Sam's. This gives us the best of both worlds: the wisdom of the envelope system with the convenience of plastic.

Sunday, July 19, 2009

First thing Monday morning!

Pennsylvania state employees' most recent paychecks were some 30% short, and legislators are waving the threat of goose eggs on the next round. Flat-broke California has already been paying in IOUs, but banks are starting to balk at their worthless paper. Many other states were slow to make good on refunds due taxpayers.

Now turn the tables. We all know people who've been there, and some have been so unfortunate as to experience it personally. The taxman demands payment, and pleas of hardship or budget problems—the same appeals California and Pennsylvania are making now—find no mercy.

“If you don't pay up, we'll garnish your wages.” Revenue agents may even threaten property seizure or jail time.

Government jobs are widely considered to be the among the safest. Civil servants tend to make less money, but the perception of increased security is part of the compensation. GEICO, the Government Employees Insurance Company, was founded with a business model limiting policies to government employees on the bet that they're less likely to be risk takers.

These tough times where even governments can't make payroll help us all appreciate the importance of having an emergency fund as cushion against these sort of blows. How well could you absorb the hits that folks in California and Pennsylvania have taken? Personal-finance advisor Dave Ramsey recommends setting aside 3 to 6 months of expenses at Baby Step 3. But where's the extra money to sock away supposed to come from?

Which brings me to my tip. How big were your income-tax refunds last year, state and federal? Big refunds are not money from heaven: you just paid too much in taxes. Using refunds as a savings program is a rotten plan because the government doesn't pay you interest on your generous loan to them. Now states are earning slow-pay credit ratings, so don't leave your state or the feds owing you money you may need.

First thing Monday morning go to your accounting or human resources department and ask for a new W-4 form to adjust your federal withholding and the appropriate form for your state withholding. The tables on these forms are conservative and will cause you to overpay. TurboTax, for example, has a withholding calculator that gets you close to zero, not much owed either way. Use your most recent paystub and last year's returns to fill in the blanks and get your new withholdings. Easy-peasy.

The point is more money in each paycheck. By increasing the allowances you claim on these forms, less money will be deducted every payday as prepayment on next year's tax bill. So instead of sending that money to the IRS and then waiting—and hoping—for them to send it back next April, keep your hard-earned money. Think of it as getting your refund early. Use it to bulk up your emergency fund, increase 401k or IRA contributions, invest in your professional development, or whatever purpose you see fit.

If you have questions or concerns, be sure to consult with your tax professional.