A Problem

I think I have a Bash problem.  What follows is an actual command from my history.

cat /usr/share/dict/words | fgrep -v "'" | perl -ne 'chomp($_); @b=split(//,$_); print join("", sort(@b))." ".$_."\n";' | tee lookup.txt | perl -pe 's/^([^ ]+) .*/\1/g' | awk '{ print length, $0 }' | sort -n | awk '{$1=""; print $0}' | uniq -c | sort -nr | egrep "^[^0-9]+2 " | awk '{ print length, $0 }' | sort -n | awk '{$1=""; print $0}' | perl -pe 's/[ 0-9]//g' | xargs -i grep {} lookup.txt | perl -pe 's/[^ ]+ //g' | tail -n2

It’s just so hard to bite the bullet, admit that the problem has grown in scope, and move it to its own Perl/Python script.  (P.S. The Guinness Book is wrong.  “Conservationalists” is not a real word.)

Edit: to those who are competing in the comments to improve (shorten) the above command: when pasting code, use the <code> tag to override WordPress quote formatting.

Joey Comeau has a new book out based on Overqualified, which has long been one of my favorite things on the internet.  He writes cover letters to companies.  They each sound businesslike enough for the first paragraph or so, and then you gradually realize you are reading something that is in no way a normal cover letter.  An excerpt from one to Nintendo:

We need a new Mario game, where you rescue the princess in the first ten minutes, and for the rest of the game you try and push down that sick feeling in your stomach that she’s “damaged goods”, a concept detailed again and again in the profoundly sex negative instruction booklet, and when Luigi makes a crack about her and Bowser, you break his nose and immediately regret it. When Peach asks you, in the quiet of her mushroom castle bedroom “do you still love me?” you pretend to be asleep. You press the A button rhythmically, to control your breath, keep it even.

#2 (NeoPost), #28 (Phone surveys) and #58 (MySpace) are three of my favorites.

133 thoughts on “A Problem

  1. @Katie:

    I believe you intended
    cowsay -n "STACK OVERFLOW" | cowsay -n | cowsay -n | cowsay -n

  2. Many months later, this discussion came up in another discussion, and so I came back to look at it again, and saw Randall’s response to me. So, belatedly, I clarify: that “47″ was actually a “backslash 0 4 7″, which is the pcre octal for the single quote; the “backslash 0″ part apparently got eaten. If you put that back in, it works (I believe) as Randall intended.

  3. This should do it in Haskell (composed at the interactive prompt, much easier to play with than Bash), unless I got the problem description wrong:


    import Data.Ord
    import Data.List

    main = interact (unlines . map show . anagrams . lines)

    anagrams = reverse
    . sortBy (comparing (length.head))
    . filter (not.null.tail)
    . groupBy (\a b -> sort a == sort b)

    Save as anagrams.hs and cat /usr/share/dict/words | runhaskell anagrams.hs

  4. oh a couple of weeks late.. oh well:

    perl -l15 -ne ‘push @{$x{lc join”",sort split//}},$_;END{map{print join(” “,length,@{$x{$_}}).”\n”}sort{length($a)<=>length($b)}grep{@{$x{$_}}>1}keys%x;}’</usr/share/dict/words|tail

    perl -e’print map{@{$x{$_}}>1&&join(” “,length,@{$x{$_}}).”\n”}sort{length($a)<=>length($b)}map{chomp;push@{$x{$k=lc join”",sort split//}},$_;@{$x{$k}}<2&&$k}<>;’</usr/share/dict/words|tail

  5. Here’s my variant: getting rid of almost all the Perl and doing the rest with Unixy programs… only thing left that Perl is required for is sorting letters within a string, which I couldn’t figure out how to do simply :)

    </usr/share/dict/words grep -v \' | perl -ne 'chomp;print length."\t$_\t".(join "", sort split "")."\n"' | sort -k 3 | uniq -Df 2 | sort -srnk 1 | head -n 20 | cut -f 2

  6. Not going into the code, but depending on what you are trying to accomplish other word lists may be helpful. For instance, this page has links to the allowable Scrabble word lists (depending on location and intent—for instance, use the TWL2 is the current US Tournament Word List; OSPD is the Official Scrabble Players Dictionary, which is essentially the TWL2 with possibly offensive words removed for family players):

    http://www.zyzzyva.net/wordlists.shtml

    I used similar command lines to debunk an email forwarded from a friend a while ago. Just put it up on my (fresh, still default-themed) blog here:

    http://www.iliveinashinyworld.com/content/debunking-word-trivia-email

  7. @ Matt Hickford

    fsck is just filesystem check :) mtab is metatab, I’d pronounce usr as user, even though it actually means ‘unix system resources’.

  8. Pingback: War On Pants — Archiving Pidgin Logs by Year

  9. This article is about the video game character. For the platform game series featuring the character, see Super Mario (series). For the franchise featuring the character, see Mario (franchise). For other uses, see Mario (disambiguation).

  10. this is corect formula
    perl -pe ‘s/^([^ ]+) .*/\1/g’ | awk ‘{ print length, $0 }’ | sort -n | awk ‘{$1=”"; print $0}’ | uniq -c | sort -nr | egrep “^[^0-9]+2 ” | awk ‘{ print length, $0 }’ | sort -n | awk ‘{$1=”"; print $0}’ | perl -pe ‘s/[ 0-9]//g’ | xargs -i grep {} lookup.txt | perl -pe ‘s/[^ ]+ //g’ | tail -n2

  11. You should proud to your self for having able to write down some really wonderful tips and hints. Great articles, I think it would be a good asset.

  12. Eso es un blog muy calidad. Gran contenido y un diseño limpio. Por favor, mantenga la creación de este tipo de grandes mensajes, estoy seguro de que todos los usuarios los encontraría tan valioso como lo hice.

  13. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>