A Problem

Randall 2009-04-27 137 Comments

I think I have a Bash problem. What follows is an actual command from my history.

cat /usr/share/dict/words | fgrep -v "'" | perl -ne 'chomp($_); @b=split(//,$_); print join("", sort(@b))." ".$_."n";' | tee lookup.txt | perl -pe 's/^([^ ]+) .*/1/g' | awk '{ print length, $0 }' | sort -n | awk '{$1=""; print $0}' | uniq -c | sort -nr | egrep "^[^0-9]+2 " | awk '{ print length, $0 }' | sort -n | awk '{$1=""; print $0}' | perl -pe 's/[ 0-9]//g' | xargs -i grep {} lookup.txt | perl -pe 's/[^ ]+ //g' | tail -n2

It’s just so hard to bite the bullet, admit that the problem has grown in scope, and move it to its own Perl/Python script. (P.S. The Guinness Book is wrong. “Conservationalists” is not a real word.)

Edit: to those who are competing in the comments to improve (shorten) the above command: when pasting code, use the <code> tag to override WordPress quote formatting.

Joey Comeau has a new book out based on Overqualified, which has long been one of my favorite things on the internet. He writes cover letters to companies. They each sound businesslike enough for the first paragraph or so, and then you gradually realize you are reading something that is in no way a normal cover letter. An excerpt from one to Nintendo:

We need a new Mario game, where you rescue the princess in the first ten minutes, and for the rest of the game you try and push down that sick feeling in your stomach that she’s “damaged goods”, a concept detailed again and again in the profoundly sex negative instruction booklet, and when Luigi makes a crack about her and Bowser, you break his nose and immediately regret it. When Peach asks you, in the quiet of her mushroom castle bedroom “do you still love me?” you pretend to be asleep. You press the A button rhythmically, to control your breath, keep it even.

#2 (NeoPost), #28 (Phone surveys) and #58 (MySpace) are three of my favorites.

137 replies on “A Problem”

Tinctorius says:

2009-04-30 at 1:47 pm

tinctorius@grass:~$ ## Insert weird xkcd command here. grep: invalid option -- t Usage: grep [OPTION]... PATTERN [FILE]... Try `grep --help' for more information. grep: invalid option -- t Usage: grep [OPTION]... PATTERN [FILE]... Try `grep --help' for more information. grep: unknown directories method grep: OWeeimpr: invalid context length argument
grep: conflicting matchers specified grep: OWegiiknrtu: invalid context length argument grep: unknown directories method grep: invalid option -- W Usage: grep [OPTION]... PATTERN [FILE]... Try `grep --help' for more information. grep: unknown directories method grep: unknown directories method grep: unknown directories method grep: unknown directories method grep: Naaaddeeiilllnnnrsst: invalid context length argument
grep: unknown directories method grep: unrecognized option `--eeeeegiiiklnnnnrrsstvw' Usage: grep [OPTION]... PATTERN [FILE]... Try `grep --help' for more information. belastingverplichtingen betalingsverplichtingen

I do not know what just happened. Too many dashes in my word list (Dutch), perhaps?

LikeLike
Matt Anderson says:

2009-04-30 at 11:38 pm

import solution
Hooray for python!

LikeLike
plash says:

2009-05-01 at 12:16 am

Matt wins!

CAPTCHA: Minutes studying

LikeLike
Gorgonzola says:

2009-05-01 at 1:21 am

Meh…

LikeLike
spirov92 says:

2009-05-01 at 4:31 am

Nice! BTW what is this script supposed to do? Spewing?
Randal: you should save such long things in a shell script, you know. I wouldn’t risk losing this if something happens to ~/.bash_history .
Also, why not write the whole thing in Perl or python? I would probably write it in PHP, since it’s the only language I know well enough 🙂
reCAPTCHA: delicacy pervert 😀 nice

LikeLike
Sven Neumann says:

2009-05-01 at 4:33 am

Hi Randall, didn’t know how to reach u so I’ll drop a comment here. you GOTTA check out the Scarpar. I left a link in the “Website” field. Hope you agree 😉
Sorry, I couldn’t help but think of you when I saw that vid.

LikeLike
ruds says:

2009-05-01 at 5:25 am

The biggest performance gain for the smallest edit that I saw was to move the tail up to right after the last sort. i.e.

cat /usr/share/dict/words | fgrep -v "'" | perl -ne 'chomp($_); @b=split(//,$_); print join("", sort(@b))." ".$_."n";' | tee lookup.txt | perl -pe 's/^([^ ]+) .*/1/g' | awk '{ print length, $0 }' | sort -n | awk '{$1=""; print $0}' | uniq -c | sort -nr | egrep "^[^0-9]+2 " | awk '{ print length, $0 }' | sort -n | tail -n1 | awk '{$1=""; print $0}' | perl -pe 's/[ 0-9]//g' | xargs -i grep {} lookup.txt | perl -pe 's/[^ ]+ //g'

This prevents grepping all the patterns when all you really care about is the longest one.

PS cygwin is slow.

LikeLike
Joshua Gross says:

2009-05-02 at 3:13 am

I found the latest one about ebay awesome, and so: http://joshisgross.com/ebayclearance/

LikeLike
Erik says:

2009-05-02 at 3:46 am

It just made me remember… I wrote a python script for downloading all of the xkcd comics into the hard drive:

import urllib

def downloadImage (url):
sock = urllib.urlopen (url)
img = file (url.split(“/”)[-1], “wb”)
img.write(sock.read())
sock.close()
img.close()

def findImage (url):
sock = urllib.urlopen (url)
html = sock.read()
sock.close()
i = html.find(“http://imgs.xkcd.com/comics”)
image = “”
while html[i] != ‘”‘:
image += html[i]
i += 1
return image

for i in range (1, 576): #change this number for the actual one
if i != 404:
downloadImage(findImage(‘http://xkcd.com/%i’ % i))

Pythons is such as cool, he.
BTW, it could seems like it stucks but if you look at the path you run it you could see the images downloading.

LikeLike
Mason says:

2009-05-02 at 12:38 pm

@Aaron A: Yeah, some do, in the sense that they have a lot of extra fields in which you can put pretty much whatever you want. I think the comic is just referring to really specific genres though.

LikeLike
The Interobang Guy says:

2009-05-04 at 7:36 am

spirov92 : “I wouldn’t risk losing this if something happens to ~/.bash_history .”
Am i the only person who reads that with a menacing tone?
I can just picture the mafia bots in futurama: “hey, you best not be messing around with python, cuz that looks like it took an awful lot of work, I wouldn’t risk losing this if something happens to ~/.bash_history”

actually typing that out i just realized i pronounce ~ “home”, i think there’s something wrong with me ?

LikeLike
Matt Hickford says:

2009-05-04 at 1:53 pm

I pronounce ~ as home, / as root, we all should.

/etc as ‘et cetera’, fstab as footstab. fsck is harder.

ls is list.

LikeLike
Crimson says:

2009-05-04 at 1:58 pm

Updating xkcd every day?
I’m so thrilled. It’s great.

Thanks! You’re the best.

LikeLike
Cole Erickson says:

2009-05-04 at 11:46 pm

@Matt Hickford:

lol. Reminds me of how I pronounce HTML: hitmull, XML: ex-ihm-ull, SHTML: well. you can guess, SQL: Skewl (even though I’m well aware it’s sequel), as well as many other things like this.

LikeLike
smcwhtdtmc says:

2009-05-05 at 12:22 pm

/etc is quite clearly pronounced “et-see”. And ~ is “squiggle”. I feel so adamant about this that I’m willing to start a quasi-religious propaganda war. Any takers?

…and I will do it from emacs.

LikeLike
just kidding says:

2009-05-05 at 12:45 pm

Is this safe to run or do I need to worry about overwriting the kernel with a list of words from the dictionary?

hahaha

LikeLike
me says:

2009-05-07 at 3:47 am

Hey Randall, did ya hear? There’s gonna be a new Kindle:

If you turn it 90 degrees it will automatically switch to a 2 page side-to-side view like an original book. And it comes free with some newspaper subscriptions. Too bad we can’t get that thing in europe…

LikeLike
RiotingPacifist says:

2009-05-07 at 7:13 am

(please imagine me leading a charge while shouting nano, as my post will have much more effect that way)

How can you say ~ is “squiggle” (at least it should be tilde), that’s is like saying that ? is “three dots” and 🙂 is “colon bracket”. While i wont argue over /etc (ekd) or /usr (user) and honestly don’t care, i will see you on the battlefield regarding ~, oh and i compose all my posts in nano!!!!

LikeLike
James says:

2009-05-07 at 7:15 pm

Hi all. I didn’t quite shorten the code, but I did make it faster. 🙂

#include #include #include
#define MAXWORDLEN 30 #define MAXWORDS 1000000 struct line { char word[MAXWORDLEN]; char sorted[MAXWORDLEN]; int length; }; struct pair { char first[MAXWORDLEN]; char second[MAXWORDLEN]; int length; }; int comp_chars(const void *a, const void *b); int comp_sorted(const void *a, const void *b); int comp_len(const void *a, const void *b); int main(int argc, char *argv[]) { /* Check correct no of arguments */ if (argc != 2 && argc != 3) { printf("Usage: uniq_anag words_list number_of_pairsn"); printf("If no number given, default is one.n"); return -1; } /* Declarations */ FILE *wordlist; struct line *lines = calloc(MAXWORDS, sizeof(struct line)); int i, j; char c; int no_of_words; int no_to_return = (argc == 2 ? 1 : atoi(argv[2])); struct pair *pairs = calloc(MAXWORDS, sizeof(struct pair)); /* Open the file */ wordlist = fopen(argv[1], "r"); /* Get the words */ i = j = 0; while ((c = getc(wordlist)) != EOF) { if (c != 'n') { lines[i].word[j] = c; lines[i].sorted[j++] = c; } else { j = 0; ++i; } } no_of_words = i; /* Order the letters in the second copy */ for (i = 0; i < no_of_words; ++i) { lines[i].length = strlen(lines[i].word); qsort(lines[i].sorted, lines[i].length, sizeof(char), comp_chars); } /* Sort the lines by the sorted words */ qsort(lines, no_of_words, sizeof(struct line), comp_sorted); /* Look for pairs */ char to_comp[2][MAXWORDLEN]; strcpy(to_comp[0], lines[0].sorted); int matches = 0; j = 0; for (i = 1; i < no_of_words; ++i) { strcpy(to_comp[i%2], lines[i].sorted); if (strcmp(to_comp[0],to_comp[1]) == 0) matches++; else { if (matches == 1) { strcpy(pairs[j].first, lines[i-2].word); strcpy(pairs[j].second, lines[i-1].word); pairs[j++].length = lines[i-1].length; } matches = 0; } } int no_of_pairs = j; /* Sort the pairs we've found */ qsort(pairs, no_of_pairs, sizeof(struct pair), comp_len); /* Print results */ for (i = 0; i < no_to_return; i++) { if (pairs[i].length == 0) break; printf("%s %sn", pairs[i].first, pairs[i].second); } /* Clean up */ free((void *)lines); fclose(wordlist); return 0; } int comp_chars(const void *a, const void *b) { if (*(char*)a < *(char*)b) return -1; else if (*(char*)a == *(char*)b) return 0; else return 1; } int comp_sorted(const void *a, const void *b) { char sorted[2][MAXWORDLEN]; strcpy(sorted[0],(*(struct line *)a).sorted); strcpy(sorted[1],(*(struct line *)b).sorted); return strcmp(sorted[0],sorted[1]); }
int comp_len(const void *a, const void *b) { int lengths[2]; lengths[0] = (*(struct pair *)a).length; lengths[1] = (*(struct pair *)b).length; if (lengths[0] < lengths[1]) return 1; else if (lengths[0] == lengths[1]) return 0; else return -1; }

LikeLike
James says:

2009-05-07 at 7:22 pm

Oh no! What happened to my precious code! For a nicer version, I’m cheekily hosting it at wikipedia for the time being: http://en.wikipedia.org/wiki/User:Jetekus/uniq_anag

LikeLike
stu says:

2009-05-08 at 3:25 am

maybe you should replace the fancy apostrophes like: ‘{ print length, $0 }’ (and elsewhere) with simple ‘ ‘ ones… this would allow one to copy/paste into a shell without error 🙂

LikeLike
stu says:

2009-05-08 at 3:26 am

hmmm. your blag borks my input! I enter basic apostrophes and it makes them fancy. bad blag!

LikeLike
benjam says:

2009-05-09 at 10:50 am

just thought you’d be interested to know that my AVG 8.5 Free blocks the first page of comments as having a virus called BAT/Deleter.

I know there is nothing malicious (or is there <_<), just thought it was funny.

LikeLike
Henrie Schnee says:

2009-05-10 at 6:22 pm

Have you considered a “Random”-functionality for the blag similar to the one in the comic-section… there’s several years worth of text, so giving the uninformed reader the option to perceive it in a more fatalistic way would be… awesome.

Respect from Germany & it’s liberal arts majors!

LikeLike
Joshua Wright says:

2009-05-11 at 11:21 pm

Useless use of cat; tsk, tsk.

fgrep -v “‘” </usr/share/dict/words | …

Probably not the greatest of your worries, but hey, it’s a start.

-Josh

LikeLike
Viadd says:

2009-05-14 at 1:40 am

If I were trying to use minimal lines I could make this much less clear
def sortuple(word) : letters = [l for l in word.strip()] letters.sort() return tuple(letters)
bytuple={} pairs=[] for w in open('/usr/share/dict/words') : w = w.strip("n") tup = sortuple(w) if not "'" in tup : if tup in bytuple : pairs.append( (len(w),bytuple[tup],w) ) else : bytuple[tup] = w
pairs.sort() for tup in pairs[-1:-100:-1] : print("%2d %s %s" % tup)
Which gives
% time python /Users/palmer/sameletters.py 22 hydropneumopericardium pneumohydropericardium 22 cholecystoduodenostomy duodenocholecystostomy 21 glossolabiopharyngeal labioglossopharyngeal 21 duodenopancreatectomy pancreatoduodenectomy ... 19 incontrovertibility introconvertibility ... 17 misrepresentation representationism ... real 0m2.766s user 0m2.695s sys 0m0.063s

LikeLike
Viadd says:

2009-05-14 at 1:48 am

And of course, the whitespace is lost even with code tags, so you can’t just cut and paste.

I am using OS X so it is looking at almost a quarter million words with average length > 9.5 letters.

LikeLike
Kyzer says:

2009-05-14 at 1:29 pm

I like this idea! Hopefully wordpress won’t ruin the code

perl -ne 'END{for(values%x){next if(@_=keys%{$_})<2;$y{join(" ",@_)}=length$_[0]."n"}print join"n",sort{$y{$b}<=>$y{$a}}keys%y}chomp;$x{join"",sort split//,lc}{$_}++' /usr/share/dict/words|head

LikeLike
Ian says:

2009-05-15 at 4:07 am

How much fs would a fsck ck if a fsck could ck fs?

LikeLike
Aniviller says:

2009-05-16 at 11:11 am

I’m not perl expert but I see HUGE redundancy in first perl call. $_ is placeholder for default argument, so you can skip it almost everywhere. Concatenation of literals and variables is not necessary. You can also eliminate intermediate variable. Here’s much cleaner code (only beginning)

fgrep -v “‘” /usr/share/dict/words | perl -ne ‘chomp; print join(“”, sort(split(//))).” $_n”;’

There’s even shorter option if you take advantage of -p option and take notice that parentheses are optional at split//

fgrep -v “‘” /usr/share/dict/words | perl -pne ‘chomp;$_=join(“”,sort(split//)).” $_n”‘

I’m sure the same can be done for rest of the code.

LikeLike
Aniviller says:

2009-05-16 at 11:23 am

Even more, you usage of
perl -pe 's/^([^ ]+) .*/1/g’
for extracting first column is overkill (the same goes for extraction of last column later in code).

There’s nice utility which does just that.

cut -d' ' -f1

p.s. sorry for not using “code” before. Here’s repost:

fgrep -v "'" /usr/share/dict/words | perl -pne 'chomp;$_=join("",sort(split//))." $_n"'

LikeLike
Caitlin says:

2009-05-19 at 7:44 am

The day I read this post about Overqualified I bought it. I bought it because you said it was funny, and it seemed pretty neat to me.. but the book is so much bigger than just funny cover letters.
As you read all the cover letters, it slowly reveals a story about the author’s brother’s death in a car accident and it’s really uniquely written.

I love it and I’m sharing it with everyone I know. Thanks!

LikeLike
Theyain says:

2009-05-19 at 1:32 pm

At least you’ve never walked in on your roomie masturbating to a bash prompt.

LikeLike
sfink says:

2009-05-25 at 6:23 pm

I know it really, really doesn’t matter, but I can’t resist. You don’t just have a bash problem; you have a perl and awk problem too.

I know that each of these commands has a zillion options and it’s pointless to memorize each one. But some are worth your while: perl -l, for example. chomp and “n” are both annoying nuisances having nothing to do with the problem at hand.

perl -a is the other. perl -pe 's/^([^ ]+) .*/1/' makes me cringe. You’re friendly with awk; why didn’t you use that? Anyway, I think you meant to say perl -ane 'print $F[0]'.

Your awk problem is that you use it. Clean out that corner of your brain, stick with Perl, and maybe you’ll have space to remember -a and -l.

That’s not to mention your regexp problem. What’s all this [^ ]+ nonsense? Seems like you probably meant w+ anyway.

I confess I didn’t actually spend the time to figure out what all of that is trying to accomplish. Something like perl -lne '$s=join("",sort split(//)); push @{ $w{$s} }, $_; END { foreach (sort { length($w{$b}[0]) length($w{$a}[0]) } keys %w) { next unless @{ $w{$_ } } > 1; print length($w{$_}[0]), " ", join(" ", @{ $w{$_} }) } }' /usr/share/dict/words | head I guess? (Or what that was before the blag ate it, I mean.)

Argh! I promised myself not to do that. Evil geekteasing slut! You knew we couldn’t help ourselves…

But yes, somewhere around the 5th or 6th stage in the pipeline, that should have been rewritten as a single script. I’ve committed similar sins, but I think you’ve beat my worst by a factor of 3 or 4. Probably something to do with my attention span.

LikeLike
Aniviller says:

2009-05-27 at 8:17 am

@sfink: I believe that this was put together in a hurry (like every oneliner is). So one automatically uses a tool he’s most comfortable with. But I do suggest usage of cut/paste/join/sed/grep if possible. Mostly because oneliners are more readable and portable if you can see what it’s doing. Perl is only for the steps that require serious string work.

I posted my suggested changes 2 posts above.

awk is IMO fairly good tool, but pretty useless because when there’s need for it, the problem is already out of hand and should be moved to perl/python/…

LikeLike
Bob LaBla says:

2009-05-27 at 5:49 pm

That’s not really a Bash script, it’s a bunch of Awk and Perl strung together with minimal amount of shell commands. If you like Perl so much, why not just write a Perl script rather than this mess of Perl one-liners strung together?

You can write perfectly readable, clean code in Bash, but it takes a bit of discipline, and skill.

LikeLike
reCAPTCHA says:

2009-05-29 at 6:41 pm

ly comments

LikeLike
wastelandamerica says:

2009-05-30 at 1:18 am

Hey Randal! I want to use one of your comics on my blog…well…uh…I kind of already did. Is the citation the way you want it? I changed the rollover text, problem? Hit me back.

wastelandamerica.

LikeLike
Dan says:

2009-05-30 at 7:36 am

There’s nothing wrong with using Bash to string together this much manipulation, as long as:
1) Each portion of the pipe is an independent logical functional unit (which I am led to believe might not be the case by your “tee”, but it’s still possible).
2) The command line grew organically, as you tested each portion of the manipulation along the way and thought of the correct next step.
3) Speed of execution is not an issue.

For example, here’s a quick script I worked up the other day to reset a password, check accounts and group memberships, and verify sudoers settings:

FAILSTRING='33[1;31mFAILURE33[0;0m'; for user in user1 user2 user3 user4 anotheruser; do (grep $user /etc/passwd && groups $user | grep staff >> /dev/null) || /bin/echo -e $user: $FAILSTRING; done; grep -E '^%staff' /etc/sudoers || /bin/echo -e "Sudoers: $FAILSTRING"; if [ -z $pwreset ]; then passwd anotheruser; fi; export pwreset='n'

It works like a charm, and for what I needed it for, it was ideal. For example, by storing state in the shell, I could re-run it after I fixed any issues it identified and it wouldn’t repeat any destructive action.

LikeLike
SuccessfullyWasted2Hours says:

2009-06-02 at 2:31 pm

This is not at all shorter by any means, I admit. But my angle was different:
You only return the (two) longest matches. For languages with a more complex grammer (French, German), those are commonly of no use, because they are just two different flections of the very same verb (different tenses etc.).

So, after I figured out what you were doing (given that I know almost nothing about perl, that was simply “first command, check output, add next, check output, …”), I dropped all but the first perl command (which I wouldn’t know how to do in pure shell…) and built up the rest in the command structures I am familiar with. 🙂
And when I was there, I added args to play around… min. length, min. multiplicity, sort by length or by multiplicity.
The result is then grouped by the key (making it better readable as the groups are more obvious).

It’s also a bit faster (~9s vs. ~35s for the original), but uses more files…
(Yeah, and it misses the whole “do in one line” thing as it is written now, but I believe with fixed args it *could* be crammed into one.)

#!/bin/bash # sort behaves not as expected: # it treats international chars as separators? if LC_ALL is not set to C export LC_ALL=C
# now that this is a script, might as well give options... arginvalid="" sort="-bylength" cntmin=3 lenmin=8 while [ $# -gt 0 ]; do if [ "$1" == "-bycount" -o "$1" == "-bylength" ]; then sort=$1 else if [ "$1" == "-countmin" ]; then if [ "$2" -gt 1 -a "$2" -lt 10 ]; then cntmin=$2 else echo Invalid argument to '-countmin': Must be followed by a number between 1 and 10. fi # taking 2 args away shift else if [ "$1" == "-lenmin" ]; then if [ "$2" -gt 1 ]; then lenmin=$2 else echo Invalid argument to '-lenmin': Must be followed by a number over 1. fi # taking 2 args away shift else arginvalid=$1 fi fi fi shift done if [ "$arginvalid" != "" ]; then echo Invalid argument given: "$arginvalid" echo Valid choices: '-bycount', '-bylength', '-countmin ', '-lenmin ' fi echo Running with: Sorting = "$sort", min. multiplicity = "$cntmin", min. length = "$lenmin" # generate the lookup table and the sorted list of all keys in it echo Generating tables... cat /usr/share/dict/words | fgrep -v "'" | perl -ne 'chomp($_); @b=split(//,$_); print join("", sort(@b))." ".$_."n";' | sort | tee lookup.txt | cut -f1 -d >keys.txt # only at least triplicate (or $cntmin if overridden) keys are kept, result sorted by multiplicity descending # min. accepted key length is ~8 (or $lenmin if overridden) chars (international chars are counted twice or more!) echo Analyzing keys... i=0; multifilter=":[" while [ $i -lt $cntmin ]; do multifilter=${multifilter}$i i=$(($i + 1)) done multifilter=${multifilter}"]" lenfilter="^[^:]{"${lenmin}",}:" ( count=0; j=""; for i in `cat keys.txt`; do if [ "$j" == "$i" ]; then count=$(($count + 1)); else echo $j:$count; count=1; j="$i"; fi; done; echo $j:$count ) | grep -v $multifilter | grep $lenfilter >keys_len8.txt if [ "$sort" == "-bylength" ]; then # I don't know how to tell awk to count length only until the :... awk '{ print length, $0 }' keys_len8_sorted.txt else sort -k2 -t: -r keys_len8_sorted.txt fi # list the keys and the list of the matches total=`grep -c : keys_len8_sorted.txt` j=0 pct=0 echo >&2 Building result: (one dot equals 1%) ( for i in `cut &2 -n .; pct=$pctnow; fi; done ) >values_len8plus.txt echo >&2
echo --- Complete result following --- cat values_len8plus.txt

LikeLike
SuccessfullyWasted2Hours says:

2009-06-02 at 2:46 pm

That *almost* copied right. D’oh. Mkay, replaced all redirections by HTML entities… If I knew how, I’d delete the previous post. 😀
—

This is not at all shorter by any means, I admit. But my angle was different:
You only return the (two) longest matches. For languages with a more complex grammer (French, German), those are commonly of no use, because they are just two different flections of the very same verb (different tenses etc.).

So, after I figured out what you were doing (given that I know almost nothing about perl, that was simply “first command, check output, add next, check output, …”), I dropped all but the first perl command (which I wouldn’t know how to do in pure shell…) and built up the rest in the command structures I am familiar with. 🙂
And when I was there, I added args to play around… min. length, min. multiplicity, sort by length or by multiplicity.
The result is then grouped by the key (making it better readable as the groups are more obvious).

It’s also a bit faster (~9s vs. ~35s for the original), but uses more files…
(Yeah, and it misses the whole “do in one line” thing as it is written now, but it *could* be crammed into one, I believe.)

#!/bin/bash # sort behaves not as expected: # it treats international chars as separators? if LC_ALL is not set to C export LC_ALL=C
# now that this is a script, might as well give options... arginvalid="" sort="-bylength" cntmin=3 lenmin=8 while [ $# -gt 0 ]; do if [ "$1" == "-bycount" -o "$1" == "-bylength" ]; then sort=$1 else if [ "$1" == "-countmin" ]; then if [ "$2" -gt 1 -a "$2" -lt 10 ]; then cntmin=$2 else echo Invalid argument to '-countmin': Must be followed by a number between 1 and 10. fi # taking 2 args away shift else if [ "$1" == "-lenmin" ]; then if [ "$2" -gt 1 ]; then lenmin=$2 else echo Invalid argument to '-lenmin': Must be followed by a number over 1. fi # taking 2 args away shift else arginvalid=$1 fi fi fi shift done if [ "$arginvalid" != "" ]; then echo Invalid argument given: "$arginvalid" echo Valid choices: '-bycount', '-bylength', '-countmin <number>', '-lenmin <number>' fi echo Running with: Sorting = "$sort", min. multiplicity = "$cntmin", min. length = "$lenmin" # generate the lookup table and the sorted list of all keys in it echo Generating tables... cat /usr/share/dict/words | fgrep -v "'" | perl -ne 'chomp($_); @b=split(//,$_); print join("", sort(@b))." ".$_."n";' | sort | tee lookup.txt | cut -f1 -d >keys.txt # only at least triplicate (or $cntmin if overridden) keys are kept, result sorted by multiplicity descending # min. accepted key length is ~8 (or $lenmin if overridden) chars (international chars are counted twice or more!) echo Analyzing keys... i=0; multifilter=":[" while [ $i -lt $cntmin ]; do multifilter=${multifilter}$i i=$(($i + 1)) done multifilter=${multifilter}"]" lenfilter="^[^:]{"${lenmin}",}:" ( count=0; j=""; for i in `cat keys.txt`; do if [ "$j" == "$i" ]; then count=$(($count + 1)); else echo $j:$count; count=1; j="$i"; fi; done; echo $j:$count ) | grep -v $multifilter | grep $lenfilter >keys_len8.txt if [ "$sort" == "-bylength" ]; then # I don't know how to tell awk to count length only until the :... awk '{ print length, $0 }' <keys_len8.txt | sort -nr >keys_len8_sorted.txt else sort -k2 -t: -r <keys_len8.txt >keys_len8_sorted.txt fi # list the keys and the list of the matches total=`grep -c : keys_len8_sorted.txt` j=0 pct=0 echo >&2 Building result: (one dot equals 1%) ( for i in `cut <keys_len8_sorted.txt -f2 -d | cut -f1 -d:`; do echo $i:; grep "^$i " lookup.txt; echo; j=$(($j + 1)); pctnow=$(($j * 100 / $total)); if [ $pctnow -ne $pct ]; then echo >&2 -n .; pct=$pctnow; fi; done ) >values_len8plus.txt echo >&2
echo --- Complete result following --- cat values_len8plus.txt

LikeLike
words says:

2009-06-04 at 4:48 pm

I pronounce / as “slash”, /etc as “ettckk” (like you have something in your mouth), ~ as “squiggly” (or occasionally “the little squiggly”). fsck is “eff suck”.

LikeLike
hemflit says:

2009-06-08 at 3:23 pm

For someone who doesn’t speak Perl and isn’t motivated to learn, can anyone please explain in English (or in clear pseudocode) what the original shell command sausage was trying to accomplish?

(Captcha: “10:30 blanche”. 10:30?)

LikeLike
Skips says:

2009-06-10 at 5:18 am

Am I the only one that pronounces everything (except WWW, which I pronounce as World Wide Web) as the actual letters?

SQL – Ess Que El
HTTP – Aich Tee Tee Pee

and then I pronounce the ones that arent letters the way I imagine they sound.

/ – Fshhhp
– Shhhhpf
. – Poit
~ – ooOOooeeeEEoo

LikeLike
Scmb says:

2009-06-11 at 9:23 am

In relation to the pronunciation of symbols I’m sure this has already come up:
http://lists.ding.net/geeks/96/dec/msg00005.html

LikeLike
DeathAnchor says:

2009-06-19 at 1:19 pm

Simply written in a few lines of Perl code without insanity:

#!/usr/local/bin/perl
use strict; use warnings; my $store; while(){ chomp; push @{$store->{join '', sort split //}}, $_; }
foreach (sort {length($a) length($b) } keys %{$store}){ print "@{$store->{$_}}n" if @{$store->{$_}} > 1; }

And you just do: script.pl /usr/share/dict/words | tail -1

You can have it remove the names by ignoring anything with a capital in the first loop.

LikeLike
DeathAnchor says:

2009-06-19 at 1:22 pm

Oooo damn you webforms suck with coding:

inside the while (){ should be the wakas (or angle brackets)

if this works:
while(<>){

LikeLike
Katie says:

2009-07-06 at 6:00 pm

I think of ‘etc’ as “etts” for some reason. I guess my brain is lazy.
‘usr’ is “user.” I don’t really pronounce ‘/’ at all, I just say “user bin python” or “etts X11 X org dot conf.” Similarly, ‘~/.mozilla’ is just “dot mozilla” and ‘/home/katie’ is “home katie.”

Even more confusingly, ‘$HOME’ is just “home.” It’s a good thing I don’t often talk to people about Linux in person.

As for BASH scripts, I pretty much give up and move it to a Python script if it’s longer than one line or involves more than three pipes.

Fun times:
fortune | sed '/you/& and Gary Busey/is' | cowsay -n

P.S.: reCAPTCHA: “cuddling York” – I’m not sure why this amuses me, but it does.

LikeLike
Katie says:

2009-07-06 at 6:02 pm

Related:
cowsay -n "STACK OVERFLOW" | cowsay -n | cowsay -n | cowsay -n

LikeLike
medyum says:

2009-07-07 at 3:24 am

Thats not really a Bash script, it’s a bunch of Awk and Perl strung together with minimal amount of shell commands. If you like Perl so much, why not just write a Perl script rather than this mess of Perl one-liners strung together.You can write perfectly readable, clean code in Bash, but it takes a bit of discipline, and skill.

LikeLike

Comments are closed.

Share this:

Related

137 replies on “A Problem”