A Problem

I think I have a Bash problem.  What follows is an actual command from my history.

cat /usr/share/dict/words | fgrep -v "'" | perl -ne 'chomp($_); @b=split(//,$_); print join("", sort(@b))." ".$_."n";' | tee lookup.txt | perl -pe 's/^([^ ]+) .*/1/g' | awk '{ print length, $0 }' | sort -n | awk '{$1=""; print $0}' | uniq -c | sort -nr | egrep "^[^0-9]+2 " | awk '{ print length, $0 }' | sort -n | awk '{$1=""; print $0}' | perl -pe 's/[ 0-9]//g' | xargs -i grep {} lookup.txt | perl -pe 's/[^ ]+ //g' | tail -n2

It’s just so hard to bite the bullet, admit that the problem has grown in scope, and move it to its own Perl/Python script.  (P.S. The Guinness Book is wrong.  “Conservationalists” is not a real word.)

Edit: to those who are competing in the comments to improve (shorten) the above command: when pasting code, use the <code> tag to override WordPress quote formatting.

Joey Comeau has a new book out based on Overqualified, which has long been one of my favorite things on the internet.  He writes cover letters to companies.  They each sound businesslike enough for the first paragraph or so, and then you gradually realize you are reading something that is in no way a normal cover letter.  An excerpt from one to Nintendo:

We need a new Mario game, where you rescue the princess in the first ten minutes, and for the rest of the game you try and push down that sick feeling in your stomach that she’s “damaged goods”, a concept detailed again and again in the profoundly sex negative instruction booklet, and when Luigi makes a crack about her and Bowser, you break his nose and immediately regret it. When Peach asks you, in the quiet of her mushroom castle bedroom “do you still love me?” you pretend to be asleep. You press the A button rhythmically, to control your breath, keep it even.

#2 (NeoPost), #28 (Phone surveys) and #58 (MySpace) are three of my favorites.

137 thoughts on “A Problem

  1. tinctorius@grass:~$ ## Insert weird xkcd command here.
    grep: invalid option -- t
    Usage: grep [OPTION]... PATTERN [FILE]...
    Try `grep --help' for more information.
    grep: invalid option -- t
    Usage: grep [OPTION]... PATTERN [FILE]...
    Try `grep --help' for more information.
    grep: unknown directories method
    grep: OWeeimpr: invalid context length argument

    grep: conflicting matchers specified
    grep: OWegiiknrtu: invalid context length argument

    grep: unknown directories method
    grep: invalid option -- W
    Usage: grep [OPTION]... PATTERN [FILE]...
    Try `grep --help' for more information.
    grep: unknown directories method
    grep: unknown directories method
    grep: unknown directories method
    grep: unknown directories method
    grep: Naaaddeeiilllnnnrsst: invalid context length argument

    grep: unknown directories method
    grep: unrecognized option `--eeeeegiiiklnnnnrrsstvw'
    Usage: grep [OPTION]... PATTERN [FILE]...
    Try `grep --help' for more information.
    belastingverplichtingen
    betalingsverplichtingen

    I do not know what just happened. Too many dashes in my word list (Dutch), perhaps?

    Like

  2. Nice! BTW what is this script supposed to do? Spewing?
    Randal: you should save such long things in a shell script, you know. I wouldn’t risk losing this if something happens to ~/.bash_history .
    Also, why not write the whole thing in Perl or python? I would probably write it in PHP, since it’s the only language I know well enough 🙂
    reCAPTCHA: delicacy pervert 😀 nice

    Like

  3. Hi Randall, didn’t know how to reach u so I’ll drop a comment here. you GOTTA check out the Scarpar. I left a link in the “Website” field. Hope you agree 😉
    Sorry, I couldn’t help but think of you when I saw that vid.

    Like

  4. The biggest performance gain for the smallest edit that I saw was to move the tail up to right after the last sort. i.e.


    cat /usr/share/dict/words | fgrep -v "'" | perl -ne 'chomp($_); @b=split(//,$_); print join("", sort(@b))." ".$_."n";' | tee lookup.txt | perl -pe 's/^([^ ]+) .*/1/g' | awk '{ print length, $0 }' | sort -n | awk '{$1=""; print $0}' | uniq -c | sort -nr | egrep "^[^0-9]+2 " | awk '{ print length, $0 }' | sort -n | tail -n1 | awk '{$1=""; print $0}' | perl -pe 's/[ 0-9]//g' | xargs -i grep {} lookup.txt | perl -pe 's/[^ ]+ //g'

    This prevents grepping all the patterns when all you really care about is the longest one.

    PS cygwin is slow.

    Like

  5. It just made me remember… I wrote a python script for downloading all of the xkcd comics into the hard drive:

    import urllib

    def downloadImage (url):
    sock = urllib.urlopen (url)
    img = file (url.split(“/”)[-1], “wb”)
    img.write(sock.read())
    sock.close()
    img.close()

    def findImage (url):
    sock = urllib.urlopen (url)
    html = sock.read()
    sock.close()
    i = html.find(“http://imgs.xkcd.com/comics”)
    image = “”
    while html[i] != ‘”‘:
    image += html[i]
    i += 1
    return image

    for i in range (1, 576): #change this number for the actual one
    if i != 404:
    downloadImage(findImage(‘http://xkcd.com/%i’ % i))

    Pythons is such as cool, he.
    BTW, it could seems like it stucks but if you look at the path you run it you could see the images downloading.

    Like

  6. @Aaron A: Yeah, some do, in the sense that they have a lot of extra fields in which you can put pretty much whatever you want. I think the comic is just referring to really specific genres though.

    Like

  7. spirov92 : “I wouldn’t risk losing this if something happens to ~/.bash_history .”
    Am i the only person who reads that with a menacing tone?
    I can just picture the mafia bots in futurama: “hey, you best not be messing around with python, cuz that looks like it took an awful lot of work, I wouldn’t risk losing this if something happens to ~/.bash_history”

    actually typing that out i just realized i pronounce ~ “home”, i think there’s something wrong with me ?

    Like

  8. I pronounce ~ as home, / as root, we all should.

    /etc as ‘et cetera’, fstab as footstab. fsck is harder.

    ls is list.

    Like

  9. Updating xkcd every day?
    I’m so thrilled. It’s great.

    Thanks! You’re the best.

    Like

  10. @Matt Hickford:

    lol. Reminds me of how I pronounce HTML: hitmull, XML: ex-ihm-ull, SHTML: well. you can guess, SQL: Skewl (even though I’m well aware it’s sequel), as well as many other things like this.

    Like

  11. /etc is quite clearly pronounced “et-see”. And ~ is “squiggle”. I feel so adamant about this that I’m willing to start a quasi-religious propaganda war. Any takers?

    …and I will do it from emacs.

    Like

  12. Is this safe to run or do I need to worry about overwriting the kernel with a list of words from the dictionary?

    hahaha

    Like

  13. (please imagine me leading a charge while shouting nano, as my post will have much more effect that way)

    How can you say ~ is “squiggle” (at least it should be tilde), that’s is like saying that ? is “three dots” and 🙂 is “colon bracket”. While i wont argue over /etc (ekd) or /usr (user) and honestly don’t care, i will see you on the battlefield regarding ~, oh and i compose all my posts in nano!!!!

    Like

  14. Hi all. I didn’t quite shorten the code, but I did make it faster. 🙂


    #include
    #include
    #include

    #define MAXWORDLEN 30
    #define MAXWORDS 1000000

    struct line {
    char word[MAXWORDLEN];
    char sorted[MAXWORDLEN];
    int length;
    };

    struct pair {
    char first[MAXWORDLEN];
    char second[MAXWORDLEN];
    int length;
    };

    int comp_chars(const void *a, const void *b);
    int comp_sorted(const void *a, const void *b);
    int comp_len(const void *a, const void *b);

    int main(int argc, char *argv[])
    {
    /* Check correct no of arguments */
    if (argc != 2 && argc != 3) {
    printf("Usage: uniq_anag words_list number_of_pairsn");
    printf("If no number given, default is one.n");
    return -1;
    }

    /* Declarations */
    FILE *wordlist;
    struct line *lines = calloc(MAXWORDS, sizeof(struct line));
    int i, j;
    char c;
    int no_of_words;
    int no_to_return = (argc == 2 ? 1 : atoi(argv[2]));
    struct pair *pairs = calloc(MAXWORDS, sizeof(struct pair));

    /* Open the file */
    wordlist = fopen(argv[1], "r");
    /* Get the words */
    i = j = 0;
    while ((c = getc(wordlist)) != EOF) {
    if (c != 'n') {
    lines[i].word[j] = c;
    lines[i].sorted[j++] = c;
    }
    else {
    j = 0;
    ++i;
    }
    }
    no_of_words = i;

    /* Order the letters in the second copy */
    for (i = 0; i < no_of_words; ++i) {
    lines[i].length = strlen(lines[i].word);
    qsort(lines[i].sorted, lines[i].length, sizeof(char), comp_chars);
    }

    /* Sort the lines by the sorted words */
    qsort(lines, no_of_words, sizeof(struct line), comp_sorted);

    /* Look for pairs */
    char to_comp[2][MAXWORDLEN];
    strcpy(to_comp[0], lines[0].sorted);
    int matches = 0;
    j = 0;
    for (i = 1; i < no_of_words; ++i) {
    strcpy(to_comp[i%2], lines[i].sorted);
    if (strcmp(to_comp[0],to_comp[1]) == 0)
    matches++;
    else {
    if (matches == 1) {
    strcpy(pairs[j].first, lines[i-2].word);
    strcpy(pairs[j].second, lines[i-1].word);
    pairs[j++].length = lines[i-1].length;
    }
    matches = 0;
    }
    }
    int no_of_pairs = j;

    /* Sort the pairs we've found */
    qsort(pairs, no_of_pairs, sizeof(struct pair), comp_len);

    /* Print results */
    for (i = 0; i < no_to_return; i++) {
    if (pairs[i].length == 0)
    break;
    printf("%s %sn", pairs[i].first, pairs[i].second);
    }

    /* Clean up */
    free((void *)lines);
    fclose(wordlist);
    return 0;
    }

    int comp_chars(const void *a, const void *b)
    {
    if (*(char*)a < *(char*)b)
    return -1;
    else if (*(char*)a == *(char*)b)
    return 0;
    else
    return 1;
    }

    int comp_sorted(const void *a, const void *b)
    {
    char sorted[2][MAXWORDLEN];
    strcpy(sorted[0],(*(struct line *)a).sorted);
    strcpy(sorted[1],(*(struct line *)b).sorted);
    return strcmp(sorted[0],sorted[1]);
    }

    int comp_len(const void *a, const void *b)
    {
    int lengths[2];
    lengths[0] = (*(struct pair *)a).length;
    lengths[1] = (*(struct pair *)b).length;
    if (lengths[0] < lengths[1])
    return 1;
    else if (lengths[0] == lengths[1])
    return 0;
    else
    return -1;
    }

    Like

  15. maybe you should replace the fancy apostrophes like: ‘{ print length, $0 }’ (and elsewhere) with simple ‘ ‘ ones… this would allow one to copy/paste into a shell without error 🙂

    Like

  16. just thought you’d be interested to know that my AVG 8.5 Free blocks the first page of comments as having a virus called BAT/Deleter.

    I know there is nothing malicious (or is there <_<), just thought it was funny.

    Like

  17. Have you considered a “Random”-functionality for the blag similar to the one in the comic-section… there’s several years worth of text, so giving the uninformed reader the option to perceive it in a more fatalistic way would be… awesome.

    Respect from Germany & it’s liberal arts majors!

    Like

  18. Useless use of cat; tsk, tsk.

    fgrep -v “‘” </usr/share/dict/words | …

    Probably not the greatest of your worries, but hey, it’s a start.

    -Josh

    Like

  19. If I were trying to use minimal lines I could make this much less clear

    def sortuple(word) :
    letters = [l for l in word.strip()]
    letters.sort()
    return tuple(letters)

    bytuple={}
    pairs=[]

    for w in open('/usr/share/dict/words') :
    w = w.strip("n")
    tup = sortuple(w)
    if not "'" in tup :
    if tup in bytuple :
    pairs.append( (len(w),bytuple[tup],w) )
    else :
    bytuple[tup] = w

    pairs.sort()
    for tup in pairs[-1:-100:-1] :
    print("%2d %s %s" % tup)

    Which gives

    % time python /Users/palmer/sameletters.py
    22 hydropneumopericardium pneumohydropericardium
    22 cholecystoduodenostomy duodenocholecystostomy
    21 glossolabiopharyngeal labioglossopharyngeal
    21 duodenopancreatectomy pancreatoduodenectomy
    ...
    19 incontrovertibility introconvertibility
    ...
    17 misrepresentation representationism
    ...
    real 0m2.766s
    user 0m2.695s
    sys 0m0.063s

    Like

  20. And of course, the whitespace is lost even with code tags, so you can’t just cut and paste.

    I am using OS X so it is looking at almost a quarter million words with average length > 9.5 letters.

    Like

  21. I like this idea! Hopefully wordpress won’t ruin the code

    perl -ne 'END{for(values%x){next if(@_=keys%{$_})<2;$y{join(" ",@_)}=length$_[0]."n"}print join"n",sort{$y{$b}<=>$y{$a}}keys%y}chomp;$x{join"",sort split//,lc}{$_}++' /usr/share/dict/words|head

    Like

  22. I’m not perl expert but I see HUGE redundancy in first perl call. $_ is placeholder for default argument, so you can skip it almost everywhere. Concatenation of literals and variables is not necessary. You can also eliminate intermediate variable. Here’s much cleaner code (only beginning)

    fgrep -v “‘” /usr/share/dict/words | perl -ne ‘chomp; print join(“”, sort(split(//))).” $_n”;’

    There’s even shorter option if you take advantage of -p option and take notice that parentheses are optional at split//

    fgrep -v “‘” /usr/share/dict/words | perl -pne ‘chomp;$_=join(“”,sort(split//)).” $_n”‘

    I’m sure the same can be done for rest of the code.

    Like

  23. Even more, you usage of

    perl -pe 's/^([^ ]+) .*/1/g’

    for extracting first column is overkill (the same goes for extraction of last column later in code).

    There’s nice utility which does just that.


    cut -d' ' -f1

    p.s. sorry for not using “code” before. Here’s repost:


    fgrep -v "'" /usr/share/dict/words | perl -pne 'chomp;$_=join("",sort(split//))." $_n"'

    Like

  24. The day I read this post about Overqualified I bought it. I bought it because you said it was funny, and it seemed pretty neat to me.. but the book is so much bigger than just funny cover letters.
    As you read all the cover letters, it slowly reveals a story about the author’s brother’s death in a car accident and it’s really uniquely written.

    I love it and I’m sharing it with everyone I know. Thanks!

    Like

  25. I know it really, really doesn’t matter, but I can’t resist. You don’t just have a bash problem; you have a perl and awk problem too.

    I know that each of these commands has a zillion options and it’s pointless to memorize each one. But some are worth your while: perl -l, for example. chomp and “n” are both annoying nuisances having nothing to do with the problem at hand.

    perl -a is the other. perl -pe 's/^([^ ]+) .*/1/' makes me cringe. You’re friendly with awk; why didn’t you use that? Anyway, I think you meant to say perl -ane 'print $F[0]'.

    Your awk problem is that you use it. Clean out that corner of your brain, stick with Perl, and maybe you’ll have space to remember -a and -l.

    That’s not to mention your regexp problem. What’s all this [^ ]+ nonsense? Seems like you probably meant w+ anyway.

    I confess I didn’t actually spend the time to figure out what all of that is trying to accomplish. Something like perl -lne '$s=join("",sort split(//)); push @{ $w{$s} }, $_; END { foreach (sort { length($w{$b}[0]) length($w{$a}[0]) } keys %w) { next unless @{ $w{$_ } } > 1; print length($w{$_}[0]), " ", join(" ", @{ $w{$_} }) } }' /usr/share/dict/words | head I guess? (Or what that was before the blag ate it, I mean.)

    Argh! I promised myself not to do that. Evil geekteasing slut! You knew we couldn’t help ourselves…

    But yes, somewhere around the 5th or 6th stage in the pipeline, that should have been rewritten as a single script. I’ve committed similar sins, but I think you’ve beat my worst by a factor of 3 or 4. Probably something to do with my attention span.

    Like

  26. @sfink: I believe that this was put together in a hurry (like every oneliner is). So one automatically uses a tool he’s most comfortable with. But I do suggest usage of cut/paste/join/sed/grep if possible. Mostly because oneliners are more readable and portable if you can see what it’s doing. Perl is only for the steps that require serious string work.

    I posted my suggested changes 2 posts above.

    awk is IMO fairly good tool, but pretty useless because when there’s need for it, the problem is already out of hand and should be moved to perl/python/…

    Like

  27. That’s not really a Bash script, it’s a bunch of Awk and Perl strung together with minimal amount of shell commands. If you like Perl so much, why not just write a Perl script rather than this mess of Perl one-liners strung together?

    You can write perfectly readable, clean code in Bash, but it takes a bit of discipline, and skill.

    Like

  28. Hey Randal! I want to use one of your comics on my blog…well…uh…I kind of already did. Is the citation the way you want it? I changed the rollover text, problem? Hit me back.

    wastelandamerica.

    Like

  29. There’s nothing wrong with using Bash to string together this much manipulation, as long as:
    1) Each portion of the pipe is an independent logical functional unit (which I am led to believe might not be the case by your “tee”, but it’s still possible).
    2) The command line grew organically, as you tested each portion of the manipulation along the way and thought of the correct next step.
    3) Speed of execution is not an issue.

    For example, here’s a quick script I worked up the other day to reset a password, check accounts and group memberships, and verify sudoers settings:

    FAILSTRING='33[1;31mFAILURE33[0;0m'; for user in user1 user2 user3 user4 anotheruser; do (grep $user /etc/passwd && groups $user | grep staff >> /dev/null) || /bin/echo -e $user: $FAILSTRING; done; grep -E '^%staff' /etc/sudoers || /bin/echo -e "Sudoers: $FAILSTRING"; if [ -z $pwreset ]; then passwd anotheruser; fi; export pwreset='n'

    It works like a charm, and for what I needed it for, it was ideal. For example, by storing state in the shell, I could re-run it after I fixed any issues it identified and it wouldn’t repeat any destructive action.

    Like

  30. This is not at all shorter by any means, I admit. But my angle was different:
    You only return the (two) longest matches. For languages with a more complex grammer (French, German), those are commonly of no use, because they are just two different flections of the very same verb (different tenses etc.).

    So, after I figured out what you were doing (given that I know almost nothing about perl, that was simply “first command, check output, add next, check output, …”), I dropped all but the first perl command (which I wouldn’t know how to do in pure shell…) and built up the rest in the command structures I am familiar with. 🙂
    And when I was there, I added args to play around… min. length, min. multiplicity, sort by length or by multiplicity.
    The result is then grouped by the key (making it better readable as the groups are more obvious).

    It’s also a bit faster (~9s vs. ~35s for the original), but uses more files…
    (Yeah, and it misses the whole “do in one line” thing as it is written now, but I believe with fixed args it *could* be crammed into one.)


    #!/bin/bash
    # sort behaves not as expected:
    # it treats international chars as separators? if LC_ALL is not set to C
    export LC_ALL=C

    # now that this is a script, might as well give options...
    arginvalid=""
    sort="-bylength"
    cntmin=3
    lenmin=8
    while [ $# -gt 0 ]; do
    if [ "$1" == "-bycount" -o "$1" == "-bylength" ]; then
    sort=$1
    else
    if [ "$1" == "-countmin" ]; then
    if [ "$2" -gt 1 -a "$2" -lt 10 ]; then
    cntmin=$2
    else
    echo Invalid argument to '-countmin': Must be followed by a number between 1 and 10.
    fi
    # taking 2 args away
    shift
    else
    if [ "$1" == "-lenmin" ]; then
    if [ "$2" -gt 1 ]; then
    lenmin=$2
    else
    echo Invalid argument to '-lenmin': Must be followed by a number over 1.
    fi
    # taking 2 args away
    shift
    else
    arginvalid=$1
    fi
    fi
    fi

    shift
    done

    if [ "$arginvalid" != "" ]; then
    echo Invalid argument given: "$arginvalid"
    echo Valid choices: '-bycount', '-bylength', '-countmin ', '-lenmin '
    fi

    echo Running with: Sorting = "$sort", min. multiplicity = "$cntmin", min. length = "$lenmin"

    # generate the lookup table and the sorted list of all keys in it
    echo Generating tables...
    cat /usr/share/dict/words
    | fgrep -v "'"
    | perl -ne 'chomp($_); @b=split(//,$_); print join("", sort(@b))." ".$_."n";'
    | sort
    | tee lookup.txt
    | cut -f1 -d >keys.txt

    # only at least triplicate (or $cntmin if overridden) keys are kept, result sorted by multiplicity descending
    # min. accepted key length is ~8 (or $lenmin if overridden) chars (international chars are counted twice or more!)
    echo Analyzing keys...
    i=0;
    multifilter=":["
    while [ $i -lt $cntmin ]; do
    multifilter=${multifilter}$i
    i=$(($i + 1))
    done
    multifilter=${multifilter}"]"
    lenfilter="^[^:]{"${lenmin}",}:"

    (
    count=0;
    j="";
    for i in `cat keys.txt`; do
    if [ "$j" == "$i" ]; then
    count=$(($count + 1));
    else
    echo $j:$count;
    count=1;
    j="$i";
    fi;
    done;
    echo $j:$count
    ) | grep -v $multifilter | grep $lenfilter >keys_len8.txt

    if [ "$sort" == "-bylength" ]; then
    # I don't know how to tell awk to count length only until the :...
    awk '{ print length, $0 }' keys_len8_sorted.txt
    else
    sort -k2 -t: -r keys_len8_sorted.txt
    fi

    # list the keys and the list of the matches
    total=`grep -c : keys_len8_sorted.txt`
    j=0
    pct=0
    echo >&2 Building result: (one dot equals 1%)
    (
    for i in `cut &2 -n .;
    pct=$pctnow;
    fi;
    done
    ) >values_len8plus.txt
    echo >&2

    echo --- Complete result following ---
    cat values_len8plus.txt

    Like

  31. That *almost* copied right. D’oh. Mkay, replaced all redirections by HTML entities… If I knew how, I’d delete the previous post. 😀

    This is not at all shorter by any means, I admit. But my angle was different:
    You only return the (two) longest matches. For languages with a more complex grammer (French, German), those are commonly of no use, because they are just two different flections of the very same verb (different tenses etc.).

    So, after I figured out what you were doing (given that I know almost nothing about perl, that was simply “first command, check output, add next, check output, …”), I dropped all but the first perl command (which I wouldn’t know how to do in pure shell…) and built up the rest in the command structures I am familiar with. 🙂
    And when I was there, I added args to play around… min. length, min. multiplicity, sort by length or by multiplicity.
    The result is then grouped by the key (making it better readable as the groups are more obvious).

    It’s also a bit faster (~9s vs. ~35s for the original), but uses more files…
    (Yeah, and it misses the whole “do in one line” thing as it is written now, but it *could* be crammed into one, I believe.)


    #!/bin/bash
    # sort behaves not as expected:
    # it treats international chars as separators? if LC_ALL is not set to C
    export LC_ALL=C

    # now that this is a script, might as well give options...
    arginvalid=""
    sort="-bylength"
    cntmin=3
    lenmin=8
    while [ $# -gt 0 ]; do
    if [ "$1" == "-bycount" -o "$1" == "-bylength" ]; then
    sort=$1
    else
    if [ "$1" == "-countmin" ]; then
    if [ "$2" -gt 1 -a "$2" -lt 10 ]; then
    cntmin=$2
    else
    echo Invalid argument to '-countmin': Must be followed by a number between 1 and 10.
    fi
    # taking 2 args away
    shift
    else
    if [ "$1" == "-lenmin" ]; then
    if [ "$2" -gt 1 ]; then
    lenmin=$2
    else
    echo Invalid argument to '-lenmin': Must be followed by a number over 1.
    fi
    # taking 2 args away
    shift
    else
    arginvalid=$1
    fi
    fi
    fi

    shift
    done

    if [ "$arginvalid" != "" ]; then
    echo Invalid argument given: "$arginvalid"
    echo Valid choices: '-bycount', '-bylength', '-countmin <number>', '-lenmin <number>'
    fi

    echo Running with: Sorting = "$sort", min. multiplicity = "$cntmin", min. length = "$lenmin"

    # generate the lookup table and the sorted list of all keys in it
    echo Generating tables...
    cat /usr/share/dict/words
    | fgrep -v "'"
    | perl -ne 'chomp($_); @b=split(//,$_); print join("", sort(@b))." ".$_."n";'
    | sort
    | tee lookup.txt
    | cut -f1 -d >keys.txt

    # only at least triplicate (or $cntmin if overridden) keys are kept, result sorted by multiplicity descending
    # min. accepted key length is ~8 (or $lenmin if overridden) chars (international chars are counted twice or more!)
    echo Analyzing keys...
    i=0;
    multifilter=":["
    while [ $i -lt $cntmin ]; do
    multifilter=${multifilter}$i
    i=$(($i + 1))
    done
    multifilter=${multifilter}"]"
    lenfilter="^[^:]{"${lenmin}",}:"

    (
    count=0;
    j="";
    for i in `cat keys.txt`; do
    if [ "$j" == "$i" ]; then
    count=$(($count + 1));
    else
    echo $j:$count;
    count=1;
    j="$i";
    fi;
    done;
    echo $j:$count
    ) | grep -v $multifilter | grep $lenfilter >keys_len8.txt

    if [ "$sort" == "-bylength" ]; then
    # I don't know how to tell awk to count length only until the :...
    awk '{ print length, $0 }' <keys_len8.txt | sort -nr >keys_len8_sorted.txt
    else
    sort -k2 -t: -r <keys_len8.txt >keys_len8_sorted.txt
    fi

    # list the keys and the list of the matches
    total=`grep -c : keys_len8_sorted.txt`
    j=0
    pct=0
    echo >&2 Building result: (one dot equals 1%)
    (
    for i in `cut <keys_len8_sorted.txt -f2 -d | cut -f1 -d:`; do
    echo $i:;
    grep "^$i " lookup.txt;
    echo;
    j=$(($j + 1));
    pctnow=$(($j * 100 / $total));
    if [ $pctnow -ne $pct ]; then
    echo >&2 -n .;
    pct=$pctnow;
    fi;
    done
    ) >values_len8plus.txt
    echo >&2

    echo --- Complete result following ---
    cat values_len8plus.txt

    Like

  32. I pronounce / as “slash”, /etc as “ettckk” (like you have something in your mouth), ~ as “squiggly” (or occasionally “the little squiggly”). fsck is “eff suck”.

    Like

  33. For someone who doesn’t speak Perl and isn’t motivated to learn, can anyone please explain in English (or in clear pseudocode) what the original shell command sausage was trying to accomplish?

    (Captcha: “10:30 blanche”. 10:30?)

    Like

  34. Am I the only one that pronounces everything (except WWW, which I pronounce as World Wide Web) as the actual letters?

    SQL – Ess Que El
    HTTP – Aich Tee Tee Pee

    and then I pronounce the ones that arent letters the way I imagine they sound.

    / – Fshhhp
    – Shhhhpf
    . – Poit
    ~ – ooOOooeeeEEoo

    Like

  35. Simply written in a few lines of Perl code without insanity:


    #!/usr/local/bin/perl

    use strict;
    use warnings;

    my $store;
    while(){
    chomp;
    push @{$store->{join '', sort split //}}, $_;
    }

    foreach (sort {length($a) length($b) } keys %{$store}){
    print "@{$store->{$_}}n" if @{$store->{$_}} > 1;
    }

    And you just do: script.pl /usr/share/dict/words | tail -1

    You can have it remove the names by ignoring anything with a capital in the first loop.

    Like

  36. Oooo damn you webforms suck with coding:

    inside the while (){ should be the wakas (or angle brackets)

    if this works:
    while(<>){

    Like

  37. I think of ‘etc’ as “etts” for some reason. I guess my brain is lazy.
    ‘usr’ is “user.” I don’t really pronounce ‘/’ at all, I just say “user bin python” or “etts X11 X org dot conf.” Similarly, ‘~/.mozilla’ is just “dot mozilla” and ‘/home/katie’ is “home katie.”

    Even more confusingly, ‘$HOME’ is just “home.” It’s a good thing I don’t often talk to people about Linux in person.

    As for BASH scripts, I pretty much give up and move it to a Python script if it’s longer than one line or involves more than three pipes.

    Fun times:

    fortune | sed '/you/& and Gary Busey/is' | cowsay -n

    P.S.: reCAPTCHA: “cuddling York” – I’m not sure why this amuses me, but it does.

    Like

  38. Thats not really a Bash script, it’s a bunch of Awk and Perl strung together with minimal amount of shell commands. If you like Perl so much, why not just write a Perl script rather than this mess of Perl one-liners strung together.You can write perfectly readable, clean code in Bash, but it takes a bit of discipline, and skill.

    Like

Comments are closed.