Distraction Affliction Correction Extension

Lots of people have asked me for the system I used to implement the restriction in the alt-text of today’s comic.

At various times, I thought of doing it with an X modification, Firefox extension, a Chrome add-on, an irssi script, etc—but none of them worked too well (or involved a lot of sustained undistracted effort, which was sort of a Catch-22).  Then I hit on a much simpler solution:

I made it a rule that as soon as I finished any task, or got bored with it, I had to power off my computer.

I could turn it back on right away—this wasn’t about trying to use the computer less. The rule was just that the moment I finished (or lost interest in) the thing I was doing, and felt like checking Google News et. al., before I had time to think too much, I’d start the shutdown process.  There was no struggle of willpower; I knew that after I hit the button, I could decide to do anything I wanted. But if I decided to look at a website, I’d have to wait through the startup, and once I was done, I’d have to turn it off again before doing anything else. (This works best if your ongoing activities are persistent online—for example, all my IRC chat is through irssi running in screen, so turning off my laptop doesn’t make me sign out.)

Other ‘honor system’ approaches have never worked for me.  Blocking the sites (or keeping the computer off) didn’t work—I could always find a way to argue with myself. I’d decide this day needed to be an exception for some reason, think of a project that required the computer, or just grow frustrated after a few hours and get really curious about something I’d seen a website somewhere.  There’s some interesting research about novelty and dopamine, suggesting (tentatively) that for some people exposure to novelty may activate the same reward system that drug abuse does.  In my case, I felt like my problem was that whenever I was trying to focus on a (rewarding) project, these sites were always in the background offering a quicker and easier rush.  I’d sit down to write code, draw something, build something, or clean, and the moment I hit a little bump—math I wasn’t sure how to handle, a sentence I couldn’t word right, an electronic part I couldn’t find, or a sock without a mate—I’d find myself switching to one of these sites and refreshing.  Reward was briefly unavailable from the project, but constantly available from the internet.  Adding the time-delay removed the promise of instant novelty, and perhaps helped disconnect the action from the reward in my head.  Without that connection dominating my decisions, I could think more clearly about whether the task was really important to me.

Beyond that one rule, I put no other restrictions on myself.  Want to go read a 17-part Cracked article?  Fine!  Think you might have an important email?  Go check.  Feel like looking at Reddit for the 20th time today?  Go for it; you might find something interesting (hey, it’s where I found that dopamine article).  Want to play Manufactoria until your eyes bubble?  Absolutely.  The only catch is that you have to stare at a startup screen for 30-60 seconds first. (If you have one of those instant-boot laptops, you’re out of luck.)

It was remarkable how quickly the urges to constantly check those sites vanished. Also remarkable was that for the first time in years, I was keeping my room clean. Since the computer was no longer an instant novelty dispenser, when I got antsy or bored I’d look around my room for a distraction, and wind up picking up a random object and putting it away.

I’ve since relaxed this restriction; the family health situation I mentioned a while back has meant that I’ve had less free time lately, and when I do, mindless distractions have been welcome (thank you again to everyone who sent in games!). But just following this system for a short time was enough to break most of my distracting website habits completely, and when things return to normal around here I’ll probably start using it again.

There’s still a place for a browser extension, though.  A lot of peoples’ jobs require them to be on a computer running something all the time, or can’t shut down for other reasons, so my quick turning-the-computer-off trick won’t work for them.  None of my abortive attempts are worth building on, but if someone’s looking for a quick project, building an extension like this might be a good one.  It could let you impose a delay like this on loading a new page, or a page outside the current domain, or refreshing a page you’re already on (and no, just running the browser under Vista on a Pentium-133 doesn’t count).  If anyone makes a good one, I’d be happy to share it here .  Just post a link in the comments!

Trochee Chart

Here’s something I made as I drew today’s comic.  It’s a chart of Google results for “X Y” (in quotes) where X and Y are words from the first panel of the strip.  The first word is on the top, the second down the side (the opposite of the intuitive way, of course).

"Doctor Doctor" and "Jesus Jesus" are highest. The highest non-repeating combo is "Pirate Captain", followed by "Robot Monkey" and "Penguin Zombie".

I generated this using a Google API variable search tool developed by Eviltwin on #xkcd (I’m not linking to the tool so as to avoid potentially getting his API key revoked) Edit: He now offers the source and says it can be run without a key, and is happy to let people use it until Google does something. Not only is the API helpful in making these kinds of charts (which I spend more time doing than I care to admit), it also gives a roughly accurate count of results—in contrast to the Google search page.

The “number of results” count that Google gives when you search is clearly fabricated.  This is clear for a few reasons.  When Google says this:

Excellent!  That's a lot!

You can tell that it’s wrong first by scrolling to the end of the results.  When you get to page 32, it suddenly becomes:

I learned in AP Calculus that 316 is WAY less than 190,000.

This doesn’t usually matter, since nobody looks much past the first few pages of results, but it’s annoying if you’re trying to use the number of results as a measure of something.  When I was making the Numbers comic, I didn’t use the API, and there were a few graphs I had to throw out, crop, or put on an unnecessary log scale; otherwise, Google’s clumsy number-fudging made the graphs look nonsensical.  I can’t find a good example now (perhaps they’ve smoothed it out a bit) but when searching for things like “I was born in <X>”, the results for successive years would look something like this:

… 150 : 200 : 250 : 300 : 350 : 117,000 : 450 : 251,000 : 500 : 550 : 312,000 : 320,000 : 390,000 : 425,000 …

If you scrolled to the last page for each, you’d find that the smaller counts were roughly accurate, but the counts in the hundreds of thousands had no more actual results than their neighbors.

I suppose it’s remotely possible that these numbers are correct, there are no years with an in-between number of hits, and for some reason they’re just not showing you most of the promised pages when you try to flip through them.  But making this even less likely is the fact that the search API (which is apparently being deprecated and replaced right now) doesn’t return these bad numbers—it gives reasonable-looking results which seem to be roughly consistent with the number you come up with by navigating to the last search page.

So it really looks like there’s a certain threshold of result volume beyond which Google apparently says “screw it” and throws out a gigantic number.  I imagine this is probably due to incompetence rather than intentional deception; I’m sure it’s hard to generate pages quickly from many sources, and maybe for searches with a lot of results they don’t have time to get it all synced up.  So they fudge the numbers.  The fact that this makes it look like they have way more results than they do is presumably just an unintended bonus.

All in all, this isn’t a big deal and I don’t think there’s anything particularly evil about it. It does make it hard to use Google hits as an accurate gauge of anything, but I suppose if you’re trying to study something by seriously analyzing Google result counts, you have bigger methodological problems to worry about.

Edit: As Mankoff observes, it looks like the API sometimes *underestimates* the number of results, too.  For example, it still reports 0 results for “narwhal zombie”, when a regular search shows quite a few. Now, I notice, scrolling through them, that most either have some minor character/text in between the two words, or are related to the comic I just posted.  But at least one seems to date back to last year.