Trochee Chart

Randall 2011-02-04 144 Comments

Here’s something I made as I drew today’s comic. It’s a chart of Google results for “X Y” (in quotes) where X and Y are words from the first panel of the strip. The first word is on the top, the second down the side (the opposite of the intuitive way, of course).

"Doctor Doctor" and "Jesus Jesus" are highest. The highest non-repeating combo is "Pirate Captain", followed by "Robot Monkey" and "Penguin Zombie".

I generated this using a Google API variable search tool developed by Eviltwin on #xkcd (I’m not linking to the tool so as to avoid potentially getting his API key revoked) Edit: He now offers the source and says it can be run without a key, and is happy to let people use it until Google does something. Not only is the API helpful in making these kinds of charts (which I spend more time doing than I care to admit), it also gives a roughly accurate count of results—in contrast to the Google search page.

The “number of results” count that Google gives when you search is clearly fabricated. This is clear for a few reasons. When Google says this:

Excellent! That's a lot!

You can tell that it’s wrong first by scrolling to the end of the results. When you get to page 32, it suddenly becomes:

I learned in AP Calculus that 316 is WAY less than 190,000.

This doesn’t usually matter, since nobody looks much past the first few pages of results, but it’s annoying if you’re trying to use the number of results as a measure of something. When I was making the Numbers comic, I didn’t use the API, and there were a few graphs I had to throw out, crop, or put on an unnecessary log scale; otherwise, Google’s clumsy number-fudging made the graphs look nonsensical. I can’t find a good example now (perhaps they’ve smoothed it out a bit) but when searching for things like “I was born in <X>”, the results for successive years would look something like this:

… 150 : 200 : 250 : 300 : 350 : 117,000 : 450 : 251,000 : 500 : 550 : 312,000 : 320,000 : 390,000 : 425,000 …

If you scrolled to the last page for each, you’d find that the smaller counts were roughly accurate, but the counts in the hundreds of thousands had no more actual results than their neighbors.

I suppose it’s remotely possible that these numbers are correct, there are no years with an in-between number of hits, and for some reason they’re just not showing you most of the promised pages when you try to flip through them. But making this even less likely is the fact that the search API (which is apparently being deprecated and replaced right now) doesn’t return these bad numbers—it gives reasonable-looking results which seem to be roughly consistent with the number you come up with by navigating to the last search page.

So it really looks like there’s a certain threshold of result volume beyond which Google apparently says “screw it” and throws out a gigantic number. I imagine this is probably due to incompetence rather than intentional deception; I’m sure it’s hard to generate pages quickly from many sources, and maybe for searches with a lot of results they don’t have time to get it all synced up. So they fudge the numbers. The fact that this makes it look like they have way more results than they do is presumably just an unintended bonus.

All in all, this isn’t a big deal and I don’t think there’s anything particularly evil about it. It does make it hard to use Google hits as an accurate gauge of anything, but I suppose if you’re trying to study something by seriously analyzing Google result counts, you have bigger methodological problems to worry about.

Edit: As Mankoff observes, it looks like the API sometimes *underestimates* the number of results, too. For example, it still reports 0 results for “narwhal zombie”, when a regular search shows quite a few. Now, I notice, scrolling through them, that most either have some minor character/text in between the two words, or are related to the comic I just posted. But at least one seems to date back to last year.

144 replies on “Trochee Chart”

jaclaz says:

2011-06-05 at 11:03 am

JFYI, this particular aspect of Google results was analyzed some time ago, here is an FGA on the matter:
http://homepage.ntlworld.com./jonathan.deboynepollard/FGA/google-result-counts-are-a-meaningless-metric.html

jaclaz

LikeLike
Jonathan says:

2011-06-22 at 8:19 pm

Back in the good old days, they weren’t bot-checking so you could just load a thousand searches a minute and parse the html to find the number of results. It was pretty accurate but there was some kind of anomaly between 1,000-1,750 hits.

Click to access Googlewhack_Poster.pdf

LikeLike
zzz says:

2011-07-05 at 2:20 pm

@gormster

Just substitute the trochees into We Didn’t Start the Fire.

Works like a charm.

LikeLike
Steve says:

2011-07-11 at 9:50 am

Randall,

I’m not sure how to send you an email, so I figured this would be the best way to do it. Anyways, StumbleUpon put this link in my inbox a few minutes ago and I thought you of all people would enjoy it the most. Take a look!

ZOMBIE-PROOF HOUSE:

http://www.stumbleupon.com/su/9S9D73/all-that-is-interesting.com/post/4956385434/the-first-zombie-proof-house

-Steve

LikeLike
mac online says:

2011-07-26 at 2:12 am

I suppose it’s remotely possible that these numbers are correct, there are no years with an in-between number of hits!Very interesting!

LikeLike
IELTS Writing says:

2011-08-02 at 12:00 pm

we love this ielts resources. the color show me how to get higher band score in IELTS test

LikeLike
condominium says:

2011-08-13 at 10:33 am

A great article written with great hard work…i must say….a great work of your which shows…I like your site its quite informative and i would like to come here again as i get some time from my studies. And I will share it with my friends.

LikeLike
porno says:

2011-08-30 at 11:18 am

yes it’s amk

guzel site

LikeLike
Pingback: Google, Apple, Microsoft: Who’s bad? | Erbloggtes
porno says:

2011-09-05 at 10:26 am

A great article written with great hard work…i must say….a great work of your which shows…I like your site its quite informative and i would like to come here again as i get some time from my studies. And I will share it with my friends.good

LikeLike
Chris says:

2011-10-19 at 5:36 am

hahah the coloring is funny to me… anyone else feel the same way??

LikeLike
Nioxin Recharging Complex says:

2011-11-13 at 9:28 pm

I love the chart, makes sense too.

LikeLike
Office 2010 says:

2011-12-14 at 4:00 am

Wow!I really loved reading your blog. It was very well written and simple to understand. Unlike additional blogs I have read.

LikeLike
Pingback: This is how quickly we give up on resolutions
دليل مواقع says:

2012-03-21 at 5:45 pm

In my personal opinion is subject follow-up and congratulate you for the beautiful and hot topics at the same time
Only accept greetings

LikeLike
Coach Outlet Canada says:

2012-03-29 at 9:38 pm

Thank you very much.I like this site.

LikeLike
Pingback: Attack of the narwhal zombies! » Aaron at the Internet Archive
Solet says:

2012-09-12 at 10:30 am

the number may be accurate. whenever you get to the last page it always tells you some were omitted because of duplication/similarity/etc

LikeLike
Kadin sisme says:

2013-01-11 at 11:35 am

the number may be accurate. whenever you get to the last page it always tells you some were omitted because of duplication

LikeLike
Pingback: The infographics of xkcd | halfblog.net
sohbet says:

2013-07-11 at 1:40 pm

ay is, perhaps, the best western movie about Budhism.
He is born, lives and dies everyday, over and over untill ove

LikeLike
کرکره برقی says:

2013-09-26 at 9:17 am

The fact that this makes it look like they have way more results than they do is presumably just an unintended bonus.

LikeLike
خرید سگ says:

2013-12-07 at 2:12 am

It was an interesting discussion. Starting this debate so thank you., I had not known about the plot. Much I thank you for sharing this.

LikeLike
Terapia Bowen Craiova says:

2014-02-23 at 3:03 pm

Really loved reading your blog. It was very well written and simple to understand. Unlike additional blogs I have read.

LikeLike
Casete Luminoase Bucuresti says:

2014-02-24 at 5:10 am

Many thanks for this one. I’ve been looking for this sort of information everywhere.

LikeLike
Firme Luminoase Bucuresti says:

2014-02-24 at 7:55 am

Fantastic work and very well-written article. I will recommend reading it to all my friends!

LikeLike
çizgi film says:

2014-02-26 at 3:55 am

Thank you very much.I like this site.

LikeLike
Corturi Nunti says:

2014-03-19 at 8:04 pm

Good post. I learn something totally new and challenging on websites I stumbleupon every day. It will always be useful to read articles from other writers and use something from other sites

LikeLike
siemensservisibursa.net says:

2014-03-29 at 7:38 am

we love this ielts resources

LikeLike
nacele says:

2014-05-23 at 5:13 pm

Thank you for presenting your points and providing this information. I have learned something about this topic.

LikeLike
grape cu discuri says:

2014-05-29 at 1:53 pm

This excellent article assisted me very much! Saved your site, extremely great categories everywhere that I read here! I really appreciate the info.

LikeLike
cabinete stomatologice craiova says:

2014-06-26 at 8:59 am

Grazie per presentare i vostri punti e fornire queste informazioni. Ho imparato qualcosa su questo argomento.

LikeLike
pizza delivery dublin 4 says:

2014-06-27 at 4:06 am

Muchas gracias. Me gusta este sitio.

LikeLike
pensiuni ocnele mari says:

2014-06-27 at 5:52 pm

Il vostro sito è molto utile.

LikeLike
parchet craiova says:

2014-06-27 at 7:09 pm

Superbly written article, if only all bloggers offered the same content as you, the internet would be a far better place.

LikeLike
درب اتوماتیک کرکره ای says:

2014-07-13 at 4:30 am

Thank you for presenting your points

LikeLike
paypal hosting says:

2014-08-29 at 5:01 pm

Thank you for a wonderful blog. Your suggestions are so helpful. I’ve been journaling for quite some time and it helps me to write down my prayer for the day and helps me focus.

LikeLike
audit craiova says:

2014-09-18 at 8:45 am

Thank you for taking the time to publish this information very useful!

LikeLike
Can be Found on This Site says:

2014-09-24 at 1:45 pm

This is great post, i really inspired from your blogging skills

LikeLike
florarie craiova says:

2014-09-26 at 4:35 pm

Su sitio web es muy bueno y este es un gran artículo inspirador.

LikeLike
studio videochat bucuresti says:

2014-09-26 at 4:40 pm

Your post was good and the information that you giving your post that was really cool. I like it very much. Thanks for posting this.

LikeLike
usi interior craiova says:

2014-10-10 at 3:50 pm

Your web log isn’t only useful but it is additionally really creative.

LikeLike
aparate cafea says:

2014-10-14 at 1:35 pm

This is the most amazing article that I’ve ever read in my whole life. I am really impressed by the work you’ve done on this topic. I want to learn more about this topic.

LikeLike
tapet printat says:

2014-11-12 at 4:04 pm

Tu entrada era bueno y la información que le da su puesto que era realmente genial. Me gusta mucho. Gracias por publicar esto.

LikeLike

Comments are closed.

Share this:

Related

144 replies on “Trochee Chart”