ROBOT9000 and #xkcd-signal: Attacking Noise in Chat

Edit 2: Oh God Oh God 4chan has Robot9000. soup /r9k/. Have fun with the bot and do one last barrel roll for me.

Edit: As expected, with the huge flood of new traffic after this post went up, the channel is full of new folks coming in and playing with the bot. This is unavoidable and expected for these first few days, and ROBOT9000 is actually controlling the noise pretty well. Still, #xkcd-signal is a social channel — if you just want to play games with the moderator/concept, please use #moderator-sandbox. Thanks!

#xkcd has had about 250 chatters these days. Large communities suck. This problem is hard to solve, but we’ve come up with a fun attack on it — enforced originality (in a very narrow sense). My friend zigdon and I have put together an auto-moderation system in an experimental channel, #xkcd-signal, and it seems to work well, so we invite you all to take part.

When social communities grow past a certain point (Dunbar’s Number?), they start to suck. Be they sororities or IRC channels, there’s a point where they get big enough that nobody knows everybody anymore. The community becomes overwhelmed with noise from various small cliques and floods of obnoxious people and the signal-to-noise ratio eventually drops to near-zero — no signal, just noise. This has happened to every channel I’ve been on that started small and slowly got big.

There are a couple of standard ways to deal with this, and each one has problems. Here’s an outline of the major approaches (skip down if you just want to read about ROBOT9000):

  • Strict entry requirements: This is the secret club/sorority approach. You can vet every new person before they’re allowed to speak. This sucks. It reminds me of Feynman’s comment on resigning the National Academy of Sciences — he said that he saw no point in belonging to an organization that spent most of its time deciding who to let in. The problems are apparent during sorority rush week on college campuses. Not only is the question of who does the vetting (and how) difficult, but the drama reaches horrifying levels as bitter counter-cliques rise up and do battle.
  • Moderators: This is the approach IRC channels and forums usually take. You designate a few ‘good’ people who can deal with noise as it happens, by muting, kicking, banning, or editing content as need be. There are a couple problems here — the circle of moderators has to grow with the community. It eventually becomes fairly large, with complicated dynamics of its own, and the process of choosing moderators leads to sorority/NSF-esque drama and general obnoxiousness. I don’t like the elitism that inevitably develops, and prefer more egalitarian systems.
  • Running peer-moderation: When it’s possible, this is a good approach. It’s used to great effect on comment threads, with Slashdot pioneering the whole thing and sites like reddit stripping it down to an effective core. But it doesn’t work very well for live time-dependent things like IRC channels.
  • Splinter communities: This has happened on most IRC channels I’ve been on — small invite-only side channels sprout up with particular focuses. Often, the older core members of the community go off to create their own high-signal channel, which is generally kept quiet. But this is limited — it lacks the open mixing of the internet that often makes online communities work.

I was trying to decide what made a channel consistently enjoyable. A common factor in my favorite hangouts seemed to be a focus on original and unpredictable content on each line. It didn’t necessarily need to be useful, just interesting. I started trying to think of ways to encourage this.

And then I had an idea — what if you were only allowed to say sentences that had never been said before, ever? A bot with access to the full channel logs could kick you out when you repeated something that had already been said. There would be no “all your base are belong to us”, no “lol”, no “asl”, no “there are no girls on the internet”. No “I know rite”, no “hi everyone”, no “morning sucks.” Just thoughtful, full sentences.

There are a few obvious questions/objections, and I think each of them has been answered by experiment:

Q: Can’t you just tack a random set of letters on the end to ensure your line is unique (or misspell things, add in gibberish, etc)?

A: Of course. The moderator has plenty of holes if you’re acting in bad faith. But if you’re doing that, why are you in the channel at all? Folks who persist in doing this anyway earn (like any spammers) a prompt manual ban.

Q: Won’t it get harder and harder to chat as lines get “used up”?

A: You underestimate the number of possible sentences. We’ve been working off two years (2 million) lines of logs, and it’s not very hard at all — I expect the channel will be able to run for at least a decade before it becomes a problem, and probably long past that.

Q: What about common parts of conversation, like “yeah” and the like?

A: Surprisingly, it doesn’t seem to be a huge problem. In some cases, they can be done without entirely, and in others, you’re just forced to elaborate a little bit on what you’re agreeing with and why.

I talked it over with zigdon, a Perl guru, and he coded it up. We called the project ROBOT9000 (the most generic, unoriginal name for a bot that we could think of). Then we started a sister channel to #xkcd and put the bot in it. #xkcd-signal has been running for the last couple weeks (using the last two years of #xkcd logs) with about 60 reasonably active chatters, and it’s working beautifully — good, solid chat between relative strangers, with very little noise. (We’ll see how it handles the influx of people as we announce the experiment to the wider net.)

In zig’s implementation, the moderator bot mutes (-v) chatters for a period after every violation. The mute time starts at two seconds and quadruples with each subsequent violation, so you have five or six tries to get the hang of it. Your mute-time decays by half every six hours (we’re still tweaking the parameters). When looking for matches, the bot ignores punctuation, case, and nicks.

The big problem we ran into, actually, was meta-discussion overwhelming the channel. Every new person wanted to speculate about the rules and their effect, and every violation was followed by a long postmortem. At first, we had a scoreboard showing who was the best at talking without violation, but this quickly turned into a competition, destroying actual chat. When we took down the scoreboard and banished meta-discussion of the channel to #meta-discussion, everything worked out nicely. (And, of course, for discussion of the concept of #meta-discussion people had to go to #meta-meta-discussion, and for chat about how silly that whole idea was, we created #meta-meta-meta-discussion …)

You’re welcome to come hang out with us. The moderator bot is running in #xkcd-signal on Foonetic (irc.foonetic.net or irc.xkcd.com). But again, it’s a social channel; take discussion of the concept to #meta-discussion.

If you’d like to run this bot in your own channel, zig has published an initial version of the code here:

http://irc.peeron.com/xkcd/ROBOT9000.html (Perl bot, SQL skeleton, Changelog)

1,377 replies on “ROBOT9000 and #xkcd-signal: Attacking Noise in Chat”

  1. “I would love to see this applied to 4chan.”

    It would bring it to its knees sadly 😦

    Like

  2. Pingback: moonbuggy
  3. The bot should also punish bad typing skills and I don’t mean typos, but “netsp33k” and streams of consciousness with no punctuation.

    For example, just check out the comments on YouTube or yahoo answers.
    Here’s what I got from an open YouTube window:

    “wtf happens?”

    “i didnt think it was THAT bad… i thought 2 girls 1 finger was the worst, but it only upset me @ the end… i dont know why it would make you barf unless you had just eaten chocolate ice cream or something”

    “I SO WANNA SEE IT, But I’m fucking afriad xD”

    “what happens in 2 girls 1 finger, i’ve seen 2 girls 1 cup btw… those girls are fu*king sick in there minds man….”

    “obvious u dont kno about searching for qqqqq & clicking im feeling lucky in google..”

    ..

    Like

  4. While affirmative phrases are often more noise than signal, negatory ones are not. While “No, because…” might sometimes be more illuminative, other times a simple “no” is all that needs to be said. It also keeps conversational flow feeling more natural whenyou can simply agree.

    Allowing phrases that have not been used in a longish period of time (maybe six weeks?) might also be useful in keeping it relatively easy for users to keep talking. Yes, there are a grillion possible sentences, but there are only a few thousand words that people use frequently in conversation. We tend to use these words in only a couple dozen different types of sentence. Someone could easily want to use the same sentence, especially if the bot ignores punctuation, but in regards to a different subject or whatever. It could reduce user frustration if they need to say something different from only the last 10,000 things said, rather than everything anyone’s ever said.

    But it’s a fucking brilliant idea! Well done.

    Like

  5. I think the major problem here is URLs. http://xkcd.com will be banned quick as a flash, any other Cool Thing? will be banned within 30 seconds.

    Perhaps you can also bias based on ‘karma’ and previous usages of that sentence? (I.E., if it was used once 6 months ago it should not have the same impact as if it were used 5 times today and 65,535 [ooh, one more and it overflows] times the past month.)

    Like

  6. I have to agree with our furry friend Bob.

    How about making it a bit less extreme? Maybe hang on to log entries for a year instead of forever (bot performance would be better!) and give more leeway to nicks with a good record. Good users should be allowed *some* noise, simply because “noise” often serves a social purpose: greetings, good-nights, etc. Those sorts of messages are not thought-provoking, but they’re important for social animals like humans.

    Like

  7. why not take it one step forward (and loose simplicity): block not only the same exact utterances but all utterances that carry a similar meaning.
    use some levenshtein (edit) distance or check the information gain…

    but seriously – what about ironic utterances and the like?
    or, more theoretically, things like ‘Pierre Menard, Author of The Quixote’ where menard re-created the Quixote in the same exact words but with a totally different meaning?

    Like

  8. Thanks for publishing the code. I think I may need to port this to .NET and pay a visit to my company’s exchange server.

    Like

  9. Can you banish stupid shit like “LOL” (or the even stupider “lol”)?

    Like

  10. Hold the newsreader’s nose squarely, waiter, or friendly milk will countermand your trousers.

    Like

  11. Quite a solid idea, I’d say chaps!

    (trying for an original way to say “COOL!”)

    also – i thought that your captcha was an ad. almost ignored it.

    Like

  12. People here seem to be conjecturing problems and advocating less harsh systems.

    I counterconjecture that they’re overstating their case. But naturally none of us will log on to check.

    Like

  13. can’t connect to irc.xkcd.com!
    irssi says:

    11:39 -!- Irssi: Looking up irc.xkcd.com
    11:39 -!- Irssi: Connecting to irc.xkcd.com [216.93.242.10] port 6667
    11:39 -!- Irssi: Unable to connect server irc.xkcd.com port 6667 [Invalid
    argument]

    any ideas?

    Like

  14. When will you deem this a success or failure? Currently you have created a splinter channel, so unless the splinter grows as large as the main channel currently is, or you move the bot to the main channel, any success could not be pinpointed for certain on the bot.

    It’s an interesting idea, but I’m not sure it’s complete or how well it’ll work. I’ll drop by this evening and see how it’s going. Part of me likes the idea of applying this to URLs, as every now and then someone discovers something two years old that everyone else has seen and announces it loudly as the coolest thing ever, only to our disappointment when we click. On the other hand, if someone linked a really important/interesting/funny thing when it was 5 o’clock in the morning my time, it would be nice if someone could “repost” it at a more sociable hour without getting punished.

    The idea of making it server-side rather than a bot to decrease noise was a nice one by the way, Ben.

    Like

  15. Adarsh, I would think that running something like this on email might be counter-productive. I’d guess that useful messages like “See you at 12” would be blocked, and spam would be waved through (since most spam contains random garbage nowadays).

    Like

  16. Ok the words clearly are
    Areene ROWARTH
    I entered
    Aremme ROVARTH

    Comment Still accepted?

    Like

  17. Getting muted for a few seconds really isn’t the end of the world. If you really want to just reply with a simple yes/no to some statement, it’s only going to be a few seconds before you can speak again. unless of course you’ve been unoriginal a lot, in which case you’re probably not going to be missed.

    Like

  18. While discouraging outright repetition, it may encourage use of snowclones… Hell, it might even become a research tool for identifying them.

    “I love the smell of ROBOT9000 in the morning. It smells like… originality!”

    Like

  19. if you look at that new, fresh, never repeating examples of stupid comments on youtube, heheh i dunno if this would work for everybody, but it sure is an amazing idea…

    Like

  20. I think this is a neat idea eventhough I think depending on everyone to be fair is a bit harsh, and the build-up of frustration may turn away chatters not into the ‘gee whiz’ factor who simply want to have a discussion. The humorous part is by getting rid of those people, you probably create a community that will not exceed Dunbar’s Number.

    That all said, I think it is disgusting how some of the commenters, and most of the ping-backers grovel before Randall. I mean, I am sure he is a great guy, and in a sense he is ‘living the dream’ of geekdom, but I feel like there is a subset of the xkcd population who take it too far. Maybe I am just jaded by geek culture.

    Like

  21. Why not keep track of usernames and timestamps for each line in the line table?
    Also, why not use a honeypot instead of captcha? Pa-lease…

    Like

  22. It’s a great idea. I only wish I could connect to the server. X-Chat seems to want to connect to localhost. Did you set a redirect because of traffic?

    Like

  23. in signal, I declared that we needed a text based game of russian roulette
    each person types a word
    just one
    there’s a chance that they get muted
    that’s amusing
    the less letters it has, the higher the chance, but the higher the points
    it is, but I’m muted for another 2 minutes
    ironically, my word was Roulette

    Like

  24. jptix> user: try #moderator-sandbox
    -moderator:#xkcd-signal- jptix, you have been muted for 1 minute 4 seconds.

    Like

  25. I think a worthy thing to try for this project beyond “gee whiz” if you just desire to get rid of the constant reposting of memes would be to only look back a week or so.

    Like

  26. I wonder if this same technology could be used in other areas of life, perhaps conference calls. šŸ™‚

    Still it is a great idea. Crap I didn’t read everything above this to know if people already told you it is a great idea.

    I wonder if noise is ever helpful.

    Like

Comments are closed.