ROBOT9000 and #xkcd-signal: Attacking Noise in Chat

Edit 2: Oh God Oh God 4chan has Robot9000. soup /r9k/. Have fun with the bot and do one last barrel roll for me.

Edit: As expected, with the huge flood of new traffic after this post went up, the channel is full of new folks coming in and playing with the bot. This is unavoidable and expected for these first few days, and ROBOT9000 is actually controlling the noise pretty well. Still, #xkcd-signal is a social channel — if you just want to play games with the moderator/concept, please use #moderator-sandbox. Thanks!

#xkcd has had about 250 chatters these days. Large communities suck. This problem is hard to solve, but we’ve come up with a fun attack on it — enforced originality (in a very narrow sense). My friend zigdon and I have put together an auto-moderation system in an experimental channel, #xkcd-signal, and it seems to work well, so we invite you all to take part.

When social communities grow past a certain point (Dunbar’s Number?), they start to suck. Be they sororities or IRC channels, there’s a point where they get big enough that nobody knows everybody anymore. The community becomes overwhelmed with noise from various small cliques and floods of obnoxious people and the signal-to-noise ratio eventually drops to near-zero — no signal, just noise. This has happened to every channel I’ve been on that started small and slowly got big.

There are a couple of standard ways to deal with this, and each one has problems. Here’s an outline of the major approaches (skip down if you just want to read about ROBOT9000):

  • Strict entry requirements: This is the secret club/sorority approach. You can vet every new person before they’re allowed to speak. This sucks. It reminds me of Feynman’s comment on resigning the National Academy of Sciences — he said that he saw no point in belonging to an organization that spent most of its time deciding who to let in. The problems are apparent during sorority rush week on college campuses. Not only is the question of who does the vetting (and how) difficult, but the drama reaches horrifying levels as bitter counter-cliques rise up and do battle.
  • Moderators: This is the approach IRC channels and forums usually take. You designate a few ‘good’ people who can deal with noise as it happens, by muting, kicking, banning, or editing content as need be. There are a couple problems here — the circle of moderators has to grow with the community. It eventually becomes fairly large, with complicated dynamics of its own, and the process of choosing moderators leads to sorority/NSF-esque drama and general obnoxiousness. I don’t like the elitism that inevitably develops, and prefer more egalitarian systems.
  • Running peer-moderation: When it’s possible, this is a good approach. It’s used to great effect on comment threads, with Slashdot pioneering the whole thing and sites like reddit stripping it down to an effective core. But it doesn’t work very well for live time-dependent things like IRC channels.
  • Splinter communities: This has happened on most IRC channels I’ve been on — small invite-only side channels sprout up with particular focuses. Often, the older core members of the community go off to create their own high-signal channel, which is generally kept quiet. But this is limited — it lacks the open mixing of the internet that often makes online communities work.

I was trying to decide what made a channel consistently enjoyable. A common factor in my favorite hangouts seemed to be a focus on original and unpredictable content on each line. It didn’t necessarily need to be useful, just interesting. I started trying to think of ways to encourage this.

And then I had an idea — what if you were only allowed to say sentences that had never been said before, ever? A bot with access to the full channel logs could kick you out when you repeated something that had already been said. There would be no “all your base are belong to us”, no “lol”, no “asl”, no “there are no girls on the internet”. No “I know rite”, no “hi everyone”, no “morning sucks.” Just thoughtful, full sentences.

There are a few obvious questions/objections, and I think each of them has been answered by experiment:

Q: Can’t you just tack a random set of letters on the end to ensure your line is unique (or misspell things, add in gibberish, etc)?

A: Of course. The moderator has plenty of holes if you’re acting in bad faith. But if you’re doing that, why are you in the channel at all? Folks who persist in doing this anyway earn (like any spammers) a prompt manual ban.

Q: Won’t it get harder and harder to chat as lines get “used up”?

A: You underestimate the number of possible sentences. We’ve been working off two years (2 million) lines of logs, and it’s not very hard at all — I expect the channel will be able to run for at least a decade before it becomes a problem, and probably long past that.

Q: What about common parts of conversation, like “yeah” and the like?

A: Surprisingly, it doesn’t seem to be a huge problem. In some cases, they can be done without entirely, and in others, you’re just forced to elaborate a little bit on what you’re agreeing with and why.

I talked it over with zigdon, a Perl guru, and he coded it up. We called the project ROBOT9000 (the most generic, unoriginal name for a bot that we could think of). Then we started a sister channel to #xkcd and put the bot in it. #xkcd-signal has been running for the last couple weeks (using the last two years of #xkcd logs) with about 60 reasonably active chatters, and it’s working beautifully — good, solid chat between relative strangers, with very little noise. (We’ll see how it handles the influx of people as we announce the experiment to the wider net.)

In zig’s implementation, the moderator bot mutes (-v) chatters for a period after every violation. The mute time starts at two seconds and quadruples with each subsequent violation, so you have five or six tries to get the hang of it. Your mute-time decays by half every six hours (we’re still tweaking the parameters). When looking for matches, the bot ignores punctuation, case, and nicks.

The big problem we ran into, actually, was meta-discussion overwhelming the channel. Every new person wanted to speculate about the rules and their effect, and every violation was followed by a long postmortem. At first, we had a scoreboard showing who was the best at talking without violation, but this quickly turned into a competition, destroying actual chat. When we took down the scoreboard and banished meta-discussion of the channel to #meta-discussion, everything worked out nicely. (And, of course, for discussion of the concept of #meta-discussion people had to go to #meta-meta-discussion, and for chat about how silly that whole idea was, we created #meta-meta-meta-discussion …)

You’re welcome to come hang out with us. The moderator bot is running in #xkcd-signal on Foonetic (irc.foonetic.net or irc.xkcd.com). But again, it’s a social channel; take discussion of the concept to #meta-discussion.

If you’d like to run this bot in your own channel, zig has published an initial version of the code here:

http://irc.peeron.com/xkcd/ROBOT9000.html (Perl bot, SQL skeleton, Changelog)

1,377 replies on “ROBOT9000 and #xkcd-signal: Attacking Noise in Chat”

  1. Hey xkcd, please close comments in this post. You’ll get lots of insightful observations from Mr. Anonymous B. Tard that may not interest you.

    Like

  2. 4chan listened, and I for one think it’s great. They’re going to stress-test the hell out of the system, so we’ll know within a month if it’s viable.
    Carry on, Anonymous.

    Like

  3. BYAAAAAAAHAHHHHHHHH

    Shit, I can’t believed that this robot thing actually works.

    Like

  4. This brings me to my next point, this system becomes a target. It provides incredibly unoriginal people the opportunity to make the original seem unoriginal by spamming boards and channels with content that may come up due to general behavior of the ?community?.i want to tell your my first-hand experience.I have found my best friend and my true Love. I have found my match on this site, Sugarmommymatch.com. It was an instant connection for rich women seeking handsome and charming men. Good luck to all who are looking for true and real love, I am blessed.

    Like

  5. This is the most twisted fuckin’ autistic shit I’ve seen in a long while, and I read 4chan threads where dorks jack each other off while blubbering in self-pity about how they can’t get a girl. What the fuck, your damn fool IRC channel — an IRC channel devoted to a webcomic that recycles the worst of repetitious horse-beating nerd culture — is so irretrievably damaged by a “hi all” that you have a little computer program to punt anyone who, horror of horrors, says something that was said 23 months ago?

    What kind of basement-bound poopsocking dweeb is so scared of normal human interaction that he writes a script — probably in fuckin’ Perl or Ruby on Rails or some such bull-fuckin’-shit — to slap someone on the wrist because they didn’t say something 100% original? Originality ain’t a thang. Student-style OMG SO RANDOM ‘humour’ like ‘ninja pirate monkey cheese!!! !! velociraptorz lol i luv dinosaur comix XKCD & jerkcity’ won’t show up in any automatic filter but it’s still BOOSHIT I wouldn’t want in any Goddamned channel of mine.

    Also, the guy who wrote this ‘blag’ (lollariously original neologism there, d00d) is totally contradicting himself. He says that the bot works and splinter channels do not. Then he says that the first thing that happened when he started the bot was that the channel exploded into a firestorm of metachannel navel gazing. Effective! And what did he do to solve the problem the bot created? He moved meta-discussion into a splinter channel, which worked. But I thought he said splinter channels don’t work…?

    In sum, this is a classic case of a geek making transparent excuses to half-ass a technical solution to a social problem so he can make people kow-tow to his latent Asperger’s. He’s not even honest about it, and all y’all wang-polishers who’re donating saliva in this thread are just beta dorks who get your rocks off watching a webcomic e-lebrity acting out in public. You dicksucks make me sick.

    Like

  6. enjoy your endless stream of unoriginal bullshit regurgitated over and over on all your forums and channels, then!

    Like

  7. enjoy your endless stream of unoriginal bullshit regurgitated over and over on all your forums and channels, then!

    Like

  8. Wait i have a question… why Nine-Thousand?? Why not 9001? i mean, at least 9001 is over 9000…. and then you could have a funny joke included in your awesome program! ^^;; just a suggestion….

    Like

  9. Afraid I tl;dr’d this much past about the 20th comment (i know, i know, hypocritical) … but there may not need to be quite so restrictive with it to get the effect you desire. Never, ever having something repeated is a nice experiment, but it’s not exactly a way to get much useful signal. Often “signal” can itself be something repeated amongst purely random “noise” (a song, for example …)

    And how can we effectively pass on knowledge to anyone who isn’t immediately present, without repeating ourselves, or repeating said by someone who was around at a previous time but has now left, before some new people have turned up?

    Rather than going back two years, setting the filter at, say, an hour may be sufficient to kill off the mindless meme-ing, without driving people absolutely stir crazy. There’s every chance you may be trying to say something perfectly useful to somebody (e.g. asking what time you should meet up) and just happen to clash with a previous example of yourself or someone else asking the same thing. One too many of those in a row (getting unlucky) and you’re muted for several hours. Not good.

    Course, you can always go to another channel or use private chat, I suppose … so long as requesting this with the people you’re talking to doesn’t count as a repetition from the previous time it happened also.

    Setting the bot just to kick for certain annoying things might work as well… kind of like an auto-mod but more wide-ranging than the ones usually used to filter out viagra spam, or endless short-term repetition (e.g. once a second) of the same post. Like you could kickban someone for saying Desu more than once every 15 seconds or whatever, or any instance of an idiot meme that was past it’s sell-by date. But you’d have to take votes and get a certain representative sample of the community to agree to each entry, which itself could get out of hand. Tricky. Just like all human communication.

    Like

  10. I suppose it works the same way as the old phrase you mother taught you:
    “If you don’t have anything interesting to say, don’t say anything at all.”

    Like

  11. I don’t think Dunbar’s number is the limit of the size of a community. It’s not very likely that the people in the community knows only people in the community, is it? 🙂

    Like

  12. Now all we have to do is wait for someone to write a mutating spam script which changes what is posted after every post.

    Wouldn’t be that hard,

    define $text “a”
    post $text
    define $text “$text + a”

    And Repeat. Mind I know nothing of coding. (HTML doesn’t count in my opinion)

    Also, on 4chan, Dunbar’s number is equal to 0.

    Like

  13. The beautiful thing about forcing originality and creativity is that it basically culls the herd down to those who actually physically are creative and original – it totally eliminates the factor of people parroting bullshit catchphrases and one-note jokes only other non-original one-note parrots find amusing.

    I’ve been considering a similar concept to this for the longest time – in fact, while I was still in high school I designed a rudimentary version of it in C++/Oracle to analyse all incoming TCP/IP data… worked well to minimise incoming bullshit sent by classmates, but the CPU cycles it stole were horrifying. This one however seems to be, as they say, “the duck’s nuts”, so kudos for pushing to develop such a tool. I’ll definitely be playing around with it in the near future, I think.

    Like

  14. Pingback: Jack
  15. Pingback: best stocks to buy
  16. Pingback: maps fun
  17. it sounds like an interesting idea, although it could perhaps work better with a score keeping system.. Commonly said phrases lose you big score/ new phrases gain you some score. If you get into negative, you cannot speak but slowly regain score over time until you are back in the positive.

    This might mean people can actually afford to bring in new discussions on old topics without being muted on the way

    Like

  18. i cant seem to join foonetic. im also kind of an irc-newby who lives in australia. any suggestions?

    Like

  19. In a moment of insight I came up with the salacious counterpart to angular momentum girl!

    “The Nerds and the Bees”

    When the accrued potential energy between us manifests as kinetic energy we will dwarf the maximum energy output of the large hadron collider.

    Like

  20. I vote for an even more restrictive version, which would query google with each entry as an exact phrase.

    Like

  21. Hey, great idea!

    One thing I immediately thought of when I read this was the size of the log files. Also, the lookup time will increase over time, obviously,

    What you should do is hash the sentence, and store that instead. CPU is almost never a bottleneck versus hard drive read/write times.

    Like

  22. if you a ridah come over here another day another queer trying to get steered so suffer before you get tared off the scale and off the chain i spit the flows thats off the brain im kneedeep in the pain and im goin insane dont make me BRING the pain so SING my praise before i get my gs to slit you three ways from hearth and malaise

    Like

Comments are closed.