ROBOT9000 and #xkcd-signal: Attacking Noise in Chat

Edit 2: Oh God Oh God 4chan has Robot9000. soup /r9k/. Have fun with the bot and do one last barrel roll for me.

Edit: As expected, with the huge flood of new traffic after this post went up, the channel is full of new folks coming in and playing with the bot. This is unavoidable and expected for these first few days, and ROBOT9000 is actually controlling the noise pretty well. Still, #xkcd-signal is a social channel — if you just want to play games with the moderator/concept, please use #moderator-sandbox. Thanks!

#xkcd has had about 250 chatters these days. Large communities suck. This problem is hard to solve, but we’ve come up with a fun attack on it — enforced originality (in a very narrow sense). My friend zigdon and I have put together an auto-moderation system in an experimental channel, #xkcd-signal, and it seems to work well, so we invite you all to take part.

When social communities grow past a certain point (Dunbar’s Number?), they start to suck. Be they sororities or IRC channels, there’s a point where they get big enough that nobody knows everybody anymore. The community becomes overwhelmed with noise from various small cliques and floods of obnoxious people and the signal-to-noise ratio eventually drops to near-zero — no signal, just noise. This has happened to every channel I’ve been on that started small and slowly got big.

There are a couple of standard ways to deal with this, and each one has problems. Here’s an outline of the major approaches (skip down if you just want to read about ROBOT9000):

  • Strict entry requirements: This is the secret club/sorority approach. You can vet every new person before they’re allowed to speak. This sucks. It reminds me of Feynman’s comment on resigning the National Academy of Sciences — he said that he saw no point in belonging to an organization that spent most of its time deciding who to let in. The problems are apparent during sorority rush week on college campuses. Not only is the question of who does the vetting (and how) difficult, but the drama reaches horrifying levels as bitter counter-cliques rise up and do battle.
  • Moderators: This is the approach IRC channels and forums usually take. You designate a few ‘good’ people who can deal with noise as it happens, by muting, kicking, banning, or editing content as need be. There are a couple problems here — the circle of moderators has to grow with the community. It eventually becomes fairly large, with complicated dynamics of its own, and the process of choosing moderators leads to sorority/NSF-esque drama and general obnoxiousness. I don’t like the elitism that inevitably develops, and prefer more egalitarian systems.
  • Running peer-moderation: When it’s possible, this is a good approach. It’s used to great effect on comment threads, with Slashdot pioneering the whole thing and sites like reddit stripping it down to an effective core. But it doesn’t work very well for live time-dependent things like IRC channels.
  • Splinter communities: This has happened on most IRC channels I’ve been on — small invite-only side channels sprout up with particular focuses. Often, the older core members of the community go off to create their own high-signal channel, which is generally kept quiet. But this is limited — it lacks the open mixing of the internet that often makes online communities work.

I was trying to decide what made a channel consistently enjoyable. A common factor in my favorite hangouts seemed to be a focus on original and unpredictable content on each line. It didn’t necessarily need to be useful, just interesting. I started trying to think of ways to encourage this.

And then I had an idea — what if you were only allowed to say sentences that had never been said before, ever? A bot with access to the full channel logs could kick you out when you repeated something that had already been said. There would be no “all your base are belong to us”, no “lol”, no “asl”, no “there are no girls on the internet”. No “I know rite”, no “hi everyone”, no “morning sucks.” Just thoughtful, full sentences.

There are a few obvious questions/objections, and I think each of them has been answered by experiment:

Q: Can’t you just tack a random set of letters on the end to ensure your line is unique (or misspell things, add in gibberish, etc)?

A: Of course. The moderator has plenty of holes if you’re acting in bad faith. But if you’re doing that, why are you in the channel at all? Folks who persist in doing this anyway earn (like any spammers) a prompt manual ban.

Q: Won’t it get harder and harder to chat as lines get “used up”?

A: You underestimate the number of possible sentences. We’ve been working off two years (2 million) lines of logs, and it’s not very hard at all — I expect the channel will be able to run for at least a decade before it becomes a problem, and probably long past that.

Q: What about common parts of conversation, like “yeah” and the like?

A: Surprisingly, it doesn’t seem to be a huge problem. In some cases, they can be done without entirely, and in others, you’re just forced to elaborate a little bit on what you’re agreeing with and why.

I talked it over with zigdon, a Perl guru, and he coded it up. We called the project ROBOT9000 (the most generic, unoriginal name for a bot that we could think of). Then we started a sister channel to #xkcd and put the bot in it. #xkcd-signal has been running for the last couple weeks (using the last two years of #xkcd logs) with about 60 reasonably active chatters, and it’s working beautifully — good, solid chat between relative strangers, with very little noise. (We’ll see how it handles the influx of people as we announce the experiment to the wider net.)

In zig’s implementation, the moderator bot mutes (-v) chatters for a period after every violation. The mute time starts at two seconds and quadruples with each subsequent violation, so you have five or six tries to get the hang of it. Your mute-time decays by half every six hours (we’re still tweaking the parameters). When looking for matches, the bot ignores punctuation, case, and nicks.

The big problem we ran into, actually, was meta-discussion overwhelming the channel. Every new person wanted to speculate about the rules and their effect, and every violation was followed by a long postmortem. At first, we had a scoreboard showing who was the best at talking without violation, but this quickly turned into a competition, destroying actual chat. When we took down the scoreboard and banished meta-discussion of the channel to #meta-discussion, everything worked out nicely. (And, of course, for discussion of the concept of #meta-discussion people had to go to #meta-meta-discussion, and for chat about how silly that whole idea was, we created #meta-meta-meta-discussion …)

You’re welcome to come hang out with us. The moderator bot is running in #xkcd-signal on Foonetic (irc.foonetic.net or irc.xkcd.com). But again, it’s a social channel; take discussion of the concept to #meta-discussion.

If you’d like to run this bot in your own channel, zig has published an initial version of the code here:

http://irc.peeron.com/xkcd/ROBOT9000.html (Perl bot, SQL skeleton, Changelog)

1,377 replies on “ROBOT9000 and #xkcd-signal: Attacking Noise in Chat”

  1. If you wanted to keep forums small with a large userbase you could split in into many fora, randomly assign each ip adress to one of them, if someone is a mature enough poster they get promoted to the next one, imature enough and they get demoted. Say you could see any thread you have visited and it will stay with you for the rest of your fora going but the only new threads you will see are the ones that are posted in your new forum.

    you can follow all links and PMS work crossfora.

    Like

  2. Hey, nice words. I dunno if a comment already passed about it, but you can create localized sister channels like #xkcd-fr #xkcd-es … this creates new communities where some grows some other stagnates.

    Now looking for the irc server.. 🙂
    mike

    Like

  3. I just wanted to throw in a comment that you guys are amazing. I’ll no longer have to deal with the dregs of internet society. You’ve saved perhaps the most vulnerable part of my community.

    Like

  4. While it’s a neat idea, I’d suggest what you’re trying to do is very much against the spirit of IRC: or, rather, that this is a problem very endemic to IRC and there’s simply no real way to combat it. The problem isn’t just signal-to-noise: it’s also volume-to-IRC-window-length (since the moment things scroll off screen they lose immense value) and readability/connectivity (since with 250 people, no matter what they’re saying to each other, you will struggle to pick out one specific conversation) which are both major problems that occur in any large channel regardless of how intelligent the discussion in there is.

    Ultimately, splinter factions are what channels were made for. Focus in one channel getting too big? Split it. Let people decide where they want to go. Sure, maybe have one room that’s the catch-all, but then people should expect that the signal ratio will be low there and in the meantime a notice or public directory on a site (since new users won’t know /list) to the effect of ‘if you’d like specific intelligent discussion on topic X, consider heading to #channelX…’.

    tl;dr but I have to admit that’s a neat bit of work. Though personally I’d just add random nouns to the badwords filter and leave it up to people’s imagination as to what all those ‘s are hiding.

    Like

  5. They’re changing sex at Buckingham palace,
    Christopher Robin is now called Alice.

    Like

  6. Concerning the current comic: Of course no supernatural powers exist, but there might be physical effects not yet known or understood blah blah blah… er, sorry. What I want to say is, I think it’s possible that, for example, telepathy is a real effect based on physical laws. There HAVE been scientific confirmations of telepathy (search for “Ganzfeld experiment”). The quality of these experiments and the results are controversial, though.

    Like

  7. Man, that bot is on STEROIDS. Or something. You shouldn’t make the ban time add up so fast! That’s just cruel. Also, there was an interesting idea on the chat, wich goes, the more often something is said, the more time you get banned. So repeating something like “i still don?t see why such a state of affairs logically necessitates the nihilation of metafisics ” would ban you for ten seconds, and saying “lol” bans you for a few hours. ¿good idea or not?

    Like

  8. In the interest of ruining good concepts and playing devil’s advocate, I suggest a channel #xkcd-noise where you get kicked for being original and unpredictable. However, each new sentence is added to the log, so eventually every sentence will be usable. Eventually.

    Like

  9. It does bring an interesting atmosphere to conversations, I’ll give it that. Maybe time will tell if it brings much value to communication on the channel.

    I’m sure Edward Tufte would love making a visual representation of all of this.

    Like

  10. Had a thought while contemplating how my workplace might be able to use this. What if the bot read back the context from the log to an offender? Then if you asked a question that had already been asked it might provide you with an answer.

    Like

  11. @shMerker:
    The ubiquitous Purl bot in the semi-official Perl channel of freenode has a feature similar to that. You can teach it associations, and it auto-responds if anyone posts a question in the form of “what is x”.

    A typical way to use it would be:
    user: purl, orange is both a color and a fruit
    purl: ok user.
    anotheruser: what is orange?
    purl: It has been said that orange is both a color and a fruit.

    It also has a bunch of other features. I’m thinking it could be fairly straightforward to make a new plugin for the bot that implements the rules of the bot on here, possibly with exceptions provided for statements that trigger another handler, such that you could ask a “what is” style question without triggering a mute or kick.

    Like

  12. Interesting concept and I am sure you two have already thought of this, but a fresh pair of eyes could never hurt:
    In a cataloging sense, how long does this bot retain the used sentences? And how easily can one remove the entires it stores? Basically, what I am asking is how would the bot handle someone who might jump in and post common phrases. You said that the phrases excluded punctuation, so would common word groupings trigger the bot, or does it examine the entire post from front to end?
    [Person 1] The rocks are in the box, baby!
    [Person 2] Wow, ‘the rocks are in the box, baby’ is going to be the coolest new internet catch phrase ever!
    [Person 1] I know, that’s why the rocks are in the box!See where I’m going with it? I don’t know…maybe I’m just looking at it from the stance of spam management. You want to kill off certain phrases, but you can take the whole sentence because it’s too easy to manipulate the fluff. On the other hand, you can just take certain word groupings, because then a person could get in and post sentence fragments just for the purpose of tripping-up the bot. Even if the person is banned for it, does this attack on the dictionary stay, or is the list of acceptable phrase managed on a daily basis…essentially just reverting one back to the position of being a Mod. Which would, inevitably, still have to expand with the size of the community. Granted it’s no longer 1/10…but is 1/100 that much better?

    Like

  13. Oooh. ROBOT9000 is my new second-favourite IRC bot, after qwok (who I think hit elvis-mode and is too fat to run anymore, sadly).

    ROBOT9000 is the un-swibble. Well, almost.

    (thanks for xkcd all these years, since it’s my first post, too)

    Like

  14. I think although the Dunbar’s Number comment on the size of a social population is correct, it is incomplete. Obviously if people only had one main social community (fraternities and such?) You might get to as much as half Dunbar’s number. It seems that most peoples real friends and such would take up at least 2/3 of the persons they can know. For me, nearly all, but I don’t hang out in the chat rooms.

    Like

  15. @Christopher:
    There seems to be some misconception as to what the purpose of this bot is. It isn’t designed (near as I can tell, I had no part in writing or deploying it, just extrapolating from the sidelines) to prevent spammers, or other general bad behavior. In fact it’s trivially easy to game the bot to get around the rules it uses. Ultimately the job of keeping griefers and general undesirables out of the channel falls on the moderators and the ban privileges they get for just that purpose. What this bot is attempting to do however is apply a restriction on the channels content to encourage the good members of the community to continue providing good content, and to provide a semi-automated feedback system to indicate when undesirable content is being provided (in this case defined as something someone else already said). In my first couple posts I outlined some ideas I thought might be improvements to the general idea, so you can go read those if you want, but otherwise the bot seems to do what it’s designed to do rather well.

    Like

  16. We modified ROBOT9000 for our “#bizzaro-chat” to be more forgiving by eliminating a single “*”. The first 2 hours we spent time, ‘sniping’ people by posting commonly said statements, user specific or otherwise, so when they would join later on they would be muted.

    The chat already is very engaging though the metadiscussion can be bad sometimes. But sometimes they pull off amazing ‘snipes’. Frankly you can make a game out of it by looking at the database for which users have accumulated how much time.

    This bot is an amazing achievement and at only 731 lines very compact. The thing is there is some discussion about how slow the bot would get as our log file after only a few hours was approaching a meg.

    Like

  17. for those of you who ever get devoiced from #xkcd-signal for a very long time (>30mins) come along to #signal-fail, where all the failiures go….
    i control it, and will be happy to invite you. it runs on a completely different set of rules, based on the robot9000 rules:
    1. No new stuff, everything you say must have been said before on either #xkcd, #xkcd-signal, #bots, #uk or #signal-fail.
    2. All new stuff will get you devoiced for a random amount of time…

    Like

  18. It occurs to me that an effective, yet perhaps slightly more evil alteration to the mute policy could be implemented. Instead of calculating mute times based on the number of times a particular user reuses a phrase, the time could be multiplied based on how many times a particular phrase has been used. In this scenario, using an uncommon phrase that happens to have been said before will result in a slap on the wrist, yet using a phrase likely to be a known violation will incur a longer punishment. Pity the unfamiliar user who says hello and is immediately muted for seven years….

    Like

  19. Pingback: http www youtube
  20. The first few times I read any xkcd I thought it was a by a woman and I was very excited. Then I found out it was just another dude writing a great internet comic and I was disappointed. Again with this post I immediately thought female, maybe because of the sorority references.

    Like

  21. cool, although I think you should wipe it’s memory after every year just in case some phrazes get used again.

    Like

  22. Looking at this idea from the viewpont of constructing Concept Maps I see it could greatly help prompt new ‘propositions’ even though one or two of the three terms are being reused.
    Are you encountering combinatorial explosions when processing these semantic nets with von Neumann machines?

    Jack Ring

    Like

  23. Ripoff. I have a MediaWiki bot called “ROBO-BOT 2000”.

    (But seriously: this is quite nifty. I’m always interested in new ways of keeping a high signal-to-noise ratio and generally building awesome online communities. Thanks for a happy good work fun time de luxe!

    Like

  24. They’re not elitist, they’re just better than you. Plus which, they wouldn’t have to implement it if you guys weren’t savage fucktards in the channels.

    Like

  25. >When we took down the scoreboard and banished meta-discussion of the channel to #meta-discussion, everything worked out nicely. (And, of course, for discussion of the concept of #meta-discussion people had to go to #meta-meta-discussion, and for chat about how silly that whole idea was, we created #meta-meta-meta-discussion ?)

    You should just create a channel called #xkcd-ascent-into-meta for all meta discussions beyond the first one. Including discussion of #ascent-into-meta and all its related discussions.

    Haha, this is a joke of course. I doubt you get much meta-meta discussion nowadays.

    Like

  26. Dear Robot 9000,
    I do have to say this is an interesting idea but I don’t believe it will work. Implementation of such a bot will provide some relief in the coming days or weeks but the larger communities will begin to feel pressure as the number of common but very useful phrases are taken. Honestly, with this system on a certain unnamed image board with a certain unnamed, hated moderator, I’ve been ‘muted’ many times in a row out of pure ignorance with what has been said. While at first the times are nothing to worry about, they add up quickly and can cause headaches for frequent posters. Sometimes, your best people contributing to a ‘community’ repeat things said before, especially by others.

    This brings me to my next point, this system becomes a target. It provides incredibly unoriginal people the opportunity to make the original seem unoriginal by spamming boards and channels with content that may come up due to general behavior of the ‘community’. This system is too outspoken. In my opinion, a quieter, more subtle system will attract less attention and far fewer trolls. If this system had a way to slowly remove phrases after a somewhat short period of time (I’d say between 2 weeks and 2 months personally), a way to have a certain threshold of unoriginal posts (to signal the end of a topic with Godwin’s law), and a white list of common phrases (with a maximum threshold for each user to prevent the ‘copypasta effect’), it would be vastly improved as even a visible or ‘loud’ system.

    And “I’m sorry Dave, I’m afraid I can’t do that”

    Your pal in wires and numbers,
    HAL_9000

    Like

  27. For those who are alarmed that useful urls might penalise people unjustly, remember, there are a myriad of ways to say “hey, check out http://www.xkcd.com – that webcomic is neato!” or the like.

    For the argument that ROBOT 9000 will destroy awesome memes, I’m sure that a time-delay can be added to when lines are added to the database. A day, maybe a few days, would probably be enough to nurture memes. (This has the added “benefit” of making important, time-sensitive urls safe for the duration of their utility without having to add explanatory text). Lines that are repeated a lot could be made safe in a #meme channel. I suspect that if you want to make a meme garden, this kind of implementation would nurture it rather than destroy it.

    Then again, I’m basing all of this off of sheer speculation – I have not played with the bot in IRC or in code.

    Like

  28. I spilled my grape juice all over the carpet while reading this, and you know what?

    I blame you.

    DO A BARREL ROLL IN HELL.

    Like

  29. A barrel roll is the (recommended|suggested|correct|indicated|pertinent|optimal|best) action (to (take|perform|engage in|attempt|use|utilize) )?(at (this time|(the )?present)|(for (your|this|the|our)( (presumed|tactical|current))?( (circumstances|situation|environment|engagement))?[.]

    Like

Comments are closed.