pubsubhubbub, rss cloud and real time web updates

*Disclaimer: I paraphrase a lot of things from articles I have linked to. Any mistaken assumptions and errors are strictly mine. For the more technically and visually inclined follow the links. Also longish post.

Continuing to be the last person to comment on news after world+dog, lets talk about pubsubhubbub and the Push Button way of web.

Anil Dash has a good series of articles (1,2,3) talking about real time update technologies he calls PushButton. There are some really simple diagrams to explain the concept. It is actually a simple concept and has been floating around for ages, lacking the infrastructure and interest to move it forward.

Lets take an RSS feed. You download an RSS client or use an online client, subscribe to a list of feeds, rely on your client to periodically poll all the feeds and read the updates. A lot of criticism towards RSS has been directed towards the needless effort wasted on the polling and periodic updating which slows down things. A lot of polling by multiple clients is a waste of machine cycles. You potentially slow down other processes and burn a hole in the ozone. The other issue is updating - by having your client update periodically, you tend to loose out on immediate updates. If you set the update interval low, you end up using more machine cycles. This is pretty much the existing scenario for RSS subscriptions.

So whats the way out? Anil and a bunch of others are betting on something being dubbed PushButton Technology. The premise is this - your feed publisher indicates to your client that it can be obtained via an intermediary, generally called a 'hub'. Your client then goes out and registers with the hub (or multiple hubs) to receive updates. Here's where things are different - instead of polling the publisher periodically, the publisher updates the hub at any point of time an update is made and the hub PUSHes out the updates to the client. The interaction is almost instantaneous (especially for an online client.See very,very,very crude experiment here). There are some intricacies to this protocol, which I will touch upon later. But the end result is almost real time updates.

So who is implementing the three pieces of the puzzle - the publisher which can indicate a hub, the actual hub and a client which understands the hub? And how? There are actually quite a few resources. And most of them make use of - wait for it - RSS/ Atom. Another vindication of the truth that RSS is far from dead. Google is backing one of the major (and buzzworthy) approaches called pubsubhubbub. It relies on the 'link' element's 'rel' attribute in the RSS name space. Google is trying to standardize the protocol between the three elements, also called pubsubhubbub (guess why?:/). pubsubhubbub attempts to close pitfalls like hubs not being updated, in which case the client can directly call the publisher.

Another approach is RSS cloud. It has slightly different terminology and dependencies (the hub is called a cloud, it uses the cloud element in RSS namespace, the cloud is hosted on Amazons cloud not on Googles and so on) but offers the same real time capabilities. A special mention to Dave Winer who is an originator of this idea - he was one of few that had proposed this way back in 2001.

So what is exciting here? Real time updates of course! Well, almost instantaneous in any case. Not only that but also the advantage of avoiding "lame polling". And remember - RSS is just a syndication format. It does not have to be only for your Twitter updates (they have their own protocol for instant updates which they don't share btw), your rants on why other people write bad posts or all the important information about Bhajji slapping papparazi. It could be for (facebook?) game moves, live blogging, cricket scores (aargh... rediff page reloads) - any time of real time messaging. Decentralizing the message passing might actually be the answer to preventing twitter-like services from being overwhelmed - remember you can link to multiple hubs and potentially cycle through them (based on client location?). Plus the protocols are open and rely on open format XML formats - that gets extra love. Google has gone the extra mile by ensuring that Google Reader can understand hubs and also shared item feeds now come with a link to the experimental hub. Feedburner is also almost ready, though this feed does not have the hub linked.

That is not to say its all smooth sailing. Eric Smith correctly points out problems and notes how this is after all only a gigantic hack. There are also teething issues with making new hubs. And competing ways to work tend to fragment early adopters. Another real problem is lack of RSS  adoption - with little understanding among the mass about the RSS format and little inclination towards experimentation, things might take some time to roll out. There is also, in my mind, very little appeal to real-time updates - a person can follow only so many people at the same time. Whatever we piggyback on top of existing technologies, acceptance by the existing tech community will probably be very slow.

Plus my office is probably going to ban the hub when it becomes popular enough. So, excuse please.