Monday, May 5, 2008

True Data Portability

I've been thinking about data portability and the phantom "online integrated experience" for a long time now. I know it's possible, and it's within our reach. It's the integration of countless web apps or online services, with each other with no previous knowledge of each other that will make data truly portable, and give us a truly integrated online experience. Sites that are agnostic of each other should be able to connect to allow one to access a user's data on the other and make more sense of it, or do something with it to make it more valuable, and update the user's data on the original site. It would be awesome if I could make a simple web service and integrate it with whichever calendaring site I use (google calendar, other?), whichever to-do list site I use (remember-the-milk, tadalist?), and so on. You could use whichever other sites you use, too, and I wouldn't even need to program any specific API's into my application. This is possible. They have a name for that -- standards. Standards make things work in compliance with one another. It's the same principle that ushered in the industrial revolution.

We have standards for many things -- some of the most recent to become excited about are OpenID and OAuth. These are a form of standards, called specifications. OpenID allows anyone to operate their own authentication -- allow a unique and globally identifiable entity (your openid provider) to agree that you are who you say you are. (The fallacy is that it should often be an entity trusted by both parties, but it ends up being an entity that is proven to be trusted only by the user.) OAuth defines a procedure by which any two sites can talk to each other with a user on hold in order to share data owned by that user, with the user's permission. These are very good and necessary improvements. OpenID is nice, providing a way I can identify myself in the same way to all the services I use; and OAuth is nice because it allows sites to use and modify my data on another site.

But full integration is still a step ahead of us. Every site still has its own language. Each programmer has to make up his own language for his app or service to speak (it's called an API). An API (Application Programming Interface) is "a specification, that defines the means and language of communication with an application." Almost all web services have an API responsive to the same means of communication: HTTP. But the language of communication is still different: each web developer ends up making up his own.

What we need is common terminology, common concepts that define an object (a package of data, a record, etc). REST shows an aspect of this, inferring that any object should respond to (at least) four verbs: GET, POST, PUT, DELETE. In the same way, but going a step further, there are common concepts that we can all apply to common data objects:

  • Object: Package of custom data. Read, Update, perhaps Append.

  • Listing: several instances of Object in their abbreviated form. Read (with filtering options).

  • Collection: several instances of Object in their complete form. Read (with filtering options), Append.
37Signals has debuted a very important piece with ActiveResource -- calling convention to dictate common terminology. Using "include" to mean you want the object an associated object embedded is one example, using parameters the same name as model properties in order to filter your results is another convention. What we need is more of this, but re-thinking it from the concepts at the foundation of data.

I've started a google group for this initiative. I can't say how far it will go, but I'd like to get some key people into it to discuss these things. Especially people who hold some of the key implementations (like ActiveResource) would be wonderful.
See the group: http://groups.google.com/group/cloudapi -- JOIN if you have ideas to share!

0 comments: