Wednesday, May 28, 2008

ActiveResource vs SimpleMapper

I wrote SimpleMapper a while back as a replacement for ActiveResource, for two reasons:
1) I needed to work with more than one data source.
2) I needed to use OAuth.

The biggest need was for an ActiveResource with OAuth. I found that it's virtually impossible with the current structure of ActiveResource. (I did get it working, after adding a "callbacks" plugin to ActiveResource and rewriting ARes's core request method.)

So I sat down one morning to write SimpleMapper. The name is rather random -- I like DataMapper, and I wanted to stress that this was just a simple version of a DataMapper / ActiveRecord solution. It took an hour or so to write from scratch the parts of ActiveResource, and I was already happier with my solution than with ARes. The simple structure is what makes it so powerful and extensible.

The structure: I started with a simple Base class, that simply provides methods that bind Connection Adapters with Format Adapters. For example, the Base.get(*args) method simply calls the Connection Adapter's get method with the arguments provided, then extracts individual objects from the returned content by using the Format adapter. This way, the Base is completely independent of the type of Connection and Format being used. I started out by writing an HTTP adapter and an XML adapter, and later I wrote a JSON adapter. Since all the parsing code is in the adapter, it's extremely simple to write new format adapters. You could easily write a yaml, or even csv or plaintext adapter. Or SQL. Write a SQL connection adapter, if you wish (I haven't tackled that yet).

There's one more piece worth mentioning -- ActiveResource does NOT follow the REST way. How? ARes relies on IDs, not URLs. (See Building Web Services the REST Way, down in the section entitled "REST Web Services Characteristics," for authoritative information.) The REST way uses URLs, not IDs. IDs are not reliable. (In thinking about the server side, you should implement a kind of global unique identifier [guid] system for your URLs instead of IDs.)

So SimpleMapper relies only on URLs. The way this works: You call a finder url with the options you need to find some object, a url that returns either a collection or a single object, and go from their. Your API *should* include a URL in the object's properties as the identifier of the object, never an ID. Then to update or delete that resource, it will access it directly via the URL given in the resource's identifier property.

To me, SimpleMapper wins, hands down. If you want to run an ActiveResource-like API, go ahead and clone the http adapter and change things to use an ID to construct-a-URL rather than reading the object's identifier as a URL.

Feed me an API: He who does not work, shall not eat.

On the road to openness, there is one principle that should not be taken lightly: He who does not work, shall not eat. When you build an application, if you want to feed off someone else's data, the right thing to do is to build an API for your application first.
Take note, Facebook, Plaxo, Google, Microsoft, Yahoo, and every other proprietary system out there! Feed others before you just feed off all their hard work.

A good read for background understanding, try The Value Chain for Information, by Elias Bizannes.

The root of the value chain for information is access to the information in the first place. If you have exclusive access to a piece of information, then you have a monopoly on the entire chain of value for that information. With the dawn of the web, a couple of guys named Larry Page and Sergey Brin (Google) jumped on this fact with a mission to "organize the world's information and make it universally accessible and useful." Good move. They actually took the most logical step. First, we have data. Second, we have to organize it. Third and following are numerous other ways to make use and value out of the data. The reason we like Google is that they [say they aim to] hold the monopoly only on the first step, the step very few of us would want to try to tackle in the first place -- and then they freely open the data to the rest of us to go at adding value in our own innovative ways.

Probably each one of us, when building a web app, have thought about the monopoly of owning the data. But I implore you to study a little closer that value chain of data, and realize that you cannot cover all ways to add value to even the data that you own. You must open it up to others to fill in gaps that you hadn't even thought of.

The ONLY way you can really get by these days is to open up your data. Remember, your app should do one thing and do it well. I've said this before. That one thing should cover a concrete portion of the value chain of information. And you should be providing the data in such a way that it is easy for others to add value to it in whatever way they choose.

Now, if your data is private user data, you'll have to use OAuth to open it up securely only with the user's permission. If all, or even some, of your data is public, then provide it publicly. Its up to you how you want to do it, but remember: Feed others before you feed yourself.

Monday, May 5, 2008

True Data Portability

I've been thinking about data portability and the phantom "online integrated experience" for a long time now. I know it's possible, and it's within our reach. It's the integration of countless web apps or online services, with each other with no previous knowledge of each other that will make data truly portable, and give us a truly integrated online experience. Sites that are agnostic of each other should be able to connect to allow one to access a user's data on the other and make more sense of it, or do something with it to make it more valuable, and update the user's data on the original site. It would be awesome if I could make a simple web service and integrate it with whichever calendaring site I use (google calendar, other?), whichever to-do list site I use (remember-the-milk, tadalist?), and so on. You could use whichever other sites you use, too, and I wouldn't even need to program any specific API's into my application. This is possible. They have a name for that -- standards. Standards make things work in compliance with one another. It's the same principle that ushered in the industrial revolution.

We have standards for many things -- some of the most recent to become excited about are OpenID and OAuth. These are a form of standards, called specifications. OpenID allows anyone to operate their own authentication -- allow a unique and globally identifiable entity (your openid provider) to agree that you are who you say you are. (The fallacy is that it should often be an entity trusted by both parties, but it ends up being an entity that is proven to be trusted only by the user.) OAuth defines a procedure by which any two sites can talk to each other with a user on hold in order to share data owned by that user, with the user's permission. These are very good and necessary improvements. OpenID is nice, providing a way I can identify myself in the same way to all the services I use; and OAuth is nice because it allows sites to use and modify my data on another site.

But full integration is still a step ahead of us. Every site still has its own language. Each programmer has to make up his own language for his app or service to speak (it's called an API). An API (Application Programming Interface) is "a specification, that defines the means and language of communication with an application." Almost all web services have an API responsive to the same means of communication: HTTP. But the language of communication is still different: each web developer ends up making up his own.

What we need is common terminology, common concepts that define an object (a package of data, a record, etc). REST shows an aspect of this, inferring that any object should respond to (at least) four verbs: GET, POST, PUT, DELETE. In the same way, but going a step further, there are common concepts that we can all apply to common data objects:

  • Object: Package of custom data. Read, Update, perhaps Append.

  • Listing: several instances of Object in their abbreviated form. Read (with filtering options).

  • Collection: several instances of Object in their complete form. Read (with filtering options), Append.
37Signals has debuted a very important piece with ActiveResource -- calling convention to dictate common terminology. Using "include" to mean you want the object an associated object embedded is one example, using parameters the same name as model properties in order to filter your results is another convention. What we need is more of this, but re-thinking it from the concepts at the foundation of data.

I've started a google group for this initiative. I can't say how far it will go, but I'd like to get some key people into it to discuss these things. Especially people who hold some of the key implementations (like ActiveResource) would be wonderful.
See the group: http://groups.google.com/group/cloudapi -- JOIN if you have ideas to share!