Monday, May 5, 2008

True Data Portability

I've been thinking about data portability and the phantom "online integrated experience" for a long time now. I know it's possible, and it's within our reach. It's the integration of countless web apps or online services, with each other with no previous knowledge of each other that will make data truly portable, and give us a truly integrated online experience. Sites that are agnostic of each other should be able to connect to allow one to access a user's data on the other and make more sense of it, or do something with it to make it more valuable, and update the user's data on the original site. It would be awesome if I could make a simple web service and integrate it with whichever calendaring site I use (google calendar, other?), whichever to-do list site I use (remember-the-milk, tadalist?), and so on. You could use whichever other sites you use, too, and I wouldn't even need to program any specific API's into my application. This is possible. They have a name for that -- standards. Standards make things work in compliance with one another. It's the same principle that ushered in the industrial revolution.

We have standards for many things -- some of the most recent to become excited about are OpenID and OAuth. These are a form of standards, called specifications. OpenID allows anyone to operate their own authentication -- allow a unique and globally identifiable entity (your openid provider) to agree that you are who you say you are. (The fallacy is that it should often be an entity trusted by both parties, but it ends up being an entity that is proven to be trusted only by the user.) OAuth defines a procedure by which any two sites can talk to each other with a user on hold in order to share data owned by that user, with the user's permission. These are very good and necessary improvements. OpenID is nice, providing a way I can identify myself in the same way to all the services I use; and OAuth is nice because it allows sites to use and modify my data on another site.

But full integration is still a step ahead of us. Every site still has its own language. Each programmer has to make up his own language for his app or service to speak (it's called an API). An API (Application Programming Interface) is "a specification, that defines the means and language of communication with an application." Almost all web services have an API responsive to the same means of communication: HTTP. But the language of communication is still different: each web developer ends up making up his own.

What we need is common terminology, common concepts that define an object (a package of data, a record, etc). REST shows an aspect of this, inferring that any object should respond to (at least) four verbs: GET, POST, PUT, DELETE. In the same way, but going a step further, there are common concepts that we can all apply to common data objects:

  • Object: Package of custom data. Read, Update, perhaps Append.

  • Listing: several instances of Object in their abbreviated form. Read (with filtering options).

  • Collection: several instances of Object in their complete form. Read (with filtering options), Append.
37Signals has debuted a very important piece with ActiveResource -- calling convention to dictate common terminology. Using "include" to mean you want the object an associated object embedded is one example, using parameters the same name as model properties in order to filter your results is another convention. What we need is more of this, but re-thinking it from the concepts at the foundation of data.

I've started a google group for this initiative. I can't say how far it will go, but I'd like to get some key people into it to discuss these things. Especially people who hold some of the key implementations (like ActiveResource) would be wonderful.
See the group: http://groups.google.com/group/cloudapi -- JOIN if you have ideas to share!

Monday, April 28, 2008

Three Web Trends to keep your eye on

Trend #1: Do one thing and do it well.

In developing web apps, translate: "Replace feature-bloat with KISS and Integration." I learned this lesson in two or more areas of life separately and realized it's a global principle. The same principle was learned in the Industrial Revolution (throughout the 1800s) -- each worker can excel at making one simple part with a high level of quality and consistency, and the product can be put together with these many parts. Of course leaders of people know this same principle -- when each person has their tasks well-defined and knows how to accomplish them efficiently and excellently, there is little limit left as to what can be accomplished.

Why do I call this a trend? You've heard of the "KISS" concept -- Keep It Simple, Stupid. The more I use the internet (and I'm sure it's not just me), the more I find it's the simple services that are the most useful. The ones that simply do one thing for me, but they do it well, keep it succinct, stay out of the way and let me do my thing, not trying to do more (or make me do more) than I want. Gmail -- have you noticed that of all things it could be, it's remained ONLY an email service? All of the energy and effort has gone into making it easier to use, more intuitive to operate, and staying out of your way. Any extra tools (contact groups, etc) are kept in the toolbox, not in your hand.

This is also a psychological thing. People will come to your web app and think, what is this service? Notice: They don't think, "What can it do for me?", they think, "What need do I [already] have that this service can fill?" And then, "Is it worth the trouble?" This last question I don't think I've heard enough in a developer's evaluation of a web app. Some services are killer, the interface beautiful, the monetization simple and invisible, even the solution simple; but if it's not worth the trouble of using the site, the service is worthless. How can we navigate this psychological narrows? Make your app do one thing and do it well. Advertise the one thing it does. Feature-lacking? Open an API and connect with other sites. Add functionality by making separate and distinct web services yourself, and connecting the two. (If you own them both, of course you have advantages for integration, but do keep them separate and allow them to perform their distinct service to the world independently.)

Trend #2: Users are beginning to take Integration for Granted.

In practical terms: "Integration is Key. Do it."

In this wonderful world of online applications, this is something that started out as cool and somewhat helpful and has grown to something that is necessary to forward movement. The world of online applications must start to link arms in order to grow. Many of these simple applications (as following Trend #1) are so good on their own, but we all know they'd be even better together. Thus the word "Mashup" was birthed. Mashups combine two or more services into one to provide a more valuable service. That's great, but it's not the integration I'm talking about. I'm talking about in-app integration -- like access to your gmail contacts from gcalendar and gdocs. There's a really useful tool called "Presdo" that recently was publicized, that can export to several calendars. Real integration would mean any user can voluntarily "add" presdo data to their calendar app -- no matter what calendaring app they use.

But there's a problem. Integration is hard. Yes, most of my work in programming has been focused on integration of systems, migrating or communicating data, etc. I still say integration is hard. Why? Because 1) everybody manages a different type of dataset (one manages a calendar, the other manages a todo list), and 2) everybody organizes their data differently. It would be nice if everyone could structure their data the same way, follow global naming conventions, and provide the same API; but people also use their data differently, and some structures are more efficient for certain purposes. What I'm saying is, everyone's underlying data structures will always be different, and integrating systems on this level is hard; BUT the last idea, providing the same API -- or at least API language -- is not impossible. We already have standards such as XML and JSON that allow us to at least speak the same language. The same language for data. In the database world, we have SQL as a sort of standard -- it's a language that has been devised that very elegantly handles requesting specific data. But in the web world we have no standard. Communicating app-to-app is a new thing, and this new "language" will soon become necessary. It becomes evident with the somewhat recent birthing of "DataPortability" that we are on this path -- the prerequisite step of making our data usable to others in a secure manner is being finalized, and the step of standardizing how we communicate about this data is the next. Everyone who creates an API has to make up their own language, or set of request url's and query-strings, to communicate what data is being requested. We need a standard.

I'll post about this point further soon, and I think I'll entitle it, "He who will not work shall not eat."

Trend #3: Integration Bloat.

Wait a minute, you say. Isn't that contradictory to what you just ranted long and hard about? No, what I mean by "Integration Bloat" is what you get if you misuse the above principle. Users are beginning to take Integration for Granted. Think about this: What if Subway tried to convince you that they were better because they have mayonnaise? That would be ridiculous -- of course they have mayo. It's the same for web apps. We're coming to the point where I really shouldn't have to choose my favorite application based on who they connect with. I should be able to expect that they will integrate with other sites. Plaxo is an example of a failure here -- I don't know why they exist anymore (as in, what their focus is) because they accent so heavily on all their integrations. If they showed me one good reason to use their site, only ONE THING they do better than anybody else, I'd be likely to use them. But their integration bloat has gotten proud of itself and starts to get in the way.

(These trends are all related.)

Just a note on the topic of trends in general: these trends are all related. One gives birth to another, and the existence of one causes others to exist as well. Think just a little further and you will see connections between these trends, their parents, and the postmodern way of thinking that we live in. The good lessons from our culture we can apply to programming. The bad lessons we can learn from and fix. And if you haven't noticed it already, the postmodern mind is probably the first whose way of thinking has been primarily influenced by the paths of technological development. Things like wikipedia have both come from and contributed to the postmodernist's belief that the majority is right (and not because they choose the right thing but because they are the majority). The blog has both resulted from and fed back into the idea that we all have the right to express ourselves to the public at-large.


Please feel free to post comments, longer comments, or link to a response post on your own blog.

Monday, March 31, 2008

HashMagic: a utility hash extensions gem

Rubyforge: http://rubyforge.org/projects/hash-magic/
Documentation: http://hash-magic.rubyforge.org/
Install:

gem install hash_magic
HashMagic, first of all, doesn't touch Hash except for two methods. Like the FormattedString gem, it just provides a way to get a "different kind" of the original object.

HashMagic provides two special kinds of hashes: OrderedHash and SlashedHash. OrderedHash is just what it says - a hash that keeps its order. You can specify a static order, if you want. SlashedHash provides a way to expand and collapse a multi-level hash structure into a flat single-level hash, with keys joined by a slash ('/') character.

There are times when this becomes very handy -- like when I want to weed out certain keys at various levels in the hash structure, only if they're there, without having to make a very complex method to walk around the structure and not complain when things don't exist; or when I want a hash structure of ordered hashes -- yes, they play together nicely. You can order a SlashedHash, even setting the order of keys in deep levels of the structure.

This was necessary for generating xml for quickbooks qbxml -- the elements must be ordered correctly, in a multi-level structure. This was also necessary in the SimpleSync gem, for mapping attributes -- the top-level attribute 'email' in one source (quickbooks) maps to a second-level attribute on the other side, inside an association: "<emails><personal>...email...</personal></emails>". All I have to do to accomplish that is 'email' => 'emails/personal'.

You wanted to see code? Look at the HashMagic Specs!

Friday, March 28, 2008

SimpleSync: a synchronizer for ANYTHING

 
Step One: Set up a few models -- use SimpleMapper, DataMapper, ActiveRecord, ActiveResource, my Quickbooks gem, or any other ORM you can get to respond correctly to the few essential methods: obj = Model.get(finder_options={}); obj = Model.new(attributes={}); obj.save; obj.delete.

Step Two: Follow the five simple steps to synchronize your data source:

1) Create a syncer object

@sync = SimpleSync::Syncer.new( <last_sync_time> )
2) Add some sources
@bug_source = @sync.add_source(<source_model>, <finder_options>, <common_id_attribute_name (example: id)>)
@juice_source = @sync.add_source(<source_model>, <finder_options>, <common_id_attribute_name (example: bug_id)>)
3) Define for each source how to grab updated records to sync
@bug_source.new_records {|start_time| <block code that returns record objects> }
@bug_source.changed_records {|start_time| <code that returns record objects> }
@bug_source.deleted_records {|start_time| <code that returns record objects> }
@juice_source.new_records {|start_time| <code that returns record objects> }
@juice_source.changed_records {|start_time| <code that returns record objects> }
@juice_source.deleted_records {|start_time| <code that returns record objects> }
4) Map the attribute-conversions for all attributes you want to synchronize
@sync.mappings[@bug_source => @juice_source] = {
'first_name' => 'FirstName',
'last_name' => 'LastName',
'company_name' => 'JuiceCompanyName',
'birthdate' => 'Birthday',
}
# And usually just an inversion of the first hash, but sometimes not:
@sync.mappings[@juice_source => @bug_source] = @sync.mappings[@bug_source => @juice_source].invert
5) Sync!!
@sync.sync!
6) Wait for a while, then sync again!
@sync.sync!

The purpose of SimpleSync is to handle all of the logic of synchronizing and leaving all the specifics of the data to your custom models and attribute mappings. The above code was taken from a working script and slightly modified just to better illustrate the purposes of each piece.

Install the gem:
gem install simplesync

Now get this: I have a fully-functional synchronizer between Quickbooks Customers and my addressbook website (not yet public), using 3 of my gems: SimpleSync, SimpleMapper, and Quickbooks -- in a script of only 150 lines. Beat that, I dare you. :)