Searching across websites with OpenSearch

Providing search services that span a number of disparate websites is a challenging problem that in the past has been left to the big-boys such as Google. However Amazon's OpenSearch RSS format is changing this reality and providing a means for effective multiple website search to be deployed at low cost by small development teams.

Background

Most organisations comprise of a number of different interest groups (I like to think of them as factions) and when it comes to external and internal websites it proves far more efficient to let these groups build and maintain their own independent sites rather than combine them under a single unified banner and management structure. The reasons for this are pragmatic rather than technical, in fact from a purely technical perspective it is far easier to concentrate on building a single massive website as this means one architecture, one management group and a homogonised user base.

In reality the idea of a single website that rules them all is almost impossible to realise. Because the stakes are so high and the number of participants so diverse making decisions is a slow and politically painful process. Such a minefield can be avoided if the management team is lead by a strong willed, 'my way or the highway' personality who can make clear decisions that allow the technical team to produce the best possible solution. Unfortunately the chances of being involved with a management team that has a strong leader who is also capable of juggling Web, technical and business needs gets even lower, and without such skills the project maybe even more danger than one without such a personality.

Consequently it is generally preferable to allow the individual groups to develop their own websites independently of each other thereby spreading risk and distributing political friction. From a user perspective such a strategy is beneficial because it allows solutions to be tailored to the specific business needs and technical capability of the group rather than satisfying the imagined needs of the 'average' user. Unfortunately from a technical perspective this distributed architecture not only dilutes resources but it also raises a number of questions around identity management and areas where organisation resources should be unified such as search. With good planning the problem of identity can be resolved through intelligent use of directory systems, for example eDirectory and its associated technologies. Searching multiple, perhaps dramatically different websites however is a different problem altogether.

Providing dynamic search without a Google-like infrastructure

The traditional approach of solving cross-party search is to use an independent search index such as a Google appliance or in-house created solution. The primary drawback of such an approach is that this independent system requires as much ongoing maintenance as the websites it is intended to service. From the perspective of the user search results from a cached search index can also leave out the latest content or be out of date. From a productivity and user satisfaction perspective this can almost be worse than having no search functionality at all. A more effective solution that provides up to date results without the need of an independent search system is provided by Amazon's OpenSearch format and RSS aggregation as illustrated by the diagram below:

 

 

Instead of providing a separate search architecture OpenSearch is intended to be applied within existing search mechanisms present in the individual websites. Rather than presenting traditional HTML formatted search results an OpenSearch enabled search engine returns results as a specially formatted RSS feed. RSS is a simple XML standard for sharing information about website content that is rapidly gaining widespread industry acceptance. Because it is formatted in a manner computers can read RSS allows content on a website to be processed and acted upon automatically without human intervention. The upside of this is that multiple OpenSearch RSS feeds from disparate organisation websites can be aggregated (retrieved and combined) by the computer and presented to the user as a single set of up to date and relevant search results. This standardised process negates the need for a dedicated, organisation-wide search index as each of the websites in question can perform this task easily themselves. The added benefit of such a model is that results can be tailored to the needs of the user group in question rather than being returned in the one size fits all format an independent search index provides.

Implementing OpenSearch aggregation

Considering its benefits implementing OpenSearch style searching is getting easier by the day. For example the Drupal content management system now offers two open source modules that provide both OpenSearch search results and multi-site OpenSearch aggregation. Consequently this functionality can be implemented in minutes instead of the days needed in a traditional independent search index approach at a fraction of the cost.

Whilst Amazon's OpenSearch may not be the ultimate incarnation of the RSS-based search concept it is mature and signals the way ahead for the technology. Considering the rate of development in the field the next few years will definitely be very exciting when it comes to simple, syndicated search that is accessible to the masses. If the concept proves successful it may herald the downfall of the global search index as the best way of finding things on the Internet.