Web Searching of CAD content

David — Wed, 17 Jan 2007 09:41:28 +0000

Recently Scott Sheppard from Autodesk blogged about Docupoint Discovery, an intranet/Internet search engine for AutoCAD files. It works by parsing binary AutoCAD files and indexing their textual and numerical content. Whilst it is not super intelligent (i.e. it doesn't make spacial assumptions based on the actual models submitted) it does help Autodesk workgroup users find information faster. The upshot of the Docupoint Discovery system is that you don't actually need a copy of AutoCAD, it reads the binary files into the index and if you need a quick preview it uses Autodesk's own DWF viewer technology to show it to you (now that is really helpful).

A similar set of functionality can be provided if you are an ArchiCAD Mac user by harnessing OSX's Spotlight functionality and the freely available ArchiCAD Spotlight plug-in. With this plug-in installed OSX can index all your ArchiCAD files (alongside all the other relevant project data like PDF files). Then with the next version of OSX (Leopard) or the open-source Weblight server you can search your Spotlight index on the intranet/Internet via a web browser. It does not offer the DWF-based preview option of Docupoint Discovery but for a zero-cost, minimum configuration solution it is not too shabby.

Personally I can envision these services operating very successfully if a services orientated architecture approach was taken and appropriate plug-ins were provided within CAD software. For example it would be ideal if AutoCAD was capable of generating its own DWF preview and submiting this plus the indexable data over a secure connection to a remote, hosted search service. Such a capability would remove the need for dedicated search infrastructure within the architecture office and provide a level of universal searching across CAD formats that was previously achieved only with text data such as HTML, PDF and Office documents.

A diagram of a service orientated approach to CAD search indexing (click to enlarge)

These search services could be operated by CAD vendors, external parties (i.e. Flikr style) or using a traditional internally maintained server model. Search results would be exposed via a Google style interface with 3D preview (in DWF/PDF depending on what is submitted) or via RSS using the OpenSearch standard. Unlike conventional Internet search engines content would be pushed to the search indexes rather than automatically gathered by automated Web spiders. This system would give content owners more control over what data files are submitted to the index and the frequency at which this process would occur. From a technical perspective this is very similar to how successful, real-time Web search engines like Technorati behave today.

Such an architecture would have a three significant benefits. Firstly users would feel comfortable using these search services as their valuable intellectual property are not being handled (or stored) by a third party. This is because all that would be exposed to the third party is the search index details and a relatively worthless (in I.P terms) 3D preview. In theory industry adoption would be relatively high given this level of data security, the low (if not free) operating costs and the relative ease by which such a service could be utilised. This widespread adoption would enable teams of people in different offices or companies to immediately gain the benefits of searchable CAD data without having to invest in expensive, internal infrastructure. Finally by exposing search results in a standard format (such as OpenSearch) AEC professionals would be able to cross the vendor barriers currently enforced when it comes to managing files of different types. Whilst a Bentley user may not be able to open the Autodesk file they would still gain important insight into what it was about (via the indexed metadata) and what it was like (via the 3D preview).

As these search services would not host the actual data file responsibilty for granting access would be in the hands of the file owner. In close working relationships the data file maybe located on a shared network drive whilst in remote situations (physically or professionally) access would be requested from the file owner via email or via some form of file transfer medium (instant messaging, FTP, etc).

Searching across websites with OpenSearch

David — Mon, 18 Dec 2006 01:14:36 +0000

Providing search services that span a number of disparate websites is a challenging problem that in the past has been left to the big-boys such as Google. However Amazon's OpenSearch RSS format is changing this reality and providing a means for effective multiple website search to be deployed at low cost by small development teams.

Background

Most organisations comprise of a number of different interest groups (I like to think of them as factions) and when it comes to external and internal websites it proves far more efficient to let these groups build and maintain their own independent sites rather than combine them under a single unified banner and management structure. The reasons for this are pragmatic rather than technical, in fact from a purely technical perspective it is far easier to concentrate on building a single massive website as this means one architecture, one management group and a homogonised user base.

In reality the idea of a single website that rules them all is almost impossible to realise. Because the stakes are so high and the number of participants so diverse making decisions is a slow and politically painful process. Such a minefield can be avoided if the management team is lead by a strong willed, 'my way or the highway' personality who can make clear decisions that allow the technical team to produce the best possible solution. Unfortunately the chances of being involved with a management team that has a strong leader who is also capable of juggling Web, technical and business needs gets even lower, and without such skills the project maybe even more danger than one without such a personality.

Consequently it is generally preferable to allow the individual groups to develop their own websites independently of each other thereby spreading risk and distributing political friction. From a user perspective such a strategy is beneficial because it allows solutions to be tailored to the specific business needs and technical capability of the group rather than satisfying the imagined needs of the 'average' user. Unfortunately from a technical perspective this distributed architecture not only dilutes resources but it also raises a number of questions around identity management and areas where organisation resources should be unified such as search. With good planning the problem of identity can be resolved through intelligent use of directory systems, for example eDirectory and its associated technologies. Searching multiple, perhaps dramatically different websites however is a different problem altogether.

Providing dynamic search without a Google-like infrastructure

The traditional approach of solving cross-party search is to use an independent search index such as a Google appliance or in-house created solution. The primary drawback of such an approach is that this independent system requires as much ongoing maintenance as the websites it is intended to service. From the perspective of the user search results from a cached search index can also leave out the latest content or be out of date. From a productivity and user satisfaction perspective this can almost be worse than having no search functionality at all. A more effective solution that provides up to date results without the need of an independent search system is provided by Amazon's OpenSearch format and RSS aggregation as illustrated by the diagram below:

Instead of providing a separate search architecture OpenSearch is intended to be applied within existing search mechanisms present in the individual websites. Rather than presenting traditional HTML formatted search results an OpenSearch enabled search engine returns results as a specially formatted RSS feed. RSS is a simple XML standard for sharing information about website content that is rapidly gaining widespread industry acceptance. Because it is formatted in a manner computers can read RSS allows content on a website to be processed and acted upon automatically without human intervention. The upside of this is that multiple OpenSearch RSS feeds from disparate organisation websites can be aggregated (retrieved and combined) by the computer and presented to the user as a single set of up to date and relevant search results. This standardised process negates the need for a dedicated, organisation-wide search index as each of the websites in question can perform this task easily themselves. The added benefit of such a model is that results can be tailored to the needs of the user group in question rather than being returned in the one size fits all format an independent search index provides.

Implementing OpenSearch aggregation

Considering its benefits implementing OpenSearch style searching is getting easier by the day. For example the Drupal content management system now offers two open source modules that provide both OpenSearch search results and multi-site OpenSearch aggregation. Consequently this functionality can be implemented in minutes instead of the days needed in a traditional independent search index approach at a fraction of the cost.

Whilst Amazon's OpenSearch may not be the ultimate incarnation of the RSS-based search concept it is mature and signals the way ahead for the technology. Considering the rate of development in the field the next few years will definitely be very exciting when it comes to simple, syndicated search that is accessible to the masses. If the concept proves successful it may herald the downfall of the global search index as the best way of finding things on the Internet.

stressfree - opensearch

Web Searching of CAD content

Searching across websites with OpenSearch

Background

Providing dynamic search without a Google-like infrastructure

Implementing OpenSearch aggregation