Web Searching of CAD content

Recently Scott Sheppard from Autodesk blogged about Docupoint Discovery, an intranet/Internet search engine for AutoCAD files. It works by parsing binary AutoCAD files and indexing their textual and numerical content. Whilst it is not super intelligent (i.e. it doesn't make spacial assumptions based on the actual models submitted) it does help Autodesk workgroup users find information faster. The upshot of the Docupoint Discovery system is that you don't actually need a copy of AutoCAD, it reads the binary files into the index and if you need a quick preview it uses Autodesk's own DWF viewer technology to show it to you (now that is really helpful).

A similar set of functionality can be provided if you are an ArchiCAD Mac user by harnessing OSX's Spotlight functionality and the freely available ArchiCAD Spotlight plug-in. With this plug-in installed OSX can index all your ArchiCAD files (alongside all the other relevant project data like PDF files). Then with the next version of OSX (Leopard) or the open-source Weblight server you can search your Spotlight index on the intranet/Internet via a web browser. It does not offer the DWF-based preview option of Docupoint Discovery but for a zero-cost, minimum configuration solution it is not too shabby.

Personally I can envision these services operating very successfully if a services orientated architecture approach was taken and appropriate plug-ins were provided within CAD software. For example it would be ideal if AutoCAD was capable of generating its own DWF preview and submiting this plus the indexable data over a secure connection to a remote, hosted search service. Such a capability would remove the need for dedicated search infrastructure within the architecture office and provide a level of universal searching across CAD formats that was previously achieved only with text data such as HTML, PDF and Office documents.

A diagram of a service orientated approach to CAD search indexing (click to enlarge)

These search services could be operated by CAD vendors, external parties (i.e. Flikr style) or using a traditional internally maintained server model. Search results would be exposed via a Google style interface with 3D preview (in DWF/PDF depending on what is submitted) or via RSS using the OpenSearch standard. Unlike conventional Internet search engines content would be pushed to the search indexes rather than automatically gathered by automated Web spiders. This system would give content owners more control over what data files are submitted to the index and the frequency at which this process would occur. From a technical perspective this is very similar to how successful, real-time Web search engines like Technorati behave today.

Such an architecture would have a three significant benefits. Firstly users would feel comfortable using these search services as their valuable intellectual property are not being handled (or stored) by a third party. This is because all that would be exposed to the third party is the search index details and a relatively worthless (in I.P terms) 3D preview. In theory industry adoption would be relatively high given this level of data security, the low (if not free) operating costs and the relative ease by which such a service could be utilised. This widespread adoption would enable teams of people in different offices or companies to immediately gain the benefits of searchable CAD data without having to invest in expensive, internal infrastructure. Finally by exposing search results in a standard format (such as OpenSearch) AEC professionals would be able to cross the vendor barriers currently enforced when it comes to managing files of different types. Whilst a Bentley user may not be able to open the Autodesk file they would still gain important insight into what it was about (via the indexed metadata) and what it was like (via the 3D preview).

As these search services would not host the actual data file responsibilty for granting access would be in the hands of the file owner. In close working relationships the data file maybe located on a shared network drive whilst in remote situations (physically or professionally) access would be requested from the file owner via email or via some form of file transfer medium (instant messaging, FTP, etc).