Porting documentation to XML

Are you thinking of porting your documentation libraryXML_red to one of the XML-based documentation standards –
DITA or DocBook?  Read on, this post is for you!

Who needs XML?

If you work for a start-up or an SME, then you should stick with Microsoft Word and read the rest of this article just for fun – because XML-based documentation is best suited for large organizations with huge documentation libraries.

XML, with its extensible tagging capabilities, is used by large writing shops because it maintains a clear separation between content and presentation.  This is valuable for two reasons:

  • It becomes a no-brainer to modify the look and feel of your documents.  So when the new marketing manager decides to change the logo, or gets excited about that cool 9-point font – it’s no problem – just tweak a couple of XSLT scripts, and publish!
  • R&D departments reduce delays and improve accuracy by letting the developers author the content as they work.  No need to learn Word styles or become an expert on chapter and section numbering, they need only focus on the content.

DITA or DocBook?

I’m going to keep this section short and leave this argument to the XML heavies.  There are two widely-accepted standards today for XML-based documentation:

  • DITA (Darwin Information Typing Architecture).  Promoted by IBM, DITA is a highly-structured  data model for authoring and publishing documentation.
  • Docbook.  Older and less rigid than the DITA standard, Docbook offers a slightly more flexible structure than DITA, permitting a wider range of document types to be authored in XML format.

Enough said, DITA and Docbook are really quite similar.  Here is a quick example from a Docbook article:


See the <emphasis> tag?  The content editor just says, “Emphasize this word.”  And he leaves it to the documentation staff to decide what “emphasis” means – bold, italic, 16-point Arial, whatever.

Porting to XML – so what’s the problem?

So porting from Word or FrameMaker to XML shouldn’t be a problem, right?  Open Word’s “Save as…” dialog box, there’s even an XML option!  Now, open the file you just created in a text editor.  Sure enough, it’s XML, but not in any format that you or I can use or understand.  Believe me, it wouldn’t be a stretch for Microsoft to turn Word into a front end for DITA or Docbook standard XML, but you can be sure that it will never happen.  That’s because we could then take the file and continue to edit it using any of the 7 or 8 available XML-standard editors, many of them being available free of charge.

Performing the Port

So how do we get there from here?  The process involves the use of readily available transformation tools, together with some good old grunt work.  Here’s how it’s done:

(1)    Pre-transformation clean up.  This step is intended to save hours of post-transformation editing by ensuring that the Word or FrameMaker is structured and styled properly.  For example, it’s crucial to check that section headings are actually marked as headings.

(2)    Transformation.  We can now use an off-the-shelf transformation tool to produce DITA or Docbook standard XML.  But we’re not finished yet…

(3)    Document completion.  Or, “The devil is in the details.”  Once the transformation has been checked, we can start with the features not supported by the transformation tool:

  • Re-establishing internal and external links
  • Applying conditional texts
  • Handling of tables and diagrams

Call us to help

So that’s the whole picture!  Need to port your documentation library to XML?  Contact us for a free consultation.