Friday, January 11, 2008

Full XML Parser for Wikis

You may want a primer on parsing XML with Ada. That is an... okay... example, but better than many Ada packages have on the web. The following is a more complete, but more complicated example. In XML/Ada you supply an object to the parser that provides the rules for how to parse the document. It is a pretty easy to work with and extend. This code was some of the earliest I started writing in this whole adventure. I waited to finalize it until now so I could completely figure out how the client code would work with it.

This is the client to the XML/Ada library and makes use of the parser object posted below: (test_wiki_parser.adb)

This uses the Input_Sources.Large_File posted earlier. This entire program boils down to:
  • Tell parser what to do
  • Open file
  • Parse file
  • Close file

(wiki_reader.ads)

(wiki_reader.adb)

I tried passing the two processes (Document_Process & Collection_Process) at the declaration of Wiki_Parser, but evidently Ada does not like access types being passed there (and a compiler bug). I could possibly wrap them in a record and pass them through that also. I think this way is not the preferred way, but it works.

Yay closures.

This combo fully parses the new 14 GB version of Wikipedia in 25 minutes. This is not the exact version that will be used in the final product. The Full_Document type will be declared elsewhere. Other than that all the code above will be included.

No comments: