This is the client to the XML/Ada library and makes use of the parser object posted below: (test_wiki_parser.adb)
This uses the Input_Sources.Large_File posted earlier. This entire program boils down to:
- Tell parser what to do
- Open file
- Parse file
- Close file
I tried passing the two processes (Document_Process & Collection_Process) at the declaration of Wiki_Parser, but evidently Ada does not like access types being passed there (and a compiler bug). I could possibly wrap them in a record and pass them through that also. I think this way is not the preferred way, but it works.
This combo fully parses the new 14 GB version of Wikipedia in 25 minutes. This is not the exact version that will be used in the final product. The Full_Document type will be declared elsewhere. Other than that all the code above will be included.