Sometimes I think I try to make my solutions to problems more elegant than needed. I encountered another hurdle in my quest, which sent me on a short detour. I will wait to share the problem, but I've attached a demonstration of the solution.
A demonstration of how to find the power of two that is less than or equal to the supplied value.
(leq_pot.adb)
Not an exciting program, but reminds me of some of those groaner homeworks that I've had before.
Wednesday, December 26, 2007
Friday, December 21, 2007
Experimenting with XML
An unintended use of this blog is that I am using it as a central repository for notes on my projects. I have three unpublished entries that are notes for various projects. This post is actually about a week old and gradually became what you see now.
I plan on working with the English Wikipedia database in XML. At nearly 12 GB, 203 million lines, and millions of entries it is several orders of magnitude larger than any file I have attempted to process. Most tools just give up at dealing with it. I resorted to vi to save a slice of the document to another file, and less to view and copy the end of the file. I created a manageable 700 line sample of the database. It was good to look through the XML of Wikipedia to see the wiki tags in their unparsed form. I wasn't very happy with the formatting of the articles as I will have to do three types of parsing on the document. Once for the XML, once to strip out the words in certain tags and documents, and again for the words. Plans are to parse the tags and words in the same pass after they have been retrieved from the XML. I did grumble a bit at the Wiki tags being embedded in the document instead of the whole thing being pure XML. Made me think of this. Another hurdle I suppose.
The English Wikipedia is not the only wikipedia database available for download. This page has a list of databases in XML that are regularly generated. Wikis on anything from 50 cent to Zombies are available. Uncyclopedia is what I sometimes use to keep myself awake during class and it is a usable size at 390 MB. Small enough to fit in memory, large enough to take time to process. It takes around 1:30 to parse with my current parser that I hope to be publishing here soon.
I started reading Uncyclopedia a couple of hours ago. Started off with Lisp, Why stick things in an electrical outlet?, and 667:Neighbor_of_The_Beast. I guess no more coding tonight.
I plan on working with the English Wikipedia database in XML. At nearly 12 GB, 203 million lines, and millions of entries it is several orders of magnitude larger than any file I have attempted to process. Most tools just give up at dealing with it. I resorted to vi to save a slice of the document to another file, and less to view and copy the end of the file. I created a manageable 700 line sample of the database. It was good to look through the XML of Wikipedia to see the wiki tags in their unparsed form. I wasn't very happy with the formatting of the articles as I will have to do three types of parsing on the document. Once for the XML, once to strip out the words in certain tags and documents, and again for the words. Plans are to parse the tags and words in the same pass after they have been retrieved from the XML. I did grumble a bit at the Wiki tags being embedded in the document instead of the whole thing being pure XML. Made me think of this. Another hurdle I suppose.
The English Wikipedia is not the only wikipedia database available for download. This page has a list of databases in XML that are regularly generated. Wikis on anything from 50 cent to Zombies are available. Uncyclopedia is what I sometimes use to keep myself awake during class and it is a usable size at 390 MB. Small enough to fit in memory, large enough to take time to process. It takes around 1:30 to parse with my current parser that I hope to be publishing here soon.
I started reading Uncyclopedia a couple of hours ago. Started off with Lisp, Why stick things in an electrical outlet?, and 667:Neighbor_of_The_Beast. I guess no more coding tonight.
Tuesday, December 18, 2007
Fixing the heat
Yes, the weather has been cold here lately, but that is not what this post is about.
In the spring of 07 I was given an assignment to solve the heat equation. The heat equation works simply by finding the average of a point and and all the points around it, setting that new found average as the new value, and repeating. My original attempt at this involved a monitor, three copies of the array in memory, and a lockup that I could not trace down. I would have forgotten about this assignment, but earlier this semester I had an epiphany of why it didn't work. The epiphany was that I only had one point of synchronization in a loop. At certain times threads would be released from that point of synchronization then run through their loop, and come back to the point of synchronization. That point of synchronization was still open because other threads were still in it. Once this happened the threads became out of sync and the program would deadlock.
The new solution: (heat.adb)
This is not a robust solution, but merely an exercise in task synchronization. I did make two big goofs during development. I wasn't thinking at the time and placed the line "All_Under_Tolerance := True;" in the elsif instead of the if of First_Sync. This had the effect that All_Under_Tolerance was set by the last thread to exit that entry. This took a careful reading of the code to realize this error. The other error was that I had the two lines:
What did I learn from all this? I think the Synchronizer in this contains the first burst locks I have ever written. If a burst lock occurs in a loop a second burst lock seems necessary. It might be possible to call one burst lock with a requeue in it so that there are two burst locks in one call. That might be an exercise for later though.
In the spring of 07 I was given an assignment to solve the heat equation. The heat equation works simply by finding the average of a point and and all the points around it, setting that new found average as the new value, and repeating. My original attempt at this involved a monitor, three copies of the array in memory, and a lockup that I could not trace down. I would have forgotten about this assignment, but earlier this semester I had an epiphany of why it didn't work. The epiphany was that I only had one point of synchronization in a loop. At certain times threads would be released from that point of synchronization then run through their loop, and come back to the point of synchronization. That point of synchronization was still open because other threads were still in it. Once this happened the threads became out of sync and the program would deadlock.
The new solution: (heat.adb)
This is not a robust solution, but merely an exercise in task synchronization. I did make two big goofs during development. I wasn't thinking at the time and placed the line "All_Under_Tolerance := True;" in the elsif instead of the if of First_Sync. This had the effect that All_Under_Tolerance was set by the last thread to exit that entry. This took a careful reading of the code to realize this error. The other error was that I had the two lines:
Synchronizer.Second_Sync; exit Averaging_Loop when All_Under_Tolerance;reversed. This made it so that some tasks would read an unsynchronized value of All_Under_Tolerance. That goof caused some tasks exit their loops and some tasks continue on. That was easy to track down.
What did I learn from all this? I think the Synchronizer in this contains the first burst locks I have ever written. If a burst lock occurs in a loop a second burst lock seems necessary. It might be possible to call one burst lock with a requeue in it so that there are two burst locks in one call. That might be an exercise for later though.
Labels:
academic,
Ada programming,
concurrency,
Winter of Code
Monday, December 17, 2007
Begining the Winter of Code
If Google can have a summer of code why can't I have a Winter of Code?
Next semester I have a class that makes many people cringe at it's name. It is heavy in Ada and in concurrency. I would like to polish my skills in preparation for this class. I have the free time this winter break to do so, hence the winter of code.
I have two projects on my agenda at the moment. In the Spring of '07 I was given a project to solve the heat problem. My solution worked, but it had some occasional deadlock. Fixing this shouldn't be a big deal, but a nice warmer into heavier things. Also the same semester I created a rather small search engine. I would like to expand it and make it more robust.
A longer goal for this project is to improve my writing ability and possibly some Ada programming advocacy. Those two can only be solved by persistence and topics to write about. I do have my workout journal elsewhere on the internet and this may see the occasional crossover.
Next semester I have a class that makes many people cringe at it's name. It is heavy in Ada and in concurrency. I would like to polish my skills in preparation for this class. I have the free time this winter break to do so, hence the winter of code.
I have two projects on my agenda at the moment. In the Spring of '07 I was given a project to solve the heat problem. My solution worked, but it had some occasional deadlock. Fixing this shouldn't be a big deal, but a nice warmer into heavier things. Also the same semester I created a rather small search engine. I would like to expand it and make it more robust.
A longer goal for this project is to improve my writing ability and possibly some Ada programming advocacy. Those two can only be solved by persistence and topics to write about. I do have my workout journal elsewhere on the internet and this may see the occasional crossover.
Subscribe to:
Posts (Atom)