Text to speech was a valuable tool in a recent class. It made for great debugging information when no visual interaction was possible, and was a helpful feature in the final application. The text to speech system used there was easy to program dedicated hardware. I may be a hardware junkie, but I'm pretty sure I can get by with a software solution.
After a bit of searching around I found Festival. Festival is available as a package for Linux and seems to be pretty mature. I installed festival and read through a tutorial on it. Some reading needed to be done to understand the parameters in GNAT.OS_Lib. After a bit of examining the code it made sense.
First needed to make a script to translate the way Ada's spawning passes parameters into the way Festival takes input.
(festival_helper.sh)
Simple shell script that needs to be saved and be made executable. "./festival_helper.sh "your text goes here"" will call festival.
The Ada part was a bit kludgy with how the parameters were passed, but nothing too bad.
(festival_demo.adb)
I'd like to think that is pretty straightforward...
I doubt this is the most efficient way. Some magic with bash scripts could easily be more efficient. Instead of re-spawning Festival with every call another possible approach might be to use Festival's --server and send over the network. I've dabbled with interfacing directly with the Festival C++ API, but my understanding of C++ compiler messages is limited. I had to do some shoehorning to get a gcc on this system, so that might be a problem as well.
It works, its simple, I'm happy.
Showing posts with label Ada programming. Show all posts
Showing posts with label Ada programming. Show all posts
Sunday, June 8, 2008
Saturday, January 12, 2008
Avoiding rendezvous to delay starting of tasks
A common problem I encounter is that something needs to be done before task can start running. I've always solved that by using a Start entry on tasks and calling that entry when it is safe for them to start. Calling entries works fine but I'd like to think the following is a bit neater:
(declare_task_test.adb)
Instead of delaying when the tasks start, delay when the tasks are declared.
(declare_task_test.adb)
Instead of delaying when the tasks start, delay when the tasks are declared.
Friday, January 11, 2008
Full XML Parser for Wikis
You may want a primer on parsing XML with Ada. That is an... okay... example, but better than many Ada packages have on the web. The following is a more complete, but more complicated example.
In XML/Ada you supply an object to the parser that provides the rules for how to parse the document. It is a pretty easy to work with and extend. This code was some of the earliest I started writing in this whole adventure. I waited to finalize it until now so I could completely figure out how the client code would work with it.
This is the client to the XML/Ada library and makes use of the parser object posted below: (test_wiki_parser.adb)
This uses the Input_Sources.Large_File posted earlier. This entire program boils down to:
(wiki_reader.ads)
(wiki_reader.adb)
I tried passing the two processes (Document_Process & Collection_Process) at the declaration of Wiki_Parser, but evidently Ada does not like access types being passed there (and a compiler bug). I could possibly wrap them in a record and pass them through that also. I think this way is not the preferred way, but it works.
Yay closures.
This combo fully parses the new 14 GB version of Wikipedia in 25 minutes. This is not the exact version that will be used in the final product. The Full_Document type will be declared elsewhere. Other than that all the code above will be included.
This is the client to the XML/Ada library and makes use of the parser object posted below: (test_wiki_parser.adb)
This uses the Input_Sources.Large_File posted earlier. This entire program boils down to:
- Tell parser what to do
- Open file
- Parse file
- Close file
(wiki_reader.ads)
(wiki_reader.adb)
I tried passing the two processes (Document_Process & Collection_Process) at the declaration of Wiki_Parser, but evidently Ada does not like access types being passed there (and a compiler bug). I could possibly wrap them in a record and pass them through that also. I think this way is not the preferred way, but it works.
Yay closures.
This combo fully parses the new 14 GB version of Wikipedia in 25 minutes. This is not the exact version that will be used in the final product. The Full_Document type will be declared elsewhere. Other than that all the code above will be included.
Friday, January 4, 2008
The difference a review can make
One of my previous classes involved building a search engine for a document. The professor demonstrated different ways of indexing documents. One that really jumped out at me was ngramming. Ada's ability to do Enumeration_Type'Value on an equivalently sized string and return Enumeration_Type really made this attractive to me. Seemed like a pretty straightforward idea and should be easy to implement.
It wasn't.
I wrote a script to define the ngram type of all combinations of 3 letters. The script output a file "aaa, aab, aac, .. zzx, zzy, zzz". There are 17576 combinations of 3 letters. Ada has 12 reserved words that are 3 letters so I had to code around that. My once neat "abs" became "absa" and I had to add a case statement with 7 choices and at least one if-else in every choice of that case. Yipe. What was a very close representation of the data just became a burden. The conversion function and the accompanying type declaration make for a nearly 90 KB file. Ugh.
Using a script to write the data type for me should have been a clue.
I've been looking over that code recently and came up with a much neater solution. Here is the declaration and test of what should be a nearly drop in replacement and is probably faster to boot:
(trigram_test.adb)
In comparison the original enumeration type declaration took 704 lines. This new type might be not as representative, but I never used the representation in the old implementation. This is by far easier to understand and maintain.
It wasn't.
I wrote a script to define the ngram type of all combinations of 3 letters. The script output a file "aaa, aab, aac, .. zzx, zzy, zzz". There are 17576 combinations of 3 letters. Ada has 12 reserved words that are 3 letters so I had to code around that. My once neat "abs" became "absa" and I had to add a case statement with 7 choices and at least one if-else in every choice of that case. Yipe. What was a very close representation of the data just became a burden. The conversion function and the accompanying type declaration make for a nearly 90 KB file. Ugh.
Using a script to write the data type for me should have been a clue.
I've been looking over that code recently and came up with a much neater solution. Here is the declaration and test of what should be a nearly drop in replacement and is probably faster to boot:
(trigram_test.adb)
In comparison the original enumeration type declaration took 704 lines. This new type might be not as representative, but I never used the representation in the old implementation. This is by far easier to understand and maintain.
Tuesday, January 1, 2008
Plans gone awry
An unforeseen obstacle is that the file IO included with XML/Ada is not the most robust solution. The included file input source works by reading the entire xml file to a buffer at once. The largest file it can index is 2 GB. That is not going to work for the 12 GB file I have.
My first thought was to read large blocks with Sequential_IO for the majority of the file and then use progressively smaller calls to Direct_IO for the remainder. The size of the calls to Direct_IO would be determined by the leq_pot demonstration posted earlier. I decided against this approach because of the complicated buffer and file slicing. I would've ended up with a massive development headache and then needed to run more extensive tests. Best tests are tests you don't have to run.
Stream_IO is one of the more powerful IO packages in Ada. I had toyed with the idea of using Stream_IO to read the data, then using Unchecked Conversion to change the data to the proper type. I'm not a fan of Unchecked Conversion, and I would need to have multiple copies of the same data in memory. I considered that ugly and moved on. The second attempt was to make a buffered Stream_IO package. I wrote the package pretty quickly and started implementing it. After doing more investigation I discovered that the decoding procedure for Unicode characters requires that the buffer be internal to the Input_Source and not in the IO package. Back to the drawing board.
After visiting the drawing board this time I felt much better about my solution. Using Stream_IO, buffering logic from the previous attempt, and a procedure borrowed from the UTF-8 Unicode encoding I managed to make an object that reads a file using Stream_IO and converts Stream_Elements directly to UTF-8 characters. Additional encoding schemes shouldn't be hard to do, but I'll just stick with UTF-8 for now. This is what I came up with:
(input_source-large_file.ads)
(input_sources-large_file.adb)
The time required to parse a file using this input source is much more consistent. For the 390 MB Uncyclopedia the file source included with XML/Ada would take anywhere from 1:24 to 2:01. What I wrote it is always between 1:12 and 1:14. Part of the speedup is probably due to the inlining of the UTF-8 conversion. The more consistent times is likely due to finer grained IO. Using a 16 MB buffer CPU usage on a 2.4 Ghz Core 2 hovers between 90% and 100% utilization of a single core and 18 MB of RAM is consumed. A parser with high standards for what constitutes a document processed the 12 GB wikipedia and counted 1074662 articles in 23 minutes.
Another input source to verify this against with would be nice. I've verified this input source with the smaller Uncyclopedia. For all I know this source is faulty after the first 390 MB. Then again 390 MB worth of data is probably a high enough standard.
Lessons taught:
What I'd like to continue with:
My first thought was to read large blocks with Sequential_IO for the majority of the file and then use progressively smaller calls to Direct_IO for the remainder. The size of the calls to Direct_IO would be determined by the leq_pot demonstration posted earlier. I decided against this approach because of the complicated buffer and file slicing. I would've ended up with a massive development headache and then needed to run more extensive tests. Best tests are tests you don't have to run.
Stream_IO is one of the more powerful IO packages in Ada. I had toyed with the idea of using Stream_IO to read the data, then using Unchecked Conversion to change the data to the proper type. I'm not a fan of Unchecked Conversion, and I would need to have multiple copies of the same data in memory. I considered that ugly and moved on. The second attempt was to make a buffered Stream_IO package. I wrote the package pretty quickly and started implementing it. After doing more investigation I discovered that the decoding procedure for Unicode characters requires that the buffer be internal to the Input_Source and not in the IO package. Back to the drawing board.
After visiting the drawing board this time I felt much better about my solution. Using Stream_IO, buffering logic from the previous attempt, and a procedure borrowed from the UTF-8 Unicode encoding I managed to make an object that reads a file using Stream_IO and converts Stream_Elements directly to UTF-8 characters. Additional encoding schemes shouldn't be hard to do, but I'll just stick with UTF-8 for now. This is what I came up with:
(input_source-large_file.ads)
(input_sources-large_file.adb)
The time required to parse a file using this input source is much more consistent. For the 390 MB Uncyclopedia the file source included with XML/Ada would take anywhere from 1:24 to 2:01. What I wrote it is always between 1:12 and 1:14. Part of the speedup is probably due to the inlining of the UTF-8 conversion. The more consistent times is likely due to finer grained IO. Using a 16 MB buffer CPU usage on a 2.4 Ghz Core 2 hovers between 90% and 100% utilization of a single core and 18 MB of RAM is consumed. A parser with high standards for what constitutes a document processed the 12 GB wikipedia and counted 1074662 articles in 23 minutes.
Another input source to verify this against with would be nice. I've verified this input source with the smaller Uncyclopedia. For all I know this source is faulty after the first 390 MB. Then again 390 MB worth of data is probably a high enough standard.
Lessons taught:
- This is the first tagged type extension, outside of a book exercise, that I have done.
- Tagged types cannot have default discriminants.
- Don't expect to deal with large files easily.
- Simple is better.
What I'd like to continue with:
- Make the package handle more than UTF-8.
- Find a more optimal buffer size.
- Can another thread speed up execution or is it IO bound?
Wednesday, December 26, 2007
If its not one thing, its another
Sometimes I think I try to make my solutions to problems more elegant than needed. I encountered another hurdle in my quest, which sent me on a short detour. I will wait to share the problem, but I've attached a demonstration of the solution.
A demonstration of how to find the power of two that is less than or equal to the supplied value.
(leq_pot.adb)
Not an exciting program, but reminds me of some of those groaner homeworks that I've had before.
A demonstration of how to find the power of two that is less than or equal to the supplied value.
(leq_pot.adb)
Not an exciting program, but reminds me of some of those groaner homeworks that I've had before.
Tuesday, December 18, 2007
Fixing the heat
Yes, the weather has been cold here lately, but that is not what this post is about.
In the spring of 07 I was given an assignment to solve the heat equation. The heat equation works simply by finding the average of a point and and all the points around it, setting that new found average as the new value, and repeating. My original attempt at this involved a monitor, three copies of the array in memory, and a lockup that I could not trace down. I would have forgotten about this assignment, but earlier this semester I had an epiphany of why it didn't work. The epiphany was that I only had one point of synchronization in a loop. At certain times threads would be released from that point of synchronization then run through their loop, and come back to the point of synchronization. That point of synchronization was still open because other threads were still in it. Once this happened the threads became out of sync and the program would deadlock.
The new solution: (heat.adb)
This is not a robust solution, but merely an exercise in task synchronization. I did make two big goofs during development. I wasn't thinking at the time and placed the line "All_Under_Tolerance := True;" in the elsif instead of the if of First_Sync. This had the effect that All_Under_Tolerance was set by the last thread to exit that entry. This took a careful reading of the code to realize this error. The other error was that I had the two lines:
What did I learn from all this? I think the Synchronizer in this contains the first burst locks I have ever written. If a burst lock occurs in a loop a second burst lock seems necessary. It might be possible to call one burst lock with a requeue in it so that there are two burst locks in one call. That might be an exercise for later though.
In the spring of 07 I was given an assignment to solve the heat equation. The heat equation works simply by finding the average of a point and and all the points around it, setting that new found average as the new value, and repeating. My original attempt at this involved a monitor, three copies of the array in memory, and a lockup that I could not trace down. I would have forgotten about this assignment, but earlier this semester I had an epiphany of why it didn't work. The epiphany was that I only had one point of synchronization in a loop. At certain times threads would be released from that point of synchronization then run through their loop, and come back to the point of synchronization. That point of synchronization was still open because other threads were still in it. Once this happened the threads became out of sync and the program would deadlock.
The new solution: (heat.adb)
This is not a robust solution, but merely an exercise in task synchronization. I did make two big goofs during development. I wasn't thinking at the time and placed the line "All_Under_Tolerance := True;" in the elsif instead of the if of First_Sync. This had the effect that All_Under_Tolerance was set by the last thread to exit that entry. This took a careful reading of the code to realize this error. The other error was that I had the two lines:
Synchronizer.Second_Sync; exit Averaging_Loop when All_Under_Tolerance;reversed. This made it so that some tasks would read an unsynchronized value of All_Under_Tolerance. That goof caused some tasks exit their loops and some tasks continue on. That was easy to track down.
What did I learn from all this? I think the Synchronizer in this contains the first burst locks I have ever written. If a burst lock occurs in a loop a second burst lock seems necessary. It might be possible to call one burst lock with a requeue in it so that there are two burst locks in one call. That might be an exercise for later though.
Labels:
academic,
Ada programming,
concurrency,
Winter of Code
Monday, December 17, 2007
Begining the Winter of Code
If Google can have a summer of code why can't I have a Winter of Code?
Next semester I have a class that makes many people cringe at it's name. It is heavy in Ada and in concurrency. I would like to polish my skills in preparation for this class. I have the free time this winter break to do so, hence the winter of code.
I have two projects on my agenda at the moment. In the Spring of '07 I was given a project to solve the heat problem. My solution worked, but it had some occasional deadlock. Fixing this shouldn't be a big deal, but a nice warmer into heavier things. Also the same semester I created a rather small search engine. I would like to expand it and make it more robust.
A longer goal for this project is to improve my writing ability and possibly some Ada programming advocacy. Those two can only be solved by persistence and topics to write about. I do have my workout journal elsewhere on the internet and this may see the occasional crossover.
Next semester I have a class that makes many people cringe at it's name. It is heavy in Ada and in concurrency. I would like to polish my skills in preparation for this class. I have the free time this winter break to do so, hence the winter of code.
I have two projects on my agenda at the moment. In the Spring of '07 I was given a project to solve the heat problem. My solution worked, but it had some occasional deadlock. Fixing this shouldn't be a big deal, but a nice warmer into heavier things. Also the same semester I created a rather small search engine. I would like to expand it and make it more robust.
A longer goal for this project is to improve my writing ability and possibly some Ada programming advocacy. Those two can only be solved by persistence and topics to write about. I do have my workout journal elsewhere on the internet and this may see the occasional crossover.
Subscribe to:
Posts (Atom)