Victor Mote
|
Victor | |
Darla |
Home-School / Self-Study |
Dave Utter, GFA missionary to the Marshall Islands, is in the process of publishing the Bible in Marshallese, using the modern Marshallese orthography. Victor Mote is attempting to assist in the typesetting for this project. Dave is the decision-maker on all issues; Vic's purpose is to relieve Dave from as much of the technical, management, and grunt work as is feasible, so that Dave can stay as focused as possible on the language issues. The purpose of this page is to document decisions, progress, and to-do items for the project:
Believing that it is sometimes easier to show than to explain, here is a link to the current version of our prototype, Matu (Matthew): (PDF, 534 KB, 93 pages). Note that the combining diacriticals are now aligned more-or-less properly. However, there are still numerious layout issues, especially related to the relationship between side notes and footnotes.
Matthew was converted from InDesign. All other books are being converted from Word. Our prototype for the Word-to-XML conversion is Mark: (PDF, 345 KB, 55 pages).
Major challenges on this project include:
The following are issues for which there appears to be no current solution, based on the parameters in the Resolved Issues section:
Because true paragraphs and verses (or even chapters) can be staggered, this is the one place where our output may affect the design of our semantic XML document. If we use option 2 above, we will need to create specialized "fragment" elements to handle straddling issues. If we use option 1, we can get away with simply dropping a pilcrow character directly in the text, and otherwise ignoring "true" paragraphs.
In lieu of giving Dave access to source code control tools, we are currently emailing files back and forth as needed. Because we both have copies of the files, and because we are not using shared source code control, it is important to keep track of who "owns" a given file at any given time. Only the party with the token to the file should edit the file. Otherwise, we will end up overwriting each other's changes. Each party, should, when passing a token to the other party, explicitly state in an email message that they are doing so. For example, when Vic is done making changes to Matu.xml, and emails it back to Dave, he should also indicate in the email that he is passing the token to Dave as well.
After we get going, this shouldn't be a big issue, because, except for major DTD changes or other systemic changes (hopefully rare), Vic's involvement with the content files should end after the Word-to-XML conversion has been completed.
As stated in the Resolved Issues section, if this gets to be too much trouble, we can use CVS instead.
The current tokens are:
File | Token Currently Belongs To |
---|---|
Matu | Dave |
Mark | Dave |
All Others | Dave |
Fortunately, the Word files we wish to convert have very little formatting or markup content in them. Our main task then is to get the text dumped, character conversions done, and basic tagging. Here is the cookbook method of converting the Word files sent by Dave into XML:
This is Dave's toolkit. Essential items are as follows:
Nice-to-have items:
This is Vic's toolkit. He is documenting it here for two reasons: 1) if, for any reason, Vic is unable to complete the project, Dave will be able to continue, and 2) to provide the scope of what might be required for Dave to bring the project in-house if he decides to do that in the future.
Essential items include all of the Markup Toolkit, including nice-to-have items, plus:
This project is currently using the PortageBook DTD, developed by Vic for use by Portage Publications. We will change that if necessary, but for now it works fine.
Important: It is a very possible that the DTD will need to be revised during the project. If you think a change is necessary, please contact Vic.
Much can probably be inferred by examining any existing file that has some markup in it. The following is a brief description of the elements that Dave is most likely to use in his markup work. The list is incomplete, but can be expanded later as needed:
Also used in other places in the document, most notably within any <Preface>, and within <Footnote> elements. When used in any context other than <Chapter>, the "id" attribute is generally not needed.
Some notes for markup: