Last post, I discussed what would be necessary in Behold to properly sort dates. I also wrote a bit about input of dates and output of dates.
I started implementing date sorting and I was using the raw DATE value as input into Behold. I had to interpret it and fix it prior to setting up the sort ordering for it. I then realized that if I’m doing the interpretation and fixing, I might as well write the input procedure and add the error checking. I do need the dates to fit GEDCOM 5.5.1 standards, so it is most efficient to input and fix it once. That does add a bit of overhead to Behold’s input, maybe adding up to 20% extra time (prior to optimizing), but it seems like it’s necessary.
Then I started looking at what fixing will entail by looking at the GEDCOMs that are out there. I just about choked. Without looking hard I found GEDCOMs with these as DATE values (and there are no typos in what I’m showing):
- January 10, 1879
- January 19th, 1893
- January 1915C
- ? APR 1911
- ca 1825
- ca 1840 1847
- 0117-08-09
- 111/14/1897
- 8/23/?
- 14-Oct
- by 13 Mar 1452
- by 1859?
- By Banns 1831 Jan 1
- September 13 1895
- before 1887
- 1851 or 1856
- 4th February 1977
- Before (1951)
- 16 Aprl 1910
- 4Aug 2005
- 16XX
- 13 FÉV 1665
- Unknown
- As an infant
- Live Long
- 1 AUG
- cira 1896
Now this is a cat’s breakfast. It seems like there’s no holds barred when it comes to how programs output dates in GEDCOM.
What I’m going to do with Behold is try to interpret every date correctly. If Behold can figure out unambiguously what the date means, then it will convert it internally to a proper GEDCOM date format. If there is some doubt, then I’ll turn it into a date phrase by enclosing it in parenthesis. In both cases, I’ll add a message to the log file so you know what’s been done.
This means that when Behold 1.5 comes out with GEDCOM export, then Behold may in fact become the one of the few programs that will output a fully compliant GEDCOM 5.5.1 file. Not only that, but it will input just about all the data from any GEDCOMish file, and make it compliant.
I realize this is something very important and could become one of the main things that Behold is used for. You need to store your data in something that is an accepted standard. Doing so future-protects your data. You can be sure that for quite some time, many programs will be able to read at least some of a correctly formatted GEDCOM 5.5.1 file. And there will be a few programs, like Behold, that will be able to read all of it.
And if some future standard, say BetterGEDCOM, comes along, then I can write an output routine to output to that, and Behold can be your translation program.
Once Behold 1.5 come along (and I’ll be working on this next after the current set of changes), you’ll see what compliant GEDCOM looks like.
I find this interesting. This bad input conversion to good output is really just going to be a sideline to what Behold’s true value will be. It’s Behold’s editing that will blow you away. I promise!