It seems that I’ve created a series of posts about dating. First there was Sort of a Date where I discussed the problems in sorting GEDCOM dates. Then there was How About a Date? where I reviewed the GEDCOM syntax for dates and how many programs break them.
While writing my FixDate function to correct those illegal date functions, I realized that GEDCOM was asking that you enter only dates that actually exist, but there was nothing in the GEDCOM syntax itself that specifically defined what dates these were.
Well, genealogy programs should be smart, shouldn’t they? If they’re helping you to enter dates from long ago, wouldn’t you want your program to check your date for you and tell you right away if you’ve got something wrong. I would hope you wouldn’t be able to enter “36 Feb 1938” as a date – but to my shock, I found that in a Brother’s Keeper GEDCOM.
I was curious to see how bad the problem was, so I decided I’d try searching for a few non-existent dates on Google using filetype:ged in the search to only give GEDCOM files in the results.
My first thought was that programs might not check for leap years correctly, so I searched:
- 29 FEB 1897 – 1 result, MyHeritage
- 29 FEB 1903 – 2 results, Family Origins and PAF
- 29 FEB 1906 – 1 result, PAF
- 29 FEB 1909 – 1 result, Ancestry.com Family Trees
- 29 FEB 1910 – 1 result, PAF
- 29 FEB 1911 – 2 results, Family Origins and PAF
So … there appear to be at least a few programs that don’t check the leap year for you.
Then I tried:
- DATE 30 FEB – 24 results, Brother’s Keeper (3), PAF (11), Family Origins (3), RootsMagic, Legacy (4), BasGen and one stated to be from: AAAAAA (Eh what?)
Well what about:
- DATE 31 FEB – 14 results, plus Family Treasures, Heredis and Pro-Gen 2
- DATE 31 APR – 40 results, plus EFTree and Holger (which doesn’t exist)
- DATE 31 SEP – 32 results, plus GenoPro
- DATE 31 NOV – 37 results, plus Ancestral File and CFTree
- DATE 32 – 1 result, EasyTree that gave 32 Dec 1841
Granted that many of the GEDCOMs on the web are quite old and produced by earlier versions of the above programs. But this is a basic check all genealogy software should do for you. If your program is mentioned above, then it is possible it still allows non-existent dates to be entered into it.
I just tried entering 29 Feb 1943 into RootsMagic 4 not using its little popup calendar (since the calendar does not have that date on it). RootsMagic highlighted the date in light yellow in a passive way to signal a problem, but accepted it. So then I tried adding 43 freb 1943 (yes, with FEB spelled as “freb”) and it accepted that as well. I exported the file, and the GEDCOM line said:
2 DATE 43 freb 1943
That is not good. RootsMagic knows it’s a bad date. Exporting it that way is illegal in GEDCOM and will create problems when other programs read it. At the minimum, this date should have been converted to a date phrase for export by enclosing it in parenthesis, which is valid, i.e.:
2 DATE (43 freb 1943)
Try entering some of these illegal dates in the program you use and see what your program does with them.
Still curious, I thought I’d try a more difficult date construct. There are the double dates caused by the change of calendars in 1583. This is something that’s tricky even for genealogists to get right. Basically the years from 1583 to 1751 could be written 1583/84 to 1751/52, but only from January 1st through March 24th since one calendar had January 1st as the start of the new year, and the other had March 25th as the start. From March 25th, the year is the same and a double date should not be used.
You’d think if a program allowed you to enter double dates, they’d at least help you by letting you know you didn’t get it right.
I tried searching for some illegal double dates and found these:
- DATE 1579/80 – 1 result, FamilyOrigins
- DATE 1577/79 – 1 result, Brother’s Keeper
- DATE 1575/76 – 1 result, FamilyOrigins
- JUN 1719/20 – 1 result. Brother’s Keeper. Invalid.
- NOV 1719/20 – 1 result. Brother’s Keeper. Invalid.
Hopefully these programs are not improperly using the slash as an “or”, because that is also illegal GEDCOM.
RootsMagic 4 in its help file says it does support double dates, and does put in checks for you. That’s good and you want that. Unfortunately, they may export their double dates as 1698/9 instead of 1698/99 and as 1699/1700 instead of 1699/00 as GEDCOM specifies. Because of that, programs inputting GEDCOMs from RootsMagic 4 may not be able to read those double dates correctly.
It is not difficult to do basic checks that a date exists. I’m currently just about finished including checks in Behold and adding “illegal date” as a new **data error** message so that Behold will help you identify any known-to-be illegal dates that you might have in your data. This should be in the almost-ready Version 1.0.1 of Behold. And when editing is added in Version 2.0, Behold will let you know if you try to enter an illegal date and will give you the chance there and then to correct it.
But the sad thought I have about all this is: If genealogy software vendors can’t even do a simple date check in their programs, and can’t even output their date field to properly follow the GEDCOM standard for it, then that really worries me. These programs seem out to do their own things their own way, and don’t seem willing to cooperate in the way they do things with anyone else or follow the established standards. Unless this attitude changes, there will be no hope of improving data transfer between programs and initiatives such as BetterGEDCOM will be doomed to fail.
Joined: Mon, 12 Jan 2009
36 blog comments, 59 forum posts
Posted: Wed, 14 Dec 2011
Louis
Without meaning this as a negative comment to the BetterGEDCOM project, of which I have previously been a member, but more as information to your blog readers, there is a high possibility that BetterGEDCOM may fail. This is due to the participation of minimal (maybe that could read only Behold) Genealogy program developers/owners being a part of BetterGEDCOM.
Brett
Joined: Sun, 9 Mar 2003
288 blog comments, 245 forum posts
Posted: Thu, 15 Dec 2011
Brett:
I don’t think the reason for fail will be lack of developer involvement. Tom Wetmore who developed Lifelines, was involved in GenTech and is now developing DeadEnds has been a major contributor. Michael Martineau of Family Pursuit has also been a contributor.
Then we’ve had involvement from AncestorSync who is in turn involved and has connections to many of the major vendors. Bruce Buzbee of RootsMagic has attended a few developer meetings. Gordon Clarke of FamilySearch has been helping with the organization of the BetterGEDCOM group. And Legacy has offered its citation templates for BetterGEDCOM’s SourceTemplates.org endeavor.
I don’t think it would be hard to get other developers involved if BetterGEDCOM started producing something tangible that others could evaluate. My current push is to get them to produce the Source/Citation definitions that have been talked about. BG’s got to produce something soon in order to stay relevant, and there would be no better time to do that than just prior to RootsTech.
I said BetterGEDCOM might be doomed to fail, not because developers aren’t involved, but because the developers would not willing to compromise and adapt their programs to work with whatever standards are decided upon.
Louis
Joined: Thu, 15 Dec 2011
5 blog comments, 0 forum posts
Posted: Thu, 15 Dec 2011
I run the website Genealogie Online (http://www.genealogieonline.nl/en/) where genealogists can publish their genealogical data (GEDCOM) and images. For this to work, Genealogie Online has to “swallow” a lot of GEDCOM’s from several programs (see http://www.genealogieonline.nl/en/stamboom-programma.php for a complete list).
After reading your blogposting I decided to examine the 4638 GEDCOM’s which genealogists uploaded. I searched for files containing 29 FEB ***1, 29 FEB ***3, 29 FEB ***5, 29 FEB ***7 or 29 FEB ***9.
PRO-GEN > 26
GensDataPro > 22
MyHeritage Family Tree Builder > 21
BROSKEEP > 7
Aldfaer > 1
gwb2ged > 1
OEDIPUS_II > 1
PhpGedView > 1
Reunion > 2
Stamboom > 1
So Dutch programmers have some work to do too…
Joined: Mon, 12 Jan 2009
36 blog comments, 59 forum posts
Posted: Thu, 15 Dec 2011
Louis:
What do you consider the likelihood of developers being willing to compromise and adapt their programs to work with any standards decided?
How does one address the many users who are happy with their program as is (not knowing GEDCOM export/import difficulties) and do not upgrade their software, so that there will still be incorrect GEDCOM files being created/distributed? Would each developer need to also adapt their programs to correctly import GEDCOM, as is your goal for Behold?
Brett
Joined: Sun, 9 Mar 2003
288 blog comments, 245 forum posts
Posted: Thu, 15 Dec 2011
Bob: Very interesting. I guess our samples really are representative of the current date state. - Louis
Joined: Sun, 9 Mar 2003
288 blog comments, 245 forum posts
Posted: Thu, 15 Dec 2011
Brett:
Developers will need a good reason to change before they would. A new standard will have to be not too difficult to implement. It will have to be able to still work without requiring modifications to their current database. And it will have to offer them something that is so important they can’t refuse, and that will be important to them if their competition is using it and/or their users demand it. Maybe some slick implementation of sources/citations, evidence/conclusions or internationalization might entice them.
As far as the incorrect GEDCOMs go, those users today are very lucky. If and when a new standard comes out, there will be at least a few programs made to convert GEDCOM to the new standard. Some of the programs will do extended GEDCOM better than others. And I know I’ll add that conversion ability into Behold as well. So they really don’t have to worry about it for quite a while yet. The fact that GEDCOM today is so pervasive is what provides this safety and the fact that it was able to get that way is what makes me say we are very lucky as genealogists to have had it.
Developers could and probably should, but don’t and won’t have to write their own conversion routine unless they have a good reason to. Ancestry.com didn’t write a conversion routine for their FTW Text leaving many of their users in the exact situation you describe - but Ancestry didn’t care. And those FTW Text files now have Behold to view their data and soon will have Behold to to convert their file back to standard GEDCOM. So the users are all safe for now.
Louis
Joined: Mon, 12 Jan 2009
36 blog comments, 59 forum posts
Posted: Thu, 15 Dec 2011
Louis:
I am looking forward to the ’soon will have Behold to to convert their file back to standard GEDCOM’.
By the way, what is your idea of standard GEDCOM, 5.5 or 5.5.1.
Thanks
Joined: Sun, 9 Mar 2003
288 blog comments, 245 forum posts
Posted: Thu, 15 Dec 2011
The official standard is GEDCOM 5.5. The de facto standard is GEDCOM 5.5.1.
There is in fact very little difference between the two. In 5.5.1, the BLOB tag was removed and 8 tags were added, including some important ones like EMAIL, WWW, LATI and LONG, and a few extra syntactical changes were added that improved 5.5. Many programs now export these extras.
A program that can read valid 5.5.1 can also read valid 5.5, except for the BLOB tag which I don’t believe is used any more by any current program, and I’m not sure if it can be displayed by any program (maybe GEDitCOM). I have never found a GEDCOM that actually uses BLOBs other than a GEDitCOM-generated file used for testing GEDCOM readers. So I’ve not bothered to implement interpretation of what the blob is supposed to be.
A program exporting to 5.5 would have to do something with those extra 5.5.1 tags and enhanced syntax, and that would compromise the user’s data. A program exporting to 5.5.1 would only lose the BLOG tag.
So overall, there are minor trade-offs between using GEDCOM 5.5 and GEDCOM 5.5.1.
Prior to implementing GEDCOM export, I’ll take a more detailed look at what some other programs do and then decide on what might be best for Behold to export to. I do want only one export version, though.