Conflicting Information in GEDCOM - Tue, 9 Nov 2021
An issue about GEDCOM has once again come to my attention.
In the GEDCOM 5.5.1 standard they write:
Conflicting event dates and places should be represented by placing them in separate event structures with appropriate source citations rather than by placing them under the same enclosing event.
I addressed this over 8 years ago in my article: Multiple Events and Unions in GEDCOM where I said this:
What this means is that if you have two conflicting sets of information for an event, such as a birth event, then there should be separate event structures for them, e.g.:
1 BIRT
2 DATE 1880
1 BIRT
2 DATE 1870Presumably you’d have more information with each including the full dates, the places, your sources and notes about each bit of evidence. Because of the GEDCOM rule, the first of the two would be considered the preferred, i.e. most credible date.
This is all fine and good for events like Birth and Death that, other than extremely extended circumstances (e.g. brought back from a coma, or science fiction), normally occur only once in any person’s life.
The trouble is that almost any other event can occur multiple times in a person’s life: adoption, naturalization, census, education, retirement. There have been people who have had multiple baptisms and even multiple burials.
This results in a problem. For events other than Birth and Death, if the events are represented like the 4-line GEDCOM example above, how do you tell if they are two different events of the same type, or if they are two sets of conflicting information about the same event?
The answer is, you can’t. GEDCOM does not explain how to distinguish the difference.
A Standard Needs to be Standardized
You would want a standard like GEDCOM to be followed by all developers. You would hope that the standard is internally consistent in how similar objects are represented.
Here we have an inconsistency, where more than one occurrence of the same event type can represent either:
- Two different events, or
- Conflicting information for the same event.
This type of inconsistency should never happen in a standard. So what is the possible solution?
Conflicting events are not multiple events. They are different versions of the same event based on different sources that give different information for the event. If we are talking about the same event, then all the information available for the event, conflicting or not, should be included in the event, for example, one idea to change GEDCOM might be like this:
1 BIRT
2 DATE 1880
3 SOUR @S1@
2 PLAC Wilmington, Delaware, USA
3 SOUR @S1@
2 ALTDATE 1870
3 SOUR @S2@
2 ALTPLAC New York City, New York, USA
3 SOUR @S2@
So this add new tags ALTDATE and ALTPLAC to indicate alternative information for the same event. Note that the source for that information is indicated.
Personally, I don’t like the idea of GEDCOM adding a million new tags, and adding each item of information individually unnecessarily causes repetition of source information. So this would not be an easy thing for developers to manage.
Maybe what would be better then, would be to include groups of alternative information with each group denoted by a single new tag, maybe ALT, e.g.:
1 BIRT
2 DATE 1880
2 PLAC Wilmington, Delaware, USA
2 SOUR @S1@
2 ALT
3 DATE 1870
3 PLAC New York City, New York, USA
3 SOUR @S2@
2 ALT
3 DATE 1875
3 SOUR @S3@
And then conflicting information for non-unique events, which don’t currently have a mechanism, can be done the same way.
I should mention that FamilySearch GEDCOM 7.0 does not address the issue of conflicting information and has the same problem as GEDCOM 5.5.1.
NO Is Wrong
While I’m at it, I should mention one of the few changes FamilySearch GEDCOM 7.0 introduced is sort of related to the conflicting information issue.
FSG 7.0 introduced a NON_EVENT_STRUCTURE indicated by the tag: “NO”, which they say:
Indicates that a specific type of event … did not happen within a given date period (or never happened if there is no DATE substructure).
with this example:
1 NO MARR
2 DATE TO 24 MAR 1880
Well gee thanks! That will break just about every genealogy developer’s code, will need to be handled for every possible event tag, and may require changes to the program’s database as well.
In this particular example, I don’t know why the date can’t be specified as:
1 MARR
2 DATE AFT 24 MAR 1880
And if you wanted to indicate that the couple didn’t marry, why not follow the model that GEDCOM 5.5.1 already had and allow it to be specified as:
1 MARR N
After all, GEDCOM already allows the following to indicate that a marriage happened but without additional information:
1 MARR Y
Two People Married More Than Once
The issue of conflicting information was brought back to my attention a few days ago by a discussion in a GEDCOMGeneral Google group. The question raised was if two people married and then separated and then married a second time, should they be included as one FAM (Family) record, or two?
e.g. as one FAM
0 @F1@ FAM
1 HUSB @I1@
1 WIFE @I2@
1 MARR
2 DATE 1950
1 DIV
2 DATE 1960
1 MARR
2 DATE 1970
or as two FAMs:
0 @F1@ FAM
1 HUSB @I1@
1 WIFE @I2@
1 MARR
2 DATE 1950
1 DIV
2 DATE 1960
0 @F2@ FAM
1 HUSB @I1@
1 WIFE @I2@
1 MARR
2 DATE 1970
In the first case, children from both marriages will be together. In the second one, they would be split into the two families, even though they are full siblings.
Well the issue gets more complicated. What do you do then with a child born between the divorce and the 2nd marriage?
This really should not be an issue at all. It is clear that there should be only one FAM representing all the relationships of two people and all the children they have. Multiple MARR tags should be allowed under a single FAM tag.
But, when they are, are they considered to be two marriage events, or conflicting information for one marriage event? GEDCOM is ambiguous.
Remembering Past Articles
It’s not good enough for the people writing the standards to just think about an issue and imagine what might be best. Each issue should be studied in detail, and when it comes to GEDCOM, there has already been a lot of study and discussion of most issues. Years of BetterGEDCOM, FHISO, and independent thinking by many genealogy developers should not just be re-thought without referring back to the work that has already been done.
So let me refer you back to:
- My article from 2013: Multiple Events and Unions in GEDCOM
- Tamura Jones’ article from 2019: Married, Divorced, Married Again
In case you don’t want to bother reading the two articles, both conclude that because of the way GEDCOM was written, and because of the way developers implemented GEDCOM, a FAM needs to represent just one union. The MARR and DIV (or ANUL) events therefore represent the start and the end of the union, just like the BIRT and DEAT events represent the start and end of an INDI which represents one life. Multiple MARR and DIV events within one union represent conflicting information, just as multiple BIRT and DEAT events represent conflicting information within an INDI.
All other tags, e.g. CENS, RESI, OCCU, EDUC, etc., represent events that can occur multiple times, so there’s no way to represent conflicting information for those events. This is an inconsistency in GEDCOM that should be fixed.
Until this change to conflicting information is made, the FAM must remain as one union from MARR to end of MARR. But once multiple events no longer are used to represent conflicting information, then the FAM concept can be changed to represent the more logical concept of all the relationships between two people and the children they have together.
So the GEDCOM-standards writers really need to change how conflicting information is handled so that the FAM concept can be repaired.