Is Updating the GEDCOM Standard Necessary? - Sun, 22 Oct 2023
The GEDCOM Standard was first developed almost 40 years ago as a way to store genealogical data and transfer it between programs. It was developed about the same time the first genealogical software programs were developed.
The early programs developed a basic structure for genealogy, and the standard reflected that. The standard was updated many times mostly to ease its implementation and to transfer additional types of data, but the basic record structure has never really changed.
The standard that is common use now is GEDCOM 5.5.1. It was drafted in 1999 and finalized without changes in 2019. So in 24 years, the standard hasn’t changed. Similarly about 24 years ago, genealogy software had matured to the point where their data structures were set and rarely needed to change. Having the GEDCOM standard to base their data structures on had a lot to do with that.
What is a Standard and what is Good Standard?
A Standard is an agreed-upon document that provides rules or instructions for its intended audience to follow in order to meet the document’s specific purpose.
GEDCOM’s intended audience is mostly genealogy software developers.
GEDCOM’s specific purpose is to facilitate transfer of genealogical data between software.
Most people would agree that a standard is a good standard iff:
- It has been adopted and is used by most of its intended audience.
- It is understandable and contains most of what is needed to serve its purpose.
- It is relatively stable from version to version without requiring major changes.
So is GEDCOM 5.5.1 a good standard?
- Almost all genealogy software developers today know about the GEDCOM standard and the vast majority use it as a way to share their software’s genealogical data with others or to get its data from other software. - 1 / 1
- All the rules are there. Genealogy software has been successfully sharing data with other software for 24 years using GEDCOM 5.5.1. - 2 / 2
- GEDCOM 5.5.1 hasn’t changed at all in 24 years. - 3 / 3
Giving it 3 out of 3, I can’t see why this version of GEDCOM 5.5.1 would not be considered to be a “good” standard.
Data Doesn’t Transfer
GEDCOM is supposed to facilitate data transfer between programs.
If you are using genealogy program XXXX and decide you want to switch to genealogy program YYYY, then you need to transfer your data. So you export your data from XXXX to a GEDCOM 5.5.1 file and you import it from that file into program YYYY. You will likely find that a lot of your data did not transfer.
For the past 15 years, we’ve seen initiatives such as BetterGEDCOM, FHISO, and GEDCOM 7.0 try to improve GEDCOM 5.5.1 to enable much more of the data to transfer. The idea here was that there was something about GEDCOM 5.5.1 that was preventing the data transfer.
I believe this thinking is wrong.
The work I have done have led me to conclude that:
- 5% of data doesn’t transfer because GEDCOM 5.5.1 cannot handle the specific type of data.
- 35% of data doesn’t transfer because the receiving system did not implement the functionality that needs or uses that data, and thus did not have a data structure or table in its database to store it.
- 60% of data doesn’t transfer because the developer did not use the correct GEDCOM 5.5.1 method, or used his own custom tags to do the transfer.
If only 5% of the data not transferring is due to GEDCOM, then the standard is not the problem.
If 35% is due to the receiving system not needing or accepting the data, then no improvements to the standard could fix that.
If 60% is due to developers not making the effort to correctly implement GEDCOM, then more education about the standard is needed.
What Is Not Needed
There is nothing inherently wrong with GEDCOM 5.5.1. What is not needed is a significant revision to it. What I am referring to of course, is the release of GEDCOM 7.0 two years ago by FamilySearch.
GEDCOM 7.0 is written differently from GEDCOM 5.5.1. It no longer uses the GEDCOM form but uses a Hierarchical container format. Standard Bachus Naur Format (BNF) for defining the syntax is changed to “A Metasyntax for Structure Organization”. Changing the representation of the standard is akin to writing it in a different language. It makes the adoption of the standard by 5.5.1 users unnecessarily more difficult. Programmers do not want something to change just for the sake of change. They want a standard where every change is simple and understandable and meets a need. If it ‘aint broke, don’t fix it.
The selling point of a new standard is for better data transfer. It seems like slim pickings if they are trying to reduce the 5% of the data that does not transfer. Adding new data structures is admirable if they are needed by the majority. But will enabling negative assertions, rich-text notes and “better” multimedia handling be useful if 35% of the systems will not need or accept that data and 60% of them will not follow the rules in using it?
After more than two years, very few genealogy developers have implemented GEDCOM 7.0. Fewer still have implemented the new features that 7.0 added.
There can be many different reasons for this, from technical to practical to the simple idea that they’d rather wait for everyone else to implement it before they spend their time and resources in doing it themselves.
What Is Needed
If you want more of your data to transfer between programs, you won’t get it by creating a new standard for that 5%, and you won’t be able to improve on the 35% that your destination program has not implemented.
The best you can do is to reduce the 60% of the data that is written incorrectly or read incorrectly or written as custom tags which the receiving system cannot understand. For that we need better resources that will help the developer implement the GEDCOM 5.5.1 standard as correctly as possible.
And there are a couple of resources available for that right now.
- The GEDCOM 5.5.1 Annotated Edition
- The GEDCOM 5.5.5 Specification
Both are available at: https://www.gedcom.org/gedcom.html
These specs were created in 2019 by Tamura Jones with the input of 9 genealogy software developers, myself included.
The GEDCOM 5.5.1 Annotated Edition takes all the knowledge and experience of these experts and adds them as notes into the original 5.5.1 standard. They explain whatever is not clear and give suggestions as to how to correctly implement GEDCOM.
The GEDCOM 5.5.5 Specification effectively updates the 5.5.1 standard with the notes from the 5.5.1 Annotated Edition and marks items that are no longer of practical use and should be deprecated from the 5.5.1 standard. In this way the 5.5.5 Specification should be used for writing to a GEDCOM file as it is 100% backward compatible to 5.5.1, except for some necessary correction of mistakes in 5.5.1 and relaxation of some length restrictions.
Further Reading
- Is Perfect GEDCOM Reading and Writing Necessary? (16 Oct 2023)
- Reading GEDCOM – A Guide for Programmers (12 Oct 2023)
- Conflicting Information in GEDCOM (9 Nov 2021)
- Can GEDCOM 7.0 Succeed? (15 Jun 2021)
- GEDCOM 7.0, Official (7 Jun 2021)
- GEDCOM 7.0 (19 Feb 2021)
- The GEDCOM 5.5.5 Initiative and Making It Work (6 Oct 2019)
- Advancing the GEDCOM Standard (26 May 2018)
- Complete Genealogy Data Transfer (8 Jun 2015)
- Is GEDCOM Good For Sources? (7 May 2015)
- The Future of Genealogy – 6 Predictions (7 Apr 2015)
- Standardizing Sources and Citation Templates (27 Aug 2014)
- A Recipe for GEDCOM (3 Aug 2013)
- Nine Necessities in a GEDCOM Replacement (5 Jun 2013)
- Build a BetterGEDCOM or learn GEDCOMBetter? (5 Jan 2011)
- BetterGEDCOM (7 Dec 2010)
Conclusions
Is Updating the GEDCOM Standard Necessary? I would say no. If anything, a few minor additions to 5.5.1 would be useful, but nothing major.
Moving to GEDCOM 7.0 could be dangerous as it might make data less likely to transfer correctly. Developers do not want to spend time changing their programs to implement features not needed by their own programs.
Available resources such as the 5.5.1 Annotated Edition and the 5.5.5 Specification that better explain how to implement GEDCOM can help developers make their GEDCOM more compatible with others.
Any future work on the GEDCOM standard should strongly discourage the use of user-defined (i.e. custom) tags, or even better, make them illegal.