The GEDCOM X team has just put up a survey. Here are the responses I sent to them.
GEDCOM X: Request For Feedback: March 23, 2012
We’d love to hear your thoughts on some of the things we’re working through right now at GEDCOM X. Just three questions.
What are the five most significant deficiencies in GEDCOM 5.5 that you’d like to see addressed in GEDCOM X?
The most significant deficiency:
Not keeping “conclusions” separate from the raw source details. Source details (i.e. the factual parts of the SOURCE_CITATION) must be their own record (or a subrecord of the source)
The second most significant deficiency:
Not having a Place record. Events should be allowed to be associated to places. Place hierarchies should go lower, down to the house, or even a room in a house. A ship should be allowed to be a place (even though it moves around, certain events happen on a ship)
The third most significant deficiency:
Not enough attention was drawn to the powerful ASSO/RELA (Association/Relationship) tags, that are underused and can be made very powerful.
The fourth most significant deficiency:
Good and simple documentation, that will help developers use the standard correctly.
The fifth most significant deficiency:
Good sample files, for developers to use as a guide and test their programs with.
What kind of file format do you prefer?
There are currently two active proposals for the new GEDCOM X file format: a text-editor-friendly MIME multipart (see http://www.gedcomx.org/File-Format.html) or a binary, indexed, zip-based bundle (see http://www.gedcomx.org/File-Format—Alternate-Proposal.html). Which do you prefer and why?
- The MIME-based file (see http://www.gedcomx.org/File-Format.html)
- The zip-based file (see http://www.gedcomx.org/File-Format—Alternate-Proposal.html)
Why?
Objects should not be embedded within the raw data. Early GEDCOM versions tried this and found it was a mistake and subsequently took it out. You have to parse through these huge objects to get to the data. You may get files terabytes of size or larger.
A separate compressed or non-compressed file containing all the objects is better. Event if the file is huge, it need not have to all be read but can be kept on disk and just the parts necessary be accessed quickly when needed via the index and memory-mapping.
Do you have any other feedback?
Current GEDCOM has a lot of great ideas that currently work and work well. Try not to lose these.
Also, try to use the philosophy of “don’t change something unless it absolutely has to be”. Minimizing the logical changes between GEDCOM and GEDCOM X will better allow developers to adopt the new standard.
When choosing between a simple solution and a complex solution, choose the simplest one when it is 99% adequate. Remember, we developers have to program that stuff.
Produce a version of your XML using the GEDCOM syntax. It should be a simple mechanical translation. This will not only validate that the XML works, but will also show exactly where and how current GEDCOM differs with GEDCOM X. In addition, it will allow developers to convert into GEDCOM X in two easy steps: (1) to GEDCOM X in GEDCOM syntax (2) to GEDCOM X in XML. If this is not done for them, the developers will have to do it themselves anyway to see how they must modify their data structures or import/export routines. Each will end up doing it slightly differently and this will introduce errors in input/output of GEDCOM X between programs. So it is better if you do it.
The number one goal should be to ensure that the data will transfer properly between programs. Attempts to include all possible data will result in complexity that will prevent the goal. A proper balance must be achieved. Anything that forces developers to do something unnatural will cause them object and look for a way around it.
Allowing custom tags always results in other programs being unable to properly interpret the meaning of the tag. Programs are starting to use them more and more to export their data that GEDCOM doesn’t handle. This must stop, and some method of being open and extensible without allowing abuse must be implemented - maybe constructs like the TYPE tag, which allow a user-defined data item, rather than allowing user-defined entities, is the way to go.