Login to participate
  
Register   Lost ID/password?

Louis Kessler’s Behold Blog

Double Match Triangulator 4.0 - Tue, 23 Feb 2021

Yesterday, I released  new version of Double Match Triangulator. This took longer than I hoped, but finally it’s out.

In late November, GEDmatch changed the format of their segment match file and also made a change to their one-to-one report, so DMT needed to be updated to handle those.

Also in May 2019, MyHeritage changed their segment match files to include a unique ID for each person. DMT now uses the name plus part of that ID so that like GEDmatch, two people with the same name will be differentiated.

But the biggest changes are under the hood.  I reviewed most of DMTs internals and geared it to give you as much of the information and assumptions that you can make from your data.

DMT’s interface now looks like this:

The only change to the interface is the addition of a Male (1 X) / Female (2 X) selector. The selection will be reflected in the number of X chromosomes DMT includes in the results. DMT now knows that for males, ancestral paths on the mother’s side are the only one allowed. And it understands that no ancestral path on the X can go through two F (fathers) in a row.

I also found and fixed a bug that was causing the Map page to take 10 times longer to generate than it needed to.


DMT, Painting and Clustering

Double Match Triangulator works differently than all the other autosomal DNA tools. By comparing all the segment matches of two or more people, DMT determines every single triangulation between the people whose files you have.

The addition of user-entered Most Recent Common Ancestors (MRCAs) in DMT version 3 allowed DMT to take the next step and allow painting of ancestral paths to segments.  This is exactly what you do manually with DNA Painter, except that with DNA Painter, you are only adding single matches.  DMT used triangulations and makes use of the fact that segments that don’t triangulate likely are on the opposite parent as those that do triangulate.

DMT also calculates all possible inferred matches, where Person B matches Person C but Person A does not. Basically, these refute the ancestral line towards the more distant MRCA of Persons B and C. .

Put those together, and you can get most of your genome painted fairly easily.  DMT will create a file for you that you can input into DNA painter. For example, if you know 11 MRCAs and have their segment match files, this is what the results might look like when uploaded to DNA Painter:

Other autosomal analysis tools that do clustering have become available in the past few years. DMT does clustering as well. It does so by using the most common ancestral path of a person’s segments to be the cluster for the person.

With those 11 MRCAs in the above example, DMT places people into the following clusters:

image

In this example, almost half the people got assigned to a cluster on either the father (F) or mother (M)’s side. For people who you don’t know your relationship to, this will be a great clue as to which ancestral line you should look at first.


DMT is Available at:

The new version 4 of DMT can be found on the DMT website:  www.doublematchtriangulator.com

For those of you who have already purchased DMT ($40 USD) it is as it always will be a free upgrade. Simply download and install it.

For those who haven’t, please feel free to try the program. The download is fully functional but only shows you results for chromosome 1. That should be enough to give you a good feeling for what it can do.

GEDCOM 7.0 - Fri, 19 Feb 2021

**UPDATE** June 8: 
FamilySearch has released the official version of GEDCOM 7.0.
See my blog post: GEDCOM 7.0, Official.

It appears that a GEDCOM Version 7.0 Release Candidate will be announced at RootsTech Connect on February 25.

This will likely take place in the session by Gordon Clarke titled:
”GEDCOM is Alive and getting Smarter” –> See Feb 24, 2021, below.

The home for GEDCOM 7.0 appears to be:  https://gedcom.io

image

The current Release Candidate 7.0.0-rc1 appears to be available:

As a web page:  https://gedcom.io/specifications/GEDCOM7rc.html

As a PDF:  https://gedcom.io/specifications/gedcom7-rc.pdf


   

I’m going to keep track on this blog post of anyone writing about GEDCOM 7.0. Please let me know of any new articles you find and I’ll post them here:

Feb 19, 2021:

Feb 20, 2021:

Feb 21, 2021:

Feb 22, 2021:

Feb 23, 2021:

Feb 24, 2021

Feb 25, 2021

  • Gordon Clarke’s two presentations inexplicably no longer are available from RootsTech Connect.  The YouTube videos at the links above (Feb 24) are unavailable as well.
  • The gedcom.io site also became inaccessible.
  • The three blog posts on James Tanner’s Genealogy’s Star personal blog have been removed. (see above, Feb 19, 20, 23)
  • The GEDCOM Standard page at familysearch.org has removed the “What can we expect in the future?” and “What is GEDZip?” paragraphs that referenced GEDCOM 7.0 and GEDZip. (See Feb 19)

Feb 27, 2021

  • Markus Henn tweeted:  Answer from @RootsTechConf staff: @FamilySearch has determined to not publish information regarding #GEDCOM standards at this time. This includes some content intended for #RootsTechConnect 2021. We apologize for any inconvenience. Thank you for your patience and support.”

Feb 28, 2021

GEDCOM Should NOT Allow Extensions - Mon, 18 Jan 2021

The GEDCOM standard for transferring genealogical data has been in use basically unchanged for over 20 years now. Just about every genealogy software program can export (some of) its family data to a GEDCOM file, and can import (some of) the family data in a GEDCOM file into its database.

The issue is the “(some of)” qualifier that I put in.

We want our programs to export all their family data so that a user can transfer that data to another program or website. For the most part, the basic name-birth-marriage-death-date-place information transfers reliably. It’s everything else, facts, events, sources, repositories and even notes that often don’t make the crossing.

The blame is usually put solely on GEDCOM, accusing it of being unable to represent the data.

I disagree. I put just 10% of the blame on GEDCOM, and 90% of the blame on the programmers of genealogy software who have, for whatever reason, decided not to use some of the GEDCOM tags and constructs but rather use their own inventions instead.


Why Data Doesn’t Transfer

Several obvious reasons:

  1. The exporting program doesn’t export some of its data. You can’t import what’s not there.
  2. The exporting program sometimes exports its own custom GEDCOM tag or construct rather than use what’s in GEDCOM. An importing program can’t import what it doesn’t understand.
  3. The exporting program exports some of GEDCOM incorrectly. Hard to import anything that isn’t correctly exported.
  4. The importing program doesn’t import everything. Usually it won’t import what it doesn’t export.
  5. The importing program doesn’t recognize certain standard GEDCOM tags and constructs when it uses its own custom GEDCOM tags and constructs in their place for its own export. So for these tags and constructs, it will only import its own data again.
  6. The importing program imports some of GEDCOM incorrectly. It may lose some data as a result.
  7. GEDCOM does not have a construct for storing a certain type of data, so it can’t be transferred. Many people think this is a worse problem than it is. There’s not much family data that GEDCOM cannot transfer.
  8. GEDCOM allows developers to use their own custom tags or extensions, so the developers do use their own. Other programs will not understand anything a developer does that’s not in the standard unless they do custom programming specifically to handle that developer’s custom tags and extensions. Allowing this was a mistake.


What is the Problem?

The number one problem is that developers for whatever reason, are not taking the time to ensure that they understand the GEDCOM standard and try to export their data the way GEDCOM is telling them to.

Too often, they are jumping to the conclusion that there is no way to export their data to GEDCOM, so they take what they think is the easy way out, and they invent their own tags and constructs for their data.

What harm in that? – they think. After all, their program will export their data, and their program will be able to import it again. Do they really care if another program can?  (They should, but I won’t get into that in this article.)


An Example

I recently had an online conversation with a very experienced genealogy software developer who was wondering how strict a genealogy program should be with respect to GEDCOM support.

He gave this example of how he wanted to export information extracted from a marriage licence and add it as part of the MARL (marriage license) tag in GEDCOM.  

image

The MARL tag is valid. GROO, BRID and RECR are not. Source information is being included in an MARL fact under the GROO and BRID tags, when it should be in GEDCOM’s SOURCE_CITATION structure instead.

Other than the program creating this, no genealogy program will be able to read and load this data as intended into its database.

So how should this case be handled?  This was my answer:

Converting your MARL event to valid GEDCOM (adding illegal indentation for clarity) would give this:

image

The birth places and ages could also be documented, but they shouldn’t be done under the marriage license event. They should be under the individual’s birth event:

image

What GEDCOM is saying regarding Evidence and Conclusions is this: Evidence should be in the DATA portion of the SOURCE_CITATION. Conclusions are the Events/Facts that you enter.

The TEXT information can be included as it is in the document and needn’t have to be pigeonholed into real or imaginary tags like GROO or RECR


Conclusion

As I see it, two very bad things happen when developers do not follow GEDCOM as intended:

1. They will export GEDCOM that other programs will not understand.

2. They will not bother to implement some GEDCOM constructs that they are not using, so their program will not be able to import and properly interpret those valid GEDCOM constructs from other programs.

People think GEDCOM is the main reason why data doesn’t completely transfer between programs. False. It is the inconsistent implementation of GEDCOM for both import and export that is the primary cause of data loss.

Future enhancements to GEDCOM should require that only GEDCOM tags and constructs be used. No developer tags or constructs should be allowed.

Requiring compliance with no exceptions is the only hope we will ever have for all our genealogy data to one day be able to transfer correctly from program to program.


Further Reading

From 2015: Complete Genealogy Data Transfer
From 2015: Is GEDCOM Good For Sources?
From 2013: Nine Necessities in a GEDCOM Replacement