Login to participate
  
Register   Lost ID/password?

Louis Kessler’s Behold Blog

Using DMT, Part 1: My 23andMe Data - Thu, 17 Oct 2019

I am going to show you how I am using Double Match Triangulator, and some of the information it provides, at least for me.

My own DNA match data is difficult to analyze. I come from a very endogamous population on all my sides that originates in Romania and Ukraine. The endogamy gives me many more matches than most people, but because my origins are from an area that have scarce records prior to 1850, I can only trace my tree about 5 generations. Therefore the vast majority of my matches are with people I may never be able to figure out the connection to. But there should be some that I can, and that is my goal, to find how I’m related to any of my 5th cousins and closer who DNA tested.

I’ve tested at Family Tree DNA, 23andMe, Ancestry, MyHeritage, and I’ve uploaded my DNA to GEDmatch. I have also tested my uncle (father’s brother) at Family Tree DNA and uploaded his DNA to MyHeritage and GEDmatch.

Ancestry does not provide segment match data, so the only way to compare segments with an Ancestry tester is if they’ve uploaded to GEDmatch.



Where to Start

With DMT, the best place to start is with the company where you have the most DNA relatives whose relationship you know. I know relationships with:

  • 14 people at Ancestry, but they don’t give you segment match data
  • 9 people at 23andMe.
  • 3 people at GEDmatch.
  • 2 people at MyHeritage.
  • 2 people at Family Tree DNA

So I’ll start first with the 9 people at 23andMe. The somewhat odd thing about those 9 people are that they are all related on my father’s side, meaning I won’t be able to do much on my mother’s side.

We’ll see what this provides.



Getting Segment Match Data from 23andMe

At 23andMe, you can only download the segment match data of people you administer. That means you have to ask your matches if they would download and send you their match data if you want to use it.

But there is an alternative. If you subscribe to DNAGedcom, you can get the segment match files of any of the people you match to. I have a section of the DMT help file that describes how to get 23andMe match data.

I used DNAGedcom to download my own segment matches, as well as the segment matches of the 9 relatives I know relationships with at 23andMe, plus 7 other people I’m interested in that I don’t know how I’m related to.



Entering my Known Relatives’ MRCAs in my People File.

First I load my own 23andMe match file as Person A in DMT. Then I run that file alone to create my People file. DMT tells me I have 8067 single segment matches. DMT excludes those that are less than 7 cM and produces a People file for me with the 892 people who I share 4244 segments of at least 7 cM. It is sorted by total matching cM, highest to lowest so that I can see my closest relatives first.

I go down the list and find the 9 relatives I know and enter our Most Recent Common Ancestors (MRCAs). Here’s the first few people in my list whose names I altered to keep them private:

image

Naa is the daughter of my first cousin, i.e. my 1C1R. She is my closest match at 23andMe and we share 541 cM on 22 segments that are 7 cM or greater. Our MRCA from my point of view is my Father’s paRents, so I enter FR as our MRCA.

Daaaaa Paaaaa is my father’s first cousin sharing 261 cM. So he is also my 1C1R but since he is my Father’s Father’s paRent’s daughter’s son, he gets an MRCA of FFR.

Similarly, Baaaa Raaaaaa and Raaa Raaaaaa are brothers who are both 2nd cousins sharing 153 and 143 cM with me. Their MRCA is FFR

The other two people I marked FFR are my 2C1R sharing 152 cM and 94 cM.

I also have 3 people related on my father’s mother’s side. The two FMFR’s shown above are 3rd cousins on my father’s mother’s father’s side sharing 90 cM and 84 cM. Also on line 96 (not shown above) is a 3rd cousin once removed sharing 58 cM who I’ve given an MRCA of FMFMR.



Single Matching

I save the People file with the 9 MRCAs entered, and I’ll run that file alone again. This time, it uses the MRCA’s and paints any matching segments at least 15 cM to the MRCA ancestral path.

This is exactly what you do when you use DNA Painter. You are painting the segment matches whose ancestral paths you know onto their places on the chromosome.

The reason why single segment matches must be 15 cM to paint is because shorter single segment matches might be a random match by chance. That can be true even if it is a close relative you are painting. It’s better to be safe than sorry and paint just the segments you are fairly certain are valid. Beware of small segments.

DMT tells me it is able to paint 15.3% of my paternal segments to at least the grandparent level using the 9 people. My closest match, my first cousin once removed doesn’t help directly, because some of her segment matches may be on my father’s father’s side, and some may be on my father’s mother’s side, and we can’t tell which on their own. But her matching segments may overlap with one of the other relative’s matches, and hers can then be used to extend that match for that grandparent. DMT does this work for you.

With this data, DMT cannot paint any of my maternal segments. I’d need to know MRCAs of some of my maternal relatives at the 2nd cousin level or further to make maternal painting possible.

DMT produces a _dnapainter file that I can upload to DNA Painter. This is what it looks like in DNA Painter showing the 15.3% painted on my paternal chromosomes:

image

My father’s father’s side (in blue), could only be painted to the FF level because I don’t have any MRCAs beyond FFR.

My father’s mother’s side (in green), was paintable to the FMFM level because I had a 3C3R with an MRCA of FMFMR.  But there’s less painted on the FM side than the FR side because my FM relatives share less DNA with me.



Double Matching and Triangulating

Now let’s use the power of DMT to compare my matches to the matches of my 9 known relatives and the 7 unknown relatives, combine the results, and produce triangulation groups and finally produce some input for DNA Painter.

The great thing about triangulating is that by ensuring 3 segments all match each other (Person A with Person C, Person B with Person C, and Person A with Person B), it considerably reduces the likelihood of a by chance match, maybe down to segment matches as small as 7 cM, which is the default value for the smallest segment DMT will include.

This allows DNA to paint 46.1% of the paternal DNA, about 3 times what was possible with just single matching. In DNA Painter, this looks like:

image



Clusters

DMT also clusters the people I match to according to the ancestral paths that the majority of their segments were assigned to.

DMT clustered the 892 people that I match to as follows:

image

There were 83 U people who DMT could not figure out if they were on the F or M side. There were 75 F people who DMT could not figure out if there were on the FF or FM side. And there were 19 X people who didn’t double match and were only a small (under 15 cM) match in my segment match file, so these were excluded from the analysis.

Overall 63% of the people I match to were clustered to my father’s side, 24% to my mothers side. Here’s how my closest matches (see first diagram above) look after they were clustered:

image

So it looks like quite a few of my closest matches whose MRCA I don’t know might be on my FF side. That tells me where I should start my search for them.

There are also 4 matches that might be on my M mother’s side that I can try to identify.

All but two of my top matches have at least one segment that triangulates with me and at least one of the 16 people whose match files I ran against. All the details about every match and triangulation are included in the map files that DMT produces, so there’s plenty of information I can look through if I ever get the time.

Next post:  GEDmatch.

DMT 3.1 Released - Wed, 16 Oct 2019

I’m working on a series of articles to show how I am using the new 3.0 version of Double Match Triangulator to analyze my own segment match data.

As much as I’d like you to believe that I’ve developed DMT for the good of genetic genealogists everywhere, I humbly admit that I actually developed it so that I could analyze my own DNA to help me figure out how some of my DNA matches might be related.

Of course, as I started working on my articles, first looking at my 23andMe matches, I found some problems in my new 3.0.1 version, and a few places I could make enhancements.

If you downloaded version 3.0 that was released on Oct 1, or 3.0.1 on Oct 3, please upgrade to 3.1 whenever you can.

image

If you are a member of the Genetic Genealogy Tips and Tricks group on Facebook, or the DNA Painter User Group on Facebook, my free trial key I gave there is still valid and will let you run the full version of DMT until the end of October. Just look for my post on Oct 2 on either group for the key.

Some of the enhancements in Version 3.1 include:

  • The grandparent extension algorithm:  I found a few extra extensions where they should not be. So I completely changed the algorithm to one that was clearer and easier for me to verify that it is working properly. The final results are similar to the old algorithm, but they’re significant enough to make a noticeable effect on the grandparent assignments.
  • Small improvements were made to the determination of triangulation boundaries.
  • Internally, I increased the minimum overlap DMT uses from 1.0 Mbp to 1.5 Mbp. This was to prevent some incorrect overlaps between two segments when there was a bit of random matching at the overlapping ends of the segment.

And there were a few bug fixes:

  • If you run 32-bit Windows, then the DMT installer installs the 32-bit version of DMT rather than the 64-bit version. The 32-bit version had a major bug that crashes it when writing the People file. Nobody complained to me about this, so I guess most of you out there are running 64-bit Windows. Maybe one day in the not-too-distant future, I’ll only need to distribute just the 64-bit version of DMT.
  • In the Combine All Files run, a few matches were not being assigned an AC Consensus when they should have been. Also a few assignments of AC No Parents was made when there was a parent.
  • 23andMe FIA match files downloaded using DNAGedcom were not being input correctly.

Now back to see what DMT can do for me.

The GEDCOM 5.5.5 Initiative and Making It Work - Sun, 6 Oct 2019

It’s been 35 years since GEDCOM 1.0 was released to the genealogical software development community in 1984.

It’s been 20 years since GEDCOM 5.5.1, the last official specification by FamilySearch was released on October 2, 1999.

Five days ago, October 2, 2019, the gedcom.org website was renewed containing the newly-released GEDCOM 5.5.5 specifications.

GEDCOM was originally an acronym for GEnealogical Data COMmunication. It has been the standard that genealogical software has been using for the past 35 years. It specifies how genealogical data should be exported to a text file so the data can be preserved in a vendor-neutral form (separate from their proprietary databases) in a format that other programs will be able to import.

For 15 years, between 1984 and 1999, the GEDCOM standard was developed and made available by the Church of Jesus Christ of the Latter Day Saints (LDS). They had a team in place that had discussions with many genealogical software vendors. They prepared the standard by taking all the ideas and figuring out how computer programs of the day could and should transfer data between programs.

They were very successful. There have been hundreds of genealogy programs developed since then, and just about every single one of them adopted the standard and can import GEDCOM and export GEDCOM. I don’t know of too many standards that gained nearly 100% acceptance. To me, that is a very successful adoption of a standard in any field.

Why was GEDCOM successful by the LDS? My take on that is because of the way they operated.

1. They had a team in place to develop the standard.
2. They sent drafts to developers and solicited feedback.
3. The team evaluated all the feedback and suggested possible implementations, and evaluated conflicting ideas,
4. And most importantly: one person acted as editor and made the decisions. I believe that may have been Bill Harten, who has been called the “Father of GEDCOM”.  


What Happened in 1999?

In 1999, the LDS decided it no longer was going to continue the development of GEDCOM and it disbanded the team. The last GEDCOM version the LDS issued was 5.5.1, which was still labeled beta, but was definitely the de facto standard because the LDS itself used it and not the 5.5 version, in their very own  Personal Ancestral File (PAF) software.

The standard was very good, but not perfect. Each software developer used GEDCOM basics for important items like names, linkages between children and families, families and parents, birth, marriage, death dates and places.

GEDCOM had more than that. Way more. It had lots of different events and facts. It had links to multimedia. It had sources, repositories and source references. Technically it had everything you needed to properly source and reference your genealogical material. The funny thing was that it was ahead of its time.

Genealogical programs back then and genealogists in particular did not have the understanding of the need to document our sources. We were all name collectors back then. C’mon admit it. We were. We only learned years later our folly and the importance of sources.

So in the next 10 years after 1999, software developers starting adding source documentation to their programs. And in doing so, most of them ignored or only loosely used the sourcing standards that GEDCOM included. They did so because no one else was exporting sources with GEDCOM. What happened is that each of them developed their own unique sourcing structure in their programs and often didn’t export that to GEDCOM, or if they did, they only used some of GEDCOM and developed their own custom tags for the rest, which none of the other programs would be able to understand.

This continued to other program features as well. For example, adding witness information or making a place a top level record was not something the last version of GEDCOM had, so developers, and even groups of developers came out with extensions to GEDCOM (such as Gedcom 5.5EL).

The result:  Source data and a lot of other data did not transfer between programs, even though almost all of them were exporting to and importing from what should have been the very same GEDCOM.


Two Attempted Fixes

About 2010, a BetterGEDCOM grassroots initiative was started by Pat Richley-Erickson (aka Dear Myrtle) and a number of others. For 10 years, too much data, and especially source data, was not transferring between programs.

At the time I wrote in a blog post about BetterGEDCOM:

“The discussion is overwhelming. To be honest, I don’t see how it is going to come together. There are a lot of very smart people there and several expert programmers and genealogists who seem to be having a great time just enjoying the act of discussing and dissecting all the parts. … I really hope they come back to earth and accomplish something. I’ve suggested that they concentrate on developing a formal document.”

There was a very large amount of excellent discussion about where GEDCOM should go and what was wrong with it and what should be fixed. But nobody could agree on anything.

The reason in my opinion why this didn’t work out:  Because the discussion took over, similar to a Facebook group where everyone had their own opinion and no one would compromise.

The one thing missing was an editor. Someone who would take all the various ideas and make the decision as to the way to go.

After a few years, the BetterGEDCOM group realized it wasn’t getting anywhere. Their solution was to create a formal organization with by-laws and a Board of Directors to spearhead the new initiative. It was called FHISO and they created the website at fhiso.org. They obtained the support of many software vendors and genealogical organizations.

On March 22, 2013, FHISO initiated their standards development process with an Open Call for Papers. Scores of papers were submitted, two by myself.

Then discussion started and continued and continued and continued. There was little agreement on anything. They had excellent technical people involved, but the nature of the discussion was often too technical for even me to understand and got involved in externalities far beyond what GEDCOM needed and spent months on items that were more academic in nature than practical.

FHISO had hoped to come out with an Extended Legacy Format (ELF) which would update the 5.5.1 standard. They were working on a Serialization Format (which GEDCOM already effectively had) and a Data Model. It’s the data model that is what is wanted and is most important. But to date, after over 6 years, all they have is a document defining Date, Age and Time Microformats.

Horrors, that document is 52 pages long and is a magnificent piece of work, if you wanted to submit a PhD thesis. Extrapolating that out to the rest of GEDCOM, we’d likely be looking at a 20 year development time for a 50,000 page document which no programmer would be able use as a standard.


We Need Something Practical – GEDCOM 5.5.5

Genealogy technology expert Tamura Jones on his Modern Software Experience website has for years been an in-depth critical reviewer of genealogical software and the use of GEDCOM. I have greatly benefited from several of his reviews of my Behold software which sparked me to make improvements.

I had the pleasure of meeting Tamura when I was invited to speak at the Gaenovium one day technology conference in his home town of Leiden Netherlands in 2014. I then gave the talk “Reading wrong GEDCOM right”. Over the years Tamura and I have had many great discussions about GEDCOM, not always agreeing on everything.

In May of last year (2018), on his own initiative, Tamura created an annotated version of the GEDCOM 5.5.1 specification that was very needed. It arose from his many articles about GEDCOM issues, with solutions and best practices. Prior to release, he had a draft of the document reviewed by myself and six other technical reviewers, and incorporated all of our ideas that made sense with respect to the document. Many of the comments included thoughts about turning the annotated edition into a new version of GEDCOM.

So I was surprised and more than pleased when a few months ago, Tamura emailed me and wanted reviewers for what would be a release of GEDCOM 5.5.5.

image

The goal was to fix what was not right in 5.5.1, remove no longer needed or used constructs, ensure there was just one way to do anything, fix examples and clear up everything that was unclear. This version 5.5.5 was not to have anything new. Only obsolete, deprecated, duplicate, unnecessary and failed stuff was to be taken out.

The 5.5.5 result was published on October 2, exactly 20 years after 5.5.1 was released. You can find GEDCOM 5.5.5 on the gedcom.org site.

For a description of what was done, see the Press Release.


Why GEDCOM 5.5.5 Works and What’s Coming

GEDCOM 5.5.5 is based on GEDCOM 5.5.1. It included the thoughts and ideas of a number of genealogy software developers and experts, just like the original GEDCOM did. It is practical as it is a repair of what has aged in 5.5.1.

The reason why this document has come out was because it was done the way the LDS originally did it. A draft was sent to experts. Each sent back comments, opinions, criticisms and suggestions back to the editor, who then compared the ideas and decided what would work and what wouldn’t.

I know I spent dozens of hours reviewing the several drafts that were sent to me. I sent back what was surely a couple hundred comments on every piece of the document. I didn’t agree with the document on all items. In the end, some of my arguments were valid and the item was changed. But some were not and I appreciate those decisions, because other reviewers may feel different than me and some final decision must be made. I’m glad Tamura is the one willing to make the decision as I very much respect his judgement. I also greatly thank him for the thousands of hours I know he must have put into this.

What made this happen?

1. There was a standard that had already been annotated to start with.
2. Drafts were sent to developers and feedback was received.
3. Tamura reviewed all the feedback and considered all possible implementations, and evaluated conflicting ideas,
4. And most importantly: Tamura acted as editor and made the decisions. He is not selling software himself, so is unbiased in that respect.

I will support and promote this initiative. I will be adding GEDCOM 5.5.5 support into future versions of Behold. I encourage other genealogical software developers to do so as well.

This is only a first step. There are enhancements to GEDCOM that many vendors want. These ideas need to be compared to each other, and decisions need to be made as to how they would best be implemented in a GEDCOM 5.7 or 6.0.  A similar structure with one respected editor vetting the ideas is likely the best way to make this happen. And as long as the vendors are willing to offer their ideas and compromise on a solution, Tamura will be willing to act as editor to resolve the disputes. I would expect this would give us the best chance of ending up with a new GEDCOM standard that is clear, easy for genealogy software programmers to implement, enable all our genealogical information to transfer between programs, and be something that genealogy software developers would agree to use.