Login to participate
  
Register   Lost ID/password?

Louis Kessler’s Behold Blog

EAST Part 2 - Double Match Triangulation - Tue, 14 Jun 2016

In Part 1, I gave you a flavour of the mass-triangulation that I am doing, which I called EAST: Extreme Autosomal Segment Triangulation. Triangulation is a technique to determine what parts of your DNA come from what ancestors. That will help you determine how your matches are related to you.

What is new about this EAST technique is that it uses segment matches of two people (your own and a relative) rather than just one (your own).

 

Single Match Triangulation (SMT)

To make this clear, let me first describe the standard way people currently triangulate with FamilyTreeDNA data. I’m going with the same example I used in Part 1, using my uncle Harry as person A, and my 3rd cousin Joel as person B. They are 2nd cousin’s once removed and their most recent common ancestors are my great-great-grandparents Hirsch Focsaner and his wife Dwora.

To triangulate, I first need the segments that match between Harry and Joel. I go to Harry’s FamilyTreeDNA account, select Chromosome Bowser, and pick Joel to match to. It gives this diagram:

Harry and Joel match on the orange segments.

Now we need to find a relative of both Harry and Joel to be the third person of the triangulation. We go back to the FamilyTreeDNA matches page and next to Joel’s name, we click on the 4th cute little symbol below his name to “Run Common Matches”.

image

That brings up a second menu and we select “In Common With”. Then, if you are lucky like me, you’ll be presented with 232 pages of matches containing the 2,318 people who match to both both Harry and Joel.

Now write down the names of the top 4 and go back to the Chromosome browser and add them along with Joel. Now you’ll see:

image

These are still Harry’s chromosomes. Joel’s matches with Harry are shown in orange. We want a third person who matches with Harry and Joel. These four people only have one instance, in chromosome 1, where one of the others matches one of Joel’s segments. You can see it in chromosome 1 as the green line that is under the orange line. Joel’s match with Harry (the orange line) is 10.64 cM. The green match is with someone named Daniel and it is 18.76 cM. So we have a triangulation. Harry matches Joel where Joel matches Daniel and Daniel match Harry.

The chromosome match setting was for a minimum 5+ cM. You could go down to the 1+ cM and you’ll find a lot more matches. But there’s a problem with this. Because of the way DNA analysis companies determine matches (that half-identical thing), there is a very good chance with small matches that they are not Identical by Descent and you don’t want that. i.e. you need them to be a true relation.

So you’ll have to stick to those 5+ cM matches to be safe.

But in the above we did find that one triangulation we can use. That third person has a segment in common with Harry and Joel. This indicates that the third person has a common ancestor with Harry and Joel. It could be Hirsch and Dwora or it could be an ancestor of Hirsch and Dwora.

So now I invite you to continue to do this for the other 2,314 common matches of Harry and Joel. You’ll tire quickly!

Doing this allows you to create Triangulation Groups, building them up person by person. Triangulation Groups put likely-related people together. The analysis of triangulation groups is complicated and has been written up elsewhere. Jim Bartlett describes it very well on his wonderful Segmentology blog, but I’m not going to get into it, because this only uses segment matches of a single person. I’m going to be doing it differently using the segment matches of two people.

 

Double Match Triangulation (DMT)

Just to let you know, the terms Single Match and Double Match triangulation (SMT and DMT) as well as EAST (Extreme Autosomal Segment Triangulation) are my own. I invented them so that I can talk about them. As far as I can tell, I don’t believe anyone else has extended regular triangulation this way. The closest thing I’ve seen so far is Roberta Estes’ article Just One Cousin, which used chromosome matches between three people. But I want to go from three to extreme. So let’s get into it.

The reason why SMT is referred to as “Single” match, is because only the segment matches of one person is used. Only Harry’s matches in the example above are used. Although we found the people who matched to Joel, we did not use Joel’s segment matches.

To do the Double Match Triangulation, I emailed Joel and he sent me his match list. Please see Part 1 where I describe what this file is and how to get it. I merge my uncle’s chromosome match list with Joel’s match list and I put it into Excel and add some fancy coloured mapping of the chromosomes.

Doing this for the same segment 1 region used in the above SMT example gives the following (which is the same picture I showed in Part 1):

image

The line in yellow is the chromosome 1 match of Harry with Joel. The green area with X’s on the yellow line is their match segment. Remember that second picture of FamilyTreeDNA’s chromosome browser from above? Look again at Chromosome 1:

image

The short orange line is my line in yellow. The longer green line is the is the line that is exactly 6 lines below my line in yellow belonging to Daniel.  The part of that line shown in green with X’s is Daniel’s match with both Harry and Joel. The two parts on either end shown in red with a’s is Daniel’s match with Harry (but Joel doesn’t match). On other segments you can see the line in red with b’s. Those are places the third party matches to Joel but not to Harry.

What’s great about this Extreme triangulation technique is that:

  1. It picks out everybody who has matching segments to you AND to a selected second person. That gives all three connections needed of the triangulation triangle for everyone in a block with one of those yellow lines. This really increases the odds of the three of you being Identical By Descent (IBD). Jim Bartlett says he’s fairly confident that triangulation works down to 5 cM. Jim also says “shared segments below 5 cM are uncharted territory for triangulation.” And he was talking about Single Match Triangulation. New research about Double Match Triangulation by Michael Maglio indicates that a false positive is statistically improbable, indicating the match is IBD (or maybe IBP – identical by population, which is still IBD, but too many generations back to be of much use). So Double Match Triangulation can be used even for small segments.
  2. You get to see, not just all the third party segments matching to you, but also the third party matching to your second person that don’t match to you. This is additional information you don’t get from normal SMT triangulation that I’ll soon show is very useful.
  3. You only have 1/16 of your great-great-grandfather’s segments. But your 3rd cousin has another 1/16. With DMT, you’ve doubled the segments you can match with.
  4. I suspect all three connections may not be necessary. You and your cousin will only match on 1/16 of each others segments. So if you find what looks like a big Triangulation Block of known cousins, and you match to them, and your cousin matches to them, that may be good enough. I’ll have to test this, and if it works, it will make this technique another order of magnitude more powerful in classifying your matches.
  5. Huge time savings for analysis. One EAST is a Triangulation with every single one of your matches at once. And that’s just using one selected known relative as the second person. You can use others as well. You don’t even have to use known relatives. EAST should show you if the second person is significant within your matches..
  6. Lots more that I haven’t even worked out yet.

What we haven’t done yet is to use the EAST data to analyze and classify the segments of your matched people, to put them into Triangulation Groups and identify common ancestors and where everyone fits in. That will be in the next post of this series.

 

… One last thing:

Triple and Multiple Match Triangulation (TMT and MMT)

I want to define these now, because I see it is possible. Get the segment matches of 3 or more relatives and put them all in the same file together. Process them the same way as described in Part 1.

I don’t know if early on in the study of what EAST can do, getting into this complication is worthwhile. It will visually be hard to interpret because instead of having 3 colours (green for both match, blue for only A matches, red for only B matches), with triple match you’ll need 7 colours and Quadruple Match would need 15 colors.

It might be better to do a DMT three times (each of the three in a TMT paired three times) as each DMT would be easier to interpret than the one TMT.

But I’m getting way ahead of myself. Classifying segments will be next.

Follow-up June 20:  Yesterday, A Triangulation Intervention was posted by Blaine Bettinger on his blog, explaining what is correct triangulation for autosomal analysis. He says:

The only way to perform true triangulation is to have segment data and a way to confirm that an overlapping segment is actually shared by two or more genetic matches.

He says the only place true triangulation tool available is the Tier 1 Triangulation tool at GEDmatch. And he says:

It is very important to note that tools like KWorks, JWorks, and ADSA at DNAGedcom, and Matching Segment Search at GEDmatch, while incredibly powerful and valuable tools, do NOT perform triangulation.

I wanted to mention this, because it’s important to understand that the tools and techniques I am developing here with EAST and DMT are all true triangulation techniques. They work with the matching segments of two people and triangulate them, not just with one or two “third people”, but with all the third people at once.

p.s. I’m building a utility program to do this EAST with DMT automatically. I expect I’ll be able to get it to classify your matches for you into true triangulation groups. It will also create comma delimited files you can import into a spreadsheet to visualize your three-way matches like I do in my Excel examples above. When the program is ready, I plan to make it available as freeware.

Help Needed for DMT - Thank You! - EAST Part 3

Extreme Autosomal Segment Triangulation (EAST) - Part 1 - Sun, 12 Jun 2016

You’ve heard of extreme sports. Well this, I think, is the genetic genealogist’s equivalent.

In DNA or Bust, I told about getting my 93 year old uncle’s DNA analyzed. Our Endogamous Ashkenazi heritage layed 7,017 matches upon us. That was one month ago.

Since then, FamilyTreeDNA have updated their Family Finder algorithm. It was a big thing. People’s matches changed. With that change, my uncle lost 91 matches (which I had downloaded so I still have), and gained 565 matches. He still is getting a few new matches a day and is currently up to 7,611 matches. That’s quite a few.

And of those 7,611 matches, there is exactly and only 1 person in that list who I knew beforehand was a cousin. Joel is my 3rd cousin and is a second cousin once removed to my uncle, who I’ll refer to as Harry.

My common ancestor with Joel is Hirsch Focsaner and his wife Dwora Naftulovitz. They are our great-great-grandparents. Joel is listed on Harry’s Family Tree DNA list as a 2nd to 3rd cousin (so they got that right) with 134.8 cM shared and a 27.8 cM largest block.

What I want to do is: (1) Identify all the autosomal DNA segments that came from either Hirsch or Dwora, (2) Identify any people among my uncle’s 7,611 that are related to Hirsch or Dwora or to Hirsch or Dwora’s ancestors, and (3) find closer cousins in that list of matches that are descendants of Hirsch and Dwora that are related to me but not my cousin Joel, and those that are related to Joel but not to me.

And, I want to do this with all my matches together. Think it possible? Read on.

This post will talk about what I did for (1), to identify all the autosomal DNA segments that came from either Hirsch or Dwora.

Triangulation for autosomal DNA (i.e. for Family Finder tests) is a technique to use segment information of related people to determine what parts of their DNA comes from their common ancestor.

You might think you can just go to your Chromosome Browser at Family Tree DNA, and download your matching segments with your cousin, and those would be your common ancestor’s regions on your DNA.

Below is my uncle Harry’s match with Joel as shown in the FamilyTreeDNA Chromosome Browser, down to the minimum 1 cM threshold. As I said above, the matches total 134.8 cM, and you can see on chromosome 10 the largest block of 27.8 cM.

image_thumb2 

That would be fine and good if it were that simple. But the way companies DNA test, they use what’s call Half-Identical Regions (HIR) which don’t guarantee Identical By Descent (IBD) matches when the matching segment size is small (e.g. under 7 cM). What is needed is to use 3 related people (a 3-way match) where person A (my uncle Harry) matches person B (my cousin Joel), and person B (Joel) matches person C (someone else who is also a descendant of our common ancestor) and person C matches person A.

You can download your Chromosome segment data into a spreadsheet from the Chromosome Browser at FamilyTreeDNA by pressing the “Download All Matches to Excel (CSV Format) button at the top right of the page (see the arrow below):image_thumb6

If you have a lot of matches like my uncle, it will be a big file and could take a minute before it responds, so be patient. My uncle’s file has 172,299 lines in it and is 11,820 KB (i.e. 11 MB) in size. There are on average 22.6 matches per person and the average segment match size was 3.6 cM. 3.6 cM is usually considered too small to work with because there is a good chance that those segments are not IBD. But there’s magic that happens when you triangulate with a third person, and even the small blocks become very useful.

When loaded in Excel, the downloaded chromosome match file looks like this:

image_thumb9

My uncle’s name is in the first column, followed by the person he matches to and each of the matches. The matching people are listed in alphabetical order by their full name, so all the first names are together which is somewhat awkward, but still makes it easy to find who you want.

So the above shows that my uncle matches to A BO… at 9 locations totalling 27.89 cM, and he matches A Fre… at 31 locations (only 26 shown) totalling 102.42 cM.

I can scroll down in this file and find my cousin Joel, so my uncle’s matches with him are in the file as well. This file therefore contains the A to B (Harry to Joel) matches, and the A to C (Harry to anyone else matches). But it is not good enough. I need the B to C (Joel to other people matches) to complete the triangulation.

So what I did was email my cousin and ask him if he would download all his chromosome matches and send them to me. He, of course, is as interested as me of getting something out of his DNA test, so he gladly did so.

My cousin had 73,644 lines in his file, which is less than half of what was in my uncle’s file. That might be because my uncle is one generational level higher on our tree than Joel and I are.

So I located my uncles chromosome matches to Joel in my uncle’s file:image_thumb13

Similarly, I found Joel’s matches to my uncle in his file. It was good news. They were identical.

I combined our two files and sorted it by person and chromosome and start and end. This merged Harry and Joel’s chromosome matches together for each person. This file had almost 245,000 lines in it. Column 1 identified whether that segment was from Harry or Joel’s file.

This made it possible to find all segments for every person where both Harry and Joel matched or at least overlapped.

image

There were 2,041 people who had 4,582 segments that both Harry and Joel matched to. These are two-thirds triangulated, because Harry (A) matches to the third party (C) and Joel (B) matches to the third party (C).

In order to do the third triangulation of Harry (A) to Joel (B), I simply added the 26 match lines (above) and left them in yellow so I could see them.

Then I made this nifty diagram out in Excel, where the column’s represent the segment location (by millions):

image

The green X’s are where both (A) matches (C) and (B) matches (C). The yellow line indicated where (A) and (B), my uncle and my cousin match. It guarantees that all the other matches in the same range, from 170 to 183 for all those people highlighted at the left in green, are descendants of my uncle and cousin’s common ancestors and/or their ancestors.

Once again, the green X’s again are where both my uncle matches the third party, and where my cousin matches the third party. The red a’s are where only my uncle matches the third party. The blue b’s are where only my cousin matches the third party. You can see many cases where the a’s and b’s are added on either side of the green X’s.

There are a few reasons for this that I’m still trying to sort out. Any help from genetic experts would be appreciated. But it is my understanding that:

  1. Every descendant only gets partial segments from their ancestors. The parts may be so different that two descendants don’t even match on the segment.
  2. There are actually two common ancestors, not one. My thinking is that descendants get half of Hirsch’s and half of Dwora’s, but I’m not sure if that’s the real way the DNA tests will differentiate because of the use of both genomes in the matching.

Now not all of the (A) (C) matches and (B) (C) matches have a corresponding (A) (B) match. For example:

image

There is no yellow bar going through here. So there is no absolute confirmation that these people are related to the common ancestor with the third part of the triangle. The segment sizes here are small (only a few cM) and my uncle and cousin just may not have been passed down the same segment parts in this area of the chromosome.

But that likely doesn’t matter. If those other 20 people have been shown to be descendants from the other triangulations with the yellow lines, then the 20 verified descendants of Hirsch and Dwora all matching together will confirm another segment from them or their ancestors. Yes, there is a chance that there is another ancestor on a different side contributing. But like triangulation, extra matches decrease greatly the odds of a false-positive. If there is another ancestor they come from, all or almost all of those 20 would have to descend from that other branch as well.

And that’s the first big take-away. When we mass-triangulate like this, we not only can get the segments that your ancestors passed down to you, but we can also reasonably accurately identify the segments they passed down to their other descendants, but not to you.

And endogamy doesn’t play into this either. It doesn’t matter if you’re a descendant of the same person in 1 way or 200 ways. You will still be getting parts of that person’s DNA.

This opens up a whole new realm for analysis.

In my next post on this, I’m going to organize all the 2,041 match people into triangulation groups and see what else we can learn about them.

EAST Part 2 - Double Match Triangulation

The Future of Genetic Genealogy - Mon, 6 Jun 2016

At the #OGS2016Toronto Conference yesterday (Sunday June 5), one of the last sessions was a panel discussion with the topic: the Future of Genetic Genealogy. On the panel were four distinguished genealogists who know their DNA:

Maurice Gleeson from London, England. DNA and Family Tree Research
CeCe Moore from Southern California. Your Genetic Genealogist
David Pike from St John’s, Newfoundland. Pike Utilities for Autosomal Files
Judy Russell, from New Jersey. The Legal Genealogist

The moderator Elizabeth Kaegi, an expert on DNA herself, did a superb job keeping the panelists to a strict schedule so that all topics could be covered. Even CeCe, who admitted that she always runs overtime, finished just on time. Elizabeth has collaborated with James Thomson in her DNA work.

IMG_1885I was more than pleased when I was offered an opportunity to sponsor this session. Genetic genealogy is an important part of the future of genealogical research, and I was very interested in the direction this was going. I have been working on including some very useful DNA info into Behold. I haven’t really blogged about that yet, but I will soon.

So as a sponsor, I was introduced by James Thomson. Then I got to introduce Elizabeth Kaegi. Then Elizabeth got to introduce the four panelists. Thankfully all these introductions were short and left plenty of time for the session.

image 
CeCe Moore, Maurice Gleeson, Elizabeth Kaegi, Judy Russell, David Pike

There must have been 200 people in the room. Lara Diamond posted an excellent summary of the session and what each of the panelists thought.

Here’s a few tweets I sent out live during the panel session:


I thank Judy Russell for insisting back on the Unlock the Past Cruise we were on a few months ago together, that I test my 93 year old uncle before it’s too late.

I am excited about the future of genetic genealogy. You should be too.