Login to participate
  
Register   Lost ID/password?

Louis Kessler’s Behold Blog

Triangulation and Missing a-b Segments - Tue, 30 Aug 2016

First to reassure you, I am back working towards finishing Behold Version 1.3.

But I do have to put up this post before I forget about it. Two days ago, I  announced my free Double Match Triangulator program on the International Society of Genetic Genealogy (ISOGG)’s Facebook page. It is a closed group, so I doubt if that announcement has public access, but this is what I said:

image

The ISOGG Facebook group has 11,138 members, and within 24 hours there were 132 reactions and 34 comments. A great response. Many genetic genealogists downloaded the program and I got a lot of feedback.

But it presents the data in such a new way, with single matches and double matches and triangulations, that there was a need to explain what was going on.

So there are two concepts that I brought up there that I have to mention.

1. Triangulation does NOT guarantee Identical by Descent

Full triangulation is where Person a matches Person c, Person b matches Person c, AND Person a matches Person b and they all match on the SAME segment. When they do, that segment is said to be triangulated, and all other people who also double match (which means a matches c and b matches c) on the segment will also triangulate and will form what is called a Triangulation Group (TG).

Some of the thinking out there was that triangulation guarantees that the segment is Identical By Descent (IBD) meaning all three people get that segment from the same ancestor, like this:

image
Full triangulation

Everyone have pairs of chromosomes, which are made from one of your father’s pair, and one of your mother’s pair.  In the diagrams I’ve included, we’ll talk about a segment on a particular chromosome. H1 is one half from one of the ancestor’s parents, and H2 is the other half from the ancestor’s other parent. The ancestor passes down either H1 or H2 to each child. The child’s other half is from their other parent and we can ignore that for now.

In the case shown above, segment H1 was passed down to all three persons. This is Identical By Descent. The three people triangulate on this segment as they all have the same H1 segment. Here, triangulation identifies IBD.

But this is not always the case.

image
Full triangulation with a chance match

It is possible for 3 people to triangulate when one person has a chance match to the other two.

Let’s say Person a and Person b have a half match H1 on a shared segment that was passed down to them from a common ancestor. That segment is Identical By Descent. Person c could still match Person a’s H1 segment, and also Person b’s H1 segment by chance, which is quite possible for small segments. This will still be a true triangulation, but not IBD.

Kapoweee! IBD for small segment triangulations is blown out of the water.

But it’s not all that bad. Jim Bartlett is fairly confident that triangulation works down to 5 cM. And although some smaller segments will be IBS (by chance), some smaller segments will still turn out to be IBD.

The reason why a smaller criteria like 5 cM can be used is that if Person a and Person b have an IBD segment, then Person c can match by chance, but would need to match my chance ONLY to their H1 segment. They cannot crisscross between the H1 and H2 segments because Person a and Person b’s H2 segments are different.

So I wouldn’t throw them all away. Other information such as multiple people matching and coincident crossover points and single matches adjacent to the triangulated regions may help to identify which small segments are likely IBD – but that is future research.

In the Double Match Triangulator output, multiple people triangulating together makes a Triangulation Group and the relevant parts of the segments are shown with green X’s. Each row represents one person (Person c) who triangulates with Persons a and b. Some of these will be IBD, some will be IBS (by chance). The pink a’s and blue b’s show single matches adjacent to the double match area.

image
A Triangulation Group as mapped by DMT

 

2. Missing a-b Double Match Segments are Useful

Whoa! What the heck are Missing a-b Double Match segments? Well, they are double match segments (a matches c and b matches c) where a and b don’t match on the segment, so they don’t triangulate.

This is actually an entirely new concept in autosomal DNA segment analysis. Until I released the Double Match Triangulator program, there were no other tools that produced missing a-b information, so it’s something new that can be used to possibly help you to identify relationships.

I scratched my head for quite a while wondering how the heck a can match c and b can match c with both of those being Identical By Descent matches, without a matching b. It just didn’t make sense that that was possible.

I finally figured it out. I came up with two illustrative cases, and maybe there’s more that I didn’t come out with, but these two will do for now.

Let’s go back to our Full Triangulation diagram and change it up a bit:

image
No a-b Match, but Person a, b and c are IBD to a common ancestor

So let’s say Person a gets segment H1 from this ancestor and Person b get the other half H2 from this ancestor. One child of the ancestor gets H1 and another child gets H2. Somewhere down the line, two descendants of these children form a couple and have a child. The diagram above shows the couple as 2nd cousins but it could be any relation, even siblings (but that’s not nice).

The couple’s child (in this case Person c), will get a segment from Great-GChild 1, so it could be the H1 segment or GGChild 1’s other half segment from one of the other parents on the way up. Similarly the couple’s child will get a segment from Great-GChild 2, which will either be segment H2 or GGChild 2’s other half segment. There is a 1/2 chance Person c will get H1 and a 1/2 chance they’ll get H2 making a 1/4 chance Person c gets both H1 and H2. Let’s assume Person c does get both.

Now Person a’s H1 segment matches to Person c’s H1 segment and it is IBD. Person b’s H2 segment matches to Person c’s H2 segment and it also is IBD. But Person a’s H1 segment does not match to Person b’s H2 segment.

Yet, all three match to the same segment, albeit both halves of the ancestor.

Got it? In other words, these missing a-b double match segments can provide useful information.

Double matching gets some protection from chance matches like triangulation does. If Person a’s H1 match with Person c shares the same segment that Person b’s H2 match does with Person c, the likelihood that both are matching by chance to their other halves is very small. That is especially true since the people selected as Person a and Person b are usually known beforehand to be related. Therefore double matched segments likely approach the 5 cM threshold of Triangulated segments and can be mostly trusted down to that distance. 

For example, I have a few Chromosome Browser Results files for a few people that obviously have no relationship to my uncle. One person shares no triangulated segments and only shares 6 missing a-b segments with 4 people. Those 6 segments are only 1.84, 3.1, 3, 2.94, 4.18 and 3.43 cM, A second person only shares 8 missing a-b segments with 5 people that are 1.81, 2.87, 1.93, 2.39, 2.09, 3, 3.52 and 3.43 cM. The largest among these known-to-be by-chance matches is 4.18 cM.

Once again, other information such as multiple people matching and coincident crossover points and extended single match regions on either side of the double match region may help to identify which small segments are likely IBD – and we must leave this also to future research.

Here’s a second possibility that is very interesting:

image
No a-b Match, but Person a, b and c are IBD to a pair of ancestors

In this case, Person a gets segment H1 from this ancestor and Person b get the segment S1 from Person a’s ancestor’s spouse. But of course, the Ancestor is also an ancestor of Person b. Person b just didn’t get the H1 segment. And the Spouse is also an ancestor of Person a and Person a didn’t get the S1 segment.

So let’s pass the H1 and S1 segment down to two Great-Grandchildren who have a child together (Person c). Using the same probability logic as I used earlier, Person c has a 1/4 chance of getting segment H1 from one parent and segment S1 from the other.

Now Person a’s H1 segment matches to Person c’s H1 segment and it is IBD to the Ancestor. Person b’s S1 segment matches to Person c’s S1 segment and it also is IBD, but to the Ancestor’s spouse. Once again there is no a-b match because Person a’s H1 segment does not match to Person b’s S1 segment.

This type of missing a-b double match again is important and usable information that should not be thrown away. Small segment caveat. Multiple matches. Coincident crossover points. Extended single match region. Further research needed.

In the Double Match Triangulator output, below is what a Missing a-b Double Match looks like. The Double Match region is shown with green X’s. Each row represents one person (Person c) who double matches with Persons a and b. Some of these will be IBD, some will be IBS (by chance). The pink a’s and blue b’s show single matches adjacent to the double match area:

image
A Missing a-b Double Match as mapped by DMT

Hopefully this blog post will give you an insight to help you understand what DMT is displaying for you, and that all of the information presented may be valuable. And maybe you and I together one day, can figure out how to interpret it all and help us tell how we are all related.

Please let me know if you get any Eurekas about DMT and how to use it.

—-

Note: Be sure to read my followup post: Revisiting Missing A-B Matches

Double Match Triangulator (DMT) 1.0.1 - Sun, 21 Aug 2016

I’ve now released my new freeware program to provide a new view to help people analyze their autosomal DNA matches from FamilyTreeDNA. It is called Double Match Triangulator. I actually released version 1.0 a couple of days ago, but fixed a bug and version 1.0.1 is available at www.beholdgenealogy.com/dmt.

I’ve already blogged a few times in the last few months about it, so I’ll keep this one short and sweet. What the program does differently is that it combines all the matches of two different people. Any matches that also coincide with the two matches are, by definition, triangulated.

The program creates a Excel file that includes all matching segment boundaries along with a Map that allows you to visually look for patterns

When I first thought of developing this, I thought it would help identify common ancestors and allow me to sort out all my uncle’s 8,000 matches. Well, it’s not quite that simple.

Triangulation does not guarantee Identical By Descent (IBD) matches that indicate a common ancestor who passed down that segment of DNA. Small segments under 7 cM still have a good possibility of randomly matching by chance, even if they are triangulated.

I’ll need some experienced genetic genealogists to take this tool and figure out what can and what can not be determined from it. If straightforward analysis methods are developed using it, I could program those in and let the program do some of the analysis for you as well.

I’m still optimistic that the Double Match method of looking at autosomal results will lead to identifying family relationships. The segment boundaries (crossovers) are created by one ancestor and passed along several generations until they gets wiped out by another parent’s segments, and I’m betting that those crossovers might produce a trail that can be followed to connect all the family members together.

With regards to the random matches, means of separating those from true segments may be forthcoming. In a way, triangulation is akin to phasing in that it helps identify the side of the family the third matches belong to.

And advanced use of multiple DMT files using 60 different people in all combinations as the Person a and Person b may reveal more than we can now imagine. There is a wealth of information here, and a lot of potential.

So if you’ve done some autosomal DNA match analysis and you have access to at least two person’s autosomal DNA results at FamilyTreeDNA (and if you use Windows and Excel), feel free to download DMT and try it out. If you come up with great ways of using it, please let me know.

It took a few months and a couple hundred hours of my time to develop DMT. I am making it available free because I want people to use it.

I was half way through getting Version 1.3 of Behold when I got distracted by the need to explore the DMT concept. But I needed to explore autosomal DNA and I know a lot more now than I did before. Sorry for the brief interlude. We’ll now get back to our regularly scheduled Behold development.

Writing Freeware (Double Match Triangulator) - Sun, 17 Jul 2016

Most people might think releasing a freeware program is easy. Just write it and make it available. Right?

Well, there’s a bit more in it than that.

When I came up with the idea for Double Match Triangulation of autosomal DNA using the chromosome match files produced by FamilyTreeDNA, I knew I’d need a program to sort all that data out. And when I went online to see what there was, and there was nothing like it, I knew I’d have to create it and make it available so that others can use it too.

I first figured out what was needed by doing the matching with Excel. I loaded two chromosome match files into Excel, merged them together, and developed equations to determine segment overlaps. I then used conditional formatting to color the cells to make interpretation easier.

Once that template was set up, it wasn’t too much work to build a program with an engine that would read in two chromosome match files, compare them the same way I was doing in the Excel spreadsheet, and output the results to a csv (comma delimited) file so that Excel could read it in and display it all nicely.

 

So at that point, just a few little things to do:

1. Blog about the technique.

2. Get a few sample files from people so I can test it.

3. Test it, and find problems with the input files and handle them.

4. Learn from the results, and figure out more that can be done.

5. Decide what will be in the first cut of the program.

 

Basically the program is done…. Except it’s not.

6. Mock up a user interface to allow selection of files.

image

7. Include Open File dialogs to select the files

8. Include Open Folder dialogs to select the folders. Wait, there aren’t any Open Folder dialogs available in the Visual Controls Library. I have to research my options, see what I did in GEDCOM File Finder, and decide how to implement this.

9. Save past files and directories to the Registry so that they can appear in the  recently used list. (You’d hate me and I’d hate myself if I didn’t do this.)

10. Add error checking of file names and input files.

11. Figure out what to put in the status box and log files to track what was done and what wasn’t and any errors encountered.

12. Realize it’s easy to export to csv, but a pain to manually format it once you load the csv file into Excel. So I look for a way to automate the loading of the Excel file directly.

13. Try to make sense of the Office Developer Documentation and find the commands needed amongst the millions of articles.

14. Spend a week implementing the automation, and once it is working, realize it takes 10 times longer than creating the csv file.

15. Puzzle about ways to improve this slowness while in the shower, on my bike and at 3 in the morning.

16. Try various things, and find that creating a temporary csv file and then automating its input is 5 times faster than direct to Excel automation.

17. Rewrite everything so that multiple files can be matched at once.

18. Make sure it all looks nice, still works, and does what’s needed.

 

All done now? Yup. Except left to do:

19. An installation script for it.

20. Webpage for it so there’s someplace to download it from.

21. Some documentation would be nice.

22. Blog posts, announcements

 

Yay! Finally done.  … but forever followed by:

23. Support, bug fixes, response to questions, enhancements

 

So that’s how a freeware program is made. And the timeframe is after work, in the evenings and on weekends when not on errands, when your family lets you be alone, and when you’re not too tired to think.

Hopefully the Double Match Triangulator program will be available in the next week or two for anyone to try out.