Login to participate
  
Register   Lost ID/password?

Louis Kessler’s Behold Blog

Inferred Segment Matches - Thu, 7 Mar 2019

When we match our DNA to other people to find common ancestors, we are comparing segments of DNA that match the other people. That’s only logical, isn’t it.

Well, interestingly enough, there’s a technique that will help you determine which ancestors your DNA comes from by using non-matches. Actually, you are using matches of people you are closely related to, and finding common relatives who they match to, but you don’t.

Jonny Perl, the author of DNA Painter, recently wrote an article about this technique titled Painting your DNA with inferred matches. I believe he is the person who named it “Inferred Matching”. (Please correct me Jonny if this is not the case.)

Jonny gave examples showing how he used:

  1. His dad’s matches with a 2nd cousin once removed that he did not match
  2. His dad’s half-brother’s matches that his father matches but he does not match
  3. His mother’s paternal cousin, and his second cousin.
  4. Siblings

The basic idea behind Inferred Matching is that it works because you know you got your parent’s DNA either from your father and your mother. And each (small enough) segment you got from each parent was either from grandfather or grandmother. What you do is find another close relative, who I’ll call Person B, who matches a third person who I’ll call Person C. If Person B matches Person C on a segment, but you (Person A) do not match Person C on that segment, then you couldn’t have got your segment from the same line. If you did, it would have matched.

So Inferred Matching basically tells you the ancestral line your segment did not come from.

image

Looking at the diagram above, I show an example where I’m assuming your grandmother’s father (GM’s father) is the ancestral source of a segment. He passes it down through your grandmother, through your uncle/aunt to your 1st cousin (Person B). He also passes it down to your more distant cousin (Person C).  If he passed the same segment down to you as well, then you and your two cousins would all have the same segment, your segments would all match each other and you therefore triangulate. The triangulation is a clue that all three of you may have been passed down that segment from a common ancestor.

But what if your two cousins match each other, but you don’t match? You know you couldn’t have got the segment from your GM’s father. So who could have given you the segment? Answer: the segment you got from your parent could have instead been passed down from your GF’s father, your  GF’s mother or your GM’s mother.

So you usually can’t directly tell which line you came from with Inferred Matching. In the above example, you still don’t even know if the line is from your grandfather or grandmother’s side. But it does tell you the one line that you don’t come from.

Alone, you can’t do too much with it. But combined with other information, you can. If you find another cousin, who matches someone else on your grandmother’s side that you also match, but not on that segment, then you have a second refutation. If that refutation is, say, on your grandmother’s mother’s side, then all of a sudden you have refuted both your grandmother’s parents, and your segment should be on your grandfather’s side. Then if through yet another pair of cousins, your infer that the segment cannot be on your grandfather’s father’s side, all that remains is your grandfather’s mother’s side, and that could very likely be the ancestral path for your segment of interest.



Who Can Be Used for Inferred Matching?

Persons B and C can be anyone who is related to who share a Most Recent Common Ancestor (MRCA) with you. You must match Person B and Person C somewhere, but it’s the segments that you don’t match one or both of them that can be used for inferred matching. The ancestral path through the MRCA that is closest to you is the one that you can refute, because you cannot continue to follow up that path to the further MRCA. If you did, then you would be matching on that segment.

Using a parent or a parent’s descendant as a Person B is wonderful. With a parent, sibling, nephew or niece, you are now dealing with only two possible segments that you can receive rather than four. Because of that, Inferred Matching of segments your parent or half-relative’s matches that you don’t have will always tell you that if your match is not through your parent’s father, then it must be through your parent’s mother (and vise-versa).You will need to know the MRCA of Person C so you can determine which grandparent the non-match will be on. Jonny’s article gives excellent examples of this.



Caveat

Of course nothing’s ever perfect. If your Person B or Person C is related to you more than one way, e.g. through both of your grandparents, then you could get incorrect results. But this should be a somewhat rarer case. Normally, Inferred Matching works and works pretty well.



Visual Phasing

Inferred Matching has been used before Jonny’s paper. The technique of Visual Phasing takes the matches of 3 or more siblings and compares them. In doing so, the segments of each sibling’s DNA that came from each grandparent can be determined. Visual Phasing has been around for a few years. Part of the technique involves refuting a grandparent on a segment, which is effectively Inferred Matching, but I’ve never seen any posts about Visual Phasing referring to the term “Inferred Matching”.



Inferred Matching and Double Match Triangulation

Doing Inferred Matching manually is laborious. For any segment, you need to find all the segment matches that your known relatives have with each other that you don’t match to. Then you must logically work out what ancestral paths back to the MRCA’s are possible and see if you can eliminate some paths from possibility and thus infer the paths that are possible.

Inferred Matching works well with the ideas behind double match triangulation.

Double matching involves finding all the segment matches of Person A with Person C and compares them to all the segment matches of Person B with Person C. Those that overlap (along with A’s segment matching B’s) are triangulations.

Inferred Matching uses the complementary information available in the data used for double matching. Inferred Matching uses the segment matches of Person B with Person C where Person A is not matching either Person A and/or Person B on that segment. 

I’ve been working on implementing Chromosome Mapping into what will be Version 3.0 of Double Match Triangulator. I’m also incorporating Inferred Matching into that. In Double Match Triangulator, an inferred match will be telling you what ancestral paths cannot occur, and will look like this:

image

The green sections are triangulations that Person A and Person B have with several C Persons. In the example triangulation group, the MRCAs of the C People who triangulate are not known. The ancestral path (MM = mother’s mother) is only known from Person B’s MRCA.

An inferred match is shown on the first line and states that Person A doesn’t have the B-C match and the ancestral path cannot be MMFF. So only MMFM, MMMF and MMMM are possible. If additional Inferred Matches are found for that segment that rules out more of the possible paths, then Double Match Triangulator may be able to extend the ancestral path of the triangulations to longer path when it becomes the only possibility. This can provide extra information that wouldn’t have been available without the Inferred Matching.



Bonus: Inferred Matching on Triangulating Segments

Look at the 3rd line in the above diagram. This is a triangulation, but to the right there are 5 grey B’s. That is a section of the double match that Person A no longer matches. Person A stops matching at the last green T. But Person B continues matching Person C for 5 more Mbps (Mega base-pairs).

Inferred Matching can be applied to those 5 B’s. Person C has an ancestral path of “MM”, meaning that this segment can no longer be from the MM ancestral path. What we have found is a crossover at the end of that triangulation group belonging to Person A. These additional Inferred Matches are also being identified and will be displayed and used for ancestral path determination in the upcoming version 3.0 of Double Match Triangulator.

Of course we have to be careful not to use too small segments. There can always be some random matching at the beginning and end of any match, so we must make sure that the B-C matching preceding or following a triangulation is significant.



Double Match Triangulator 3.0

I’ve been making good progress and I will release DMT 3.0 as soon as it is ready. There have been so many great advances in DNA analysis over the past six months with clustering and new tools and especially new features at Ancestry DNA and MyHeritage DNA announced at RootsTech that I’ve been following. All of these have redirected my thinking as to what’s needed. I’ve established that the tool that is now needed is one that will help people do Chromosome Mapping by applying and automating the rules for them so they don’t have to do it themselves. The results will then be made available to you so that you can input them into DNA Painter and other tools.

I’m very excited as to what I have programmed so far. Most of what I’ve talked about above is completed in my development version. This post was mainly to document some of my thoughts about Inferred Matching, but is also meant to be a teaser as to what’s coming in DMT 3.0.

Stay tuned.



A Second Type of Inferral

It’s amazing as you work through the details of something and try to implement it programmatically that you suddenly realize something. I shake my head sometimes as to how the mind works, but it somehow connects all the dots together all by itself and suddenly this idea pops into your head.

The type of inferral that Jonny Perl wrote about and that I was writing about up to now is an inferral you can make because a close relative matches to someone on a segment, but you don’t.

What about the other way around? It works too. You can infer in a similar manner from a segment match that you have, but a close relative doesn’t.

The simple case of this is when you match someone on a segment, but one of your parents doesn’t. I like to call this "Parental Filtering”. Almost all the time, that will mean that either you match through your other parent, or the segment is false.

There is the borderline case where your parent falls under the match limit but you don’t. But in that case, you’ll still want to eliminate that segment from your analysis because you can’t say for sure that it is a segment going through that parent.

People do this parental filtering all the time, especially when they only have one parent tested. But you can also use siblings (as in Visual Phasing) to infer grandparent lines that you can’t have. And similarly you can use other segments that you have that some close relatives on those lines don’t have to infer more lines that you can’t have. And once you have all lines covered (e.g. both parents or all four grandparents), then you can start to classify segment as likely to be false.

I am now working to incorporate this second type of inferred matching into DMT. We’ll soon see how well these two methods of eliminating possible lines work to help identify the ancestral path that the segments of your DNA came from.




Followup (3 hours following my post): Blaine Bettinger wrote on the Facebook Genetic Genealogy Tips & Techniques group that there are several names for this process. Blaine says he uses “Indirect Mapping”.




Revision: Mar 10:  Nearly complete rewrite. On Facebook, Jonny Perl and Stevlana Hensman pointed out a major oversight I originally made in my article. I had thought that the Inferred match always resulted in knowing the ancestral line that your segment came from. That is only the case for parents and descendants of your parents (siblings, nephews/nieces, etc.). For anyone else, all it does is tell you the one ancestral line that your segment did not come from. That is still very useful information, however, and needs to be automated in DMT so that people can make use of it.

These concepts are brand new and are still being discovered by the genetic genealogy community. They are not simple. I am still learning myself and my head still spins every time I try to map a how DNA is shared. I appreciate all feedback as peer review is the best way to confirm, correct and improve methodologies.




Followup: Mar 14: I’ve confirmed that you can infer the grandparent when the inferred match is made through your parent or a descendant of your parent (i.e. siblings, nieces/nephews, etc.) The reason is that your parent gets one chromosome of each pair from each grandparent that only comes from two of your great-grandparents on that parent’s side. If you do not match to one, you must match to the other. 

This does not work for uncles/aunts, 1st cousins, or other relations, because they need not have got the same grandparent segment that your father did. So for them there are four possible great-grandparent segments to choose from. You can eliminate one, but without further eliminations, that still leaves two on one grandparent’s side and one on the other.




Followup:  Mar 15: I added the section at the end: “A Second Type of Inferral”



Followup: Oct 6, 2020: Blaine Bettinger gave a webinar on FamilyTreeWebinars about this technique. He now prefers using the term: “Deductive Mapping”.

MyHeritage New AutoClustering Feature is now Live - Fri, 1 Mar 2019

Another new feature announced during #RootsTech is the MyHeritage DNA integration of Evert-Jan Blom’s AutoClustering method. MyHeritage has become the first major DNA service to offer clustering.

MyHeritage’s blog post Introducing AutoClusters for DNA Matches was posted yesterday and describes their new service.

MyHeritage in their post says:

“This new tool was developed in collaboration with Evert-Jan Blom of GeneticAffairs.com, based on technology that he created, further enhanced by the MyHeritage team. Our enhancements include better clustering of endogamous populations (people who lived in isolated communities with a high rate of intermarriages, such as Ashkenazi Jews and Acadians), and automatic threshold selection for optimal clustering so that users need not experiment with any parameters.”

I looked at several autoclustering methods a month ago in my Comparing Genetic Clusters post. I included Evert-Jan’s Genetic Affairs program at the time. Those methods at the time all used Ancestry matches. Now I’m interested seeing what AutoClustering does with my MyHeritage matches, especially in the light of my endogamy.

So let’s try it.

On my DNA Tools page, I selected AutoClusters.

image

The illustration shows a clustering example diagram.

The “Generate clusters for” and “Kit:” dropdowns allow me to select from my 3 possible kits:

  1. The MyHeritage DNA test that I took.
  2. The FTDNA test that I took that I uploaded to MyHeritage
  3. The FTDNA test that my uncle took that I uploaded to MyHeritage.

I pressed the Generate button for each of the 3 kits. After pressing the button, up pops the following box:

image

I did this yesterday about noon hour, 5 hours or so after this service went live. I saw on Facebook that some people were receiving their results in an hour or two. But as more people found out about this and submitted their requests, the queue started to grow. I did not get my results until the next morning.when I found the three results in my inbox. The emails were from 1:30 a.m., so they took over 13 hours to get generated and sent to me. I expect that the waiting time will come down considerably once the initial excitement period subsides.

What You Get

You get a zip file (mine are about 80 KB each) which expand to three files:

  1. An HTML (browser) file that displays your cluster chart with the amazing bit of animation that Evert-Jan developed to organize the clusters in front of your very eyes. Just hit refresh (F5) to display this hypnotizing effect over and over.
  2. A CSV (comma delimited file) that contains all the data in columns that can be loaded into Excel for analysis.
  3. A ReadMe.pdf file that gives you information about the analysis done for you.

The HTML and CSV files are given the name:

Louis Kessler Auto Clusters – kk-kkkkkk – March 01 2019.sss

where kk-kkkkkk is the kit number and sss is .html or .csv.

The ReadMe.pdf name always has that name. So if you don’t rename it, one will overwrite the other. They are identical except that they contain information about your clustering run, so you should rename it to associate it with the other two files.  My info from the three ReadMe files, along with my match statistics tell me the following:

image

The clustering algorithm in all cases excluded my match with my uncle.

My test gave me 9,315 matches, of which 119 are between 80 cM and 350 cM and the clustering algorithm excluded 19 singletons and grouped the other 100 into 26 clusters.

My transfer from FTDNA was similar. My uncle’s transfer from FTDNA had more closer matches than I had. That’s the advantage of testing someone a generation back. So the clustering algorithm used a smaller range, 85 cM to 350 cM to only include 100 people.

My Test versus My Transfer

These are my clusters from my MyHeritage DNA test:

image

These are my clusters from my transfer from FTDNA:

image

They look almost identical and that is good. There are two fewer clusters from the transfer file, but you can barely tell.  And despite my endogamy, there are not a lot of grey squares representing matches outside of clusters.

When I compare individual people and the groups they are in, I note that the groups are numbered differently in the two reports, but I can align the groups and most of the people match. There were 100 people in the MyHeritage clusters, and 94 in the FTDNA transfer clusters. MyHeritage has 9 people that FTDNA doesn’t have, and FTDNA has 3 people that MyHeritage doesn’t have. Of the remaining 91 matching people, 11 of them disagree as to which group they are in.  So that leaves 80 people who are put in the same group from both clusterings. Pretty good.

Determining Common Ancestors

Unlike my Ancestry DNA matches, where I know my relationship to about 10 of my matches, at MyHeritage other than my uncle, I don’t know how I’m related to any of my 9,314 matches or to my uncle’s 10,834 matches. So at MyHeritage, I cannot use known tested relatives to determine common ancestors for some of the clusters.



Comparing My Clusters with my Uncle’s Clusters

At MyHeritage, I can look at the Shared DNA Matches between myself and my uncle. My uncle is my father’s full brother, and I share 1,994 cM on 52 segments with him. So our shared matches should mostly be on my paternal side. The matches I have that I don’t share with my uncle should mostly be on my maternal side.  This is one comparison that I cannot not do at Ancestry, since I only tested my uncle at FTDNA and Ancestry does not accept uploads of raw data from other companies.

The Shared DNA Match list with my uncle shows only 3,114 Matches. To my surprise, that’s only 33% of the 9,314 matches I have.  Since my uncle represents my full paternal side, you’d expect that it would be 50%. I’m guessing that either more people on my maternal side tested at MyHeritage than my paternal side, or maybe endogamy is allowing me to match some people by combining my maternal and paternal totals – and my uncle simply doesn’t meet the criteria to match them. By comparison, at FTDNA, my uncle and I match 10,182 people (57%) in common out of my 17,881 matches and my uncle’s 18,680 matches.

When I go through my Shared DNA Matches that I have with my uncle, I find just 6 matches among the 100 people in my clusters, and they are in 5 different clusters. Not only that, two of those are in my uncle’s excluded singletons, so that leaves just 4 people in common between my clusters and my uncle’s clusters.

The low number of people in common prevents me from combining my uncle’s clusters with mine to try to identify whether my clusters are on my paternal or maternal side. I’m very surprised that this happens, but it is likely because my uncle’s top 100 matches have little overlap with my top 100 matches.

So I won’t be able to directly compare my uncle’s clusters to mine by person as I had hoped. None-the-less, lets go forward anyway and look at my uncle’s clusters:

image

This also does not look much different than my clustering, but the people that make them up are different. We only have 4 of the people of the 100 shown here in common between us.

Clustering is potentially very useful if you know your relationship to some of the matches. Unfortunately for me at MyHeritage, I’ll have to wait until I determine my relationships with some of my DNA matches before I’ll be able to make full use of MyHeritage’s new clustering information.

DNA meets Trees at AncestryDNA and MyHeritage DNA - Wed, 27 Feb 2019

Today’s the first day of #RootsTech. This is the day that many of the genealogy companies announce new features on their site.

So it is not ironic at all that today, both AncestryDNA and MyHeritage DNA announced a new feature that matches up your tree with the trees of your DNA matches and shows you the results.

DNA matching is a tool to assist your genealogy research. But up to now, you’ve had to do most of the tree inspection on your own. Finally, we’ve got not one, but two new automated system to save lots of time.

Ancestry DNA

Ancestry’s official announcement may come tomorrow (Feb 28) at RootsTech in Crista Cowan’s talk at 1:30 MST titled. “What You Don’t Know about Ancestry”. This will be live streamed tomorrow, so if you read this post in time, you can listen to Crista live.

To get the new feature, you currently have to go to into your Ancestry account and from the menu, select “Extras” and under that “Ancestry Lab”. Then on the Ancestry Lab page, you should enable their Beta features.

image

After you opt in, you then can go to your DNA Matches page, and in the “Filtered by” drop down, you’ll see “Common ancestors”.  Select that, and hopefully you’ll get a few matches. I got the 4 you see below.

image

If you click on one of the people’s names, you get the comparison page. There is now a Common Ancestors box. For people who are not in the above list, you’ll get this box:

image

But for people who are in this list, the box will contain something quite exciting:

image

The people shown are the common ancestors of myself and my match. Even more exciting is if you now click on the “View relationship” link for either ancestor, you get:

image

And if you expand those dropdowns that are hiding two generations, it gives:

image

This match, as well as my other three, are all correct matches. I previously knew my connection to two of these people. The other two were people correctly placed into my tree that I did not know had tested.

This is really great! Finally, the companies are doing something intelligent to match you with your DNA match via your combined trees. Bravo!

I posted a survey on the Facebook group: Genetic Genealogy Tips, to find out how many matches people were getting. Although I only got 4 matches, almost half the people reported between 100 and 999 matches! That’s a lot of connections many people will now be able to make that previously required laborious manual tree inspection.

MyHeritage DNA

MyHeritage at almost exactly the same time implemented almost exactly the same feature. They have already announced their matching system which they call: The Theory of Family Relativity.

To access your matches, go to your MyHeritage DNA Matches page, select “Filters” and then select: “Has Theory of Family Relativity”.

image

Unfortunately for me, what I get is no results, so I won’t be able to give you a personal illustration of what it looks like.

image

But I would expect there would likely be a match tree similar to what AncestryDNA gives. This is the illustration from their announcement:

image

I posted the same survey on Genetic Genealogy Tips asking how many MyHeritage DNA matches everyone had. People generally had fewer matches than at Ancestry DNA, but about 65% did have matches and 10% marked that they had between 20 and 99 with a few having more than 100.




Update: Mar 2, 2019:  It seems I took a shortcut to get to the Ancestry Connected Trees. The “Common Ancestors” feature apparently existed before. But I didn’t have any so I didn’t know about it.

The new Ancestry feature is actually called ThruLines and you can get to it from Your DNA Results Summary page:

 image

Clicking through on the “Explore ThruLines” button takes you to a page showing all your direct ancestors:

image

Any ancestors through which the ThruLines algorithm finds a potential DNA relative will be marked with a “Potential Ancestor” indicator.  Three of my ancestors have this:

image

Clicking on their tiles will then take you to the same 4 connections that I found in my clicking on “Common Ancestors” as I first described in the post.  Herz Tzvi and Dwora both take me to the same two DNA relatives (since they were husband/wife) and Manascu takes me to the other two DNA relatives. It it titled “ThruLines” rather than Common Ancestors and can show the connection to more than one DNA relative at a time. But essentially it is the same information, just accessed by ancestor rather than by DNA match.

image