False Small Segment Matches at GEDmatch - Sun, 2 Jul 2017
I just happened to come across a post by Patricia Greber which showed the back of her T-shirt that she made to wears at genealogy conferences. Her shirt shows her family names and GEDmatch kit number and asks: “Are we related?”
Patricia obviously wants other genealogists to see this, so I checked with her and she said it was okay to display the picture:
I expect that we are not related. As I am effectively 100% Ashkenazi Jewish, I don’t expect any commonality with Patricia at all, at least as far as Identical by Descent (IBD) segments go.
Therefore, Patricia would be an excellent test to see how many false segments two non-related people may have. By false, I mean matches by chance that are not IBD.
So I went to GEDmatch, and I did a one-to-one autosomal comparison of my kit with Patricia’s kit. Using the default matching values of a minimum of 500 SNPs and minimum segment cM of 7.0, sure enough, we had no segments matching.
But what if I lower the settings to as low as GEDmatch allows? You can specify down to 25 SNPs and 1 cM, and that also requires that you specify 25 as the minimum mismatch-bunching limit (whatever that is).
Doing so results in segment matches. And not just a few segment matches, but LOTS of segment matches. What I got was 788 matching segments! Below are the first 10 and the last 10 matches. I left out the 768 matches between them to spare you the monotony.
The largest segment match was only 5.9 cM. That may be a surprise because most people say you can expect randomly matching segments up to 15 cM.
But what is alarming is that the total cM of those 788 matching segment is 1,269.1 cM. Just blindly using that number, one might conclude that we are first cousins. That would only be true if these matches are mostly IBD, which they are not. So never assume a relationship based solely on number of matches or total matching cM since the total can include random matches.
Segments with more cM are less likely to randomly match just as segments with more SNPs are less likely to randomly match. If we plot those 788 matches by SNP across cM, we get this interesting graph:
You can see 2 matches between 5 and 6 cM and 6 matches between 4 and 5 cM. Those 8 matches total 37.0 cM. Most testing companies have a threshold of a minimum total cM and/or largest cM before they’d consider two people to match each other. They’ve run comparisons like this on thousands of related and unrelated people to determine what their matching thresholds should be. FamilyTreeDNA will consider two people a match if there is one segment 9 cM or more, or if one segment is 7.69 cM and there are 20 shared cM. So Patricia and I should not be said to be a DNA match at FamilyTreeDNA.
Since both of our kits at GEDmatch start with “T” which indicates we tested at FamilyTreeDNA, I can go there and see that in fact, Patricia does not show up in my match list of 11,386 people.
The point here is that unrelated people do match and match a lot simply because small segments can match randomly. So remember this rule:
A lot of small matches does not mean two people are related.
In Double Match Triangulator, I recommend people retain the small matches down to 1 cM. Many people question me on this and ask that I add a threshold in DMT to hide all those small matches. But here we are dealing with something a bit different. We are dealing with double matches between two or more people who are known to be related. This changes the game somewhat. This greatly increases the likelihood that small matches triangulate and are IBD. Many still will be random, but you should not throw them away because you’ll be throwing away the baby with the bathwater. My article about Non-Matches by cM showed that more than 20% of small matches (of people considered to be related by FamilyTreeDNA) will also match a parent and are therefore have a better possibly of being IBD and should not simply be thrown away. Their starting and ending boundary locations could be useful to help identify ancestral segments and there’s nothing stopping the segment from having been a large IBD segment that just happened to get truncated on its way down to your cousin. In other words:
When two people are related, a small match does not mean it’s not IBD.
This was an interesting exercise, and is a warning that you should not include small segments unless you’ve determined in advance that two people are related and you’ve used some technique (phasing, parental filtering, triangulation or double matching) to eliminate many of the false segments.