Login to participate
  
Register   Lost ID/password?

Louis Kessler’s Behold Blog

False Small Segment Matches at GEDmatch - Sun, 2 Jul 2017

I just happened to come across a post by Patricia Greber which showed the back of her T-shirt that she made to wears at genealogy conferences. Her shirt shows her family names and GEDmatch kit number and asks: “Are we related?”

Patricia obviously wants other genealogists to see this, so I checked with her and she said it was okay to display the picture:

I expect that we are not related. As I am effectively 100% Ashkenazi Jewish, I don’t expect any commonality with Patricia at all, at least as far as Identical by Descent (IBD) segments go.

Therefore, Patricia would be an excellent test to see how many false segments two non-related people may have. By false, I mean matches by chance that are not IBD.

So I went to GEDmatch, and I did a one-to-one autosomal comparison of my kit with Patricia’s kit. Using the default matching values of a minimum of 500 SNPs and minimum segment cM of 7.0, sure enough, we had no segments matching.

But what if I lower the settings to as low as GEDmatch allows? You can specify down to 25 SNPs and 1 cM, and that also requires that you specify 25 as the minimum mismatch-bunching limit (whatever that is).

Doing so results in segment matches. And not just a few segment matches, but LOTS of segment matches. What I got was 788 matching segments! Below are the first 10 and the last 10 matches. I left out the 768 matches between them to spare you the monotony.

SNAGHTMLf6ffd98

The largest segment match was only 5.9 cM. That may be a surprise because most people say you can expect randomly matching segments up to 15 cM.

But what is alarming is that the total cM of those 788 matching segment is 1,269.1 cM. Just blindly using that number, one might conclude that we are first cousins. That would only be true if these matches are mostly IBD, which they are not. So never assume a relationship based solely on number of matches or total matching cM since the total can include random matches.

Segments with more cM are less likely to randomly match just as segments with more SNPs are less likely to randomly match. If we plot those 788 matches by SNP across cM, we get this interesting graph:

image

You can see 2 matches between 5 and 6 cM and 6 matches between 4 and 5 cM.  Those 8 matches total 37.0 cM.  Most testing companies have a threshold of a minimum total cM and/or largest cM before they’d consider two people to match each other. They’ve run comparisons like this on thousands of related and unrelated people to determine what their matching thresholds should be. FamilyTreeDNA will consider two people a match if there is one segment 9 cM or more, or if one segment is 7.69 cM and there are 20 shared cM. So Patricia and I should not be said to be a DNA match at FamilyTreeDNA.

Since both of our kits at GEDmatch start with “T” which indicates we tested at FamilyTreeDNA, I can go there and see that in fact, Patricia does not show up in my match list of 11,386 people.

The point here is that unrelated people do match and match a lot simply because small segments can match randomly. So remember this rule:

A lot of small matches does not mean two people are related.

In Double Match Triangulator, I recommend people retain the small matches down to 1 cM. Many people question me on this and ask that I add a threshold in DMT to hide all those small matches. But here we are dealing with something a bit different. We are dealing with double matches between two or more people who are known to be related. This changes the game somewhat. This greatly increases the likelihood that small matches triangulate and are IBD. Many still will be random, but you should not throw them away because you’ll be throwing away the baby with the bathwater. My article about Non-Matches by cM showed that more than 20% of small matches (of people considered to be related by FamilyTreeDNA) will also match a parent and are therefore have a better possibly of being IBD and should not simply be thrown away. Their starting and ending boundary locations could be useful to help identify ancestral segments and there’s nothing stopping the segment from having been a large IBD segment that just happened to get truncated on its way down to your cousin. In other words:

When two people are related, a small match does not mean it’s not IBD.

This was an interesting exercise, and is a warning that you should not include small segments unless you’ve determined in advance that two people are related and you’ve used some technique (phasing, parental filtering, triangulation or double matching) to eliminate many of the false segments.

Back From Vacation - Tue, 20 Jun 2017

I had a wonderful two week vacation with my family at Disney World in Florida. I don’t know if I’d call it relaxing, because it was non-stop go-go-go. Here’s one of the many amusing items from the Haunted Mansion for my genealogy friends:

image

So it’s back to try again to finish version 1.3 of Behold and to try to do it before I head back to Florida in a month’s time for the IAJGS conference at the Walt Disney World Swan Resort. If you’re planning to go, be sure to sign up for the 90 minute Computer Workshop I’ll be giving on Double Match Triangulator:

Monday, July 24
Session Code Session Title Speaker(s) Room Type of Session
8:15 AM - 9:45 AM
2314 Computer Workshop: Using the Double Match Triangulator for Autosomal DNA Analysis Louis Kessler Swan 8 Computer Workshop

Until then, I’ll also be a bit distracted. While away on vacation, Sorin Goldenberg on the Jewish Genealogy in Romanian Moldavia Facebook group located over 40 original Romanian birth/marriage/death records that extend my father’s mother’s family back another 2 generations. As genealogists, you can understand my excitement. It’s like a rainbow appearing with a new discovery at its end, just as we saw one evening in Animal Kingdom:

image

I will be analyzing this material in detail prior to the Conference. I’m sure a couple of blog posts will result.

Getting Carried Away - Sun, 21 May 2017

I’ve noticed it’s been almost 2 months since my last blog post, and that’s too long. I kept delaying my posts with the hope and expectation that my next one would be announcing the release of Behold 1.3. 

However, the changes to Behold have been taking longer than I hoped. With Spring bringing beautiful weather and other Spring duties, there is less time during the day for programming than in the Winter. Programmers can sometimes turn into depressing people who hope for miserable weather and rainy days so they can get more work done.

Also, I might have been getting a little “carried away” with what I’m trying to get into this version of Behold. I do want Behold 1.3 to finish off everything I need/want in the Everything Report prior to adding GEDCOM export and then Behold’s own database and editing.

Below are the things I’m trying to sneak in:

Highlighted Birth/Maiden Names

I wanted birth/maiden names to be highlighted somehow.  And I wanted that highlighting everywhere. I decided on bolding the birth/maiden name.

This was trickier than it sounds because the person’s name is a hyperlink to that person in the report. Breaking up the styling of the name breaks the hyperlink into three parts. I had to figure a way to break the styling but leave a single hyperlink. It’s different in the Everything Report, in the Treeview, in the HTML export and in the RTF export.

image

image

This required a change to the Index of Names. Previously I was using bold text to show the earliest people in each line (those without parents attached). I needed another representation for this and decided on the asterisk (*) before the name. And then, while doing that, why not in add the person’s birthplace to make it easier to identify people:

image

 

Section Header Information

I want the section headers to give some information about the numbers of people included as well as information about the amount of pedigree collapse.

image

 

Fact/Event Selection and Filtering

Behold has always allowed selection of which Tags you want displayed. On the Tags page of the Organize window, there was a box you could select or deselect if you wanted a certain tag included or excluded from the Report. Unfortunately, this never worked perfectly because tags could occur at different levels, i.e. within other tags, and this mechanism did not work for the Place Details or Source Details section. In other words, you couldn’t just get a listing of all your sources for, say, Census facts.

To fix this situation, there will now be checkboxes only beside the tags which at at Level 1 in INDI (individual) or FAM (family) records. Those will now be counted on the Tags page in their own “Facts” column.

image

This now allows you to display only the facts you want. For example, you can select just BIRT, MARR and DEAT tags if you want to just show the vital statistics for everyone and see only your vital statistics sources in the Source Details section. You could select just CENS for just the Census facts. You can select BURI to effectively give you a burial list in the Place Details section.

To make selection easier, at the right of the Tags page, I’ve added Def (Default) and None checkboxes. By checking “None”, you can uncheck everything and just add the few facts you want to show. By checking “Def”, you can show all the most important facts again and check or uncheck any others as desired.

These can then be saved into a Behold file with the “merge into” button and retrieved again with the “Merge from” button. So you can set up Behold files for “Vital Stats”, “Census Only”, and “Burials” and quickly switch between them.

image

 

DNA Features

I want/need some DNA features that I don’t see readily available in other programs. Behold is going to tell you all the ways each person is related to your starting people, their probability of sharing autosomal DNA, their expected shared autosomal DNA if they share, the same for the X chromosome and whether they share Y-DNA or mt-DNA. For all furthest-back ancestors of the starting people, their Y-candidates or mt-candidates would be listed. Those are the people alive today you can test to get that ancestors line. And inversely, for every living person, all the furthest-back ancestors who they would be Y or mt-candidates for would be listed. I don’t have a final mockup of this yet, but I’m thinking of something like this for every person:

image

 

Cheat Sheet

Well, that’s what I call it. It’s something I use in my research all the time. My first scan of any family information (e.g., an archive, book index, online site) would be to look for matches from these two alphabetically ordered lists:

  1. All ancestral surnames and the furthest-back ancestor of each one.
  2. All ancestral birth places and the furthest-back ancestors of each one.

They will be optionally shown just after the Table of Contents. I’m still finalizing how they’ll look and what they’ll contain.

 

All of this is all almost ready. Lots of little details to finish, but I thought it important to post my progress here and now and not let you think I’ve vanished from the face of Behold development.