Does WATO work well with Endogamous populations? - Sun, 25 Oct 2020
I’ve been quiet lately because I’ve been enjoying doing some research with my wife’s cousin Terry Lasky on one branch of their common families. Terry has got several dozen of his relatives on that side of the family to do DNA tests.
One aspect of what we are doing led to Jennifer Mendelsohn suggesting to me that we try WATO – the What Are the Odds tool built by Leah LaPerle Larkin and Jonny Perl.
I was concerned that the endogamy in our matches might add too much to the shared cM of two people. And I was also worried that the shared cM values that Family Tree DNA gives which are higher than the Ancestry DNA’s numbers would cause additional problems.
If WATO would not work for our known relationships, then we should not use it for our unknown relationships, meaning a test is required first.
Family Tree DNA data for a Known Relationship
So first step is to test WATO on a relationship which includes endogamy for a a person that has just one known pair of common ancestors with the other people. So there’s no other close multiple relationships that we know of other than the distant endogamy.
I took one of our starting ancestors, Moshe and Wife 3, who had three children. We have 14 DNA testers who between the children are 2C, 2C1R and 3C to each other. I took the 14th and made him the hypothesis and I created this with the WATO tool:
(click on the image above to expand it)
So I created 11 hypothesis. 1, 2 and 3 are descendants of a child of Grace. 4, 5 and 6 are descendants of a child of Grace who is a half-sibling of Grace’s other children. 7, 8 and 9 are descendants of a full sibling of Grace, and 10 and 11 are descendants of a half sibling of Grace.
Each line of hypothesis is a half generation further away than the previous. And interestingly enough, the possible hypothesis marked in green move up a generation to compensate for this difference.
WATO’s gives you the calculated probabilities of each hypothesis:
So this is staying that Hypothesis 5, that this person is a child of a half-sibling of Grace’s other children is the most likely and is 52 times more likely than Hypothesis 2. Three other are possible and the rest are not statistically possible.
I love the detailed score calculation that Leah and Jonny put together. It gives you everything you’d ever want to know about each relationship in each hypothesis. And you can see how the probabilities were arrived at:
Now can you guess which Hypothesis is the correct one? (spoiler below)
Family Tree DNA data stripping out small < 7 cM Matches
I had thought that WATO was based on the numbers from Blaine Bettinger’s Shared cM project. As I was calculating and writing the above, Jonny Perl responded to one of my posts on Facebook and said:
“The probabilities are actually separate from the shared cM project. In WATO v1 they’re from Ancestry’s white paper on matching and in v2 they are extrapolated from the probabilities AncestryDNA displays in the popup when you click on the cM amount.”
So I asked Jonny if it might be better to use Ancestry shared cM with WATO than to use Family Tree DNA data with it. He said yes, and pointed me to his Individual Match Filter tool (IMF) to strip Family Tree DNA matches back to a certain threshold (default is 7 cM).
Well Terry had done most of this work already for me and had many of the Family Tree DNA shared cM values already stripped back to only include 7 cM or larger values. I’m sure Terry would have liked to have known about Jonny’s tool as it would have saved him a lot of time.
I plotted Terry’s filtered numbers versus the non-filtered and got this relationship:
Notice this is a pretty strong relationship, and you can see that the trend line gives a pretty good estimate of what the filtered Family Tree DNA shared cM should be. The equation is basically saying that subtracting 50 cM from your unfiltered value will give you a decent filtered value. It should work okay for values greater than 100 cM, but obviously won’t be as good for smaller values.
Now I’ll use the filtered Family Tree DNA values in WATO instead of the unfiltered and we’ll see what happens:
This gives 5 feasible hypotheses with Hypothesis 2 coming on top being 8 times more likely than Hypothesis 5.
Ancestry DNA data for the same Known Relationship
Jonny’s comment also prompted me to try our Ancestry DNA matches. 11 of our 14 people above had originally tested at Ancestry DNA and those tests were later uploaded to FTDNA, so we still have 10 people we can compare with our 11th.
Putting in the Ancestry DNA shared cM values, we get this:
The Ancestry cM values we put in were actually not too different than the filtered FTDNA values. In fact, the biggest difference between them was 35 cM The conclusion is the same with Hypothesis 2 being ahead of Hypothesis 5, but only being about 2 times more likely.
The Answer and Some Observations
The correct hypothesis is Hypothesis 2.
So it does seem that WATO is doing a good job and picked the correct Hypothesis with both the filtered FTDNA data and the Ancestry data.
Even though there are a few choices of possible valid Hypothesis, adding the known generational level of the tester and/or their age, will help to invalidate some and make one more likely.
I was worried that the endogamy would be a factor, but it seems not to be. Only the unfiltered FTDNA did not pick the correct answer on its number one hypothesis, and that is due to the many extra segments (about 50 cM worth) included in those numbers. As a result, it preferred to pick the hypothesis which was a half generation higher.
So this tells me that you needn’t worry about endogamy when using WATO. Just be sure to use either filtered FTDNA data (eliminating matches less than 7 cM) or use Ancestry DNA shared cM.