Login to participate
  
Register   Lost ID/password?

Louis Kessler’s Behold Blog

My DNAweekly Interview by Ditsa Keren - Mon, 9 Nov 2020

Two weeks ago, I was interviewed on Zoom by Ditsa Keren for an article that was published on DNAweekly today.

image

DNAweekly publishes an interesting blog with a wide range of articles about consumer-based DNA tests that extend into their use by genealogists. They reach out and look for third-party software that might be of interest to DNA testers and found me and asked me if I was willing to be interviewed.

On their Blog page, they give an example of some of the recent DNAweekly blog posts:

image


The website’s primary focus is comparing, reviewing and rating DNA tests and include some FamilyTree based sites in their reviews. I currently count 58 different services in their review list.

They classify companies into these categories:  Ancestry, Family Tree, Health & Wellness, Diet and Nutrition, STD, Pets.  They give each company a rating from 1 to 10, provide for each a User Score of 1 to 5 stars, and then link each to a complete review of that product.

The product reviews are quite detailed and seem to be done very objectively. The company is obviously making money from affiliate links by you clicking and then purchasing the product, but that does not seem to be biasing their reviews in my opinion. They have some coupons available for some products towards the end of their review. Finally, at the bottom of their review, they allow you the user to write your own review on the product and give your star rating. The author of each review is shown with a brief biography.

All in all, a very nice review site for DNA, family tree and health testing services.

Ancestry’s Timber Algorithm is Better Than You Think - Thu, 29 Oct 2020

Ancestry has recently made changes to its display of the amount of DNA you match with someone. The amount is shown in cM (centimorgans). Most DNA testers using their DNA for genealogy purposes know what cM are and what they represent.

image

Your DNA match list shows the Shared DNA you have with each of your matches.

The change Ancestry made that I’d like to talk about is the addition of “Unweighted shared DNA”. When you click on the “Shared DNA” link, you’ll be shown information containing this unweighted segment value:

image

Here you’ll see a “Shared DNA” value of 91 cM and an “Unweighted shared DNA’” value also of 91 cM.  When the shared DNA value is 90 cM or more, the unshared value is always the same.

But when the shared DNA value is less than 90 cM, then the unweighted value can be more, and usually is.  The unweighted value can be as high as 89 cM.

image

Ancestry uses what they call their Timber algorithm to filter out pieces of DNA that it figures should not be considered when deciding if two people are related.

A lot of people, including myself, have been critical of Timber believing it removes segments that it shouldn’t and they were very happy with the new information that now shows the pre-Timber amount. You can’t easily get this amount for all your matches. You do have to click through each match one by one to get that match’s unweighted value. You cannot see them all on your DNA Matches page like you can the post-Timber values.



Comparing Average Shared Values

The research work I’m currently doing on one branch of my wife’s family with her cousin Terry Lasky includes some lines where we do not know if the ancestors are brother, half-brothers or first cousins. We have descendants of two ancestors who DNA tested that we can compare.  Those who are 3 generations down would be 3rd cousins if the ancestors are brothers, half 3rd cousins if they are half-brothers, and 4th cousins if the ancestors are 1st cousins. 

All of our family includes endogamy. Terry and I have been worried about the effect of endogamy on our cM shared values, and on the effect that the Ancestry Timber algorithm would have on our cM values.

Terry has 32 DNA testers from this branch who tested at Ancestry. Among the testers he had 138 pairs of them where he knew for sure how they were related and did not know of a second way they might be related, other than through endogamy.

Parent/child are 1 generation apart. At Ancestry DNA, parent/child pairs match with 3476 cM. Children are two generations apart (up to parent, down to other child). Their average match at Ancestry DNA should be 3/4 of a parent/child match or 2607 cM.  An uncle/aunt/nephew/niece is 3 generations apart, and an average match at Ancestry DNA in theory should average half of a parent/child match and be 1738 cM. From there on, every extra generation halves the cM matching. What we are doing is counting meiosis which is the number of times the cells recombine. Meiosis 6 for example can be 2nd cousins, 1st cousins twice removed, half 1st cousins once removed, or great-great-great-great grandparent/child and many other relationships. But they all should have the same theoretical average cM at Ancestry DNA and that should be 217 cM.

So what I did is averaged Terry’s known pairs by meiosis and compared them to what the theoretical average cM should be at Ancestry. It resulted in this table:

image

This very much surprised me when I first saw it. I had thought that Terry’s Ancestry numbers would be considerably higher than the theoretical averages due to endogamy. But Terry’s pairs averaged only 5 cM higher than the theoretical values. That is extremely close.

I scratched my head wondering why. These are the post-Timber values which had some segments removed by TImber. I decided to separate out the Timber affected numbers from those unaffected and divided the above table into >= 90 cM and < 90 cM.

image

Again I was surprised. The meiosis 7 and 8 have average differences of +29 and +76 for >= 90 cM.  They have average differences of -70 and -26 for < 90 cM.

It seems Ancestry optimized their 90 cM cutoff for Timber to get the averages in the meiosis levels to be close to the theoretical. What this seems to show is that it is not a good idea to separate out the two or to try to correct for their Timber algorithm.  Their numbers with Timber seem to be best.

Just to check, I averaged out the Ancestry unweighted values for Terry’s pairs:

image

Meiosis 8 corrected is okay, but meiosis 7 has and average difference of -51.  Compare that to an average difference of 7 in the original raw values with Timber.  So I wouldn’t want to use these unweighted. Using Ancestry’s values with Timber seems best.

It seems that the Ancestry genetic scientists knew what they were doing with Timber. They seemed to have optimized it so that each meiosis level will average out very close to it’s theoretical value.



Blaine’s Shared cM Version 4.0

Well that was really good to know. Now I wanted to know how much Blaine Bettinger’s Shared cM Project v4 varied from the Ancestry theoretical averages. Surely Blaine’s would be different. His numbers were based on submissions of people who got cM values not just from Ancestry, but also from 23andMe, Family Tree DNA, GEDmatch, MyHeritage and others. Not all companies report exactly the same way. Family Tree DNA includes small segments down to 1 cM and will usually report higher shared cMs for the same two people. 

So here was a second surprise:

image

Blaine’s values are actually very close to the Ancestry theoretical value for the closer relationships.  Even meiosis 6 to 9 isn’t that far away. I attribute the slightly larger differences for the more distant relationships being due to some reported pairs being related an additional way that is adding to the amount. It isn’t much, just 12 to 21 cM,

None-the-less, Blaine’s numbers match up well with the Ancestry theoretical and that’s good to know.



Conclusion

Ancestry did Timber for a reason. It seems to me that they may have calibrated TImber so that the average cM for a given relationship would be the same as the theoretical average. Even if they didn’t do that calibration on purpose, it sure worked out well.

My recommendation is to use the Timber-based numbers, especially when comparing to Blaine’s shared cM project.

Don’t worry about the new unweighted Shared DNA values, and stop complaining so much about Timber.

Using WATO for Unknown Ancestral Relationships - Mon, 26 Oct 2020

Big update Oct 27:  Much easier way to do this than in my post below.  Leah Larkin informed me that I can do all 3 scenarios at once like this:

image

So all three hypothesis indeed can be included at once.

And the results with WATO Version 2 come out as:

image

Showing Hypothesis 1 (Brother) is 37 times more likely than Hypothesis 2 (Half-Brother) which is 2481 times more likely than Hypothesis 3 (1st Cousin).

Much simpler! Many thanks to Leah and Andrew Millard on the WATO Facebook group for letting me see the light. 

I’ll leave my post below to show my original thinking.



Original Post:

In yesterday’s post, I wanted to see if the What Are The Odds (WATO) tool at the DNA Painter site would work for endogamy, and I came out satisfied that it does, for either Ancestry DNA numbers or Family Tree DNA numbers, with the < 7 cM matches removed from the latter.

WATO is designed to help you have a DNA match with someone where you don’t know for sure how that person is related to you. You build your tree in the WATO tool and add positions where you think your match might be. You set those positions to be Hypothesis.

Well, I’ve got a slightly different problem. We’ve got a bunch of DNA matches and I know where the fit in the tree.  What I don’t know is how the people at the top of the tree are related.

Let me start with the tree that I used as an example yesterday:

image

So these are all the relevant descendants of Moshe. The DNA testers are shown shaded. The Hypothesis 1 is a known tester who we simply used as a hypothesis.

Now there happens to have been a man named Gedalia who has the same last name as Moshe and came from the same town in Ukraine. We know of a few of Gedalia’s descendants who DNA tested and they are matches to the descendants of Moshe. What we don’t know and want to figure out is the relationship between Moshe and Gedalia. Could they be brothers? Half-brothers? First cousins?


Are Moshe and Gedalia Brothers?

So what I’ll do is expand the tree. I’ll add Gedalia to the tree as a brother to Moshe. I’ll add the descendants and mark the one we will use in this example as the Hypothesis: Now I’ll enter the cM shared between this descendant of Gedalia and each of the testers under Moshe.  I’ll used filtered Family Tree DNA numbers since those worked best yesterday:

image

This gives us a score of zero, saying this is not possible.

So let’s take a look at the score calculation:

image

It’s saying that Rob is way too high at 263 cM to be a 3rd cousin.

But wait a minute! That is saying that Rob is related more closely than 3rd cousin to our Hypothesis person, who we’ll call: Hyp.  We know from the diagram above that through Moshe and Gedalia, he cannot be closer than 3rd cousins.  Since Rob’s cousin Sha and 1C1R Ala don’t have the same problem, they are okay. That must mean that Rob’s mother is related to Hyp, adding extra cMs to Rob and his sibling And. In fact, And is higher than all the rest at 145 cM, but not high enough to make being a 3rd cousin to Hyp an impossibility.

Since Rob and And are related another way to Hyp, what I’ll do is remove their shared DNA amounts from being included in the WATO calculations and run it again:

image

That’s better and now the Hypothesis shows up as possible. Here’s the score calculation:

image

It’s the same as the above for the listed people, except that the Combined odds ratio is now 1.00.


Are Moshe and Gedalia Half-Brothers?

Let’s now do the same thing and just change Moshe and Gedalia to be half-brothers. WATO lets us do this and indicates they are halves with the coloured dotted lines to the left of their boxes:

image

All of the scores have changed, but this scenario is still a possibility:

image


Are Moshe and Gedalia First Cousins?

Well, let’s delete Gedalia’s side and add him back in as a first cousin:

image

Once again, this is said to be possible. Here are the scores:

image


So Which Is More Likely? Brother? Half? Cousin?

WATO has a wonderful mechanism for comparing different Hypotheses. When you include more than one hypothesis in a scenario, it tells you which of the three is most likely and how many times more likely it is than the next. (See yesterday’s post for an example).

But here, I have three different trees each with only one Hypothesis. WATO won’t compare them for you.

Well I think I see what WATO is doing.  I may be wrong, but it looks like it is multiplying the probabilities together and comparing the results between the scenarios. So I can easily do that myself in a spreadsheet:

image

I have highlighted the most likely scenario for each match. Half-Brother wins this comparison with 7, versus 1st Cousin with 3 and Brother with just 2.

The line at the bottom contains the product of the 9 values above it. The highest value is Half-Brother which is 9 times larger, meaning it is 9 times more likely a possibility than 1st Cousin. 1st Cousin is 3 times more likely than Brother. And Brother is 25 times less likely than Half-Brother.

So there you have it. We haven’t proved anything, but at least we now know that all scenarios are possible and that half-brother is most likely.


Hint, Hint, Leah and Jonny

WATO is a wonderful tool to help you hypothesize where your DNA matches fit into your tree. That was what it was designed for.

But wouldn’t it be nice if WATO could also help you test different ancestral scenarios as well, as I have just done?  Well it can, if you follow the above procedure and do the comparison yourself,

WATO-Ancestors could be set up to make it easier for you by remembering the results of each of your scenarios, and then comparing them for you, so that you won’t have to yourself.




Update (80 minutes later): I didn’t realize when I was doing the analysis that I was using Version 1 of WATO. Version 2 includes new probability numbers taken from an update to Ancestry’s paper. See Leah’s article: Improving the Odds. The main improvement is that it now has much more detail for small matches.

You can switch from Version 1 to 2 very easily, so I did and I recalculated. Here’s the revised table:

image

To tell the truth, it really changed the results. Now the conclusion is that Brother is the most likely relationship and that scenario is 37 times more likely than Half-Brother.

So make sure you use Version 2 of WATO to get the best probabilities.




Additional Idea: If you have more than one tester on the other side of the tree, you can calculate all the match values for each scenario for each of them, and then simply multiply out (or geometric mean) the “Product” line for each of them.

For example, in the above table, if I had a second person that gave Product numbers of 0.0000385 for Brother, 0.0000655 for Half-Brother and 0.0000073 for 1st Cousin, then

GMean(Brother) = (0.0000033505 * 0.0000385) ^ (1/2) = 0.0000114
GMean(Half-Brother) = (0.0000000901  * 0.000655) ^ (1/2) = 0.0000024
GMean(1st Cousin) = (0.0000000 * 0.0000073) ^ (1/2) = 0.0000000

If you don’t know what a geometric mean is, then just use a simple average which should still tell you which scenario is most likely.