Login to participate
  
Register   Lost ID/password?

Louis Kessler’s Behold Blog

Does WATO work well with Endogamous populations? - Sun, 25 Oct 2020

I’ve been quiet lately because I’ve been enjoying doing some research with my wife’s cousin Terry Lasky on one branch of their common families. Terry has got several dozen of his relatives on that side of the family to do DNA tests.

One aspect of what we are doing led to Jennifer Mendelsohn suggesting to me that we try WATO – the What Are the Odds tool built by Leah LaPerle Larkin and Jonny Perl.

I was concerned that the endogamy in our matches might add too much to the shared cM of two people. And I was also worried that the shared cM values that Family Tree DNA gives which are higher than the Ancestry DNA’s numbers would cause additional problems.

If WATO would not work for our known relationships, then we should not use it for our unknown relationships, meaning a test is required first.


Family Tree DNA data for a Known Relationship

So first step is to test WATO on a relationship which includes endogamy for a a person that has just one known pair of common ancestors with the other people. So there’s no other close multiple relationships that we know of other than the distant endogamy.

I took one of our starting ancestors, Moshe and Wife 3, who had three children. We have 14 DNA testers who between the children are 2C, 2C1R and 3C to each other. I took the 14th and made him the hypothesis and I created this with the WATO tool:

 WATO Tree for Endogamy(click on the image above to expand it)

So I created 11 hypothesis. 1, 2 and 3 are descendants of a child of Grace. 4, 5 and 6 are descendants of a child of Grace who is a half-sibling of Grace’s other children. 7, 8 and 9 are descendants of a full sibling of Grace, and 10 and 11 are descendants of a half sibling of Grace.

Each line of hypothesis is a half generation further away than the previous. And interestingly enough, the possible hypothesis marked in green move up a generation to compensate for this difference.

WATO’s gives you the calculated probabilities of each hypothesis:

image

So this is staying that Hypothesis 5, that this person is a child of a half-sibling of Grace’s other children is the most likely and is 52 times more likely than Hypothesis 2. Three other are possible and the rest are not statistically possible.

I love the detailed score calculation that Leah and Jonny put together. It gives you everything you’d ever want to know about each relationship in each hypothesis. And you can see how the probabilities were arrived at:

image

Now can you guess which Hypothesis is the correct one?  (spoiler below)


Family Tree DNA data stripping out small < 7 cM Matches

I had thought that WATO was based on the numbers from Blaine Bettinger’s Shared cM project. As I was calculating and writing the above, Jonny Perl responded to one of my posts on Facebook and said:

“The probabilities are actually separate from the shared cM project. In WATO v1 they’re from Ancestry’s white paper on matching and in v2 they are extrapolated from the probabilities AncestryDNA displays in the popup when you click on the cM amount.”

So I asked Jonny if it might be better to use Ancestry shared cM with WATO than to use Family Tree DNA data with it.  He said yes, and pointed me to his Individual Match Filter tool (IMF) to strip Family Tree DNA  matches back to a certain threshold (default is 7 cM).

Well Terry had done most of this work already for me and had many of the Family Tree DNA shared cM values already stripped back to only include 7 cM or larger values. I’m sure Terry would have liked to have known about Jonny’s tool as it would have saved him a lot of time.

I plotted Terry’s filtered numbers versus the non-filtered and got this relationship:

image

Notice this is a pretty strong relationship, and you can see that the trend line gives a pretty good estimate of what the filtered Family Tree DNA shared cM should be. The equation is basically saying that subtracting 50 cM from your unfiltered value will give you a decent filtered value. It should work okay for values greater than 100 cM, but obviously won’t be as good for smaller values.

Now I’ll use the filtered Family Tree DNA values in WATO instead of the unfiltered and we’ll see what happens:

WATO Tree for Endogamy (1)

This gives 5 feasible hypotheses with Hypothesis 2 coming on top being 8 times more likely than Hypothesis 5.

image


Ancestry DNA data for the same Known Relationship

Jonny’s comment also prompted me to try our Ancestry DNA matches. 11 of our 14 people above had originally tested at Ancestry DNA and those tests were later uploaded to FTDNA, so we still have 10 people we can compare with our 11th.

Putting in the Ancestry DNA shared cM values, we get this:

image

The Ancestry cM values we put in were actually not too different than the filtered FTDNA values. In fact, the biggest difference between them was 35 cM  The conclusion is the same with Hypothesis 2 being ahead of Hypothesis 5, but only being about 2 times more likely.

image


The Answer and Some Observations

The correct hypothesis is Hypothesis 2.

So it does seem that WATO is doing a good job and picked the correct Hypothesis with both the filtered FTDNA data and the Ancestry data.

Even though there are a few choices of possible valid Hypothesis, adding the known generational level of the tester and/or their age, will help to invalidate some and make one more likely.

I was worried that the endogamy would be a factor, but it seems not to be. Only the unfiltered FTDNA did not pick the correct answer on its number one hypothesis, and that is due to the many extra segments (about 50 cM worth) included in those numbers. As a result, it preferred to pick the hypothesis which was a half generation higher.

So this tells me that you needn’t worry about endogamy when using WATO. Just be sure to use either filtered FTDNA data (eliminating matches less than 7 cM) or use Ancestry DNA shared cM.

DNA Short Snappy Opinions - Sat, 22 Aug 2020

Lots has been happening on the DNA analysis front in the past few months. Lots of very divergent opinions on a whole bunch of issues.

Here are my opinions. You are free to agree or disagree, but these are mine.


AncestryDNA

  • Ancestry has had performance issues. Couldn’t they have been more honest to say performance is the reason for their cease and desist orders to the 3rd party screen scrapers who have been providing useful utilities.
  • I just hate the endless scrolling screens. Bring back paging, please.
  • The 6 and 7 total cM matches that Ancestry will be deleting definitely include people who have a higher probability of being related, but not because of the small DNA match which is likely false and too distant a match to ever track.
  • The 6 and 7 total cM matches are also being deleted because of their performance issues.   
  • I in no way trust Ancestry’s Timber algorithm, especially with the longest segment length being labeled as pre-Timber to explain why it’s longer than the post-Timber total cM. Now none of their numbers make sense.
  • Longest segment length is not as helpful if you have to look at it one by one. Why didn’t they show it in the match list and let us sort by it?
  • Let us download our match list, please.
  • Thinking Ancestry will ever give us a chromosome browser is a pipe dream.


23andMe

  • I love that they show your ethnicity on a chromosome map. This is in my opinion, a very underutilized feature by DNA testers.
  • Their Family Tree generated from just your DNA matches is a fantastic innovation.
  • A month ago, some people were able to add any of their DNA matches to that family tree. They’ve never announced this and it still hasn’t rolled out to me yet. What’s the problem here? Release it, please!
  • If my matches don’t opt in, I don’t want to know that. Please give me 2000 matches rather than 1361 matches that I can see and 639 that I can’t.


FamilyTreeDNA

  • Lot’s of innovation that they don’t get enough credit for, e.g. their assignment of Paternal / Maternal / Both to your matches based on the Family Tree you build.
  • Keeps your DNA for a looooooong time! Will be useful for future tests that don’t exist now on your relatives who passed away.
  • Best Y-DNA and mtDNA analysis for those who can make use of it.
  • Take advantage of their Projects if you can!
  • Nobody should see segment matches down to 1 cM, or have them included in your match totals. Pick a more reasonable cutoff, please.


MyHeritage DNA

  • I hate, hate, hate, did I say hate, imputation and splicing.
  • As a result of the aforementioned, I believe MyHeritage has the most inaccurate matching and ethnicities of the major services.
  • Showing triangulations on their chromosome browser is their best advanced feature that no one else has. 
  • I love that you are working with 3rd parties, and include features that others won’t such as AutoClusters.
  • How about some features to connect your DNA matches to your tree, like Ancestry and 23andMe and Family Tree DNA have?


Living DNA

  • They’ve missed out on a golden opportunity. They had the whole European market available.
  • Three years ago they launched and promised shared matches and a chromosome browser, which they’ve still not implemented.
  • Your ethnicity in no way works for me unless you add a Jewish category.


GEDmatch

  • I feel so sorry for GEDmatch’s recent troubles. They are trying so hard.
  • Great tools. Love the new Find Common Ancestors from DNA Matches tool that compares your GEDCOM with the GEDCOM files of your matches. Would love it more if I had any results from it.
  • They let you analyze anyone’s DNA, but don’t let you download your own tool-manipulated raw data.  Doesn’t that seem backwards?
  • Over 100 cold cases have been solved using DNA to identify the suspect. I loved CeCe Moore’s Genetic Detective series. I can’t figure why more people won’t opt-in their DNA for police use.


ToTheLetter DNA and KeepSake DNA

  • C’mon guys. We all want the stamps and envelopes our ancestors licked analyzed. This sounded so promising a couple of years ago. What’s taking so long?


Whole Genome Sequencing (WGS)

  • Sorry, but today’s WGS technology will never improve relative matching the way some people think it will. Current chip-based testing already does as good a job you can do when you’re dealing with unphased data.
  • Today’s WGS short read technology is too short. Today’s WGS long read technology is too inaccurate.
  • The breakthrough will come once accurate long reads can sequence and phase the entire genome with a single de novo assembly (no reference required) for $100.
  • PacBio is leading the way with their unbiased accurate long read SMRT technology that is not subject to repeat errors. It just needs to be about 100 times longer and remain accurate and we’re there. Optimistically: 5 years for the technology and 10 years for the price to come down.

Proof or Hint? - Sun, 19 Jul 2020

Have you heard the big hubbub going on in genetic genealogy circles?  Ancestry will be dropping your 6 and 7 cM matches from your match list.

image

In my case, I have 192,306 DNA matches at Ancestry. Of those, 54,498 matches are below 8 cM meaning Ancestry will drop over 28% of the the people on my match list.


The Proof Corner

Many of the DNA experts understand that a 6 or 7 cM segment is small and is rarely useful for proof of anything. That is totally true. As Blaine Bettinger states, small segment are “poison”. They are often false matches. When they are not, those segments are usually too many generations back to be used as “proof” of the connection.

I am not talking about Y-DNA or mtDNA here. Those have provable qualities in them. I’m talking about autosomal matching, you know, the DNA where the amount of DNA you share with a cousin reduces with each generation and you can be a 3rd cousin with someone and not share anything.

The only reasonable way to use autosomal segment matches as a “proof” is to use the techniques Jim Bartlett developed for Walking an Ancestor Back. This technique uses combinations of MRCAs on the same ancestral line, e.g. a 2C, a 3C, a 5C and a 7C all matching on the same segment who are on the same line. Jim has been able to do this successfully only because he has an extensive family tree and has rigorously mapped all his matches into triangulation groups over his whole genome. This is something that 99.9% of us will never attempt.

Note that Jim only includes matches that triangulate that are at least 7 cM. He is also aware that small segments may be false even when triangulated, so he excludes them.

But too often, people find through a DNA match a new 7th cousin, and find a family tree connection to them, and then claim that the DNA match proves the connection. This is so untrue on so many fronts.

Or people find two relatives who have a segment match that starts and/or ends at the same position as another DNA match. They then use this as proof of their connection to Charlemagne. Now doesn’t that sound ridiculous?


The Hint Corner

So why the worry of eliminating these mostly false, poison matches that can’t prove anything from your Ancestry DNA match list? It’s because they are hints.

As genealogists, we are using our DNA matches to find possible relatives that have common ancestors with us. We do that to extend our tree outwards and up. Any person who may have researched a part of our tree and have information about our relatives and ancestors that we don’t have is a very welcome find. (Hopefully they’ll respond to our email!)

So of my 192,306 matches, the closest 1% are the best candidates for me to research and see if I can connect them.

What about the other 99%? Surely, some of them might turn up to be a closer cousin than expected, or be along a line that I have researched deeper.

Obviously, none of us can spend the rest of our lives researching 190,000 matches one by one. So what do we do. We filter them to get interesting candidates, via:

1. A match who shares a common ancestor.

2. Match name who matches a surname in our tree

3. Surname in matches’ tree who matches one of ours

4. Birth location in matches’ tree that is a place our ancestors were from, or our relatives now live.

image

5. Shared matches who match with some of our DNA matches who we already have in our tree.

6. ThruLines, which compares the trees of our DNA matches for us and gives us possible family connections that we can investigate.

The people we find through any of these 6 methods (and other similar methods) is a way to take an unmanageable list of 192,000 people and select a subset for us to look at. Our hope (we don’t know this for sure) is that this will include more people who we’ll be able to connect to our family, and exclude the ones who are less likely.

So what most people are lamenting is not the loss of 28% of their DNA matches, but a loss of 28% of the hints they might be able to use.


Recommendation

If you want, there are ways to save some of the 6 and 7 cM matches that Ancestry will soon be eliminating. I won’t describe them here since many others already have. See Randy Seaver’s summary.

But please, don’t spend the next few weeks robotically marking the tens of thousands of small matches so that you don’t lose them. Yes, maybe one of them will turn out to be a hint one day. But you’ve got all your other matches to work with as well. You won’t run out of things to do, I guarantee it.





Addendum:  July 29, 2020:

If small DNA matches of 6 or 7 cM at Ancestry DNA cannot be used to prove a connection, because they are either false matches, or are too many generations back to confirm their ancestral path, then why can they be used as hints?

Answer: Simply because if you take a random selection of, say, 20,000 DNA testers at Ancestry, some of them will be relatives of yours. They may not actually share DNA with you, since 3rd cousins and further need not, but they could be people whose family tree connects to yours.

Basically, Ancestry DNA is giving you hints by simply giving you a large random selection of DNA testers. Their filtering tools (surname, place) may narrow those down to possible relatives, who don’t necessarily share any actual DNA with you.    
   
But these hints are better than just random hints. They will likely be people who share more ethnicity with you than a random DNA tester at Ancestry would.

For example, Ancestry has me at 100% European Jewish. If I compare myself with my first 6 cM match at Ancestry, I get this:

image

This 6 cM match of mine also has 100% European Jewish ethnicity.

To see if this was generally the case, I took my closest 20 matches, and my first 20 matches at 40 cM, at 20 cM, 15 cM, 10 cM, 9 cM, 8 cM, 7 cM and 6 cM. I marked down what percentage of European Jewish they had. Then I sorted each group of 20 highest to lowest. I get this:

image

Of the 180 matches I checked 179 had some European Jewish Ancestry. Over half of the matches also had 100% European Jewish ethnicity and many of them have 50% or more.

There is a much greater chance that I might find a connection to someone with European Jewish ancestry than someone without any, so these are good hints. Using ancestral surname and place filtering tools, I might find that some of these people are relatives and they can help me extend my family tree.

Does that mean that we share DNA?  Not necessarily. The matches, especially the small ones, may be false matches.,

Or we may actually share DNA. but the segments we share may not be coming from the common ancestor we found, but may be from another more distant line that we’ll never find, or it may be (especially in my case) general background noise from distant ancestors due to endogamy. We don’t know and cannot tell.

None-the-less, these matches are hints that might connect you to a relative.