Login to participate
  
Register   Lost ID/password?

Louis Kessler’s Behold Blog

23andMe’s Family Tree Beta - Mon, 23 Sep 2019

Someone on Facebook reported a new feature at 23andMe and I couldn’t wait to try it. This 23andMe beta “auto-builds”  your family tree from your DNA connections with other customers.

image

They aren’t the first to try something like this. Ancestry DNA has ThruLines, which uses your tree and your DNA match’s tree to try to show you how you connect. MyHeritage DNA does the same with their Theory of Family Relativity (TOFR) using your tree and your match’s tree at MyHeritage.

I had good success at Ancestry who gave me 6 ThruLines joining me correctly to 3 relatives I previously knew the connection to, and 3 others who were correctly connected but were new to me. I then contacted the latter 3 and we shared information and I was able to add them and their immediate relatives to my tree.

I had no success yet at MyHeritage DNA even though I have my main tree there. I’ve never had a sing TOFR there fore either me or my uncle. My closest match (other than my uncle) is 141 cM. My uncle’s closest is 177 cM. Those should be close enough to figure out the connection. But none of the names of my matches at MyHeritage give me any clues, and I haven’t been able to figure out any connection with any them, even using some of their extensive trees.

23andMe does not have trees to work with like Ancestry and MyHeritage. Actually, I shouldn’t say that. Not too long ago, in another beta, 23andMe allowed uploading your FamilySearch tree to 23andMe. See Kitty Cooper’s blog post about it for details about it. They never said what they were going to do with that data, but I wanted to be ready if they did do something, All it tells me right now is this:

image

If you include your FamilySearch Tree in your profile, then anyone else who has done the same will show a FamilySearch icon next to their name. You can also filter for those who have done so. I don’t think many people know about this beta feature yet, because my filter says I have no matches who have done so.

image

But I digress.  Let’s go check out the new Family Tree beta at 23andMe. I’m somewhat excited because I have a dozen relatives who I know my connection to who have tested at 23andMe. I’ve been working with them over the past 2 weeks getting my 23andMe matches to work in my (almost-ready) version 3.0 of my Double Match Triangulator program. And the odd thing about all my known relatives at 23andMe is that they are all on my father’s side!  I’d love to be able to connect to a few people at 23andMe on my mother’s side. Maybe this Family Tree beta will help. Let’s see.

So I go over to my 23andMe “Your Family Tree Beta” page. It takes it a few minutes to build my tree and update my predicted relationships. Once it does, out pops this wonderful diagram for me.  (Click on the image to enlarge it).

image

I’m shown in the middle (my Behold logo sun), with my parents, grandparent, great-grandparents above. And 23andMe has then drawn down the expected paths to 13 of my DNA matches.

This is sort of like ThruLines and TOFR, but instead of showing just the individual connections with each of the relative, 23andMe are showing all of them on just one diagram. I like it!! 

The 13 DNA matches they show on the diagram include 5 of the 10 matches I have that I know my relationship with (arrows point to them), and 8 who I don’t know my relationship with. Maybe this will help me figure out the other 8.

My 3 closest 23andMe matches are included, who I have numbered 1, 2 and 3. Number 3 is on my mother’s side, but I don’t know what the connection is. None of the other 10 are on my first page of matches (top 25).

The number 1 with a green arrow is my 1st cousin once removed. She is the granddaughter of my father’s brother. So that means the entire right side of the tree should be my father’s side and the left should be my mother’s side.

The ancestors are all shown with question marks. I can now try labelling the people I know because of my connection with my first cousin.  When I click on the question mark that should be my father, I get the following dialog box:

image

The “More actions” brings up a box to add a relative, but that action and likely others that are coming are not available yet.

When you click on “Add Information” you get:

image

I click on “I understand” and “Next” and I get:

image

I click “Yes” and “Next” and it lets me enter information about this person:

image

Now I’m thinking at this point that they have my FamilySearch info. Maybe in a future version, they can allow me to connect this person to that FamilySearch tree, and not only could they transfer the info, but they should be able to automatically include the spouse as well.

But for now, I simply enter my father’s information and press “Save”. I did not attempt to add a photo.

When I clicked “Deceased”, it added Place of death and Date of death. But it has a bug because the Place of Death example cannot be edited. But what the heck. This is a beta. Expect a few bugs.

It gives a nice confirmation box and then on the chart changes the orange circle with the question mark to the green circle with the “TK” (my father’s initials):

image

I also go and fill in my mother, and my father’s brother and his daughter who connect to my 1st cousin once removed.

Next step: Those 4 red arrows on the right point to four cousins on my father’s father’s father’s side. I can fill in two more sets of ancestors:

image

Unfortunately, the 4 DNA matches at the right were up one generation from where they should have been.

image

They should under AM and RB, not under RB’s parents. This is something you can’t tell from DNA, but maybe 23andMe could use ages of the DNA testers to estimate the correct generation level the matches should be at.

This is basically what 23andMe’s Family Tree beta seems to do in this, their first release. It does help visualize and place where DNA relatives might be in the tree. For example, the two unidentified cousins shown above emanate from my great-grandmother’s parents. So like clustering does, it tells me where to look in my family tree for my connection to them.

Conclusion:

This new Family Tree at 23andMe has potential. They seem to be picking specific people that would represent various parts of your tree, so it is almost an anti-clustering technique, i.e. finding the people who are most different.

There is a lot of potential here. I look forward to see other people’s comments and what enhancements 23andMe makes to it in the future, like making use of the FamilySearch relatives from their other beta. Being able to click through each DNA relative to their profile would be a useful addition. And using ages of the testers would help to get the generational level right.

Our desire as genealogists is that DNA should help us extend our family tree. It’s nice to see these new tools from 23andMe as they show that the company is interested in helping genealogists.

Now off I go to see if I can figure out how the other 7 people might be connected.

The Life and Death of a DNA Segment - Mon, 19 Aug 2019

There’s a bad rumor going around that segment matches, especially for small segments, can be very old. I’ve heard expectations that the segment might come from a common ancestor 20 generations back or even 30, 40 or more. And that’s said to happen even if you have a fairly large 15 cM segment.

Part of this is due to the incorrect thinking that a segment of your DNA has been around forever and has been passed down from some ancient ancient ancestor to you and to just about everyone else. Since there is only a 1/2 chance that each generation gets the segment from the right parent, the argument is that it gets offset maybe by the more than 2 children per generation that keep the segment alive all the way down to two 30th or 40th generation descendants who then happen to share the segment. That also assumes there is no intervening ancestor along some other path who is more recent than that 30th generation one. For endogamy, the argument is that the segment has proliferated through the people and most of them happen to have it. Although in that case, I find it hard to believe that there is not a line to a different common ancestor who is fewer than 30 generations back.

The fallacy here is that all our DNA segments are ancient. They are not. In fact, many of them are quite recent, only a few generations old.

Let’s take a look at, say a 15 cM segment that you got from your father. You could have:

1. Got the whole segment from your father’s father’s chromosome,

2. Got the whole segment from your father’s mother’s chromosome, or

3. There could have been a recombination that occurred somewhere along the 15 cM segment and you got part of it from your father’s father and part from your father’s mother.

It is case number 3 that is interesting. In this case, that 15 cM segment is no longer the same as your father’s father’s segment, nor is it the same as your father’s mother’s segment. It is a new segment that has been born in you and you are the first ancestor to have that segment and maybe you’ll pass it down to many of your descendants. And no one else will have that segment that you have, unless some random miracle as rare as a lottery winning happens.

Also, your father’s father’s segment at this location and your father’s mother’s segment both are not passed down to you. Maybe they’ll be passed to a sibling of yours or maybe they won’t. But both of your grandparent’s segments have died along your line.

So what actually happens is that any segment of your DNA has its birth in one of your ancestors. That ancestor may pass it down to zero or more descendants, and if it is passed down, each descendant may or may not continue to pass it down. The segment eventually dies. A recombination on the segment can’t be avoided forever.

Now what is the probability of a new 15 cM segment being “born” in you? Well, that’s what cM represents and there will be about a 15% chance that any particular 15 cM segment of your DNA was formed from a recombination in your parent, and that you have a brand new segment. For most purposes, using the cM as a percentage is close enough. But for more accuracy, I’ll use the actual probability from the equation P(recomb) = 1 – exp(-cM/100) which gives 13.9%. (See my Update Jan 26, 2020 about this equation)

Well guess what? The probability that any particular 15 cM segment is born in any of your ancestors is also 13.9%. The chance that the segment was not born, but was passed down is therefore 86%. We can use that fact to now calculate the probability that this segment was passed down any number of generations to some descendant:

image

What this says is that if you have a 15 cM segment, then there is about a 50% chance that it was created in one of the last 5 generations, a 75% chance that it was created in one of the last 9 generations, and 95% chance that it was created in one of the last 20 generations. The average age of segments that size is 7.2 generations (1 / 13.9%). This is very simple mathematics/statistics.

If you match with another person on the same segment, then they have the same probabilities. The chance both of you got this segment from more than 20 generations back would be only 5% x 5% = 0.25%.


Revisiting Speed and Balding Once Again

I’m still frustrated that Speed and Balding’s simulation results are being used without question to estimate segment age for human DNA segment matches.

About two years ago, I used two different sets of calculations, one my own in Revisiting Speed and Balding, and one based on work by Bob Jenkins in Another Estimate of Speed and Balding Figure 2B. In both cases, I found segment age estimates that were somewhat less than Speed and Balding.

Let’s see how my Segment Life estimates compare. Picking a few different segment sizes and calculate their values gives:

image

And then lets plot these in a stacked chart:

image

Look at the gray area at the top left. That’s the probability of segments of the given segment size being 20 or more generations old. The green bar is the divider at 10 generations. You likely have a good chance to identify how you’re related to segment matches that are under the green bar, indicating that most segments over 15 cM should be identifiable and that even very small segments might be identifiable.

Compare this to Speed and Balding:

Speed and Balding give much larger chance of older segments than does my segment life methodology, or than do either of the two analyses in my earlier blog posts.


Conclusion

Segments aren’t passed down from ancient times. They are created and die all the time due to recombination events and they may not be as old as you are led to believe. Some of your smaller matching segments. e.g., between 5 and 15 cM have (by my segment life and other earlier calculations) a 40% to 70% chance of originating less than 10 generations ago. This means you might be able to determine how you’re related to your match.

By using triangulation techniques (such as Double Match Triangulator), you can determine triangulations of segments in the 5 to 15 cM range which will eliminate most by-chance matches. You can then put your segment matches into Triangulation Groups, to help find the common ancestor of the group and connect your DNA matches to your tree.




Update Jan 26, 2020:  After discussion with Celia Baitinger on the Facebook Genetic Genealogy Tips and Techniques group, we realized that the Wikipedia equation for P(recomb) = (1 – exp(-2 * cM / 100) / 2 may only be for recombinations that involve an odd number of crossovers. For genetic genealogy, we are interested in all crossover events. As a result, the correct analysis should be this:

Assuming a Poisson distribution for crossovers (which is what is usually assumed), then the P(zero recombs) when the mean is cM/100 is: exp(-cM/100), and therefore:

P(recomb) = 1 - exp(-cM/100)

I have updated the figures in the above article to reflect this correction. No changes were significant enough to affect any of my observations or conclusion.

50 Years, Travelling Salesman, Python, 6 Hours - Wed, 7 Aug 2019

This is my first blog post in over 2 months. The reason is that I have been working very hard trying to finish Version 3 of Double Match Triangulator. Every thing I’ve been doing with it is experimental, and there’s no model to follow. So it’s tough to get it just right. I started the documentation of the new version already, when I diverted to get some sample data from some people who had done Visual Phasing (VP) with 3 or more siblings, because I was thinking that this version of DMT should be able to use segment matches to get most of the same grandparent assignments that VP does. I’ve made progress but still not completed with that.

But this morning, I was sparked programmatically by an annual event that happens where I live in Winnipeg. Folklorama is a two week festival that celebrates the multiculturalism in our city. image

“Pavilions” are set up in various venues (arenas, churches, community centres) to showcase a particular country/culture. Each pavilion has a stage performance, cultural displays, and serves authentic ethnic food and drink.

This is the 50th year of Folklorama. So I remember it as a kid. The 40 pavilions were something that I always wanted to do a bike tour of, as they were spread all over our city. Being interested in mathematics, I was curious of a way to optimize my route and use the shortest possible route to bike to all of the pavilions.

But 50 years ago was well before we had personal computers or the internet. And route traversal problems, especially this one which was known as the Travelling Salesman problem, were computationally difficult to solve back then, even on the mainframe computers at the time.

This year’s version of Folklorama got me thinking: Maybe the problem is solvable easily today. I took a look online and was surprised very much by what I found. There is a Google Developers site that I didn’t know about.

image

And at that site, they had all sorts of OR-Tools.  OR stands for Operations Research which is the name of the field that deals with analytical methods to make better decisions. The Traveling Salesman problem is in that field and has its own page at Google Developers:

image

Not only that, but they explain the algorithms and present the programs in four different programming languages:  Python, C++, Java and C#.

Now, I’m a Delphi developer, and I use Delphi for development of Behold and Double Match Triangulator. I’ve never used the four programming languages given. But I’ve been looking for a quick and easy to program language to use for smaller tasks such as analysis of raw data files from DNA tests, or even analysis of the huge 100 GB BAM files from my Whole Genome Sequencing test.

Over the last year or so, I had been looking with interest at the language Python (which is not named after the snake but is named after Monty Python’s Flying Circus). Python has been moving up in popularity because it is a new, fast, interpretive, concise, powerful, extensible and free language that can do just about anything and even do a Hello World in just one line. It sort of reminds me of APL (but without the Greek letters) which was my favorite programming language when I was in University.

Well what better time to try Python than now to see if I can run that Travelling Salesman problem.

So this morning I installed the Windows version of Python on my computer. It normally runs from a command prompt, but there is a development environment for it called IDLE that it comes with it that makes it easier to use.

It didn’t take me too long to go through the first few topics of the Tutorial and learn the basics of the language.  I threw in the Traveling Salesman code and sample data from the Google Developers site, and I got an error. The Python ortools package was missing. It took me about an hour to figure out how to use the Python PIP (package manager) to add ortools. Once I did, the code ran like a charm.

Fantastic. Now can I use it for my own purpose. First, I had the map of all the Pavilion locations:

image

There were 22 pavilions in week 1, of which 4 were at our Convention Centre downtown, so in effect there were 19 locations, plus my home where I would start and end from, so 20 in total.

Now how to find the distances between each pavilion?  Well, that’s a fairly simple and fun thing to do. You can do it on Google Maps by selecting the start and end address. Choosing the bicycle icon, it would show me possible routes and the amount of time it would take to bike them.

For instance, to go from the Celtic Ireland Pavilion to the Egyptian Pavilion, Google Maps suggested 3 possible bike routes taking 44 minutes, 53 minutes or 47 minutes. I would choose the quickest one, so I’d take the 44 minute route.

image

Now it was just a matter of using Google Maps to find the time between each of the 20 locations. That’s 20 x 19 / 2 = 190 combinations!  Google Maps does have a Google Distance Matrix API to do it programmatically, but I figured doing this manually once would take less time than figuring out the API. And besides, I liked seeing the routes that Google Maps was picking for me. Google Maps did remember last entries, so using I only had to enter the street number to change the starting or ending location. It wouldn’t take that long.

At 1 p.m was the Legacy Family Tree webinar that I was registered for: “Case Studies in Gray: Identifying Shared Ancestries Through DNA and Genealogy.” by Nicka Smith.

image

It was a fantastic webinar. Nicka is a great speaker.

And while I had the webinar on my right monitor, I was Google mapping my 190 combinations on my left monitor and entering them into my Python data set:

image

I finished my data entry just about when the webinar ended at 2:30 pm CST.

Next, I ran the program with my own data, and literally in the blink of an eye, the program spewed out the optimal bike route:

image

After 50 years of wanting to one day do this, it took only 6 hours to install and use a new language for the first time, enter 190 routes onto Google Maps, load the data, find my answer, and enjoy a wonderful webinar.

So tomorrow morning, it will be back to working on version 3 of DMT in the morning, followed by what should be a very pleasant 4 hour (247 minute) afternoon bike ride to all 23 week 1 Folklorama pavilions along the optimal route.

image

And maybe next week, I’ll do the same for the week 2 pavilions.