Login to participate
  
Register   Lost ID/password?

Louis Kessler’s Behold Blog

Probability of No X Segments Matching - Sun, 25 Dec 2016

Okay. Let’s do what we did last post for autosomal this time for the X chromosome.

I’ll assume you already know the unique pattern of how the X chromosome get’s passed down, where males get their one X from their mother and females get one of their Xs from their mother and the other from their father. The mother’s is from both of her parents and since the X chromosome (according to FamilyTreeDNA) is 196 cM, that means it recombines with an average of about 1.96 crossovers, which I will round to be 2.. The father’s is passed intact only to his daughter without recombining.

So a son only gets one X chromosome from his mother which will have on average 2 crossovers. A daughter gets one from her mother with 2 crossovers and one from her father with zero crossovers.

This is interesting. That means is a 50% chance of 2 crossovers if it is a son, and that leaves a 25% chance of 2 crossovers and a 25% of zero crossovers if it is a daughter. That works out to 75% chance of 2 and 25% chance of zero giving an expected value of 1.5 crossovers per generation.

And that seems to make sense, since if you got up the female line via mother-mother-mother-mother…, you’ll get 2 crossovers each generation.If you go up the most possible male line which is father-mother-father-mother…, you’ll get zero,2,zero,2,… crossovers which average 1 crossover each generation. So 1.5 seems like it could very well be the average over all lines.

For autosomal, we started with the 23 chromosomes pairs and increased them by 34 segments each generation since both pairs total about 3400 cM. Here for the X chromosome, we’ll start with 1 and increase by 1.5 segments per generation. It’s okay if we use fractional segments here because we’re dealing with averages.

For autosomal, we doubled the number of ancestors each generation. The X chromosome grows not by doubling, but via a Fibonacci sequence. As a lover of mathematics, I must say it’s nice to get good old Fibonacci into DNA. A Fibonacci sequence starts with 1 and 1 and then the next number is always the sum of the previous two, so it’s 1, 1, 2, 3, 5, 8, 13, 21,… A male starts with one X chromosome parent, whereas a female starts with two, so they are offset with one another and an overall average can be taken.

Now lets put the generational levels together:

image

There you see the segments growing 1.5 per generation, the male and female Fibonacci sequences and their average that represents the expected number of ancestors.

The “P(NoMat)” column is the probability of no segments matching a specific ancestor given that there are N ancestors and S segments and is calculated as:

(1 – 1 / N) ** S

Finally, we can work out the expected number of ancestors that match on the X chromosome by multiplying the number of Ancestors by the probability of matching (which is 1 – the probability of not matching). For higher generations, this number is the same as the number of segments, because it is very unlikely that such a distant ancestor will contribute more than one segment each.

N * P(NoMat)

What this table says is that after 13 generations of X chromosomes, you will have on average 20.5 segments. 95.93% of the 493.5 possible X ancestors will not contribute meaning the 20.5 segments come from 20.1 ancestors, so there is still a chance one or two of them may contribute more than one segment.

Comparing the probabilities of not matching with autosomal is interesting:

image

With autosomal, it takes 9 generations before there’s less than a 50% chance that an ancestor won’t pass you a segment. For the X chromosome, it only takes 6 generations for less than a 50% chance. And there’s even a small chance that you won’t inherit an X-segment after 1 generation. This could happen if the X chromosome from the mother’s side has no crossovers and comes just one of her parents. See the section: The X Doesn’t Recombine as Expected.

Back to statistics: The Poisson distribution can approximate the number of crossovers per generation. Assuming we are talking about the mother’s X chromosome which has an average of 2 crossovers, a Poisson distribution wiith mean of 2 can give a reasonable estimate of the expected chance of each number of crossovers in one generation on the X chromosome:

image

One thing left to do. Like we did for autosomal in my last blog post, we also want to determine the average segment length of a match. So we get this:

image

Comparing average segment length of an autosomal match with that of an X chromosome match (above) gives:

image

This shows that autosomal matching segments at any generation are on average a bit longer than X chromosome matching segments.

So now I have everything I need to program this into Behold. Behold will be working with the actual ancestors and know whether it’s a male or female and will take this into account. This will enable to Behold will give more accurate information than what I’ve shown above which are just averages. Also, Behold will correctly add the probabilities and compute the expected lengths when there’s pedigree collapse and one ancestor is an ancestor on multiple sides. This should be really useful information that I don’t believe is available anywhere else.

My calculations and assumptions above and in my previous post are as far as I can tell, correct for the averages. I would love to get these two posts peer-reviewed by some genetic genealogists and/or genetic researchers. With encouragement, I could turn these posts into a submission for a publication like the Journal of Genetic Genealogy. I’d be happy to have any problems pointed out and will make any clarifications or corrections that are necessary.

Probability of No Autosomal Segments Matching - Mon, 19 Dec 2016

Back to Behold, but still DNA.

I am adding some DNA features to Behold that I know I need and are not in any genealogy programs currently out there.

Basically, I want to know the expected (i.e. mean) amount of autosomal, X, Y and mt DNA that each person will share with main person (or people) selected for the family. This is a centimorgan (cM) amount. It is straightforward to figure out, since the expected autosomal amount gets halved every generation, Y and mt only get passed through the male and female lines respectively, and X, although slightly more complicated, is manageable with females getting all their father’s and half their mother’s and males getting half their mother’s.

In addition to that, I want to know the probability of no segments matching. This is important, because if you have a 5th cousin, and you know that there’s, say, a 50% chance that they will not match at all, then you should only expect that half of the 5th cousins that DNA tested will match you somewhere. And fewer than half of them will show up as matches with your DNA testing company because the companies have a minimum match criteria before they claim two people match, and they need to do that to prevent too many false positive random matches.

I took a look to see if I could find the theoretical probabilities that I needed. I found at the ISOGG page on Cousin Statistics two tables:

I found it very interesting that these two tables give the same information but with slightly different numbers. For instance 4th cousins are 9 generations (DNA-wise) apart sharing on average (1/2)^9 = 1/512 of their DNA. And a person with their 7xgreat grandparent also shares 1/512 of their DNA. But the 1st table gives 30.70% for 4th cousins, and the second gives 37.43% for 7xgreat grandparents. I would have thought these two numbers should be the same, and I can’t check the original article these were derived from because I’m not a PubMed author and don’t know any PubMed author’s who can invite me.

None the less, the numbers in these tables are reasonably close to each other. So now I just need a method to calculate them for any degree of generational distance. I love when I get to do something statistical which was part of my education and my work. Not too often have I had to use my statistics education for genealogy, so here’s my chance.

Let’s go to Jim Bartlett’s blog post: Crossovers by Generation. Take some time to read it and learn something like I did. I’m going to reproduce Jim’s Table 3:

05D Figure 3

The important columns are the one’s marked “Segments” and “Number of Ancestors”. Because there are on average 34 crossovers per generation, the number of segments grows linearly, 34 per generation. But the number of ancestors is growing exponentially, doubling every generation. After 9 generations, there are more ancestors than segments. By generation 13, there are 8192 ancestors, but only 465 segments. That means at most only 465 out of those 8192 ancestors will match, and that’s if none of them match on more than one segment. That already tells you that at least (8192 – 465) / 8192 = 94.32% of your 13th generational relatives will not match you.

Now let’s use some statistics. The statistical probability of no segments matching given that there are N ancestors and S segments is: 

(1 – 1 / N) ** S

What that says is that for generation 13, there is a 8191/8192 chance of a person not matching in one segment, and the non-match has to be in all 465 segments. Calculate this out and it comes to 94.48%.

Let’s do that for a bunch of generational levels and compare that to Table 1 and Table 2:

image

Hmmm. Not too bad. In fact the Statistical calculation comes very close to the Table 1 numbers. So close, that when I plot the three sets of values, you see a  small difference only with the Table 2 numbers but the other two are right on top of each other.

image

Excellent. So now I have validated that these numbers are close enough and that I can therefore use them.

One last thing left to do. The mean amount of autosomal DNA passed down is always halved each generation. On average, that means with a 13 generational difference, the expected DNA shared is 1 / 8192 = 0.01% which would work out to just 1 cM. That’s an awfully small match to be detected.

But that average includes all the ancestors who don’t match at all. We know that 94.48% or 7740 of the 8192 do not match. Better is to show the expected DNA matching when the two people do match. This would then be just 1 / 450 = 0.22% which would work out to an expected average match of 15 cM for the 450 people 13 generations apart that do match.

Let’s try this for the whole range of generations::

image

So now I have what I need. Behold is going to show:

  1. The probability of having a DNA match (e.g. 5.52%)
  2. The average match length if they do match (e.g. 15.0 cM)

Let me of course add 100 caveats. These are approximate values. The actual percentages may vary. Matching cM may vary greatly, etc., etc., blah, blah.

Running DMT Against Non-Matches - Tue, 13 Dec 2016

I happened to come across a post by Robert Davis on the Ulster Co NY Y-DNA group at FamilyTree DNA.

Robert said:

Double Match Triangulator …would be of use in finding links (common matches) between two individuals that are not themselves matches. and hence the ICW tool of FF is of no use.

That was an excellent observation. And don’t talk much about non-matching people in my writeup on my DMT page or in its help file (although I do include one non-matching person in the sample files.). Using DMT on non-matching people that are possible relatives is something that you will want to do.

Why is this so? Well, it’s simply a matter of probabilities. Once you get down into 4th, 5th and 6th cousins, there is a good chance that your cousin will not reach the threshold where they will make it into you match list. Either their longest match in common does not meet FamilyTreeDNA’s threshold, or the total cM length of the common matches does not meet the threshold.

However, you may find that this person’s sibling or parent does match. Therefore you know they are related, but FamilyTreeDNA gives you no tools to check that.

So DMT to the rescue.

When you compare your Chromosome Browser Results file to someone who does not match you, you will not get any Full Triangulations. You will only get Double Matches with a Missing a-b segment. That’s okay. Go ahead and analyze those. They won’t be the same segment passed down from a common ancestor, but they could very well be two different segments from a common ancestral line. See Triangulation and Missing a-b Segments.

In fact, you may have hundreds or thousands of people who match you. Every one of those is a candidate to be Person b in your DMT runs, and you should see if they are willing to download and let you use their CBR file. But their siblings, parents, and cousins on the related side are also candidates as Person b. If you can, ask for any of the CBR files that they administer. Of course, tell them that you’ll keep their information private and not give it or disclose it to anyone, and tell them that you’ll let them know what you find.

I have received 63 Chromosome Browser Download files from possible DNA-relatives of my uncle. 37 of them show up in my uncle’s match list. Of the other 26, all but 3 have significant Double Matches with my uncle.

image

Take a look at the above table, which is my People file produced when I use DMT to run all 63 people against my uncle. Column A has my uncle. The yellow names at the top are some of the 26 people who don’t match my uncle who I have CBR file for. The people in Column B are the 37 people who my uncle matches to.

The values shown do not have any Triangulations. Those would show up in green and the numbers would be preceded by “T” instead of “D”.

But you can see some very significant Double Matches, such the 24.95 cM matches at the top left between Harry and Erika and Harry and Steve and Harry and Mark. You’ll also see some very useful X-Chromosome matches that are shown in red. When the numbers are the same, it is very likely they’re referring to the same segment, but you’ll have to check the Map page to be sure.

Notice Andrew’s column. He is one of the 3 that don’t Double Match anyone and can be presumed to be a non-relative. Negative information like that can also be valuable.

So I wanted to point this out and write my thoughts down before I forget. Using DMT for non-matches is yet another way that DMT can prove to be useful.