Login to participate
  
Register   Lost ID/password?

Louis Kessler’s Behold Blog

Can Visual Phasing be Programmed? - Fri, 20 May 2022

Visual Phasing is a technique to assign DNA segments of 3 siblings to their four grandparents. I don’t happen to have 3 siblings who all tested their DNA, so up to now, I’ve never had a personal use for the technique.

A few months ago for a DNA study we are doing, my wife’s 3rd cousin Terry was kind enough to provide me with GEDmatch diagrams of himself and his two siblings on each chromosome. This allowed me to visually phase Terry’s grandparents to his and his siblings’ chromosomes.


Doing Visual Phasing on Terry’s with his Siblings

I will not try to teach you everything about Visual Phasing. It is quite involved and there are many good explanations of it. (e.g. Blaine Bettinger 2016). But I will point out anything that is relevant to the problem at hand.

This is what the GEDmatch diagrams look like for Chromosome 1:
(Click image for larger version)

image

So there are 3 comparisons. One for of each pair of siblings.

The yellow areas are matches between the two siblings known as HIR (Half Identical Regions). That’s where they match on one parent but not the other.

The green areas are matches between the two siblings known as FIR (Fully Identical Regions). That’s where they match on both parents.

The red areas are where they don’t match on either parent.

This particular set of comparisons, unlike some Visual Phasing cases, is very simple to solve. This seems to have very clear region boundaries.

The trick is to find recombination points. These should be where two of the three comparisons change their match status, i.e. color. I’ve added to the diagram vertical lines to show the recombination points.

Under the line is Mbp (Mega byte pair) position of the recombination along with the first letter of the “owner” of the recombination, i.e. the sibling who recombined at that point on either his/her father or mother’s chromosome. The owner is the person listed in both of the pairs having match status changes.

Using Visual Phasing rules, I can now assign two grandparents: f1 and m1 to one of Terry’s segments in the middle of the chromosome and extend it to Terry’s recombination points. Then I can use the pair matches and logic to see what the segments for Terry’s siblings need to be:  f1, f2, m1 or m2. I extend those and repeat. In this case, I am lucky and I can completely fill out both chromosomes for all 3 siblings which isn’t always the case. This is what I got:

image


Filling Out the Parent Maps

Next was to determine which grandparent the f1, f2, m1 and m2 represents.

For this we need 2nd cousins who will have a set of great grandparents as their common ancestor with the siblings. A second cousin is connected through just one grandparent, so their matches should be able to be used to determine who each grandparent is.

Terry has five 2nd cousins tested on his father’s father’s side (ff) and two 2nd cousins tested on his mother’s father’s side (mf). He doesn’t have any tested on either his father’s mother’s side or his mother’s mother’s side, but that doesn’t matter.

These are the matches each of the siblings have with these 2nd cousins that are 15 cM or more:

image

In total, the 3 siblings match on 10 segments with their 2nd cousins.

The matches with the cousins that are on the sibling’s father’s father’s side are all denoted as f1 in the earlier map, so f1 = father’s father and f2 = father’s mother.

The matches with the cousins that are on the sibling’s mother’s father’s side are all denoted as m1 in the earlier map, so m1 = mother’s father and m2 = mother’s mother.

So we can now go back and color the chromosomes:

  • ff (father’s father) in blue,
  • fm (father’s mother) in green
  • mf (mother’s father) in pink
  • mm (mother’s mother) in yellow

image

Ta Da!! We’ve done it.

Is this accurate? Well I would say it would have to be. It uses recombination logic that is checked against 2nd cousin segments and is consistent with itself. The GEDmatch diagrams, which show if each SNP position matches zero, one or both of the other sibling’s alleles, seem to be quite definitive.


Boundaries Are Not Exact

GEDmatch provides addresses of where these segments start and end:

image

If you compare the end positions that should coincide with the start positions, you’ll get:

image

And they’re not the same between companies either. The HIR matches shown above at Family Tree DNA are:

image

That’s as much as a 10.9 cM difference between what GEDmatch and Family Tree DNA’s matching algorithms produce as matches.

My example is a relatively well-behaved example. Sometimes the HIR and FIR boundaries that GEDmatch produces don’t all visually correspond to the yellow, green and red regions it displays. That’s because GEDmatch’s matching algorithm sometimes adds a new boundary where you don’t see one, or excludes a boundary where you think one should be.

For example, GEDmatch does not list the FIR between 85 and 89 that is clearly a green section in the diagram. Leaving out the FIR would prevent proper analysis of this chromosome. The HIRs starting and ending at 85 and 89 would then have to be considered to be a single recombination and the ensuing analysis would be done wrong.


So Can Visual Phasing Be Programmed?

I have really wanted to find a way to do this via a program. There is a lot of manual work to get the final grandparent map. And it’s not a foolproof procedure.

The Visual Phasing Working Group on Facebook hosts in its Files section, the Visual Phasing Spreadsheet by Steven Fox. Just a few months ago, Steven uploaded Version 2.6 of his spreadsheet along with an updated user guide.

image

Basically, the spreadsheet allows you to do what I did above, and gives you assistance along the way. But you still have to visually select the boundaries between the yellow, green and red regions and assign the grandparents,

Steven himself in his user guide states:

Things I would love to be able to do but can’t think of a way to do them…… yet.
    • Obtain the location and identify the owner of the 
         recombination points automatically.
    • Find a programmatic way to phase automatically!

If the GEDmatch segment boundary points were exact and always corresponded to the visual color boundaries, then maybe they could be used. But unfortunately they are not and don’t.

Determining the boundaries visually from the yellow, green and red regions is a perfect problem for a human, who sees patterns easily and can round off boundaries. Computers don’t do that nearly as well as humans do.

And then there’s the resolving of unclear circumstances, e.g. where sections are partly yellow and green, or apparent recombination points that only seem to change one pair rather than the two. And what always messes up any Visual Phasing is when two of the siblings have a recombination point very near each other. What if the parents of the siblings are related? The list goes on.

I have thought about possible ideas for programming Visual Phasing, but as of yet, I still have not come up with a decent way to program the whole thing. Yes of course it is possible, but it’s not worth anything unless it could get at least as good results as a person can.

Visual Phasing has been a technique that’s been around since the mid 2010s. If there would have been an easy way to program this accurately, then one of the many very smart 3rd party DNA tool developers would have done so by now.

Double Match Triangulator, Version 5.0 - Wed, 18 May 2022

I’ve released version 5.0 of  Double Match Triangulator (DMT), a tool to help you analyze your autosomal segment matches and help you determine how you might be related to your DNA matches.


Six Months of Analysis

Over the winter and spring, I was working hard on on a project with Terry, my wife’s 3rd cousin, to see if we could confirm via DNA that their great-great-grandparents and 4 others were in fact siblings. Terry had got 44 people who were descendants of the 6 potential siblings to DNA test at Family Tree DNA. He was able to supply me with Segment Match Files for 26 people.

My goal was to see if I could use DMT to provide us with useful segment information that could confirm/deny the sibling relationships. I was hoping that maybe I could use DMT to help build a chromosome map of each tester’s ancestors up to the potential sibling, and then see if that tells us anything.

I was not able to get quite that far, and I’ll hopefully blog more about this once we get some reportable results from this study.

But what did happen is that I made a number of useful changes to Double Match Triangulator that helped me, and now I’m making these changes available in my release of Version 5.0 of DMT.


Triangulations on One Parent

Up until now, DMT was based on the idea that a segment match with a relative generally matches only on one parent.

For example, a 2nd cousin with whom your common ancestor is your father’s mother’s parents should have all the matches on your father’s chromosome.

A 1st cousin with whom your common ancestor is your maternal grandparents should have all their matches on your mother’s chromosome.

Your nephew or niece or child will have each segment matching either on your father’s side or on your mother’s side.

The major exception to the rule are siblings. They will have on average 1/3 of their matches on your father’s side, 1/3 on your mother’s side, and 1/3 on both.

There are other exceptions (false matches, being related more than one way, endogamy, etc.) but these are less common and the thinking was that segment matches would be more often correct than incorrect. And using the best consensus should provide the correct answer more often.

What DMT does is determine all the triangulations that Persons A and B have with other people. For those triangulations where the MRCA (Most Recent Common Ancestor) of Person A with Person C was known, the ancestral path was “painted” to the triangulated segment.  Once that was done, the consensus ancestral path was determined for triangulations at each Mbp (Mega base pair).

Of note here is that there was only one ancestral path for all triangulations at any Mbp. All other triangulations and non-triangulations for people without MRCAs were then assigned the same ancestral path. Every person could only match either their father or their mother on one segment, but not both.

Siblings were assigned the consensus ancestral path over their matching segment. It would not include both parents when it double matched, but it wouldn’t be wrong either.

This has always seemed to work fairly well.


Triangulations on Two Parents

For our study, we had a number of siblings involved, including Terry and his brother and sister. I did want to be able to make more use of the sibling data with DMT. So I experimented and came out with a new idea.

Instead of one consensus ancestral path from the triangulations over a segment, how about determining two:  One for any triangulations on the father’s side, and one for any triangulations on the mother’s side.

If all triangulations are on just one parent’s side, then we’d have exactly what we had before. But if both sides had some, then we’d have new information to use.

So now we have four possibilities:

  1. Triangulations only on father’s side.
    Other triangulations can be mapped to the father and the ancestral path.
    Other non-triangulation can be mapped to the mother.
  2. Triangulations only on mother’s side.
    Other triangulations can be mapped to the mother and the ancestral path.
    Other non-triangulation can be mapped to the father.
  3. Triangulations on both father’s side and mother’s side.
    Other triangulations and non-triangulations have unknown side.
  4. No triangulations.

The nice thing here is that cases 1 and 2 happen a lot more often than case 3. So you can quite often map all the other segment matches whose parent cannot be determined on their own to the correct parent depending on whether that segment triangulates or not.

And in case 4, you will occasionally have a single match on its own with someone whose MRCA tells you what parent it’s on. One single match, or even two on opposite parents is fine. But there shouldn’t be more than one on the same parent because if there were, then they should be triangulating. That’s how the logic of all this works out.

The long and the short of this all is that I made this major change to DMT. I have done extensive testing on it. Overall it makes better use of sibling information and seems to do a slightly better job of assigning ancestral paths than it did before.

https://www.doublematchtriangulator.com/img/dmt-main-window.png


Other Changes

There are a lot of other changes I snuck into DMT while doing this. Check the DMT history page for a full list of all changes.

If you have an older version of DMT on your computer, you can run it and click the “Check online for new version” link and follow the instructions for a quick and easy upgrade. If you don’t have DMT, you can download and try it at:  www.doublematchtriangulator.com

By the way, Jonny Perl, developer of DNA Painter will be the keynote speaker on Friday Aug 19 for the SCGS Jamboree 2022. Jonny will be talking about “The Need for Third Party DNA Tools” and will include DMT in his talk.

I Don’t Understand Why … - Sat, 7 May 2022

A few questions that plague me.


…Why Some Genealogists Keep Their Main Tree Private

Isn’t the purpose of genealogy to preserve and share the history and stories of your ancestors and their families? So what then is the purpose in a genealogist designating their main tree as private and preventing others from seeing it?

I’m not talking about privatizing living individuals, or family relationships that the family has not yet come to grips with. I’m not talking about private research trees that contain speculative information. I’m talking about a genealogist’s main tree where they’ve recorded the majority of their research.

Are they afraid someone will copy their precious life’s work?


… Why Some Genealogists Hate Others Copying Their Work

Yes, you’ll see profiles on other websites that are obviously taken from your own. But if you’re an experienced genealogist who has thoroughly researched and sourced your information, isn’t that what you want? You’re sure your information is as close to correct as possible. If that’s the truth and other people extend their trees into yours, wouldn’t you sooner them add your correct information to your tree than someone else’s incorrect information?

I would sooner get “hints” from MyHeritage or Ancestry that are from copied versions of my thought-to-be correct information than from copied versions of other people’s incorrect information.

You’re not going to stop people from copying information from other trees. So why not make your information available and let it become the standard that is copied, rather than someone else’s.


… Why Some Genealogists Protect Their Sources

Here, I’m talking about the case where someone says “I’ve spent thousands of dollars on research and thousands of hours of my time to do my research” and it seems that they don’t want to share that hard work with others.

Basically that is to force those people to do the research for themselves. All that’s doing is getting others to spend their time needlessly repeating what you’ve done when they could be furthering the work on that ancestor’s family instead. That further work might actually help the person who was not sharing!

There may be some circumstances where the source document itself needs to be protected. For example, I have a researcher who has paid to acquire records himself and charged me for the specific records that are relevant to me. I am not sure but currently don’t believe I have the right to share the source documents. However I still can refer to the birth record of so-and-so on yyyy-mm-dd in ppppppp as a source that I used. See the question I just asked at Genealogy StackExchange:  Do I have the right to share these records?


… Why Some Genealogists Hate One World Trees

Okay. The main answer to this one is easy. It’s because people come along and change your data. I won’t go into the details because you know what I’m talking about.

That is why it is good for us to have a master version of our family tree that only we can edit, either on our computer, and/or on an online family tree site that we allow read-only access to.

Adding our family trees to One World Trees are another way to preserve our genealogies. We can add people who are not there, attempt to correct information that is there with sources to explain our corrections. And we can find potential family that we did not know about, and follow up on them.

I’d be very happy and it would make sense if there were just a single unique One World Tree. But there are half a dozen of them, and they are all different with different information. Updating one when you get new data is not so bad, but updating 6 is a bit of a pain.

Yes, your data may get changed, but the potential benefits to you outweigh that one negative. Expect it. It will happen.

So I say don’t get hung up on it. Some aspects of One World Trees are self-correcting over time. If you’ve been sharing your well documented and sourced information for a while, it will eventually start to have its influence.