Login to participate
  
Register   Lost ID/password?

Louis Kessler's Behold Blog

Revisiting Missing A-B Matches - 1 day, 23 hrs ago

I’ve been working away the past couple of months to finish off Version 2 of Double Match Triangulator. There’s lots of major improvements, and I’m hoping to find ways to get it to do some of the analysis for you as well.

What Double Match Triangulator does differently than other DNA matching tools, is that it uses not just the segment matches that one person has, but uses the segment matches of two people together. Let’s call yourself Person A, and the one you’re matching with Person B. Double Match Triangulator compares the the segment matches Person A has with other people to the segment matches Person B has with other people. Any that overlap are called a Double Match.

For example, Person A matches Person C on Chromosome 1 from position 72017 to position 7238701. Person B matches Person C on Chromosome 1 from position 3528942 to position 12148559. They Double Match between positions 3528942 and 7238701.

The magical thing about this Double Match is that now we just can look in  Person A’s matches to see what segment matches he has with Person B (or we can look in Person B’s matches since both will show the same matches with each other). If Person A matches Person B over some part of the Double Match segment with Person C, we have a triangulation: Person A matches Person C, Person B matches Person C, and Person A matches Person B – all over the same segment.

If Person A and Person B don’t match, we have what I call a Missing A-B match. We still have the Double Match of Person A with Person C and Person B with Person C.

You would think a Double Match that does not triangulate should just be considered a random match. I can match someone, and you can match the same person but we don’t have to be related. I can be a cousins of Person C on their dad’s side and you can be a cousin on their mom’s side.

But when using Double Match Triangulator, we are choosing the B people to be people who we, Person A match to. With Person B being a DNA relative, the Double Matching of Double Match Triangulator is more powerful than the matches you see lining up on the same segment in a Chromosome Browser. In the Chromosome Browser, you have no indication of whether the people lining up with each other are related to each other or not. But in Double Match triangulation, you are picking the B people that are related to you. It is like using a Chromosome Browser where you are just including people that are guaranteed to be related to each other. That’s one of the things that makes Double Matching so useful. Of course the other thing is that Double Matching will find in one fell swoop EVERY TRIANGULATION that two people have between them.

The important thing about a triangulation is that all IBD (Identical By Descent) segments that are passed down from a common ancestor must triangulate. By looking only at triangulating segments, you are eliminating many segments that are false (guaranteed not to be IBD) and increasing your chance of finding IBD segments.

If we didn’t match to Person B, then none of the Double Matches would triangulate and en we are no better than using a Chromosome Browser and verifying the Person B-C matches one by one.

Back to the Missing A-B match

Unless you Double Match, you won’t find any Missing A-B matches. When looking in a Chromosome Browser as Person A, you’ll find Person B and Person C that overlap. You’ll then find out from either Person B or C if they match the other person on that segment and if they do, they triangulate. If they don’t, then you have a Missing B-C, but that is not a missing with you, Person A.

By Double Matching, all the B-C matches are verified in advance. If Person A also matches Person B, then you have a triangulation. If not, you have a Missing A-B match.

Missing A-B matches therefore are Person A matching Person C and Person B matching Person C but Person A not matching Person B on the same segment but with the important caveat that you also know that Person A and Person B are related, and they match and triangulate on other segments, but not this one.

Missing A-B matches are not triangulations. Therefore they cannot be IBD. So what possible use then can they have?

It took me a while to figure this out. In fact I was going to take the display of Missing A-B matches out of Version 2 of DMT and I even took all the code out of the program that displayed them. But after doing some more work, I realized they were important, and put that all back in.

What got me thinking was the number of Missing A-B matches there were. There were often as many if not more Missing A-B matches than there were triangulations.

When I last puzzled over this, about six months ago in my Triangulation and Missing a-b Segments article, I came to the conclusion that Missing A-B matches could point to a common ancestor. But the way I had figured it, I thought that it required the parents of Person C to be both descendant from the same common ancestor or ancestor pair in these two possible configurations:

But I never really thought that either of these two scenarios would be so plentiful to give so many missing A-B matches. Maybe in a endogamous population. But in my test runs even non-endogamous populations are loaded with Missing A-B matches. So what gives?

I finally figured out another case. Here are some results from a run of DMT using close relatives:

image

Here we have a daughter as Person A, double matched with her Uncle as Person B, who is her mother’s brother. They match each other between 72017 mB and 7238701 mB shown on the Base AB line.

On the next line, the daughter of course will triangulate with her mother and her uncle because the daughter half-matches her mother everywhere, and her mother matches her uncle because it is the mother who passed down that segment to the daughter. Therefore they triangulate over that segment. That segment came from the common ancestor who would be the grandparent who passed that segment down to the uncle, the mother and the daughter.

On the third line, we pick the brother of the daughter as Person B. The daughter we already know matches the uncle on that segment. And her brother does as well. They Double Match. But the brother does not match his sister on that segment. What happened in this case is that mother passed down the other grandparent’s segment to the son. Mother still matches the son, but the son and daughter have different grandparent segments from their mother. The brother matches his uncle on the other grandparent’s segment. This is a Missing A-B match. The diagram looks like this:

image

This can happen at further relationships as well. For example at first cousins, we can have Cousin 1 as Person B who triangulates with the daughter and the uncle (Person C), and Cousin 2 who as Person B still double matches but doesn’t triangulate because he has a missing A-B match, as shown in the 4th and 5th lines of the spreadsheet above.

The diagram illustrating this cousin situation is:

image

Person A, the daughter matches her uncle on H1 and her Cousin 1 on H1. Her brother and Cousin 2 both match her Uncle on S1. The daughter matches Cousin 1 on H1 but does not match her brother or Cousin 2.

So we have a situation here where a Missing A-B match occurs that is not caused by having parents related and descendant from a common ancestor as I had surmised in my previous article. This new situation is likely much more common and is probably the reason why there are so man Missing A-B matches.

In fact, it can happen with further relationships than 1st cousins. Here’s an example with a 2nd cousin and 3rd cousin as Person B resulting in a Missing A-B match with the daughter as Person A and the uncle as Person C.

image

And just so you don’t think that Person B or Person C necessarily have to be a close relative of Person A, they can be close relatives of each other instead. For example below, the Son is Person A, the 3rd Cousin is Person B, and the Uncle of the 3rd Cousin is Person C:

image

’This does require that whoever is Person C, receives both the common segments, one from each parent, so that it has one that matches both Person A and Person B .

The bottom line is that a Missing A-B match is indicating that Person C who matches both Person A and Person B could have a common ancestor with Person A and Person B.  The common ancestor would have passed a segment down to Person C and also to either Person A or Person B.

What’s important to understand out of all this is that, although they are not IBD to Person A, Missing A-B matches may still be indicate the possibility of a common ancestor with Person C and they should not be ignored.

Chess and Artificial Intelligence: The Future Changed Today - Wed, 6 Dec 2017

The shocking and unbelievable news hit my inbox today, and I just couldn’t be more amazed by what has just happened and what the future effects will be.

Today it was announced that DeepMind, a company formed in 2010 and purchased by Google in 2014 to investigate the possibilities of using neural networks for machine learning, has created a program that given just the rules of a game, can play itself and learn and reach champion levels.

Everyone remembers Kasparov being defeated by the chess program Deep Blue in 1997. Go is a tougher game for computers. In March 2016, DeepMind created a program called AlphaGo, that defeated the world champion Lee Sedol.

But in just a year and a half since then, in October 2017, the algorithm was able to achieved superhuman performance in the game of Go using only 8 hours of training:

The next month, the algorithm achieved superhuman performance in the complex Japanese board game of shogi with 2 hours of training, and then it trounced Stockfish, one of the top computer chess programs in the world after just 4 hours of training.

AlphaZero played 100 games of chess with Stockfish and as white, scored 25 wins, 25 draws and no losses. As black, it scored 3 wins, 47 draws and no losses. In so doing, AlphaZero played classic human openings with no opening knowledge programmed into it. It had no endgame database. It had no heuristics. It learned everything itself.

Long, long ago, when I was a student at the University of Manitoba, I had a hobby I had dabbled in: programming a computer to play chess. I had reached a point where my program, Brute Force, was then one of the best in the world. I went to Seattle, Washington in 1977 for the 8th North American Computer Chess Championship, and followed that up in 1978 in Washington, D.C. for the 9th NACCC. (If you’re interested, see my writeup on my chess program, Brute Force).  

The program was called Brute Force because I concentrated on doing the minimum possible to evaluate positions, and simply let the program iterate as many moves as possible to determine the best move. I had the full use of the University of Manitoba’s IBM 370/168 mainframe, which likely was as powerful then as your smartphone is today. Smartphones today can play better chess than the big computers did back then in the Computer Chess Championships of the ‘70s.

Soon after, my programming interests switched to genealogy, but I remained interested in and followed the advances in computer chess and artificial intelligence in general. The best computer chess engines today, Houdini, Komodo and Stockfish, play at an ELO rating of 3400. The best chess player in the world today is Magnus Carlsen of Norway rated at about 2840. The chance of Magnus defeating one of these programs today is the same as a person rated 2280, say the 6000’th best chess player in the world, beating Magnus.

Stockfish and these other programs have been worked on continuously for a long time. They encompass 20 years of improvements in computer speed, better chess-specific algorithms, and inclusion and refinement of chess ideas since Deep Blue of 1997.

Yet, in only four hours of training, this program AlphaZero, defeated Stockfish handily.

But that’s not what is most amazing.

What is most amazing is that AlphaZero trained using a general game-playing learning algorithm that had zero chess-specific knowledge, other than the rules of how the pieces could move and what was a win, loss and draw.

See this Chess24 article by Colin McGourty for the chess viewpoint on this, along and a link to the paper written yesterday (Dec 5) on this breakthrough:

 

What Has Just Happened?

DeepMind has developed a general method where a machine can learn how to gain world-level expertise in any task once the rules are set out.

This will change everything. Think more than just board games. Think bigger.

Not long ago, I saw this tweet with a video of a robot:

Imagine now what would happen if we could program the “rules” of walking, and gravity and motion into it, and throw it through this learning algorithm. You would likely end up with a robot that could run, jump, swim, whatever at hundreds of miles per hour without disturbing anything in its path. (Notice the text of the tweet: “we dead”)

Let’s go further: Computer voice recognition. It’s terrible right now. It needs training and still makes lots of mistakes. This algorithm will make mincemeat of that problem.

Language translation: So easy now.

Computer vision, hearing, tasting, smelling. Now doable.

Artificial intelligence … Now it’s very scary. We had better do this right.

That’s enough. Time to breathe again and go back to genealogy programming.

A Year of “Retirement” - Fri, 24 Nov 2017

One year ago today, I celebrated my 60th birthday and on the same day I retired from my job of 41 years. Since then, it’s been quite a year. I wonder how I ever had time for a day job.

I recently got back from my 4th genealogy conference of the year. In February, I was at RootsTech in Salt Lake City, July at IAJGS at Disney World Florida, October at the Great Canadian Genealogical Summit in Halifax and I just returned from FTDNA’s Genetic Genealogy Conference in Houston.

Add into that two vacations, one a Western Caribbean cruise with my wife following RootsTech in February, and the second was a two-week family trip to Disney World Florida in June with my wife and older daughter while my younger daughter was working at Disney World for the summer.

The year started off with a “bang” after I had torn my peroneal tendon away from my right ankle while playing squash. I had the operation to reattach it on December 30. The next 5 weeks were on crutches and the 5 weeks following were in a walking boot, which I wore to RootsTech and on my cruise.

That marked the end of my 30 years of playing squash. I quit football at 30, running at 40, tennis at 50, and squash at 60. If I could last at biking until 70, swimming until 80 and walking until 90, I’ll have done pretty well.

I’ve been wearing the FitBit my work-people bought me for my retirement for a full year now. I average 6 hours of sleep a night, not counting about 1 hour awake a night, which I was surprised to learn is less than normal – according to FitBit, the average man my age is awake 15-31% of the night. I also learned that 10,000 steps a day is a really good goal and is not always easy to achieve. At 1,000 steps in 10 minutes, that’s 100 minutes. I learned that a day at a genealogy conference is in excess of 10,000 steps, and a day at Disney World is more than 20,000 steps. I have a few “FitBit Friends”, but it’s Carole Steers from England who I met at RootsTech who blows me away in steps every day. I don’t know how she does it.

On the programming front, I came out with a minor version of Behold in January. I’ve made progress towards what will be the last release before I add editing, and I’m close but haven’t yet had the time to finish it off.

One of the reasons for Behold’s slowdown is Double Match Triangulator. My highlight of the year was having the program win 3rd place in the RootsTech Innovator Showdown. In March I created a website for DMT and started selling it. I’ve released 5 updates to DMT in the past year and I’m currently working on a major update to be released very soon.

DNA analysis is so interesting and there is so much you can do with the data that nobody’s even thought about yet. I’ve done lots of analysis, written many blogs posts, given talks, and tried everything to confirm my genealogical relationship with even some of my DNA relatives (still unsuccessful, but it will eventually happen). Also enjoyed my first Twitter #genchatDNA a week ago. One of my tweets summarizes everything: 

I’ve also got a bit more active on some of the genealogy DNA Facebook groups, which have become very popular with tens of thousands of participants including many of the experts.

And let me thank several dozens of my Facebook friends who wished me a happy birthday on my Facebook timeline today, and also my Australian friends who who used their time-zone advantage to start the party early.

Quite a year. Still lots more to look forward to in the years ahead.