Login to participate
  
Register   Lost ID/password?

Louis Kessler's Behold Blog

Another Estimate of Speed and Balding Figure 2B - 5 days, 20 hrs ago

Ten days ago, I produced an article: Revisiting Speed and Balding, where I tried to duplicate the results of their Figure 2B. I posted a link to the article on the ISOGG Facebook group, and received a lot of comments, mostly from Andrew Millard and Debbie Kennett. Debbie also provided quite a few comments directly on my post and contacted one of the authors, Doug Speed, who then also commented on my post. He indicated he’s confident his simulation results were accurately tabulated and suggested “the differences come from us asking slightly different questions”. I’m trying to answer the question:

For a match at a specific segment length, what is the probability that the segment comes from a particular generation?

That’s what I thought Speed and Balding were answering as well, so I’m unclear as to what the difference might be.

I thought one possible difference might be that I’m taking this from the perspective of the matches in your match list at Family Tree DNA. Debbie Kennett rightly pointed out that the inclusion of only DNA matches would only affect very small segments under 9 cM, since at least one segment of 9 cM or more (or 7.69 cM plus a minimum 20 cM total) is required before the person is considered a match. So that is not a question difference that would have affected the 10 to 40 Mb range where my statistical numbers significantly differ from their simulation numbers.

David Millard rightly pointed out that I was one generation off in my Expected number of cousins, but that wouldn’t change my results much. He also didn’t think that my figure titled: “Addition of Inverse IBD Region Length Distributions” was close to Speed and Balding’s Figure 2B, but that fact of the matter was that no matter what reasonable methodologies I could think of trying, that was the closest I could get to Speed and Balding’s result.

So I do not agree with Speed and Balding’s figure 2B. It would be nice to see if anyone else has done some similar calculations and compare.

In one of Debbie Kennett’s comments during our discussions, Debbie provided a link to an article that gave some data that looked like it could be used to do a third estimation of what Speed and Balding’s Figure 2B might be. The article is by Bob Jenkins and is titled: How many genetic ancestors do you have?

Genetic ancestors don’t help us that much, but Jenkins goes on to then estimate the number of cousins by generation by segment length. He gives one table for females and one for males. They are fairly similar but the male table has a few inconsistencies that the female doesn’t, so I’ll just use the female table. Bob Jenkin’s table looks like this:

Bob Jenkins table

And it goes all the way to 100 generations. Let’s interpret this. Pick 4th cousins.

That line says it’s generation 9, but this is counting every step up and down. Translating that to Speed and Balding’s value of G would make it G=5.

Then we see “6:5”. That means 5 cousins would have 1 / (2**6) of the DNA of the ancestor, which is 0.015625, which multiplied by 6800 total cM  gives 106 cM, or multiplied by 5334 total Mb gives 83 Mb.

Then we see “7:45”. That means 45 cousins would have 1 / (2**7) of the DNA of the ancestor which is 53 cM or 42 Mb.

Etc.

So we now put this all into a spreadsheet:

Number of detectable cousins (Jenkins)

and we divide by the column total to get the likelihoods:

Fraction of detectable cousins (Jenkins)

Plotting this in the Speed and Balding manner gives:

Fraction of detectable cousins (Jenkins)

Bob Jenkins does not give the same region lengths as Speed Balding. Jenkins uses region lengths that double, so we have to be careful in our comparison. Let’s compare Jenkin’s 3 Mb with SB’s 2-4, Jenkin’s 5 Mb with SB’s 5-9, 10 with 10-19, 21 with 20-29 and 42 with 40-49.

Here is Jenkins above lined up for comparison:

Jenkins lined up

Below are my final calculations from my article 10 days ago that were calculated using Speed Balding’s values for the probability of region length and the ISOGG table for number of cousins which I then extended until it reached the world population:

image

Note that Jenkins has no visible G>20 (very light blue) at 10 Mb and 21 Mb which agrees with what I came out with.

Compare this with Speed and Balding for the equivalent segments. Look at how much G>20 (the grey shade) there is between 5 and 20 Mb. That is the part that I cannot believe is reasonable, and neither can Jenkins.Speed and Balding lined up

For the smaller regions, under 10 Mb, Jenkins does include a significant amount of G>20 segments (very light blue). But when you look at your match list, those are not included because a match of 20 generations or more with almost always match on just one segment. And if that segment is 9 cM or less, then it won’t be considered a match at Family Tree DNA and won’t show up in your match list. My results don’t show any G>20 for any segment length, but neither Speed Balding nor Jenkins do and both of them show many G>20 for small segments smaller than 10 Mb.

The bottom line is that I don’t believe that Speed and Balding’s Figure 2B is appropriate to apply to the segment lengths of the matches in your match list. There is something undetermined that they don’t take into account.

Conclusion: Almost all your matching segments with any of your matches at any segment length will be within 20 generations. Small segments under your DNA company’s minimum match limit (e.g. 9 cM at FTDNA) will also be within 20 generations because people with segments that small from more than 20 generations back will not be in your match list.

The ISOGG Facebook group is a closed group, but if you have been given access to it, the comments there about this article are a worthwhile read.

FTDNA’s 13th IGG Conference Lab Tour - Mon, 13 Nov 2017

#FTDNA2017 – What a great conference! So many people I could talk to at a technical level and so many that I learned from. I averaged less than 5 hours of sleep per night just because the day was so full, every morning was an early wakeup, and I had to watch Saturday Night Live. I left Winnipeg with 5 cm (2 inches) of snow on the ground and –10 C (14 F) temperatures, and Houston was 20 to 25 C (68 F to 77 F) but I wasn’t outside more than 5 minutes in those 4 days.

The morning after the Conference, I was signed up for the 9 a.m. Lab Tour at Family Tree DNA. Judy Russell and I shared a Lyft ride from the hotel to the FTDNA headquarters. That was my first ever ride of that type and I was impressed by both the service and the price and how you pay online (and tip online) and don’t have to mess with either cash or credit with the driver. But I can still say I’ve still never taken a Uber, which we don’t have in Winnipeg. (My taxi driver on the way home from the airport told me both Uber and Lyft will be starting in Winnipeg in February.)

The hour long Lab tour was something special. First of all, how often is your tourguide the President of the company? Bennett Greenspan got a dozen of us to put on white lab coats and those who had open toed shoes had to put disposable socks on, which were supplied.

I’m a software guy, and hardware baffles me. This is ALL hardware. We started with a robotic sorter, that placed several dozen samples into units that hold (I can’t remember any numbers so I’ll estimate everything) about 60 in each unit. These get placed in cases. Since everyone gives 2 samples, one goes for the current test and the other goes into 25 year storage.

We then saw the very large storage unit that is designed to store 2 million samples long term. This unit is not in service yet. It is the one talked about yesterday that they had to crane up to the 8th floor and remove the window to get it in.

We then were shown various DNA decoders ranging from the newest – a very slick looking device about the size of a washing machine costing a million dollars (Bennett said that particular cheque was hard to write), to two of the oldest that were of the type used to decode the first genome at a cost of billions of dollars. It took 100 of these devices and they cost $250,000 each. Bennett keeps them around because, although they are much slower, they do everything and are the gold standard against which they measure and check the newer machines.

We also saw and were given a glass chip to look hold and inspect. It is like and about the size of the glass slides you’d use with a microscope. It contained 24 small rectangles. Each of those is the results of a sample and is infused with the 700,000 SNP values from the sample.

There were way more steps than this, and I am amazed that this can even be done. Family Tree DNA is proud of their lab and its certifications and is continuously working to make improve the process with automation and increase throughput while maintaining quality control. Very impressive!

Photos were not allowed during the tour. To give you a feeling of the whole thing, the best I can do is provide a YouTube video by Family Tree DNA which shows and describes some of their lab equipment. It was made in 2015 so does not include the latest equipment, but gives you a good idea as to what it looks like, and the description is much better than my recollection of what Bennett said.

If you’re ever in Houston, see if Family Tree DNA is giving lab tours and sign up for one. After you’ve done that, then you can go see NASA (which I didn’t get to go to).

Below are my posts about each day of the Conference:

Jennifer Zinck posted a set of extensive notes from the Conference:

Judy Russell wrote a nice article about Michael Hammer’s talk:

Maurice Gleeson has his 2 talks from the Conference on YouTube:

—-

Followup: After telling my daughter who has a genetics degree about my lab tour, she told me she learned about the whole process in her molecular biology course. And she pointed me to the following video which I simply must pass on:

The PCR Song, from 2008. (Polymerase Chain Reaction)

    FTDNA’s 13th International Genetic Genealogy Conf Day 3 - Sun, 12 Nov 2017

    #FTDNA2017 - First up, another breakfast sponsored by FTDNA. This was followed at 8 a.m. by an ISOGG (International Society of Genetic Genealogy) chapter meeting, my first. It was led by Katherine Bodger, the Director and co-founder of the society. ISOGG was founded in 2005 after the first Family Tree DNA Conference. The ISOGG wiki is a vast resource of DNA information related to genetic genealogy to which Debbie Kennett adds most of the content, but it is open to anyone for editing once they are approved for an account. Leah Larkin is the editor of the JOGG (Journal of Genetic Genealogy). Derrell Oakley Teat is retiring from being the ISOGG European FTDNA Coordinator and received an award.

    At 9 a.m., Matt Dexter presented his own story: “Finding His Father – An Adoptee’s DNA Experience”. Matt knew almost nothing about his parents. It took him 7 years. In 2009, he met his mother. Through extensive DNA testing and learning how to do it, he finally discovered and met his father in 2016 and found several siblings. It was quite a story.

    Matt Dexter

    At 10:15 a.m., Judy Russell, the Legal Genealogist, who claims she is a genealogist who just happens to be a lawyer, and not the other way around, presented: “After the Courthouse Burns: Rekindling Family History through DNA”. Judy explained how she solved a genealogical puzzle with DNA in a period when the genealogy records had been lost and no longer exist.

    Max with Judy Russell

    At 11:15 am, Michael Davila, the director of Product Development for Family Tree DNA gave the 2017 Product Update. He said: “A user interface is like a joke. If you have to explain it, it’s not that good.” He talked about some of the challenges the company faced, and went over some of the current projects. Caleb Davis followed Michael and gave information about new things with Big Y.

    Bennett with Michael Davila

    By now, it was very apparent that Family Tree DNA has refined their Conference after 13 years so that all the little details were just right. Of all the conferences and talks I’ve ever been to, this is the first one that left pads of papers and pens on the tables at each seat so that questions could be written for the speaker. At the end of the talk, FTDNA staff would collect the questions and give them to the speaker. The speaker would read the questions and give answers quickly and efficiently. So much better than people randomly jumping up, no one can hear them, and spouting off something not as well though out as written words would be. Staff was at the back of each room throughout the conference, listening, learning, assisting and enjoying. It was such a pleasure.

    At the buffet lunch, I sat with Judy Russell. This is the 4th conference we’ve been at together in 20 months. I always thank Judy for imploring me to get my 94 year old uncle DNA tested last year and getting me into this DNA thing which has since sucked up all the free time I hoped to have after I retired.

    There was a set of breakout sessions at 1:15 pm. I chose to hear Roberta Estes once again, who talked about “Autosomal DNA through the Generations”. Roberta showed a 4 generation chromosome browser chart with her mother, herself, her son and granddaughters. Roberta’s mother died 5 years before Family Finder became available. But a sample was at FTDNA and they were still able to use it when the Family Finder test came out. One thing surprised me. Roberta asked the audience how many people had half-siblings. I couldn’t believe that a third of the people put up their hands. Roberta calls half-siblings: “God’s gift to genealogists.”

    Roberta Estes

    At 2:30, Elliot Greenspan, the head of IT at FTDNA, gave the Year in Review, and then the plans for Future Development. The latter includes new hardware, new technology, faster updates, and new hires.

    Elliot Greenspan

    Brent Manning, the Lab Manager, then gave us a very interesting update about the lab. FTDNA launched in 2000. In 2006 they opened their Genomics Research Center. In 2008, Hurricane Ike destroyed the lab and the building had to be reconstructed. They built an impenetrable fortress that worked to make it unscathed through Hurricane Harvey this year. Their lab is now 9,000 square feet taking up the entire 8th floor of their building and 50 people now work in the lab. They have been installing all sorts of new equipment and robots. The Automated Sample Sorter (to remain acronymless) was so large, they needed to remove the 8th floor window and rent a 70 ton crane to get it up there. This all means expanded capacity, throughput and ability to store millions of samples, all the while eliminating everything that can be automated. Brent summarized by saying “the future is freaking awesome, completely mindblowing” and it can’t get much  better than that.

    Brent Manning

    Max and Bennett closed the conference with a Q&A period, using all the questions handed in over the past two days that hadn’t been answered yet.They covered anything and everything. My notes:  The Illumina chip switch will happen in 5 to 6 months, but Illumina made some concessions to FTDNA and mods will make for better backwards compatibility and better matching. All the new equipment and efficiencies likely will soon improve time to get results down to as little as 2 to 4 weeks. A full genome takes the same time to run as 800 Big Y tests, so full genomes are not in FTDNA’s current plans. Will FTDNA/Gene-by-Gene ever go public? “Never!!”, said Bennett.

    Max and Bennett answering Questions