Login to participate
  
Register   Lost ID/password?

Louis Kessler’s Behold Blog

My Top Programming and Genealogy Q & As - Mon, 27 Dec 2021

As I was preparing GenSoftReviews for the annual User Choice Awards that I announce on January 1st or 2nd, I realized that I hadn’t checked the site’s links in a number of years. So I started up my link checker Xenu’s Link Sleuth and let it rip. I was amazed at the number of links I needed to fix. It took me a full day to work through them.

A lot were links to sites that changed from the non-secure http prefix to the secure https prefix. This is something a website owner needs to make the effort themselves to change. Doing so is a bit of a hassle and involves obtaining an SSL (Secure Sockets Layer) certificate. it was back in 2014 when Google first recommended that sites switch to https and I finally got around to doing it for my sites in May 2020. It was good to see that quite a number of genealogy software sites have been switching over.

I also found about 25 genealogy programs whose websites were no longer available, so I marked those programs as “unsupported” and changed their link to the most recent copy of their site that was captured by the Internet Archive.

Along the way, I noticed there was a broken link in my GenSoftReviews FAQ to a question I asked on Stack Overflow that was about javascript.


Stack Overflow, the Q & A Site for Programmers

Stack Overflow was created in 2008. It quickly became the go-to site for programmers to get their questions answered and to find solutions to their problems. I joined Stack Overflow in October 2008 and I have made great use of it to help with my Delphi Development of Behold and Double Match Triangulator. There you’ll find over 129 questions that I asked and over 177 answers I provided to other people’s questions.

To encourage participation, users collect points when someone upvotes one of their questions or answers or selects their answer as the best, and they lose points when they get downvotes. The net upvotes allow the best answers to be shown first. They also allow me to unbiasedly rank my top questions and answers.


The Big Purge

After a few years of operation, Stack Overflow decided to only accept questions about programming that are tightly focused on a specific problem. Questions of a broader nature—or those inviting answers that are inherently a matter of opinion were not desired. Many of my early Qs and As, including some of my best, were of the broad type, and many of those were deleted from public view. I realized that when I saw the broken link for my question about javascript.

This is what you see now from one of these deleted questions:

image

On Stack Overflow, the more points you collect, the more capabilities you have on the site. I have participated in Stack Overflow enough to earn the points required to view deleted questions and answers. They don’t say how long these deletions will still be around for, so I thought it best that I save a copy of them while they are still there. Then I can fix the link in my GenSoftReviews FAQ to point to my saved version of the javascript question.


My Top Stack Overflow Questions and Answers

The deleted Q&As are among my most popular and interesting, primarily because they are of a general nature. Here are my top 3 questions:

  1. What’s with those Do-Not-Use JavaScript People? This had 154 upvotes, was viewed 36,000 times, and had 33 answers, but was deleted. This was the broken link question that I referred to in my GenSoftReviews FAQ page as mentioned above. 
  2. What ever happened to APL?  This had 95 upvotes, was viewed 22,000 times, and had 31 answers. It was also deleted. APL was my favorite programming language in University. Very terse! See my top answer, below.
  3. How Do I Choose Between the Various Ways to do Threading in Delphi? This has 65 upvotes , 8,000 views and 6 answers. It is my top question that was not deleted. It is specific to the Delphi language that I use. Threading is getting the computer to execute different sets of code at the same time.

My favorite question is likely this one:

  • A Good and SIMPLE Measure of Randomness had 45 upvotes, was viewed 24,000 times and had 15 answers.  I provided my own answer to my question (which not only is allowed, but is often encouraged) which was 2nd of the 15 answers with 16 votes. I accepted my own answer.

My top answers were to these questions:

  1. “Strangest Language Feature” received 973 votes. There were 365 answers and my answer had the 8th most votes with 325. I stated that APL’s ability to write any program in just one line which links back to this now archived amazing article and this video. This question has so far escaped the purge, but it has been locked because of being “off-topic” and could be deleted in the future.
  2. Delphi Profiling Tools received 23 votes. There were 10 answers. My answer received 25 votes and was the accepted answer. Profiling tools help you optimize a program to make it faster, sort of a specialty of mine since my days of chess programming. This question is still there as well, but is marked closed as it “does not meet Stack Overflow guidelines” and may be deleted in the future.
  3. how many delphi users over the world? received 42 votes. There were 6 answers. My answer received 22 votes and was the accepted answer. This was a fun question to try to answer. Again, this answer might be deleted.

I do understand why the Stack Overflow people are locking, closing, and deleting many of the most popular questions. It’s because the most popular ones are general, don’t require a precise answer but often the answer is just an opinion. They did attract attention, but not the sort of attention that was wanted from the site.


Stack Exchange

Stack Overflow turned out to be an overwhelming success. It has over 22 million questions asked with over 33 million answers. It gets 9.5 million visits a day and has 5,000 new questions asked each day. 16 million users have participated.

The company soon allowed the Q&A platform they developed to be used to answer questions on other subjects. They created the StackExchange website to maintain a list of all the topics covered. There are currently 177 different sites with topics ranging from Law to Hinduism to Robotics to Bicycles to Parenting to Chess.

SNAGHTMLdc0409

I’m sure any person will find many topics that would interest them. My Stack Exchange profile lists the 31 topics that I have participated in.

Just for fun, these are my top question and answer on these other sites:

  • My top question was Image of Webpage on Tile in Windows 10 Mobile? on the Windows Phone topic site. I found that site very useful when I still had my Windows phone. The question had 12 upvotes, was viewed 1000 times and had 2 answers.
  • My top answer was to the question Are negative numbers singular or plural? which had 46 upvotes.My answer had the most upvotes (17) but was not the accepted answer which had 13 upvotes. This was on the Meta Stack Exchange site. Every topic site has its own Meta site, for discussing issues about the site itself. This question was asking if the site was displaying negative numbers properly.


Genealogy and Family History Stack Exchange

Stack Overflow worked so well for me, that I was sure it would be a great place for Q&A about Genealogy. So in 2012, I helped promote the proposal of a Genealogy Q&A site at Stack Exchange. After 58 days of getting enough interest, we did it. The Stack Exchange people created a live site for Genealogy and Family History questions and answers: https://genealogy.stackexchange.com 

Over the past 9 years, the site has had 8,900 registered users ask 3,500 questions and give 5,700 answers. I have been a regular participant, supplying many answers (281) as well as a few questions (26).

My top genealogy questions have been:

  1. What is the Russian town in this census? (18 upvotes) – For a long time, this was one of my most perplexing genealogy mysteries, which I solved 6 months ago when I found a death record giving the answer.
  2. How do I write the year with a double date? (15 upvotes) – I got some really good answers on this one.
  3. Extract Facts from an Army Portrait (14 upvotes) – I still really would like help with this if anyone can contribute.

My top genealogy answers have been to these questions:

  1. How should a trip be recorded in my family tree software? My answer has 20 upvotes and was the accepted answer.    
  2. Why haven’t more programs adopted Gramps XML free format? My answer has 20 upvotes and was the accepted answer.
  3. What are the key points for a beginning genealogist to consider? My answer has 19 votes and was the accepted answer.


Do you have your own Questions or Answers?

As you can see, there are a wide range of questions and answers on all sorts of topics on the Stack Exchange sites. 

If you are one of my genealogy friends, I invite you to join me and thousands of others at Genealogy and Family History Stack Exchange.

If you are a programmer, join the world at Stack Overflow.

And don’t forget to check out your other interests at Stack Exchange.

5 Years Ago Today, I Retired - Wed, 24 Nov 2021

Exactly 5 years ago today, which coincided with my 60th birthday, several hundred co-workers, friends and family met in the large conference room at the Manitoba Hydro head office in downtown Winnipeg on my last day of work to help see me off into “retirement”. 2021-11-23_13-49-05

I had a wonderful 41 years with the company and enjoyed being able to make use of all my computer, maths and statistics skills in all sorts of wide-ranging and exciting projects.

Today, it is 5 years later, so I’m sure you can guess what age I became today.

Now it’s time to look back and see what I have accomplished so far in my first 5 years of retirement.


DNA Testing and Analysis

These past 5 years I’d call my DNA testing years. I did my first DNA test with Family Tree DNA in Nov 2016 and subsequently tested myself at MyHeritage, 23andMe, Ancestry and LivingDNA. I then took two WGS (Whole Genome Sequencing) tests from Dante Labs, a short read test and a long read test.

I took all these tests primarily to learn about them and gain expertise in the genetic genealogy field. I have written many articles about DNA analysis in my blog, and have given talks about DNA at several genealogy conferences. I also wrote the program Double Match Triangulator to help analyze your DNA matches. I’m currently working on modifications to DMT that will be released as Version 5.


Year 1 Started with a Bang!

I had been an avid squash player 3 times a week for 30 years. We had a great group who would get together and whoever came out would play each other at a pretty competitive level. My expectation was to continue this into my retirement years as long as I could … which turned out to be less than a month. During one game, I displaced my peroneal tendon and a week and a half later had an operation to repair it.

So that started off my year 1, 2017, which included 4 genealogy conferences: RootsTech in Salt Lake City, the IAJGS at Disney World Florida, the Great Canadian Genealogical Summit in Halifax and FTDNA’s Genetic Genealogy Conference in Houston. I came out with 3rd place at the RootsTech 2017 Innovator Showdown for my program Double Match Triangulator, and I did it wearing a walking boot from my operation.


Genealogy Conferences and Vacations

So there were the 4 conferences in 2017. I attended and spoke at the Kelowna District Genealogy Conference in 2018. My first-ever online presentation was in January 2020 at the Family History Fanatics DNA eConference and I have made several others since.

Vacation-wise, my wife and I took a western Caribbean cruise in Feb 2017 just after RootsTech (and I was still in my walking boot). My wife and I and my older daughter went to Walt Disney World in June 2017 where my younger daughter was working for the summer. I went there a month later for the IAJGS conference and saw my younger daughter again while there. My wife and I took an eastern Caribbean cruise in Feb 2019. Then in July 2019, my wife and I went to Niagara Falls for a 40th anniversary reunion of my Europe Trip people from 1979. And in Feb 2020, my wife and I took a southern Caribbean cruise with my best friend Carl and his wife who now live near Vancouver. We got back just as Covid was taking over.

I was scheduled to attend the 3rd MyHeritage Live conference in Tel Aviv, Israel in the Fall 2020, but that was kiboshed like everything else by Covid. It would have been my first trip to Israel and my wife and I were looking forward to making a vacation in Israel around it. That will have to wait now until some future time.

With the onset of Covid in March 2020, so many conferences, webinars and discussions started coming online. Zoom became a word of our vocabulary. Between Legacy Family Tree Webinars, Dear Myrtle, Geneablogger, Family History Fanatics, WikiTree and all the other content providers, you could spend 24 hours a days in front of your computer being totally immersed in genealogy without getting a smidgen of your own genealogy done.


GEDCOM

I’ve always been interested in GEDCOM, the standard for transferring genealogy data because my program Behold uses it. I kept up with what FHISO was doing to advance the standard. In 2018, I was one of a number of people who contributed to Tamura Jones’ GEDCOM 5.5.1 Annotated Edition, and then a year later to Tamura’s GEDCOM 5.5.5.  I followed along as FamilySearch released its 7.0.earlier this year.


Genealogy

My one greatest accomplishment over the past 5 years has to be my advances in my own genealogy research. This summer I wrote the article: So How’s My Genealogy Going? In it, I described how I selected MyHeritage as my platform of choice and now use it and their Family Tree Builder program to store the working copy of my family tree data.

I’ve been involved in some major projects that include working with my wife’s cousin Terry Lasky, helping with his DNA project. In that project, I’m still trying to solve how the early Zaslavsky families from Tetiev were connected by using my Double Match Triangulator program. My Romanian Fossaner side has been aided by working with my genealogist cousins Joel Koenig and Phil Rodd who have been doing a lot of research and have found new living cousins in Australia who we have Zoomed with.

But my major breakthroughs have come in the past 4 years with researchers finding me records of my ancestors in Romania and Ukraine that have taken 5 of the 9 lines I’m researching back another 3 generations to the 1800’s. New records are being photographed and digitized all the time and the next few years should continue to be exciting on this front.

In order to better understand the records I’m getting, I just finished a fantastic 10 week online Salt Lake Institute of Genealogy (SLIG) course on Researching Russian Records, and now I can read, pronounce and transliterate Russian print and (with a bit more difficulty) Russian handwriting, which is essential to know for anyone researching their Russian Empire roots.

I was honored in July to be selected as a genealogy guest on the WikiTree Challenge. About 20 WikiTreers all worked together on my tree to make it better and find new sources for me.


The Next 5 Years

So much to do, so little time. First step is to try to complete my Zaslavsky project and release Version 5 of Double Match Triangulator. Then I really want to get back to working on Behold which has sort of been in limbo the past 5 years. Now that I don’t mean to make it my primary genealogy editor (that being MyHeritage), I’ve got ideas to take it into a new and interesting direction.

My genealogy will still take front and center stage. If any new records come up, or I get contacted by relatives or possible family, I will attend to that first. And those boxes of my and my wife’s family material still needs to be gone through, digitized and recorded.

I also had two little munchkins sprout up in the past 5 years (my sister’s grandchildren) who will be so fun to follow and be a part of during their next 5 years.

Other than that, I’ll try to keep in shape (walking, swimming and bike riding – did over 1000 km of the latter last year), plus lots of family stuff, house maintenance, errands, appointments, shows and sports on TV, socializing on Facebook and Twitter, keeping up my website, my GenSoftReviews, Behold and DMT sites, and blogging and Zooming from time to time. Hopefully this Covid thing will end soon and we’ll be able to see people and travel again.

I’ll report back in 5 years from now and let you know how it went.

Your DNA Raw Data May Have Changed - Sat, 20 Nov 2021

To my surprise, I downloaded my raw data from my 23andMe DNA test and it was different from my earlier downloads.

I would have thought your raw data from a company wouldn’t change. I took one DNA test there, so my results should be determined once, and that’s what should be represented in my raw data. I don’t care if the format of the file changed, but I do care if the data represented in that file changed.

So lets see what might have happened here.


Three 23andMe Raw Data Downloads

I took my 23andMe test in Nov 2017.  I’ve downloaded my raw data 3 times since then.

I talked about my original download in my August 2018 article Comparing Raw Data from 5 DNA Testing Companies. My April 2020 article Determining the Accuracy of DNA Tests used my 2nd download. I noticed a difference in the counts from my 23andMe file from and the earlier article, and I said at the time “I’m not sure why there’s a difference” and I assumed I must have made some mistake with the numbers. But what I see now is that the files were slightly different.

Let’s compare the counts in the 3 files I now have:

image

Above are the counts of the SNP values in groups: the autosomal homozygous (same valued) SNPs, the autosomal heterozygous (different valued) SNPs, the autosomal Insertions and Deletions, the Y and mt chromosomes which have just one value, and the no-calls (values which could not be determined with sufficient accuracy that are shown as a double dash “- -“).

Comparing the counts between the first two downloads, we see they are fairly close with the total number increasing by 10 and the largest difference only being 14 in the G group.

But you can see my most recent download had a significant change, with the total number of SNPs decreasing by almost 6500. About 4500 of those was from a reduction in the number of no-calls, but the other 2000 were because of fewer actual values. Surprisingly, the number of autosomal heterozygous SNPs went up by 19.


So What’s Changed?

Since I created my “All 6” Combined Raw Data File in my May 2019 article Creating a Raw Data File from a WGS BAM file, and that was based on my Sep 2018 download from 23andMe, I’ll compare that download to the one I just did.

image

So there were 6673 SNPs in my old file that are not in my new file. 4610 of those were no calls in my old file, so those don’t matter. And 52 were deletions or insertions that other companies don’t even report. But that still leaves 2010 SNPs that had values previously and no longer do. Of those, 1706 are autosomal SNPs that are important for DNA matching.

There were 191 new SNPs in my new file that were not in my old file. And 136 of those are autosomal SNPs that are important for DNA matching.

And there were 16 SNPs that changed values in my new file. Fortunately none of the changes were important. There were 6 nocalls changed to values and 9 values that were either changed to nocalls or insertions or deletions.


Did We Lose Useful Information?

The question is whether we lost useful information in those 2010 deleted SNPS that had values previously or if we gained useful information from those 182 new SNPs.

To find out, I have to go to my April 2020 article Determining the Accuracy of DNA Tests. See the section about the Accuracy of Standard Microarray DNA Tests. If I do the same procedure and compare the values from the 4 BAM files that agree with each other to these deleted and added SNPs, I can get an approximate accuracy estimate for them.

Of the 1706 autosomal non-indel SNPs that were in the old 23andMe file, 1518 had identical values in the 4 BAM files. Of those, only 1164 match the deleted value. That’s an error rate of 23.3% which isn’t good at all. So what we lost were SNPs with a high error rate.

Of the 136 new autosomal SNPs, 129 had identical values in the 4 BAM files. Of those, 128 matched the new 23andMe value. Just one, which was AG in the 23andMe file and was AA in the BAM files didn’t match. That’s an error rate of 1 / 129 = 0.8% which is okay. So what was added was useful.

The values deleted had a high error rate. The values added had a low error rate. I don’t know the reason why 23andMe made these changes or what they did to make the changes, but the net result was a slight overall improvement to the accuracy of their raw data file.


Did the Raw Data File of Other Companies Change as Well?

Let’s see.

Family Tree DNA:  My first build 37 raw data file download was from Aug 2018. My download today is identical to it.

Ancestry DNA: My first raw data download was Mar 2018. Compared to my download today, 56 SNPs from the early file have been deleted. All the deleted SNPs had values and none were no-calls.  No SNPs were added and no SNPs were changed. So this is a change, but a minor change.

LivingDNA: My first raw data download was Aug 2018.  My download today is identical to it.

MyHeritage DNA: My first raw data download was March 2017. But I was surprised to find my download today is very different, much more different than the 23andMe data, and needs its own analysis, which follows.


MyHeritage DNA Raw Data Changes

Here is my MyHeritage DNA comparison table:

image

There are over 110,000 fewer SNPs in the new dataset from MyHeritage. Most of the reductions appears to be among the heterozygous SNPs which halved in numbers.

I hadn’t heard that any change was made to MyHeritage DNA’s raw data files, so I didn’t expect this. I also see they added a few indels to their file, similar to the way 23andMe does and have cut down the number of no-calls.

But just take a look at these changes:

image

Of the 720,816 SNPs they had in the original file, they only retained 214,353 of them, changed 3436 of them, and added 391,635 new SNPs that weren’t there before.

This is a major change! The original MyHeritage raw data I got in 2018 is nothing like the new one I now get. The SNPs they are using now are 65% different.

Hopefully they improved their accuracy. Let’s see.

When I wrote my Determining the Accuracy of DNA Tests article in Apr 2020, I  produced the following table:

image

My numbers showed that MyHeritage had the best accuracy of the 5 companies, just 1 error out of every 603 SNPs.

When I do the same comparison of my new MyHeritage DNA raw data, I get this:

My new data has 574,057 autosomal values on Chr 1 to 22 that are not no-calls. Of those,  534,179 have agreeing values in my 4 BAM results that I can compare them to.  529,707 of those match the MyHeritage value.

That means 4472 are incorrect out of 534,179 or 0.8%.  So 1 out of 119 values in my new MyHeritage raw data file are incorrect. It’s error rate has increased by a factor of 5.

With this change, MyHeritage went from being the most accurate of the 5 companies, to being the least accurate.

I don’t know exactly when or why MyHeritage made changes to what it puts in your raw data download file, but whatever they did (maybe their imputation and splicing) decreased the quality of its raw data considerably.

I cannot say for sure what that does to MyHeritage’s matching accuracy. That will depend on their matching algorithm and whether they are allowing for the possibility that 1 out of 100 SNPs may have an incorrect value, rather than the 1 in 600 that they had before. If they did compensate for this and lessened their requirement to, say, 1 mismatch every 50 SNPs, then you will have more false segments than you did before.