Login to participate
  
Register   Lost ID/password?

Louis Kessler’s Behold Blog

MyHeritage’s Record Matches and Discovery Settings - Sun, 23 Mar 2025

One of the best reasons to have your family tree on one of the major online sites MyHeritage, Ancestry or FamilySearch, is because of their record matching systems. They will go out and find records for the individuals in your tree from among the billions of records in their collections and make them available to you.

I’m not saying you shouldn’t go and specifically seek out records that you need. Not all records are online, and there are a lot of records that are not indexed or transcribed incorrectly so that no automated system will find for you. But what you get from record matches are all the straightforward records that you’d never be able to find for yourself in your lifetime. These record matches will also include some record collections that you never would have thought to look at. Automated record matching is such an innovation, that I would go so far as to say it has revolutionized genealogy.


Record Matches on MyHeritage

I last used a desktop program to record my genealogy in 1994. The program I was using at the time was Reunion for Windows. I had at that point connected 1361 people to my family. Over the next 20 years, I worked on my program Behold, expecting it to become the program I would use to be the program I would use to record my family.

But in 2017, the new trend of online record collections and record matching systems became an irresistible draw for me. I opened an account at MyHeritage, initially loaded my tree using a GEDCOM file, and started working on my tree there. Over the next six years, I was fortunate to obtain hundreds of Romanian and Russian records about my relatives in the 1800’s and greatly expanded my ancestry. Meanwhile, the MyHeritage record matches were so valuable to help me work down from my ancestors and discover 3rd, 4th and 5th cousins that I would never have found otherwise.

Currently, I have 8 trees at MyHeritage. My main tree is for my and my wife’s families as well as a place-to-place study and now contains 15,353 people. I have 7 other trees for in-laws and friends that have another 3,339 people. Here’s a bit of the history that I kept about the growth of my family trees at MyHeritage and the number of Record Matches that I’ve confirmed and added to my tree.

image

With MyHeritage, I try to keep up to all the record match suggestions it supplies. Every few weeks, I might get several hundred to several thousand new record matches to look through and process by either confirming or rejecting each one.

The source to person ratio is a metric that I remember Randy Seaver blogging about. In 2019 Randy reported a ratio of 1.94 from his RootsMagic program. I wonder how much he has raised his ratio since then. It’s important to try to ensure your tree has supporting records, and the ratio is a good way to check that you’re going in the right direction. I attribute the growth of my ratio primarily due to the growth of new record collections at MyHeritage.


MyHeritage’s Discovery Settings

When you look at your record matches at MyHeritage, there is a small “cog” on the page that you may never have noticed.

SNAGHTML1862d60

If you click on the “cog”, it brings up this window:

image

I don’t know when MyHeritage added this feature. I didn’t notice that the “cog” was there until sometime last year. The Introduction to Record Matches article by MyHeritage describes the “Confidence Scores” in more detail. The Discovery Settings allow you to set the minimum level of confidence of the record matches that you are given. You can also select only structured records, or only text records if you want.

The confidence level has 11 settings. From zero (lowest confidence) with increments of  0.5 up to 5 (highest confidence). The default is 0.5 which includes almost all the record Matches that are found. Increasing the Confidence Level will reduce the number of Record Matches you are presented with.

I tested most of the confidence levels on my own matches. Below are the numbers of record matches that MyHeritage gives me at each confidence setting. MH sometimes refers to a “Collection” as a “Source”.

SNAGHTML105a5cf

MyHeritage currently has 7466 different collections. Only 485 of those give me record matches at confidence setting 0 or higher.  Confidence setting 5 only gives me record matches from 107 collections.

These are the MyHeritage collections that give me the most record matches at confidence level 0:

image

Note that MyHeritage classifies Geni, FamilySearch and Filae matches as structured record matches. These are actually family tree sites, and genealogists know that other people’s family trees (especially unsourced ones) are not to be trusted until verified with records.

MyHeritage also finds matches with its own member’s family trees. It keeps those separate into what it calls “Smart Matches”. I have  23,678 smart matches with 5,124 trees at MyHeritage. Those are great for clues and really help put families together. MyHeritage is correct not to call those “records”. But IMHO I think MyHeritage should change Geni, FamilySearch and Filae matches to also be smart matches.


Confirming and Rejecting Record Matches

When I learned about the Discovery Settings last year, I changed my confidence setting from 0.5 to 0 and learned that I had 12,000 more matches to review. I wondered if anything with zero confidence was worthwhile spending the time on.

So I spent a few weeks reviewing 6054 of these matches. I confirmed only 2558 of them (42%) and rejected the rest (58%).

Technically, a confirmation of a match should mean that this record does indeed pertain to the person that the match says it pertains to. A rejection should mean that it is not the correct person. What I do is also reject records for the correct person if they are duplicates of another record. I only keep one. For example, the same newspaper article might be printed on consecutive days or in different newspapers.

The one person in my tree who has the most record matches is the astronaut Judith Resnik who was killed in the 1986 explosion of the Space Shuttle Challenger. Needless to say, there were many articles written and the initial articles were reprinted in newspapers everywhere. I have 1208 record matches for Judith and only confirmed 30 of them (2%), rejecting the rest (98%).

So I was wondering if the confidence levels that MyHeritage assigns to records correspond somewhat to the percentage that I have rejected. I downloaded some statistics from my matches and came up with this table:

image

What this says is that MyHeritage does quite well in providing relevant structured record matches, requiring rejection less than 20% of the time with a confidence setting of 0.5 or more. The Free Text records however are much less reliable, with confidence level 0 almost always being rejected.


Conclusion

There’s nothing like MyHeritage record matches to provide you with facts and sources you need to expand your family tree. I enjoy spending the the hours necessary to review new matches whenever they appear for me. It definitely is worth your time doing this!

If you haven’t already, check out the MyHeritage Discovery Settings. You may want to increase the confidence level or only work with structured records if you have too many pending matches. Or you may want to try decreasing the confidence level to 0 and see all the suggestions that MyHeritage has to offer. The choice is yours.

Using AI Bots for Programming - Sun, 16 Mar 2025

I’m working hard on Version 2 of Behold, and I needed to write some code to strip out HTML from a string. This would be so Behold could display the text embedded in the HTML correctly without formatting.

Since I’m a Windows and Office 360 aficionado, I get a version of Microsoft’s AI bot named Copilot built into my internet browser Edge and my Office products. So that’s the one most convenient for me to use. And I have had good luck asking Copilot programming questions to help me with small tasks in my programming language Delphi in the past.

But this time, this was a task I really didn’t want to do myself. I saw the solution as scanning through the string character by character and parsing out what was html to remove it. Not too difficult, but a bit of a pain to code. So I thought I’d let Copilot try.


Query Number 1

image

uses
  System.RegularExpressions;

function RemoveHTML(const Input: string): string;
var
  RegEx: TRegEx;
begin
  // Create a regular expression to match HTML tags
  RegEx := TRegEx.Create(’<[^>]+>’);
  // Replace all HTML tags with an empty string
  Result := RegEx.Replace(Input, ”);
end;

// Example usage:
var
  Original, Cleaned: string;
begin
  Original := ‘<html><body>Hello, <b>World</b>!</body></html>’;
  Cleaned := RemoveHTML(Original);
  WriteLn(Cleaned);  // Output will be "Hello, World!"
end.

image

Well that was a surprise to me. A very simple two line function. It uses a mathematical notation called a “regular expression” to find the HTML bits. In all my programming days, I’ve never got to understand them well as I find that once you write one of those regular expressions, such as the “<[^>]+>” that Copilot created, then it is almost impossible to reverse engineer what it does and how it works.

I was a great fan of the APL programming language, and it had the same feature of being impossible to figure out after the fact.

Nonetheless, Copilot created me some code that I could use even if I didn’t fully understand how it worked.


Query Number 2

I wanted a slight modification to that. HTML can include urls in its HREF values, and that was information I ddn’t want thrown out, so I then asked Copilot:

image

uses
  System.RegularExpressions;

function ReplaceHrefTags(const Input: string): string;
var
  RegExHref, RegExTags: TRegEx;
begin
  // Regular expression to match href tags and extract href value
  RegExHref := TRegEx.Create(’<as+[^>]*href="([^"]+)"[^>]*>’);
  // Replace href tags with just their href values
  Result := RegExHref.Replace(Input, ‘$1′);

  // Regular expression to match all other HTML tags
  RegExTags := TRegEx.Create(’<[^>]+>’);
  // Remove other HTML tags
  Result := RegExTags.Replace(Result, ”);
end;

// Example usage:
var
  Original, Processed: string;
begin
  Original := ‘<html><body>Visit <a href="http://example.com">this link</a> for more information.</body></html>’;
  Processed := ReplaceHrefTags(Original);
  WriteLn(Processed); // Output will be: "Visit http://example.com for more information."
end.

image

Excellent!  Just one more thing.


Query Number 3

image

uses
  System.RegularExpressions;

function ProcessHTML(const Input: string): string;
var
  RegExHref, RegExTags: TRegEx;
begin
  // Regular expression to match <a> tags with href attributes and capture both href value and inner text
  RegExHref := TRegEx.Create(’<as+[^>]*href="([^"]+)"[^>]*>(.*?)</a>’);
  // Replace <a> tags with "href value (inner text)"
  Result := RegExHref.Replace(Input, ‘$1 ($2)’);

  // Regular expression to match all other HTML tags
  RegExTags := TRegEx.Create(’<[^>]+>’);
  // Remove remaining HTML tags
  Result := RegExTags.Replace(Result, ”);
end;

// Example usage:
var
  Original, Processed: string;
begin
  Original := ‘<html><body>Visit <a href="http://example.com">this link</a> for more information.</body></html>’;
  Processed := ProcessHTML(Original);
  WriteLn(Processed); // Output will be: "Visit http://example.com (this link) for more information."
end.

image

I am rather impressed. I ended up with the code that I wanted after only 3 queries to Copilot. And it provided me with the code, an example, and an explanation of how it worked.

I know it would have taken me considerably more than 5 minutes to write that query myself without the use of regular expressions.

Thank you Copilot! Maybe you can’t completely translate my Russian documents for me yet, but it looks like you could be very useful to me for my programming.


Just One More Thing

While testing this code on some of my data, I found one more revision was needed:

image

uses
  System.RegularExpressions;

function ProcessHTML(const Input: string): string;
var
  RegExHref, RegExTags: TRegEx;
begin
  // Regular expression to match <a> tags with href attributes and capture both href value and inner text
  RegExHref := TRegEx.Create(’<as+[^>]*href="([^"]+)"[^>]*>(.*?)</a>’);
 
  // Replace <a> tags based on the condition
  Result := RegExHref.Replace(Input,
    function(const Match: TMatch): string
    begin
      if Match.Groups[1].Value = Match.Groups[2].Value then
        Result := Match.Groups[1].Value  // Include only href value
      else
        Result := Match.Groups[1].Value + ‘ (’ + Match.Groups[2].Value + ‘)’;  // Include href value and text
    end
  );

  // Regular expression to match and remove all other HTML tags
  RegExTags := TRegEx.Create(’<[^>]+>’);
  Result := RegExTags.Replace(Result, ”);
end;

// Example usage:
var
  Original, Processed: string;
begin
  Original := ‘<html><body>Check out <a href="http://example.com">http://example.com</a> and <a href="http://another.com">this link</a>.</body></html>’;
  Processed := ProcessHTML(Original);
  WriteLn(Processed); // Output will be: "Check out http://example.com and http://another.com (this link)."
end.

image

But this time there was a problem. The code would not compile.

Would Copilot be able to fix it?

image

uses
  System.RegularExpressions;

function ReplaceHrefTags(const Input: string): string;
var
  RegExHref, RegExTags: TRegEx;
  Matches: TMatchCollection;
  Match: TMatch;
  Processed: string;
begin
  Processed := Input;

  // Regular expression to match <a> tags with href attributes and capture both href value and inner text
  RegExHref := TRegEx.Create(’<as+[^>]*href="([^"]+)"[^>]*>(.*?)</a>’);
  Matches := RegExHref.Matches(Processed);

  // Process each <a> tag match
  for Match in Matches do
  begin
    if Match.Groups[1].Value = Match.Groups[2].Value then
      Processed := StringReplace(Processed, Match.Value, Match.Groups[1].Value, [rfReplaceAll])
    else
      Processed := StringReplace(Processed, Match.Value, Match.Groups[1].Value + ‘ (’ + Match.Groups[2].Value + ‘)’, [rfReplaceAll]);
  end;

  // Regular expression to match all other HTML tags
  RegExTags := TRegEx.Create(’<[^>]+>’);
  // Remove all remaining HTML tags
  Result := RegExTags.Replace(Processed, ”);
end;

// Example usage:
var
  Original, Processed: string;
begin
  Original := ‘<html><body>Check out <a href="http://example.com">http://example.com</a> and <a href="http://another.com">this link</a>.</body></html>’;
  Processed := ReplaceHrefTags(Original);
  WriteLn(Processed); // Output will be: "Check out http://example.com and http://another.com (this link)."
end.

image

And that worked!

Very well done Copilot.

RootsTech 2025 - Sat, 8 Mar 2025

Over the past 3 days, RootsTech took place in Salt Lake City. RootsTech is the largest annual genealogy conference in the world, with tens of thousands of in person attendees and several million online attendees.

I went to RootsTech in person three times, in 2012, 2014 and 2017.

In 2018. RootsTech started live streaming some of their talks. In 2021 due to Covid, the Conference was completely online. They resumed the in-person conference a couple of years ago, but with millions of viewers online during the Covid years, they smartly decided to continue streaming many of the talks.

If you went to the RootsTech website this year between March 6 and 8, you would have seen the list of 418 sessions. 254 of those were in person, 244 were available online. And 80 of the in person sessions were also available online and live. With so many sessions available online, it made for quite a selection of talks to choose from.

image


Attending Online

Of course by not attending in person, you will not be able to spend time with all your friends you made at previous conferences or make new friends. But if you couldn’t go, then the online experience is the next best thing.

There are a lot of sessions broadcast live, including the keynote sessions.

When attending online, you can browse through the list of sessions and decide which ones you want to “attend”. You can watch the live sessions when they are broadcast, or watch them any time afterwards. If you watch them live, you can join the live chat with other viewers. Most sessions also have a syllabus that you can download. 


Choosing Your Sessions

It’s really up to you to choose the sessions that interest you the most or have speakers you enjoy listening to. Most of the sessions are an hour each, and the live ones end with about 10 minutes of Q&A from the live audience.

Over the past few years and this year as well, I’ve picked about a dozen of the sessions of most interest to me. This included some of the keynotes as well as some of my favorite speakers such as Adina Newman (DNA and Endogamy), Steve Little (Artificial Intelligence) and CeCe Moore (Ethical Standards and Investigative Genealogy). Watching about 5 hours on Thursday, 5 on Friday and 2 on Saturday was enough to exhaust me as if I was there in person.

I very much enjoyed and was most impressed and surprised by CeCe Moore’s presentation:

image

She wonderfully explains her involvement for the past 11 years and gives a behind-the-scenes description of the PBS genealogy show Find Your Roots hosted by Dr. Henry Gates. But then to my surprise, she tells of how they turned the tables and did research with Dr. Gates as the subject. It included a sneak preview of the episode that is to be broadcast on April 8 where CeCe sits in the presenter’s chair. I can’t wait to see the full episode.


Session Archives

Don’t feel you can only watch the RootsTech sessions during the few days when the RootsTech Conference is taking place each year. RootsTech keeps an online archive of most of the sessions back to 2019 and these can be watched any time. There are more than 1500 sessions and 3000 other recordings in their catalog to watch.

image