Login to participate
  
Register   Lost ID/password?

Louis Kessler’s Behold Blog

Connection Count Relationship Notation (CCRN) - Wed, 9 Apr 2025

As I work hard to finish up what is in the soon to be released Behold 1.999, there was another notation I needed to devise.

Those of you who have worked with WikiTree will be aware of their introduction of connection counts. It was a couple of years ago when the imagineers at WikiTree came up with the concept of a “Connection Count at 7 Degrees” (CC7) to be an indicator of how well-connected your family is. I wrote an article about this last year:  What’s Your CC7?

WikiTree promotes challenges to increase your CC7, which is the number of people you connect with in 7 steps or less.


Defining a Connection

A connection as defined by WikiTree is the shortest number of steps between two people, where a step is one of:

  • A parent (father or mother)
  • A child (son or daughter)
  • A sibling (brother or sister)
  • A spouse (husband or wife)

So your grandfather’s sister would be 3 steps: 

  • parent – father – sister

Your brother-in-law would be 2 steps:

  • sister – husband, or
  • wife – brother

since the term brother-in-law can be either.

The Connection Count (CC) is the lowest number of steps it will take you to go from one person to another. I find it remarkable that WikiTree can do all that calculation and tell me that the lowest number of connections I have with Kevin Bacon is 20 and with Queen Elizabeth is 17. And then it provides the steps:

There are a few important things to note about the connections:

  1. The shortest connection need not be as a relative. e.g. If your 3rd cousin married your sister, then the CC is 2, through your sister, rather than the 7 steps for 3rd cousins.
  2. The sibling step is effectively a shortcut that takes 1 step, instead of going parent – child (2 steps) to get to a full or half sibling, The BGRN notation (see below) always counts siblings as 2 steps but for connection counts, a sibling is just 1 step.

The whole idea of working on improving your CC7 is that it gets you to research some of the close families of those related to you through marriage. These people would be included in what they call FAN (Friends, Associates, Neighbors) research.


Adding Connection Counts to Behold

I don’t know of any genealogy program other than WikiTree that gives you connection counts. I think it is a very interesting number, and people with GEDCOM files might want to know about them. So I have decided to add them into this upcoming release of Behold.

I will describe my implementation of CCs in a future blog post.


Connection Count Relationship Notation (CCRN)

But what is important right now for this article, is to define a notation that can concisely define a specific connection.

The diagram earlier on this page is an older diagram that WikiTree uses. In effect, it tells me that the connection is: sister – son – wife – mother – father – mother – father – father – father – father - father –father – brother – son – son – son – son – daughter – daughter – son. 

To me, that’s a bit verbose. Something with one letter per connection would be preferable. A 20 step connection then would be exactly 20 characters long.

We need letters for:   Parent, Father, Mother, Child, Son, Daughter, Sibling, Brother, Sister, Spouse, Husband, and Wife.

Parent, Child, Sibling and Spouse are needed in case we don’t know the sex of that particular person in the line.

We can’t use just the first letters since Son, Sibling, Sister and Spouse all start with S.

Also, there is a similar notation, Behold’s Genetic Relationship Notation (BGRN) that I came up with in 2018 that I am using in Behold to define relationships. I would like the two to be compatible.

So this is what I’ve decided on:

  • F = father
  • M = mother
  • P = parent of unknown sex
  • S = son
  • D = daughter
  • C = child of unknown sex
  • h = husband
  • w = wife
  • k = spouse of unknown sex

These are already in BGRN. Note that BGRN uses uppercase letters for biological relations and lowercase for non-biological (in this case marital) relations. This makes it easy to see if a relationship is biological or not.

So now I just need to add brother, sister and sibling, as follows:

  • B = brother
  • Z = sister
  • G = sibling of unknown sex

“S” is already used for son, and BGRN uses T for twin and R for pair of parents, and the notation should not use vowels or words (possibly bad ones) might inadvertently be formed. So Z and G will have to do, unless someone has a better idea. Z isn’t that bad, because you can think: “Zister”.

Using this notation, the CCRN for my connection with Kevin Bacon is:

ZSwMFMFFFFFFBSSSSDDS

and it’s fairly easy to interpret that to be:  My sister’s son’s wife’s mother’s father’s mother’s father’s father’s father’s father’s father’s father’s brother’s son’s son’s son’s son’s daughter’s daughter’s son is Kevin Bacon.

This is exactly 20 characters long meaning the connection count is 20, and since it contains a lowercase letter, I know the connection is not biological.


Followup

I realized this the next day when I started coding the CC into Behold, that I needed to distinguish biological from non-biological connections as well, e.g. for adoptions, as I do in my BGRN notation.

So add these to the list of possible codes:

  • f = father, non-biological but legal
  • m = mother, non-biological but legal
  • p = parent of unknown sex, non-biological but legal
  • s = son, non-biological but legal
  • d = daughter, non-biological but legal
  • c = child of unknown sex, non-biological but legal
  • b = brother, non-biological but legal
  • z = sister, non-biological but legal
  • g = sibling of unknown sex, non-biological but legal

I originally was going to use “L” for sibling, until I realized that the lowercase “l” is difficult to distinguish as an “L” and is difficult to notice. Of course we would prefer “S” for sibling, but “S” is taken for son, so “G” and “g” will have to do.

More soon.

MyHeritage’s Record Matches and Discovery Settings - Sun, 23 Mar 2025

One of the best reasons to have your family tree on one of the major online sites MyHeritage, Ancestry or FamilySearch, is because of their record matching systems. They will go out and find records for the individuals in your tree from among the billions of records in their collections and make them available to you.

I’m not saying you shouldn’t go and specifically seek out records that you need. Not all records are online, and there are a lot of records that are not indexed or transcribed incorrectly so that no automated system will find for you. But what you get from record matches are all the straightforward records that you’d never be able to find for yourself in your lifetime. These record matches will also include some record collections that you never would have thought to look at. Automated record matching is such an innovation, that I would go so far as to say it has revolutionized genealogy.


Record Matches on MyHeritage

I last used a desktop program to record my genealogy in 1994. The program I was using at the time was Reunion for Windows. I had at that point connected 1361 people to my family. Over the next 20 years, I worked on my program Behold, expecting it to become the program I would use to be the program I would use to record my family.

But in 2017, the new trend of online record collections and record matching systems became an irresistible draw for me. I opened an account at MyHeritage, initially loaded my tree using a GEDCOM file, and started working on my tree there. Over the next six years, I was fortunate to obtain hundreds of Romanian and Russian records about my relatives in the 1800’s and greatly expanded my ancestry. Meanwhile, the MyHeritage record matches were so valuable to help me work down from my ancestors and discover 3rd, 4th and 5th cousins that I would never have found otherwise.

Currently, I have 8 trees at MyHeritage. My main tree is for my and my wife’s families as well as a place-to-place study and now contains 15,353 people. I have 7 other trees for in-laws and friends that have another 3,339 people. Here’s a bit of the history that I kept about the growth of my family trees at MyHeritage and the number of Record Matches that I’ve confirmed and added to my tree.

image

With MyHeritage, I try to keep up to all the record match suggestions it supplies. Every few weeks, I might get several hundred to several thousand new record matches to look through and process by either confirming or rejecting each one.

The source to person ratio is a metric that I remember Randy Seaver blogging about. In 2019 Randy reported a ratio of 1.94 from his RootsMagic program. I wonder how much he has raised his ratio since then. It’s important to try to ensure your tree has supporting records, and the ratio is a good way to check that you’re going in the right direction. I attribute the growth of my ratio primarily due to the growth of new record collections at MyHeritage.


MyHeritage’s Discovery Settings

When you look at your record matches at MyHeritage, there is a small “cog” on the page that you may never have noticed.

SNAGHTML1862d60

If you click on the “cog”, it brings up this window:

image

I don’t know when MyHeritage added this feature. I didn’t notice that the “cog” was there until sometime last year. The Introduction to Record Matches article by MyHeritage describes the “Confidence Scores” in more detail. The Discovery Settings allow you to set the minimum level of confidence of the record matches that you are given. You can also select only structured records, or only text records if you want.

The confidence level has 11 settings. From zero (lowest confidence) with increments of  0.5 up to 5 (highest confidence). The default is 0.5 which includes almost all the record Matches that are found. Increasing the Confidence Level will reduce the number of Record Matches you are presented with.

I tested most of the confidence levels on my own matches. Below are the numbers of record matches that MyHeritage gives me at each confidence setting. MH sometimes refers to a “Collection” as a “Source”.

SNAGHTML105a5cf

MyHeritage currently has 7466 different collections. Only 485 of those give me record matches at confidence setting 0 or higher.  Confidence setting 5 only gives me record matches from 107 collections.

These are the MyHeritage collections that give me the most record matches at confidence level 0:

image

Note that MyHeritage classifies Geni, FamilySearch and Filae matches as structured record matches. These are actually family tree sites, and genealogists know that other people’s family trees (especially unsourced ones) are not to be trusted until verified with records.

MyHeritage also finds matches with its own member’s family trees. It keeps those separate into what it calls “Smart Matches”. I have  23,678 smart matches with 5,124 trees at MyHeritage. Those are great for clues and really help put families together. MyHeritage is correct not to call those “records”. But IMHO I think MyHeritage should change Geni, FamilySearch and Filae matches to also be smart matches.


Confirming and Rejecting Record Matches

When I learned about the Discovery Settings last year, I changed my confidence setting from 0.5 to 0 and learned that I had 12,000 more matches to review. I wondered if anything with zero confidence was worthwhile spending the time on.

So I spent a few weeks reviewing 6054 of these matches. I confirmed only 2558 of them (42%) and rejected the rest (58%).

Technically, a confirmation of a match should mean that this record does indeed pertain to the person that the match says it pertains to. A rejection should mean that it is not the correct person. What I do is also reject records for the correct person if they are duplicates of another record. I only keep one. For example, the same newspaper article might be printed on consecutive days or in different newspapers.

The one person in my tree who has the most record matches is the astronaut Judith Resnik who was killed in the 1986 explosion of the Space Shuttle Challenger. Needless to say, there were many articles written and the initial articles were reprinted in newspapers everywhere. I have 1208 record matches for Judith and only confirmed 30 of them (2%), rejecting the rest (98%).

So I was wondering if the confidence levels that MyHeritage assigns to records correspond somewhat to the percentage that I have rejected. I downloaded some statistics from my matches and came up with this table:

image

What this says is that MyHeritage does quite well in providing relevant structured record matches, requiring rejection less than 20% of the time with a confidence setting of 0.5 or more. The Free Text records however are much less reliable, with confidence level 0 almost always being rejected.


Conclusion

There’s nothing like MyHeritage record matches to provide you with facts and sources you need to expand your family tree. I enjoy spending the the hours necessary to review new matches whenever they appear for me. It definitely is worth your time doing this!

If you haven’t already, check out the MyHeritage Discovery Settings. You may want to increase the confidence level or only work with structured records if you have too many pending matches. Or you may want to try decreasing the confidence level to 0 and see all the suggestions that MyHeritage has to offer. The choice is yours.

Using AI Bots for Programming - Sun, 16 Mar 2025

I’m working hard on Version 2 of Behold, and I needed to write some code to strip out HTML from a string. This would be so Behold could display the text embedded in the HTML correctly without formatting.

Since I’m a Windows and Office 360 aficionado, I get a version of Microsoft’s AI bot named Copilot built into my internet browser Edge and my Office products. So that’s the one most convenient for me to use. And I have had good luck asking Copilot programming questions to help me with small tasks in my programming language Delphi in the past.

But this time, this was a task I really didn’t want to do myself. I saw the solution as scanning through the string character by character and parsing out what was html to remove it. Not too difficult, but a bit of a pain to code. So I thought I’d let Copilot try.


Query Number 1

image

uses
  System.RegularExpressions;

function RemoveHTML(const Input: string): string;
var
  RegEx: TRegEx;
begin
  // Create a regular expression to match HTML tags
  RegEx := TRegEx.Create(’<[^>]+>’);
  // Replace all HTML tags with an empty string
  Result := RegEx.Replace(Input, ”);
end;

// Example usage:
var
  Original, Cleaned: string;
begin
  Original := ‘<html><body>Hello, <b>World</b>!</body></html>’;
  Cleaned := RemoveHTML(Original);
  WriteLn(Cleaned);  // Output will be "Hello, World!"
end.

image

Well that was a surprise to me. A very simple two line function. It uses a mathematical notation called a “regular expression” to find the HTML bits. In all my programming days, I’ve never got to understand them well as I find that once you write one of those regular expressions, such as the “<[^>]+>” that Copilot created, then it is almost impossible to reverse engineer what it does and how it works.

I was a great fan of the APL programming language, and it had the same feature of being impossible to figure out after the fact.

Nonetheless, Copilot created me some code that I could use even if I didn’t fully understand how it worked.


Query Number 2

I wanted a slight modification to that. HTML can include urls in its HREF values, and that was information I ddn’t want thrown out, so I then asked Copilot:

image

uses
  System.RegularExpressions;

function ReplaceHrefTags(const Input: string): string;
var
  RegExHref, RegExTags: TRegEx;
begin
  // Regular expression to match href tags and extract href value
  RegExHref := TRegEx.Create(’<as+[^>]*href="([^"]+)"[^>]*>’);
  // Replace href tags with just their href values
  Result := RegExHref.Replace(Input, ‘$1′);

  // Regular expression to match all other HTML tags
  RegExTags := TRegEx.Create(’<[^>]+>’);
  // Remove other HTML tags
  Result := RegExTags.Replace(Result, ”);
end;

// Example usage:
var
  Original, Processed: string;
begin
  Original := ‘<html><body>Visit <a href="http://example.com">this link</a> for more information.</body></html>’;
  Processed := ReplaceHrefTags(Original);
  WriteLn(Processed); // Output will be: "Visit http://example.com for more information."
end.

image

Excellent!  Just one more thing.


Query Number 3

image

uses
  System.RegularExpressions;

function ProcessHTML(const Input: string): string;
var
  RegExHref, RegExTags: TRegEx;
begin
  // Regular expression to match <a> tags with href attributes and capture both href value and inner text
  RegExHref := TRegEx.Create(’<as+[^>]*href="([^"]+)"[^>]*>(.*?)</a>’);
  // Replace <a> tags with "href value (inner text)"
  Result := RegExHref.Replace(Input, ‘$1 ($2)’);

  // Regular expression to match all other HTML tags
  RegExTags := TRegEx.Create(’<[^>]+>’);
  // Remove remaining HTML tags
  Result := RegExTags.Replace(Result, ”);
end;

// Example usage:
var
  Original, Processed: string;
begin
  Original := ‘<html><body>Visit <a href="http://example.com">this link</a> for more information.</body></html>’;
  Processed := ProcessHTML(Original);
  WriteLn(Processed); // Output will be: "Visit http://example.com (this link) for more information."
end.

image

I am rather impressed. I ended up with the code that I wanted after only 3 queries to Copilot. And it provided me with the code, an example, and an explanation of how it worked.

I know it would have taken me considerably more than 5 minutes to write that query myself without the use of regular expressions.

Thank you Copilot! Maybe you can’t completely translate my Russian documents for me yet, but it looks like you could be very useful to me for my programming.


Just One More Thing

While testing this code on some of my data, I found one more revision was needed:

image

uses
  System.RegularExpressions;

function ProcessHTML(const Input: string): string;
var
  RegExHref, RegExTags: TRegEx;
begin
  // Regular expression to match <a> tags with href attributes and capture both href value and inner text
  RegExHref := TRegEx.Create(’<as+[^>]*href="([^"]+)"[^>]*>(.*?)</a>’);
 
  // Replace <a> tags based on the condition
  Result := RegExHref.Replace(Input,
    function(const Match: TMatch): string
    begin
      if Match.Groups[1].Value = Match.Groups[2].Value then
        Result := Match.Groups[1].Value  // Include only href value
      else
        Result := Match.Groups[1].Value + ‘ (’ + Match.Groups[2].Value + ‘)’;  // Include href value and text
    end
  );

  // Regular expression to match and remove all other HTML tags
  RegExTags := TRegEx.Create(’<[^>]+>’);
  Result := RegExTags.Replace(Result, ”);
end;

// Example usage:
var
  Original, Processed: string;
begin
  Original := ‘<html><body>Check out <a href="http://example.com">http://example.com</a> and <a href="http://another.com">this link</a>.</body></html>’;
  Processed := ProcessHTML(Original);
  WriteLn(Processed); // Output will be: "Check out http://example.com and http://another.com (this link)."
end.

image

But this time there was a problem. The code would not compile.

Would Copilot be able to fix it?

image

uses
  System.RegularExpressions;

function ReplaceHrefTags(const Input: string): string;
var
  RegExHref, RegExTags: TRegEx;
  Matches: TMatchCollection;
  Match: TMatch;
  Processed: string;
begin
  Processed := Input;

  // Regular expression to match <a> tags with href attributes and capture both href value and inner text
  RegExHref := TRegEx.Create(’<as+[^>]*href="([^"]+)"[^>]*>(.*?)</a>’);
  Matches := RegExHref.Matches(Processed);

  // Process each <a> tag match
  for Match in Matches do
  begin
    if Match.Groups[1].Value = Match.Groups[2].Value then
      Processed := StringReplace(Processed, Match.Value, Match.Groups[1].Value, [rfReplaceAll])
    else
      Processed := StringReplace(Processed, Match.Value, Match.Groups[1].Value + ‘ (’ + Match.Groups[2].Value + ‘)’, [rfReplaceAll]);
  end;

  // Regular expression to match all other HTML tags
  RegExTags := TRegEx.Create(’<[^>]+>’);
  // Remove all remaining HTML tags
  Result := RegExTags.Replace(Processed, ”);
end;

// Example usage:
var
  Original, Processed: string;
begin
  Original := ‘<html><body>Check out <a href="http://example.com">http://example.com</a> and <a href="http://another.com">this link</a>.</body></html>’;
  Processed := ReplaceHrefTags(Original);
  WriteLn(Processed); // Output will be: "Check out http://example.com and http://another.com (this link)."
end.

image

And that worked!

Very well done Copilot.