Louis Kessler's Behold Blog

DMT 3.1 Released - Wed, 16 Oct 2019

I’m working on a series of articles to show how I am using the new 3.0 version of Double Match Triangulator to analyze my own segment match data.

As much as I’d like you to believe that I’ve developed DMT for the good of genetic genealogists everywhere, I humbly admit that I actually developed it so that I could analyze my own DNA to help me figure out how some of my DNA matches might be related.

Of course, as I started working on my articles, first looking at my 23andMe matches, I found some problems in my new 3.0.1 version, and a few places I could make enhancements.

If you downloaded version 3.0 that was released on Oct 1, or 3.0.1 on Oct 3, please upgrade to 3.1 whenever you can.

If you are a member of the Genetic Genealogy Tips and Tricks group on Facebook, or the DNA Painter User Group on Facebook, my free trial key I gave there is still valid and will let you run the full version of DMT until the end of October. Just look for my post on Oct 2 on either group for the key.

Some of the enhancements in Version 3.1 include:

The grandparent extension algorithm: I found a few extra extensions where they should not be. So I completely changed the algorithm to one that was clearer and easier for me to verify that it is working properly. The final results are similar to the old algorithm, but they’re significant enough to make a noticeable effect on the grandparent assignments.
Small improvements were made to the determination of triangulation boundaries.
Internally, I increased the minimum overlap DMT uses from 1.0 Mbp to 1.5 Mbp. This was to prevent some incorrect overlaps between two segments when there was a bit of random matching at the overlapping ends of the segment.

And there were a few bug fixes:

If you run 32-bit Windows, then the DMT installer installs the 32-bit version of DMT rather than the 64-bit version. The 32-bit version had a major bug that crashes it when writing the People file. Nobody complained to me about this, so I guess most of you out there are running 64-bit Windows. Maybe one day in the not-too-distant future, I’ll only need to distribute just the 64-bit version of DMT.
In the Combine All Files run, a few matches were not being assigned an AC Consensus when they should have been. Also a few assignments of AC No Parents was made when there was a parent.
23andMe FIA match files downloaded using DNAGedcom were not being input correctly.

Now back to see what DMT can do for me.

The GEDCOM 5.5.5 Initiative and Making It Work - Sun, 6 Oct 2019

It’s been 35 years since GEDCOM 1.0 was released to the genealogical software development community in 1984.

It’s been 20 years since GEDCOM 5.5.1, the last official specification by FamilySearch was released on October 2, 1999.

Five days ago, October 2, 2019, the gedcom.org website was renewed containing the newly-released GEDCOM 5.5.5 specifications.

GEDCOM was originally an acronym for GEnealogical Data COMmunication. It has been the standard that genealogical software has been using for the past 35 years. It specifies how genealogical data should be exported to a text file so the data can be preserved in a vendor-neutral form (separate from their proprietary databases) in a format that other programs will be able to import.

For 15 years, between 1984 and 1999, the GEDCOM standard was developed and made available by the Church of Jesus Christ of the Latter Day Saints (LDS). They had a team in place that had discussions with many genealogical software vendors. They prepared the standard by taking all the ideas and figuring out how computer programs of the day could and should transfer data between programs.

They were very successful. There have been hundreds of genealogy programs developed since then, and just about every single one of them adopted the standard and can import GEDCOM and export GEDCOM. I don’t know of too many standards that gained nearly 100% acceptance. To me, that is a very successful adoption of a standard in any field.

Why was GEDCOM successful by the LDS? My take on that is because of the way they operated.

1. They had a team in place to develop the standard.
2. They sent drafts to developers and solicited feedback.
3. The team evaluated all the feedback and suggested possible implementations, and evaluated conflicting ideas,
4. And most importantly: one person acted as editor and made the decisions. I believe that may have been Bill Harten, who has been called the “Father of GEDCOM”.

What Happened in 1999?

In 1999, the LDS decided it no longer was going to continue the development of GEDCOM and it disbanded the team. The last GEDCOM version the LDS issued was 5.5.1, which was still labeled beta, but was definitely the de facto standard because the LDS itself used it and not the 5.5 version, in their very own Personal Ancestral File (PAF) software.

The standard was very good, but not perfect. Each software developer used GEDCOM basics for important items like names, linkages between children and families, families and parents, birth, marriage, death dates and places.

GEDCOM had more than that. Way more. It had lots of different events and facts. It had links to multimedia. It had sources, repositories and source references. Technically it had everything you needed to properly source and reference your genealogical material. The funny thing was that it was ahead of its time.

Genealogical programs back then and genealogists in particular did not have the understanding of the need to document our sources. We were all name collectors back then. C’mon admit it. We were. We only learned years later our folly and the importance of sources.

So in the next 10 years after 1999, software developers starting adding source documentation to their programs. And in doing so, most of them ignored or only loosely used the sourcing standards that GEDCOM included. They did so because no one else was exporting sources with GEDCOM. What happened is that each of them developed their own unique sourcing structure in their programs and often didn’t export that to GEDCOM, or if they did, they only used some of GEDCOM and developed their own custom tags for the rest, which none of the other programs would be able to understand.

This continued to other program features as well. For example, adding witness information or making a place a top level record was not something the last version of GEDCOM had, so developers, and even groups of developers came out with extensions to GEDCOM (such as Gedcom 5.5EL).

The result: Source data and a lot of other data did not transfer between programs, even though almost all of them were exporting to and importing from what should have been the very same GEDCOM.

Two Attempted Fixes

About 2010, a BetterGEDCOM grassroots initiative was started by Pat Richley-Erickson (aka Dear Myrtle) and a number of others. For 10 years, too much data, and especially source data, was not transferring between programs.

At the time I wrote in a blog post about BetterGEDCOM:

“The discussion is overwhelming. To be honest, I don’t see how it is going to come together. There are a lot of very smart people there and several expert programmers and genealogists who seem to be having a great time just enjoying the act of discussing and dissecting all the parts. … I really hope they come back to earth and accomplish something. I’ve suggested that they concentrate on developing a formal document.”

There was a very large amount of excellent discussion about where GEDCOM should go and what was wrong with it and what should be fixed. But nobody could agree on anything.

The reason in my opinion why this didn’t work out: Because the discussion took over, similar to a Facebook group where everyone had their own opinion and no one would compromise.

The one thing missing was an editor. Someone who would take all the various ideas and make the decision as to the way to go.

After a few years, the BetterGEDCOM group realized it wasn’t getting anywhere. Their solution was to create a formal organization with by-laws and a Board of Directors to spearhead the new initiative. It was called FHISO and they created the website at fhiso.org. They obtained the support of many software vendors and genealogical organizations.

On March 22, 2013, FHISO initiated their standards development process with an Open Call for Papers. Scores of papers were submitted, two by myself.

Then discussion started and continued and continued and continued. There was little agreement on anything. They had excellent technical people involved, but the nature of the discussion was often too technical for even me to understand and got involved in externalities far beyond what GEDCOM needed and spent months on items that were more academic in nature than practical.

FHISO had hoped to come out with an Extended Legacy Format (ELF) which would update the 5.5.1 standard. They were working on a Serialization Format (which GEDCOM already effectively had) and a Data Model. It’s the data model that is what is wanted and is most important. But to date, after over 6 years, all they have is a document defining Date, Age and Time Microformats.

Horrors, that document is 52 pages long and is a magnificent piece of work, if you wanted to submit a PhD thesis. Extrapolating that out to the rest of GEDCOM, we’d likely be looking at a 20 year development time for a 50,000 page document which no programmer would be able use as a standard.

We Need Something Practical – GEDCOM 5.5.5

Genealogy technology expert Tamura Jones on his Modern Software Experience website has for years been an in-depth critical reviewer of genealogical software and the use of GEDCOM. I have greatly benefited from several of his reviews of my Behold software which sparked me to make improvements.

I had the pleasure of meeting Tamura when I was invited to speak at the Gaenovium one day technology conference in his home town of Leiden Netherlands in 2014. I then gave the talk “Reading wrong GEDCOM right”. Over the years Tamura and I have had many great discussions about GEDCOM, not always agreeing on everything.

In May of last year (2018), on his own initiative, Tamura created an annotated version of the GEDCOM 5.5.1 specification that was very needed. It arose from his many articles about GEDCOM issues, with solutions and best practices. Prior to release, he had a draft of the document reviewed by myself and six other technical reviewers, and incorporated all of our ideas that made sense with respect to the document. Many of the comments included thoughts about turning the annotated edition into a new version of GEDCOM.

So I was surprised and more than pleased when a few months ago, Tamura emailed me and wanted reviewers for what would be a release of GEDCOM 5.5.5.

The goal was to fix what was not right in 5.5.1, remove no longer needed or used constructs, ensure there was just one way to do anything, fix examples and clear up everything that was unclear. This version 5.5.5 was not to have anything new. Only obsolete, deprecated, duplicate, unnecessary and failed stuff was to be taken out.

The 5.5.5 result was published on October 2, exactly 20 years after 5.5.1 was released. You can find GEDCOM 5.5.5 on the gedcom.org site.

For a description of what was done, see the Press Release.

Why GEDCOM 5.5.5 Works and What’s Coming

GEDCOM 5.5.5 is based on GEDCOM 5.5.1. It included the thoughts and ideas of a number of genealogy software developers and experts, just like the original GEDCOM did. It is practical as it is a repair of what has aged in 5.5.1.

The reason why this document has come out was because it was done the way the LDS originally did it. A draft was sent to experts. Each sent back comments, opinions, criticisms and suggestions back to the editor, who then compared the ideas and decided what would work and what wouldn’t.

I know I spent dozens of hours reviewing the several drafts that were sent to me. I sent back what was surely a couple hundred comments on every piece of the document. I didn’t agree with the document on all items. In the end, some of my arguments were valid and the item was changed. But some were not and I appreciate those decisions, because other reviewers may feel different than me and some final decision must be made. I’m glad Tamura is the one willing to make the decision as I very much respect his judgement. I also greatly thank him for the thousands of hours I know he must have put into this.

What made this happen?

1. There was a standard that had already been annotated to start with.
2. Drafts were sent to developers and feedback was received.
3. Tamura reviewed all the feedback and considered all possible implementations, and evaluated conflicting ideas,
4. And most importantly: Tamura acted as editor and made the decisions. He is not selling software himself, so is unbiased in that respect.

I will support and promote this initiative. I will be adding GEDCOM 5.5.5 support into future versions of Behold. I encourage other genealogical software developers to do so as well.

This is only a first step. There are enhancements to GEDCOM that many vendors want. These ideas need to be compared to each other, and decisions need to be made as to how they would best be implemented in a GEDCOM 5.7 or 6.0. A similar structure with one respected editor vetting the ideas is likely the best way to make this happen. And as long as the vendors are willing to offer their ideas and compromise on a solution, Tamura will be willing to act as editor to resolve the disputes. I would expect this would give us the best chance of ending up with a new GEDCOM standard that is clear, easy for genealogy software programmers to implement, enable all our genealogical information to transfer between programs, and be something that genealogy software developers would agree to use.

5 Comments

It Took a Lot of Effort to Get to DMT 3.0 - Sat, 5 Oct 2019

I’m cleaning up my directories after over 3 years of development of Double Match Triangulator.

When I released Version 1 of DMT back in August 2016, I had it write information about the run to a log file, primarily so that I could debug what was going on. But I realized it is useful for the user as it could contain error messages about the input files and statistics about the matches that the user could refer to. Also, if DMT wasn’t working right, I could be sent the log file and it would be a great help in debugging the problem.

I have not deleted any of my log files since Version 1.0. I want to delete them now because there are a lot of them and they are taking up space and resources on my computer. How much? Well, I’ve got 9,078 log files totaling 421 MB of space.

There were 1,159 days since I started accumulating log files. I have at least one log file on 533 of those days, so I worked on DMT 46% of the days, i.e. over 3 days a week, averaging 17 runs each day I was working on it. These are almost all development runs, where I’m testing code in DMT and making sure everything is working correctly. The maximum in any day was 89 runs. After every run, I’d check to see what worked and what didn’t, go back to my program to fix what didn’t work and recompile and run a test again. If everything worked, then I’d go onto the next change I needed to make.

So I’m going to have a bit of fun and do some statistics about my work over the past three years.

Here’s a plot of my daily runs:

Here’s my distribution of working times:

You can see I don’t like starting too early in the morning, usually not before 9 a.m., but I’m on a roll by noon and over 10% of my runs were from noon to 1 p.m.

I relax in the afternoon. In the summer if it’s nice, I’m outside for a bike ride or a swim. In the winter I often go for a walk if it’s not too cold (e.g. –30 C).

Then you can see my second wind starting at 9 p.m with a good number of late nights to 1 a.m. You can also see the occasional night where I literally dream up something at 4 a.m. and run to my office to write it down and maybe even turn on my computer to try it out.

The days of the week are pretty spread out. I’m actually surprised that Fridays are so much lower than the other days. Not too sure why.

So that just represents the time I spent on DMT over the past 3 years. It doesn’t include time I spent working on Behold, updating my website, writing blog posts, answering emails, being on social sites, maintaining GenSoftReviews, working on my family tree, deciphering my DNA results, going to conferences, watching online seminars, vacations, reading the paper each morning, watching TV, following tennis, football and hockey, eating breakfast, lunch and supper each day, cleaning the dishes, doing house errands, buying groceries, and still having time to spend with family.

Wow. It’s 1:11 a.m. I’m posting this and going to bed!

Louis Kessler’s Behold Blog

DMT 3.1 Released - Wed, 16 Oct 2019

The GEDCOM 5.5.5 Initiative and Making It Work - Sun, 6 Oct 2019

What Happened in 1999?

Two Attempted Fixes

We Need Something Practical – GEDCOM 5.5.5

Why GEDCOM 5.5.5 Works and What’s Coming

It Took a Lot of Effort to Get to DMT 3.0 - Sat, 5 Oct 2019

Older Entries Newer entries

Search the Blog & Forum

Latest Posts RSS (?)

Latest Comments RSS

Archives

Awards and Reviews

I Geneablog

RootsTech 2017

UTP 10 - 2016
UTP 3 - 2013

Gaenovium 2014

RootsTech 2014

Professional Genealogist

About Me

Also Find Me At...

Content Protected By:

Louis Kessler’s Behold Blog

DMT 3.1 Released - Wed, 16 Oct 2019

The GEDCOM 5.5.5 Initiative and Making It Work - Sun, 6 Oct 2019

What Happened in 1999?

Two Attempted Fixes

We Need Something Practical – GEDCOM 5.5.5

Why GEDCOM 5.5.5 Works and What’s Coming

It Took a Lot of Effort to Get to DMT 3.0 - Sat, 5 Oct 2019

Older Entries Newer entries

Search the Blog & Forum

Latest Posts RSS (?)

Latest Comments RSS

Archives

Awards and Reviews

I Geneablog

RootsTech 2017

UTP 10 - 2016UTP 3 - 2013

Gaenovium 2014

RootsTech 2014

Professional Genealogist

About Me

Also Find Me At...

Content Protected By:

UTP 10 - 2016
UTP 3 - 2013