Login to participate
  
Register   Lost ID/password?

Louis Kessler’s Behold Blog

New MyHeritage DNA Filtering System - Thu, 28 Jun 2018

Today, MyHeritage announced a new feature – a brand new filtering system for DNA matches. They describe it here on the MyHeritage Blog. They say they are rolling out the feature gradually, so you may not see it yet.

I can see the new features and I thought I’d give it a run-through.

image

They now show me having 1 close family (my uncle), 148 extended family (I don’t know how I’m related to any of them) which go up to what they say are 2nd to 5th cousins with a minimum of 53.3 cM, and then 5,711 distant relatives which are said to be at 3rd to 5th cousins and further (and I again don’t know how I’m related to any of them). The first listed distant cousin shares 77.7 cM. The 5,711th shares only 1 segment that is 12 cM.

They give me locations where my DNA matches live:

image

Other than Israel in 3rd spot, all the other countries you see above are listed in almost the same order as the picture shown on MyHeritage’s blog post. That seems to indicate to me that it may more be the distribution of MyHeritage’s test takers, and my matches follow that (with the Israel thrown in). I have to go way down the list to get to Ukraine 5 and Romania 1 which is where my ancestors actually come from.

My ethnicities are the same as previously. Ashkenazi 83.8%, North Africa 5.8%, South Europe (Iberia) 4.5%, East Europe 3.8%, Middle East 1.1% and Eskimo/Inuit 1.0% – yes, I’ve still got that Inuit in me it appears. It’s cold in Winnipeg in the winter, brrr.  I only consider the Ashkenazi and East Europe correct as I should be close to 100% Ashkenazi. 23andMe has me at 99.2% and Ancestry DNA’s latest update put me at 98%.

But what’s new is they give the ethnicities of my matches, with my ethnicity percentages to the left for comparison:

image

Why I have 671 matches with North and West European people and 735 matches with Irish, Scottish and Welsh people is beyond me. But I only have 5,860 matches in total and that matches shown there total way more than that, so some of my matches must count as more than one ethnicity. The 5,476 Ashkenazi out of my 5,860 does give 93.4% and that’s not too bad.

So what’s their new filtering system? Well, it is based on these charts. You can click on any item, and it will take you to a page where it will show you only the people who corresponding to the item. So you can click on the first chart to get either your close family, extended family, or distant relatives. You can click on a line in the second chart to get your matches who live in a certain country. And you can click on a line in the third chart to get your matches who have a certain ethnicity.

I’ve got 5.8% North African ethnicity according to MyHeritage. If you go way down the ethnicity match list, you’ll get to the 10 people with North African Ethnicity that I match to. I click on that and bring up the list and see that they would all be classified as distant relatives as the closest is only said to be a 3rd to 5th cousin. When I review the 2nd of the 10 matches, it is a person with a Jewish name living in Netherlands, and all his ancestral surnames are Jewish, so I wonder how he became a North African. When I compare myself with him, I can see the ethnicities and I see this:

image

Well, MyHeritage says he’s got 13.1% North African ethnicity, so I guess that’s enough for him to be a North African match to me. He is a match for me for Ashkenazi and for Yemenite Jewish, but not for Middle Eastern which is 6.1%. So MyHeritage must have used something like 10% as the minimum ethnicity a match must have to be considered an ethnic match. 

And what about my 1% Eskimo/Inuit?  Nope. Nobody that matches me has at least 10% Eskimo/Inuit in them.

MyHeritage has also added these 3 filters (relationships, locations, ethnicities) to their DNA Matches page.

In my case, these 3 filters don’t help me much, but they may do better for you.

The filter on their DNA Matches page that is potentially the most useful to me is their “Has shared surname” filter which is under the “All tree details” dropdown:

image

Using this filter cuts down my list to just 46 matches. I manually checked all of them and cannot find a surname/place connection.

The “Has Smart Matches” filter could be useful for me one day. I do have my DNA and my uncle’s DNA connected to my tree at MyHeritage, but it has not found any DNA SmartMatches for me yet.

A filter that MyHeritage does not have but would be appreciated would be an ancestral birthplace filter, matching the towns my ancestors were born in with the towns the ancestors of my matches were born in. Surnames were only adopted by the Jewish people in Eastern Europe in the early 1800’s so connecting with 5th cousins or more by surname is usually not possible for us. But connecting by birth town, when our DNA has a match, might have more promise.

Followup: June 29.  Only a few hours after I published my post, I received a very nice unexpected email from Gilad Japhet, Founder & CEO of MyHeritage, with his comments along with some clarifications and explanations for me.

He explained that my speculation that the countries of my matches reflect the distribution of MyHeritage DNA test takers is incorrect. He said that about half of MyHeritage’s test takers are living in European countries. It is more likely that the example on their blog post was of a Jewish person which is why my countries almost match theirs. And he is correct in saying that I have few matches from Ukraine and Romania because very few Jews still live there, with most of the descendants now being in the USA, Israel, etc., like me in Canada.

Gilad pointed out that my speculation that 10% is used as a minimum ethnicity match and that people will be listed under multiple ethnicities is stated right in the MyHeritage blog post. Whoops. In the future I’ll try to read announcements more carefully so that there would be no need for me to speculate (bad on me!)

He did admit that my 1% Inuit might be a false positive.

Thank you Gilad for those clarifications. Please continue to innovate. Everybody benefits.

Another attempt with Genome Mate Pro - Sat, 9 Jun 2018

Genome Mate Pro (GMP) is a program written by Rebecca Walker and is designed to organize all your DNA matches and match information. This is not a simple program to use, but it supposedly does a lot for you.

image

I have DNA tested now at FamilyTreeDNA (autosomal, BigY-500, mt-full), 23andMe, MyHeritage DNA and Ancestry DNA. I am awaiting my test kit from Living DNA. I have also tested my uncle at Family Tree DNA (autosomal, Y-111, mt) and transferred his raw data to MyHeritage DNA. I have also transferred both my uncle and my raw data to GEDmatch, and my 23andMe raw data to GEDmatch Genesis.

I currently keep all my matches organized in spreadsheets. This works okay, and I can do lots of analysis with it, sorting and summarizing and adding notes. But this makes it difficult to then periodically add new matches to the spreadsheet.

GMP is supposed to be an alternative to this. I’m interested in seeing what organizational help it could give, and what information it gives for in common with groupings, triangulations, and chromosome mapping.

I’ve tried GMP twice before, but it is quite complex to setup and use. And I ran into problems both times and at some point aborted each attempt. Now I’m a programmer and I was even having problems, so don’t worry, it’s not just you.

Leah Larkin on her The DNA Geek blog has started a multi-part Tutorial on Getting Started with Genome Mate Pro.  I’m going to try to go through all Leah’s steps and will let you know here in this blog post how it goes.

Before starting, I’m going to uninstall the earlier version 2016r09a of GMP that I’ve got on my Windows computer. Well the nice thing is that the program uninstalled in about a tenth of a second. It did not remove the data files in the Genome Mate Pro directory in my Documents folder, so I did that myself.

Then lets get started. Click on each title to see Leah’s blog post with the steps.

 

Part 1 – Install the Program

I downloaded the GMP User Guide V2018-05-28.pdf file. It’s 13.6 MB and 300 pages long.  I feel for Rebecca Walker. Just writing the user guide must have taken her a year.

I downloaded the Windows GMP 64 Bit Setup.zip. It’s an 11.3 MB download. Leah has a Mac so for Windows people, she instructs to go to page 11 of the User Guide and follow those instructions, which I did.  The install was a standard Windows install and was clean and quick. The creation of the empty GMPDatabase.sqlite took about 5 seconds to create a 0 byte database.

The successful install message tells you that the program is complicated:
image

 

Part 2 — Set Up a Profile for a GEDmatch User

I created profiles for myself and my uncle. I added our GEDmatch keys.

So far so good.

 

Part 3 – Activate GEDmatch Import Templates

Before each import, Leah recommends backing up, so if you mess up, you can simply go to the previous version. She doesn’t explain how but the File menu allows you to Backup or Restore.  I tried a backup and it took GMP about 5 seconds to backup to a small 160 KB file, which would be simply the profile information that I added in Part 2. 

image

Next, we click on the “Import Data” menu item. Leah explains that nothing will happen since it is a brand new GMP database and no templates are “on”. Yikes! Things like that make a program confusing to use. But I won’t harp on these user interface items any more. I want to load some data.

I activate the GEDmatch Tier 1 and GEDmatch Regular data templates and Leah doesn’t describe the “next notice” which brings up the following window which we are to click OK to:

image

Hmm. It says to use the “Chrome Browser”. That’s likely important and a pain to people who use Edge, IE or Firefox. I’m an Edge user myself whenever I can be but I have the other browsers installed for when I need them.

 

Part 4 – App Settings

Leah explains how to set the settings. I will leave them all default for now. I notice that under the criteria for Data Imports, the “Add Possible False Positives” was not checked in my settings as Leah showed it. I tried clicking on the “Reset Defaults” button and that setting became checked.

I did notice one option:  “Use In-Memory Journaling for Performance”. When I hover over it, in the status bar line, it shows: “Journal database entries in-memory to improve performance. Backup before using!”.
image

One of my problems I had with GMP when I tried it before was that the performance was so bad on my files. So I am checking this option. When I do so it gives this message window:

image

I clicked Yes which is the default (you can tell because Yes is highlighted). This was followed by another window:

image

And I clicked Yes again.

Leah changed her Min cM for Chromosome Browser Display to 15 cM, but I’m going to leave mine at the 7 cM default for now, so that I’ll be able to see all the segments that GMP imports.

 

Part 5: First GEDmatch Imports

To avoid possible problems, I started GEDmatch using Google Chrome rather than Microsoft Edge which I usually use. It took about 3 minutes to load my 2,000 list of matching relatives:
image 

Inspecting the Log File which is given the filename: GMP_Import_logfile.csv which loads into Excel when you click “Show Log File”, I can see the GEDmatch IDs of the 7 people with blank names that GEDmatch did not add. It’s too bad those aren’t added, maybe by using the GEDmatch ID as the name. They have valid data and an email address that I can contact if any of them have worthwhile matches to follow up on.  So I saved the log file so that I’ll have those IDs in case I may want to manually load them later (If Leah shows how).

I’m not sure why the log report above shows “Updated  1” since Leah’s example doesn’t show it and the log file does not say anything about any updated records.

I opened the Relative List for GEDmatch as Leah suggested. It doesn’t seem to show anywhere the number of people in the list which would be useful so that I could check that 1,993 people have been loaded.

The “Side” filter has a few interesting selections:

image

So that told me what I can set the Sides to, i.e.: Maternal, Paternal, Both, Unknown (which they’re all defaulted to), and then Maternal GrandFather/Mother and Paternal GrandFather/Mother. I added the Side (P or M) for the 3 people who I know my relationship to and added a MRCA note indicating who our MRCA’s are using a notation I am planning to use in DMT. 

Other notes:  The sort by Name seems to only sort by first name because the names are as given in GEDmatch which is first name, last name. It would be nice if it could attempt to pick out surnames for you, which would allow sorting by the surname.

There seems to be one dummy line at the bottom of the match list:image

So now for the GEDmatch Tier 1 Matching Segment Search. I follow Leah’s steps. I found out one thing, that if you don’t select the correct import profile, that the import will still run and not tell you that it’s the wrong one. You’ll get a pop-up box at the end that looks like a normal completion box. If you don’t read it carefully, you may not realize that nothing got processed until you look for the data that was loaded and find that it wasn’t there. It would be much better if GMP issued a warning that this might not be the right profile for the input data if it detects few correct lines.

But after that mistake, I ran the right profile and my segment matches loaded:
image

The 2nd phase of the import matching segment search gives a status line saying: “Triangulations Processed xxx of xxxx.”  I wonder what GMP means here by “Triangulations” because it is impossible to compute triangulations from the segment match data that was just copied to GMP. Those are my matches to other people. There is no information in there to say that any other two people match each other on the same segment. Is GMP misusing the word triangulation? I’ll ask Leah this on her blog post.

So now I’ve got segment match data into GMP. I went back and did the same for my uncle’s profile.

Then I tried to set the Side and MRCA notes for my uncle. However, I found GMP uses the same MRCA note for myself and my uncle. My method of denoting the MRCA was not by the ancestor’s name, but it was by the path to the ancestor (some people might want to use ahnentafel number). And the path (or number) is different for every profile person. But GMP does not allow different MRCA notes for different profile persons, which, in my opinion, it should.

I backed up the database. It is now 5.2 MB.

 

Part 6 – The Chromosome Browser

In this episode, Leah gets us to explore the Chromosomes Tab. The first thing I notice is that I looked for the Source selection which for the Relative List is on the left:
image

but for the the Chromosome Browser it is on the right:
image

Switching sides is not good UI and makes things more difficult. GMP should make these consistent. Similarly the Search box is on the right in the Relative List and on the left in the Chromosome Browser.

And the Chromosome Browser allows you to select All Sources. But the Relative List does not. Why not? I do understand that the same person may use different names at the different testing companies, and the different companies don’t provide the same fields of information, but it still would be good to be able to compare the relatives you match to across companies.

And if you click on the Relative List tab (or any other tab), it does not highlight and there is no “Relative List” title in the screen view. There is no way to tell that it is selected if you’re not familiar with the program. It’s little things like this that GMP needs to fix because these are what makes a program confusing and hard to for new users to learn and understand.

But I must say there are some nice things about GMP and it’s Chromosome Browser. It’s nice being able to search surnames and show only the segments who match people with those surnames. When you do that, it does the search for all profiles and all chromosomes. You can click on the person’s name and bring up their details and when you go back it highlights all that person’s segments with a different color. But I’m getting ahead of Leah, so back to her instructions.

Step 6:  Assign a side to some of your segment matches. Well, I can assign my uncle’s matches for now to P (paternal), although there may be a few that are maternal due to endogamy which I can find and fix later. I have about 60 matches with my uncle (again, there is no count in a status line which would be nice) and I don’t see any way, and Leah doesn’t tell me how to change them all to P at once. If I entered P and pressed return, I would hope to go to the next entry for editing and I could do it quickly, but arrow keys or tab doesn’t advance. I can’t find a way to edit without clicking with the mouse. (F2 would be nice). So it’s mouse select, then keyboard “P”, enter. and back to the mouse for 60 entries. Hopefully there’s an easier way that either Leah or the help manual will tell us about.

I went through all of Leah’s steps. There really is some amazing functionality in Genome Mate Pro and I’m looking forward to seeing what else it can do.

Rebecca Walker is obviously a talented programmer to have put this all into a program that uses an SQLite database on either a Mac or a PC. There’s thousands of hours of her work embedded into GMP. Yet there is so much she could do to improve the user interface and make the program much more easy and obvious to use. 

 

Part 7 – More GEDmatch Imports

I didn’t have any parents or brothers or sisters tested, so I did a one-to-one compare with my uncle and GMP simply updated the already loaded matches with my uncle. It took about 30 seconds to update this, which seemed like a very long time for a simple file of 61 lines.

Leah said in her writeup: “You might also use it if you want to see quickly whether two of your matches also match one another (that is, whether they triangulate with you), in which case you’d import a one-to-one between their two kit numbers.”

I left a question to Leah on her blog post asking what do we do with that in GMP. We can’t add them to our own profile since they aren’t our own matches, or can we?

Then I followed the X Chromosome procedure. Leah said I might have fewer than 10 X matches greater than 10 cM since I’m male. I have 434.  I selected the first 50 (you have to select the check box of each individually) and did a 2-D Chromosome Browse. I imported those into GMP. It took about 3 minutes and added 150 new segment matches including 45 new X matches. I then did the same (just the first 50 out of his 785) for my Uncle. Theoretically, I would want to do all the matches and maybe I will later but this is good enough for now to see how it works.

 

Part 8 – The Relative Detail Tag

As Leah says, this part is more of an introductory tour than an exercise. It would have been nice if we could have populated this a bit before looking at it. None-the-less, the Surnames in Common and Possible Connections look like they may provide useful information. Hopefully ancestral places will also be compared here, but I’ll have to wait until we load more than GEDmatch data to see.

 

Part 9 – Import a GEDCOM

While waiting for Leah to produce Part 9, I thought I’d load 2 other relatives, the only people other than my uncle who I know how I am related to: a third cousin on my father’s mother’s side, and a second cousin once removed on my mother’s side. I loaded their relative lists and segment match lists as in Parts 5 and 7. That now gives me 4 people I can work with for this lesson.

I backed up my GMP database (now 10 MB). Then in GMP, I clicked on the Ancestor’s tab and then on the “Load Gedcom (sic)” button. Up popped this box:

 image

Well that’s just plain wrong. First of all GEDCOM 5.5.1 is the de facto standard, and UTF-8 characters are illegal in GEDCOM 5.5. Also, it really bothers me to see GEDCOM written as “Gedcom” which is not correct. But lets go on.

I tried a GEDCOM 5.5.1 file that I use for testing and it did take it. I then matched myself, my uncle, and my paternal cousin and my maternal cousin to the correct person in the GEDCOM and got GMP to load the ancestors for each.

Now for each person, all ancestors are shown. Here for example are mine, excluding myself and my parents who are numbers 1, 2 and 3:

image

This is actually rather neat. The Ahnentafel number is on the left. Selecting Maternal, Paternal or X List at the bottom shows only those ancestors.

Setting the profile to my uncle or my cousins, I see just their ancestors, and similarly can select just their maternal, paternal or X chromosome ancestors.

Now to try mapping segments. This looks similar to what DNA Painter does. Leah instructs us to go to the Chromosome tab, find a segment matching with my uncle and clicking on that segment which opens the Relative Detail tab for my uncle and shows all the segments he has in common with me and also my third cousin who is on my paternal side. You have to follow Leah’s instructions carefully since it is nowhere near obvious what to do, selecting one segment that has the correct profile name (my uncle) and marking it paternal and setting the male and female MRCA and saving the segment. Then with a right click over the segment, several options come up allowing this to be copied to some or all of the other segments.

Leah gives an example about specifying the side (paternal or maternal), but it’s not at all easy to figure it out from the program. It seems like it must be from the point of view of the person who is specified in the “profile” column. It is very easy to get lost trying to set this up here. Even if you are trying to do this just from your own point of view, it’s hard to keep the views straight.

Ultimately, it does give me the segment map from any of the profile’s point of view. Here is mine, with my 3 relatives mapped:

image

 

Part 10 - Import GEDmatch Triangulation

Today Leah gets us to use the GEDmatch Tier 1 Triangulation report. I ran this for myself, my uncle and my 2 cousins (each run taking 15 minutes to a half hour) and imported them into GMP. This was interesting. I didn’t have to specify the profile since GMP must have been assigning each one to the three people in the triangulation if they were already loaded into GMP. There were not a lot of triangulations updated when I ran my uncle and my 3 cousins. Most were added. That indicates to me that GEDmatch’s Triangulation report gives you the triangulations it finds up to some point, but does not give you all of them.

Leah says how you can quickly assign maternal or paternal sides if you’ve tested both your parents by showing possible triangulations and then selecting from the right-click menu “Mark shown DNA segments”. I don’t either of my parents tested. But I was able to do this with my uncle and cousins. That does not give complete coverage as parents would, but it marked a good number of segments M or P for me.

The context sensitive right-click options in GMP are quite overwhelming. There is likely a way to do almost everything, but figuring out when to use what and how to do a specific task is a challenge.

 

Part 11 and Beyond

Leah has announced what the next parts will be: 

  • Part 11 — Using the DNAGedcom Client to Import FTDNA Data
  • Part 12 — Merging Duplicates
  • Part 13 — The Relative List Tab
  • Part 14 — A Few Tips
  • Part 15 — Using the DNAGedcom Client to Import 23andMe Data
  • Part 16 — Importing Ancestry Composition from 23andMe
  • Part 17 — Using the DNAGedcom Client to Import AncestryDNA Data
  • Part 18 — Assign Segments to an Ancestor
  • Part 19 — Creating Custom Templates for MyHeritage (and Others)
  • Part 20 — Customize Genome Mate Pro
  • Part 21 — Updating Your GMP Version

I’ll continue to add to this post as Leah adds new articles. If you’re interested in seeing how this works out for me, bookmark this page and check back. I’m pretty sure my endogamous match data will give GMP a good workout.

Advancing the GEDCOM Standard - Sat, 26 May 2018

For over 30 years, genealogy software has relied on the GEDCOM standard to transfer data between programs. There have been advances in the past years to allow programs to directly sync to online family trees, but that is nowhere near perfected and very few programs have the capability. So the fallback is still GEDCOM.

The big complaint about GEDCOM is that the data doesn’t all transfer between programs, especially source information. I’ve always maintained that GEDCOM has almost everything needed for all the information to transfer. The problem, I feel, is mostly on the developers side. In many cases, GEDCOM has been implemented improperly or incompletely. Often, constructs from older versions of GEDCOM are used which many programs don’t support, or their own user-defined tags are included which other programs do not understand.

The last official version 5.5.1 was released in 1999. That’s almost 20 years ago. Surely all developers should have had enough time to change their GEDCOM input and output to match that standard by now, shouldn’t they? If so, wouldn’t most of our data transfers be fixed?

Well no, they haven’t changed. They didn’t know what they were doing wrong and didn’t know how to fix it. They have mostly left their programs do with GEDCOM what they always have done.

What developers need is a stiff kick in the pants. Maybe something new to promote improvement. They need an annotated version of GEDCOM that explains the nuances to them and a set of best practices that they could follow.

Genealogy technology expert Tamura Jones has done just that. Tamura has spent the past number of years reviewing various parts of GEDCOM through articles on his blog www.tamurajones.net. He has critically reviewed dozens of genealogy programs with respect to their functionality and GEDCOM adherence. Personally, I have very much appreciated his detailed reviews of my software Behold, which have allowed me to address my program’s problems and fix and improve them wherever I can. Other developers would benefit greatly by similarly listening to Tamura’s advice.

Tamura Jones has created the annotated version of GEDCOM that is so much needed. He released it only a few days ago. It is called The FamilySearch GEDCOM 5.5.1 Specification Annotated Edition (TFG551SAE). It is now available to all developers and anyone else interested.

image    image

The document incorporates the 101 original pages of the GEDCOM 5.5.1 specification and adds annotations that expand the document to 194 pages. The annotations explain and correct the original text. In doing so, obsolete features are clearly denoted, and deprecated features are marked along with the reason for deprecating them. The best practice is spread throughout the document with links to external articles that provide even more detail. If all developers followed these best practices, which are not obvious from the original GEDCOM text alone, then GEDCOM transfer between programs would improve immensely.

Tamura solicited the help of seven technical reviewers, myself included. All of us have had a great deal of experience with GEDCOM. The others were:

- Tim Forsythe, creator of Gigatrees and VGedX (GEDCOM validator)
- Diedrich Hesmer, creator of Our Family Book and GEDCOM Service Programs
- Andrew Hoyle, creator of Chronoplex My Family Tree and GEDCOM Validator
- Stanley Mitchell, creator of ezGED Viewer
- Nigel Munro Parker, creator of the GED-inline GEDCOM validator
- Keith Riggle, blogger & genealogy software reviewer

I had the great pleasure of spending a few days with Tamura in October 2014 in the beautiful city of Leiden, Netherlands where he lives for the Gaenovium Conference that he put on. I immensely enjoyed the opportunity to talk to him in person.

So what does this document bring to the table in terms of improving genealogy data transfer standards? The FHISO organization has been working for years on coming out with a new standard. They have been working mostly on concepts and vocabulary and haven’t had the time or manpower available to do much more. But one of their goals is to create an Extended Legacy Format (ELF) that was “fully compatible with current uses of GEDCOM 5.5.1”.

In my opinion, Tamura Jones has done the first part of the job for them. He has put the rules in place and laid out the foundation for the ELF that FHISO wants. It is my hope that FHISO takes this opportunity to incorporate the cooperative effort of 8 GEDCOM experts and strongly suggest to all genealogy software developers that they take heed of this document and review their products and make improvements to them as recommended. This alone would considerably improve data transfer between programs, even without a new standard.