I tried opening a GEDCOM file with umlauts in Behold 1.0.5.1. Instead of the umlauts, boxes were displayed. The file was UTF-8 with BOM. When I converted it to UTF-8 without BOM, everything was fine. Also, Unicode (UTF-16) was fine, too.
Yes, I can replicate the problem. Thank you for pointing it out.
I did some debugging and I think I've found the mistake I made.
It was bit dumb on my part. This was my checking code:
else if GedcomCharsetUsed = 'UTF-8' then begin
CharConvert := ConvertUTF8;
if (BOMDisplay = '') then
LogIt('', '~MCHAR2')
else if (BOMDisplay <> 'EF BB BF' { UTF-8 } ) then
LogIt('', '~MCHAR3');
end
It should have been:
else if GedcomCharsetUsed = 'UTF-8' then begin
if (BOMDisplay <> 'EF BB BF' { UTF-8 } ) then begin
CharConvert := ConvertUTF8;
if (BOMDisplay = '') then
LogIt('', '~MCHAR2')
else
LogIt('', '~MCHAR3');
end;
end
So I was checking if the BOM was the same as the CHAR specified in the GEDCOM and setting up the correct message, but I was converting the string in all cases. When the BOM matches the character set, the string is already correct and shouldn't be converted.
I am surprised nobody else reported this. I guess not too many people use UTF-8 GEDCOMs.
Let me know if this is an important fix for you. If so, I can produce a point update. Otherwise, the fix will be in the next full version.
Joined: Thu, 5 Sep 2013
9 blog comments, 9 forum posts
Posted: Thu, 5 Sep 2013
Hi Louis,
I tried opening a GEDCOM file with umlauts in Behold 1.0.5.1. Instead of the umlauts, boxes were displayed. The file was UTF-8 with BOM. When I converted it to UTF-8 without BOM, everything was fine. Also, Unicode (UTF-16) was fine, too.
Can you replicate this?
Klemens
Joined: Sun, 9 Mar 2003
288 blog comments, 245 forum posts
Posted: Thu, 5 Sep 2013
Klemens,
Yes, I can replicate the problem. Thank you for pointing it out.
I did some debugging and I think I've found the mistake I made.
It was bit dumb on my part. This was my checking code:
It should have been:
So I was checking if the BOM was the same as the CHAR specified in the GEDCOM and setting up the correct message, but I was converting the string in all cases. When the BOM matches the character set, the string is already correct and shouldn't be converted.
I am surprised nobody else reported this. I guess not too many people use UTF-8 GEDCOMs.
Let me know if this is an important fix for you. If so, I can produce a point update. Otherwise, the fix will be in the next full version.
Louis
Joined: Thu, 5 Sep 2013
9 blog comments, 9 forum posts
Posted: Fri, 6 Sep 2013
Louis,
thanks for checking.
I can export Unicode GEDCOM from my program, so, for me, there is no need to hurry with a new version.
Klemens