I tried loading a 317 MB GEDCOM file. (Go to: http://www.prpletr.com/Gedcoms.htm and download Good, Engle, Hanks Family Gedcom). Ran out of memory. It happened during the ANSI to Unicode conversion. So I dissected that routine and did the ANSI to Unicode conversion, 4 MB at a time. This now enabled it to complete the read in and conversion and it took only 11 seconds.
But it then ran out of memory trying to load it into my data structure. Well, 320 MB of ANSI turns into 640 MB of Unicode. With a 2 GB address space, I should still be able to load the 640 MB of character string into my structures.
I’m still not sure why, but Delphi 2009 uses over 6 times the original size of the ANSI file to store its character strings. Doubling is expected because Unicode are 2-byte characters, but six times seems excessive. I’ve put up a question on this at StackOverflow and asked a few experts as well.
Barring some answer that would cut this overhead down, I’ve reached the physical limit of what Behold can handle in memory with 32-bit processing. I am a bit disappointed, since GenViewer loads this huge file in 11 seconds and only uses about 340 MB of memory in doing so. I was hoping I could get Behold to do so as well.
I’m still going to try file mapping to memory, so I might not even have to load the file into memory. This may turn out to be too slow, especially since instead of just once, Delphi will have to convert the data to Unicode every time it accesses it. But it’s worth a shot.
Even so, the physical number of 2 to the power of 31 is closing in. So memory mapping might be able to get just a bit further with that limit being the next one in the way. And that will have to wait until Delphi’s 64-bit compiler is ready.
p.s. If you want some amusement, try that huge GEDCOM file on your favorite genealogy software, and see if it can handle it. Even try loading it into a text editor. Even try downloading it!