Login to participate
  
Register   Lost ID/password?
Louis Kessler’s Behold Blog » Blog Entry           prev Prev   Next next

Translating a Book at No Cost with Passable Results - Mon, 11 Apr 2022

Among that family materials I’ve been going through was a photocopied Memorial Book for the town of Mezhiritch in the Ukraine. My mother received this pile of paper from her cousin many years ago. The Memorial Book was printed in Israel in 1955 and it is 442 pages.. So I have it on about 250 sheets of paper.

This document  is written almost entirely in Hebrew. I can read and pronounce Hebrew, but my vocabulary is so small that I cannot speak or understand it.

image

This particular book is listed in the JewishGen Yizkor book collection and the Table of Contents is translated, but that is all.

So obviously, I have to get the document translated. I don’t mind paying a reasonable fee to a service if they could do a good job of it. But I wondered if I could do it myself without too much work on my part.


Translating the Hardcopy Directly

I had heard over the past year that Google Translate on your phone does a pretty good job of translating words in a picture.  You can use it when you are on the road to translate street signs and other items like menus when you are in another country, although you do need Internet to do so.

I decided to pick a page from my book to try it out. The Hebrew is small and as is typical for Hebrew literature, does not include the nikkud (vowels). There are two columns on each page and sections with headings. There are photos with captions under the photos. And there the names of people, which is what all genealogists are looking for. Here’s a typical page. (Click on the images for a larger version):

image

I place the document in my Shotbox which gives lots of light, and lets me rest my camera directly on top so that there will be no camera shake.

image

Then on my phone, in Google translate, I click on the the little camera icon.

image

This immediately takes me to an app called Google Lens which does the translation in place on the image it sees. I then click “Share” which allows me to save the result and I send it to my cloud service: OneDrive.

How good is the translation? Well let’s see. This is what Google Lens saved for me to OneDrive.

File_20220411-132008

Here’s an expansion of the text in top right paragraph on the page:

image

and the caption below the picture is this:

image

It’s interesting how this looks like one of those messages a criminal would create from newspaper clippings, with different sized fonts mixed together, but the fact that it gives you the translation in the correct place in the document is very appealing. It also shows you what it missed and/or couldn’t be translated. You get the general drift of what is being said, but when you read it, it doesn’t really make much sense, especially the caption below the picture. This translation to me is a fail.


Scanning and Optical Character Recognition (OCR)

Is that the best I can do?

To answer that question, I’ll have to compare this translation to other translations. There are a number of things I can try.

First I’ll scan the 250 pages onto my computer with my new Canon ES-580W sheet feed scanner. I got it a few weeks ago when after 5 years of good use, my Canon DS-860 started giving me small colored squares on my scans. The Canon support people deemed it a chip problem that required either repair or replacement. I’m not one to repair technology that is 5 years old.

My new scanner comes with Epson Scan Smart software that includes an ability to do OCR (Optical Character Recognition) in 29 different languages including Hebrew and save the image as a searchable PDF file.

When a searchable PDF is viewed, many viewers allow you to select the text and copy it. In my case I can now select the text and paste it. For the top right paragraph and the line below the caption on my test page, I get this:

כנוכר נמצאה העיירה על ארמת סטצקי, וכל תושביה, כיהורים כגויים, שלמו לו ,,טשינשיי, הייגו רמי חכירה. גביית דמי החכירה הופקדה כנהוג בירי יהודים, שמזה מאות בשנים נורעו כחוכרים ומוכסגים וכנושא לפרקים שלמ,נt ונכבדים בדברי ימי היהורים, שבהם השתקפו יחסיו הטובים, ולעתים קרובות יותר הרעים והאכזריים, של ,,הפריץ" כלפ~ נתיגיו היהוריים בכלל וכלפי ה,,מושקה"ס שלו בפרט. חוכרי המס האחרונים היו ח י י ם " ב ר ו ך ~ ח י י מ ע נ י ס " , שהיה גובה מסים וממלא שרותי קשר ביו הארמיו ובין התושבים היהוריים. אחריו תפש את מקומו מ ש ה ב ר ו ב ש ט י י ו, שטיפל בגביית המס, ואילו בקנית היבולימ ומכירתם עסקו אפ ר י ם ז י נ מ ר ו ב ן ‘ צ י ו ן ש ט ר ג ש י ס . אחרי מותו הפתאומי של מ ש ה ב ר ו ב ש ט י י י ו מילאה את מקומו אשתו פאגי, שאף היא תלתה ונפלה למשכב כעבור זמן קצר

תעשיית ארגזים (קופרני1( . הנגרים בתמונה (מימיו) ׃ נתו שטרגשיס, משה פול~;יק, בריר פרומו

This Hebrew rendition obviously is not perfect. Parts of several lines include words that have spaces between the letters of the word. But lets see how this does.

If I simply paste the above Hebrew text online into Google translate, it gives:

As a foreigner, the town was located on the Statsky estate, and all its inhabitants, as priests as Gentiles, paid him, Chinshey, Higo Rami Lease. The collection of the rent was entrusted to the firing of Jews, who for centuries had been shaken as tenants and mercenaries and as a subject for whole chapters, and honored in the annals of the Jews, which reflected the good, and often more evil and cruel, attitude of the "perpetrator" His s in particular. The last taxpayers were Haim "Baruch ~ Haim Menis", who was a tax collector and a liaison for bio-armies and the Jewish residents. After him, he took his place from the Rabbi, who took care of the collection of the tax, while the purchase of the hibulim and their sale were done by Ephraim. . After the sudden death of Moshe Bar-Shti Tiyu, he was replaced by his wife Faggi, who also hung up and fell asleep shortly afterwards.

Crate industry (Copernicus 1) . The Carpenters in the Picture (from his time) ׃ Natu Stergashis, Moshe Pol ~; Yak, Barir Promo

This is a bit better than the translation that my phone’s Google translate gave. The caption now gives what must be names of the people in the picture.

Maybe the problem is because the Hebrew text is so small and 200 dpi is not enough resolution for the OCR to determine the Hebrew letters correctly. If I increase the scanner’s resolution setting to 600 dpi, then the Hebrew comes out as this:

כנזכר נמצאה העיירה על אדמת סטצקי, וכל תושביה, כיהודים כגויים, שלמו לו ~,טשינשיי~ היינו דמי חכירה. גביית דמי החכירה הופקדה כנהוג בידי יהודים, שמזה מאות בשגים נודעו כחוכרים ומוכסנים וכנושא לפרקים שלמים ונכבדים בדברי ימי היהודים, שבהם השתקפו יחסיו הטובים, ולעתים קרובות יותר הרעים והאכזריים, של ,,הפריץיי כלפי נתיניו היהודיים בכלל וכלפי ה,,מושקה"ס שלו בפרט. חוכרי המס האחרונים היו ח י י ם ~ ב ר ו ך i ~ח י י מ ע נ י ס יי , שהיה גובה מסים וממלא שרותי קשר בין הארמון ובין התושבים היהודיים. אחריו תפש את מקומו מ ש ה ב ר ו ב ש ט י ‘י ן , שטיפל בגביית המס, ואילו בקנית היבולים ומכירתם עסקו אפ ר י ם ז י נ מ ן ו ב ן ~ צ י ו ן ש ט ר נ ש י ס . אחרי מותו הפתאומי של מ ש ה ב ר ו ב ש ט י י י ן מילאה את מקומו אשתו ~אני, שאף היא חלתה ונפלה למשכב כעבור זמן קצר

תעשיית ארגזים .. (קופרטן) . הנגרים בתמונה (מינ~ן) ׃ נתן שטרנשיס, משה פולישוק, ברוך פרומן

It is a bit different. The words still have spaces. The Google translation for this comes out as:

As mentioned, the town was found on Statzky land, and all its inhabitants, as Jews as Gentiles, paid him ~, Chinshi ~ we were rent. The collection of the rent was entrusted to Jews, who for hundreds of years had been known as tenants and tax collectors and as a subject for entire and distinguished chapters in the annals of the Jews, which reflected the good, and often more bad and cruel, relations of "Fritzi" to his Jewish subjects in general The last taxpayers were Chaim ~ Baruch i ~ Chaim Menasi, who was a tax collector and a liaison between the palace and the Jewish residents. Tin, who handled the collection of the tax, while buying and selling the crops, Ephraim Zinman and Ben-Zion Shtran Shis were engaged. After the sudden death of Moshe Bar In his place, his wife ~ I, who also fell ill and fell asleep shortly afterwards, took his place.

Crate industry .. (Copertan). The Carpenters in the Picture (Min ~ n) ׃ Natan Sternshis, Moshe Polishuk, Baruch Froman

As another comparison, I’ll also paste the 600 dpi Hebrew into Microsoft Bing Translator, which gives this:

As mentioned, the town was found on Statzky land, and all its residents, as Jews as gentiles, paid him #, Tashinshei, we were lease fees. The collection of leasing fees was entrusted as is customary by Jews, who for centuries in the past were known as leaseholders and coverers, and as the subject of entire and distinguished chapters in the history of the Jews, in which his good, and more often bad and cruel, relations of the Fritzei were reflected towards his Jewish subjects in general and towards his musk, in particular. The last tax leaseholders were H.Y.I. ~ B.R. and I~Y.Y. by N.Y. Yi, who would collect taxes and fill liaison services between the palace and the Jewish residents. He subsequently took his place from S.H. B. and B. T. Y.Y.N., who handled the tax collection, while in buying the crops and selling them, they dealt with Af R.I.M. N.N. and in N~ Zhi and N. Y. Y.N. After the sudden death of M. H. B. R. and B. T. Y.Y. Y.N., his wife filled in for him , I, who also fell ill and fell ill a short time later.

Crate industry. (Cooperten) . The carpenters pictured (Minn.) ׃ Nathan Sternshis, Moshe Polishuk, Baruch Froman

The translation is somewhat similar. Bing’s translator couldn’t interpret those single letters. But the names in the captions actually match with just Natan/Nathan being the one discrepancy.

How about another respected free translator called Yandex. Here’s what it gives:

As mentioned, the town was found on Stetsky land, and all its inhabitants, as Jews as Gentiles,paid him, Tchinshay, we were leasehold fees. The collection of leasehold fees was entrusted as custom by Jews, who for hundreds of years became known as leaseholders and hoarders, and as the subject of whole and distinguished chapters in the words of the Jews, in which the good, and more often the bad and cruel, relations of,the Pritzi towards his Jewish subjects in general and to his, and his,Moschka, in particular. He was a tax collector and a liaison between the palace and Jewish residents. After that , he took his place as a tax collector, while buying and selling the crops dealt with the business of buying and selling them . After the sudden death of M. H. B., his wife, I, also fell ill and fell ill a short time later.

Crate industry .. (Kupartan). The Carpenters in the picture: Nathan Sternshis, Moshe Polishuk, Baruch Froman

Quite similar, but some significant differences.

There are meta-search engines available that combine search results from other search engines. Wouldn’t it be nice if there was a meta-translation tool that did the same?


Other Options

Google Translate can translate documents. However, when I pass my PDF file to it, it gives me this message:

image

I tried out a few 3rd party programs to translate a PDF file with limited success, so I wont go into the details. One would have wanted $980 to translate my 250 page PDF file. No thanks.

Back to Google Translate on My Phone

I really did like the way my phone’s version of Google Translate positions the translation correctly on the page. But the translation I first did above was not good enough.

So what would happen if instead of using my camera to photograph the page, I used a screen capture of the page that was digitized. I can set up my PDF reader to display the page as I scanned it and use a snipping tool (I use Snagit) to capture an image of it. Then I save the capture as a jpeg file and download it to my phone’s pictures. Once it’s on my phone, I can again use Google Translate’s camera option. It gives this:

image

image

One last idea. How about scanning directly to a 600 bpi image instead of to a PDF file? Then I can download this image file to my phone’s pictures and use Google Translate’s camera options on my phone. Here’s what that gives:

image

image


Conclusion

Google Translate on a phone has its camera icon, and does its own Optical Character Recognition on the picture when you use it. It adds the translations right onto the image which I think is a very nice feature.

Automated translation is still a work in progress. No tools are perfect yet. The translation algorithms are not perfect. OCR is not perfect. And if the characters rendered from the OCR are not all correct, then the translation cannot be.

But perfection need not be the goal here. If we can get a feeling for what is being said, and the names of people are mostly correct, then we can find the important parts of the document that are relevant to our research.

Any text we find with crucial information that we need to understand fully, we can get translated more accurately. There are lots of genealogy groups on Facebook and elsewhere where helpful people are willing to translate small sections of text for you.

If I had only a PDF file, I settled on this procedure to translate my book:  

  1. Open the PDF and screen capture each page to jpeg.
  2. Copy the screen captures to OneDrive and download to my phone.
  3. Use Google Translate with each page on my phone.
  4. Copy the translated page images back to OneDrive.
  5. Create a searchable PDF file from the translated page images. You will need a PDF editor for this. I use PDF-XChange Editor.

This might sound like an involved procedure, but all steps are simple – just repetitive for each page. After I got the procedure down, it only took about 2 hours to do this for all 250 pages in 5 sessions. I took a break every 50 pages which I used to read through the translations and see what information was relevant to me.

If you have the book as a physical hardcopy, you may want to scan it twice. Once to make a PDF in its original language, and a second time to scan it to 600 dpi jpeg images, one image per page. With the jpeg images, you can then follow steps 2 through 5 above.

Final result: One original PDF in searchable Hebrew, and one passably translated PDF in searchable English.

If anyone has any suggestions as to how I can further improve the translation quality and/or the the procedure I use, I’m all ears.

5 Comments           comments Leave a Comment

1. Steve Little (digitalarchivist)
United States flag
Joined: Wed, 10 Nov 2021
5 blog comments, 0 forum posts
Posted: Tue, 12 Apr 2022  Permalink

Very cool, the whole process. If you’ve got an affiliate link for the Shotbox, I’d be glad to use it.

2. Louis Kessler (lkessler)
United States flag
Joined: Sun, 9 Mar 2003
288 blog comments, 245 forum posts
Posted: Wed, 13 Apr 2022  Permalink

Steve: Thanks, but I don’t have an affiliate link for Shotbox. If you want to purchase a Shotbox, you might still be able to take advantage of their RootsTech special: https://rootstech.shotbox.me/

3. Steve Little (digitalarchivist)
United States flag
Joined: Wed, 10 Nov 2021
5 blog comments, 0 forum posts
Posted: Thu, 14 Apr 2022  Permalink

Excellent–thanks, Louis! I just placed my order with the special link. Best wishes, Steve Little

4. Elaine Martzen (elaine martzen)
United States flag
Joined: Tue, 19 Apr 2022
1 blog comment, 0 forum posts
Posted: Tue, 19 Apr 2022  Permalink

Loved reading the process, thanks for writing it out! Do you think Google or Yandex is generally better for Hebrew translations? Someday I’ll be able to read well in Hebrew, but it will probably be 10 years or so! I’ll use some of your ideas for an English language city directory I’d like to get into searchable form.

5. Louis Kessler (lkessler)
United States flag
Joined: Sun, 9 Mar 2003
288 blog comments, 245 forum posts
Posted: Wed, 20 Apr 2022  Permalink

Elaine: I think in my case, the text was so small that the Hebrew OCR had trouble recognizing the characters. That’s why scanning at 600 dpi helped. I suspect if the text would have been larger, the translations would have been much better. It also was difficult for Google, Bing and Yandex to differentiate names that should simply be transliterated from words to translate. Which translator was best? I didn’t compare them that way, so I can’t tell you if Yandex is better or worse. I simply went with the Google app because of the way it presented its results in their place on the document.

 

The Following 2 Sites Have Linked Here

  1. This week’s crème de la crème — April 16, 2022, Genealogy à la carte, Gail Dever : Sat, 16 Apr 2022
    Translating a Book at No Cost with Passable Results by Louis Kessler on Behold Genealogy.

  2. Fridays Family History Finds Apr 15, 2022 - Empty Branches on the Family Tree - Linda Stufflebea : Sun, 18 Dec 2022
    Translating a Book at No Cost with Passable Results by Louis Kessler on Behold Genealogy

Leave a Comment

You must login to comment.

Login to participate
  
Register   Lost ID/password?