2024/05/13
Jump to navigation
Jump to search
Monday, May 13, 2024 (#134) Woozle's Journal
|
|
|
Harena needed to snag the text from some images, and I thought: Shirley, by now there must be a usable GUI OCR package in Ubuntu.
So I searched apt
for "OCR", and... after wading through a bunch of stuff, found a thing called YAGF which is apparently a front-end for either of two CLI apps called Cuneiform and Tesseract (with the latter being the default, and something I had apparently installed earlier).
After loading up a sample image to scan, the following sequence of following events followed, following my loading up of a sample image to scan:
- It claimed I hadn't installed the English language data files for Tesseract, which I had.
- I tried re-installing them, then closing and reopening YAGF; no joy.
- After much searching the web for help, I noticed that the Settings dialog box asks for the location of the Tesseract data files, which had been set to the root folder.
- It seems somewhat unhelpful to complain about not being able to find files when the location for those files obviously hasn't been set.
- I looked in
apt
to see where the files might be, and found them in/usr/share/tesseract-ocr/5/tessdata
.
- After this, YAGF would do what appeared to be an attempt to parse the image, but it lasted less than a second and produced no output.
- I wasn't sure if YAGF would definitely show me the output, so I did a "File → Save All Text". The resulting file had zero bytes.
- I tried modifying the path, in case YAGF was expecting an enclosing or enclosed folder (but wasn't upset to the point of giving me the error again) -- no go.
- Then I thought of switching to Cuneiform, in case that works better -- which (after installing it) it does.
- Running the scan with Cuneiform without installing Cuneiform doesn't give an error message. This is also unhelpful.