Since I started blogging, I had the pleasure of corresponding with various
people involved in the early days of the hobby. While I am not at liberty to
disclose many of my conversations, something came up that my readers
can help with.
One of the people I correspond with owns the rights to several older products
and would like to know what the best course is for getting them scanned and
the text OCRed, so they can be released again as PDFs.
Appreciate anything you folks can come up with.
5 comments:
"While I am not at liberty to disclose many of my conversations..."
Best blog post of the morn, I laughed for like 5 minutes,
thank you.
I have a couple of scanners (one very inexpensive, and one very reasonably priced) both of which came with scanning and OCR software. However, I've found that the best results come from using those pieces of software in conjunction with Photoshop (mostly for adjusting the brightness/contrast of the original scans, but also for making color tweaks to cover images if necessary.) Don't get me wrong, the basic software (sans Photoshop) will "get 'er done." However, the better direction (if possible) would be to work with someone who could OCR the text, scan the images, and create a "legacy" edition of the original (match typesetting, not just OCR text set in default arial or times). And then to create a clickable PDF in Acrobat (Acrobat Reader alone will not do it; you'll need a regular version).
Depending on who/what the projects are (and the amount of work that might be involved), I might be interested in lending my assistance as an experienced graphics person to help do a really great legacy edition (possibly for nothing more than an interior "cross-credit" for New Big Dragon on the title page.)
A lot of this depends on what they are doing with it and how much they want to spend. If it's going to be released for free, then simply using commercial OCR software and tweaking it is fine. For a full-featured commercial product, then they need to OCR the text, edit it, scan the pictures and have somebody use photoshop to end up with the best version of that art. Then to take the whole shebang in a modern desktop publishing program and recreate this as a new document. Modern programs will produce PDF's with links and bookmarks.
As the above commenters make clear, there a number of ways this could be done. What's the purpose? To simply release it free of charge, or try to "republish" it for some income?
If the intention is to simply release the materials so the fans can have it, I would suggest simply scanning all the pages at the highest possible resolution and fidelity, and then release that to the community (it would be a very large file, so perhaps via a sharing site or torrent). Let the fans clean it up, run OCR, Photoshop the art, etc. They'll probably do a very good job, and then a neat, clean and "best results" version could be linked from the author's website once complete.
Like the other commenters, I wonder what the ultimate goal is.
If you just want to get an electronic copy of a previous work, the best option is probably using a commercial OCR product to get a fairly accurate recognition on a basic scan of the pages. The recognition can be tweaked a bit by the person doing the scans and the whole pushed out as a PDF. This is what was done to create the original WOTC D&D PDFs years ago.
Another option is to make two scans of the page - one for OCR and one for the visual presentation. That way you can use a lower resolution to get good OCR (recognition does not significantly improve once you get beyond a particular resolution) and another to preserve the visual presentation of the page. The two can then be layered into a final document.
The other option is to take low-resolution scans of the text and high-resolution scans of any artwork. The text can then be recognized, edited, and repointed to conform (or not) to the original work. The art can be cleaned up, positioned, and laid out to also conform (or not) to the original. This is essentially what WOTC did with the current reprints of the 1E AD&D books.
Ultimately, the level of accuracy in recognition will be about the same, regardless of the technique used. You will still need someone (preferably multiple people) to check the whole document and ensure that the text is accurate. The only real separator is the art and whether you want to preserve an historical layout and appearance or update it and make it a "clean" presentation.
I've done scanning and electronic archiving on some documents over the years. I'd be happy to answer any other questions or give you some suggestions for methods and equipment if you like. Drop me a line at cats [dot] teacher [at] gmail [dot] com if you want some help.
Post a Comment