dev-resources.site

for different kinds of informations.

How to improve OCR accuracy ? | my 5-year experience

Published at

7/12/2024

my experience with OCR technologies

I created my 1st image to text converting app on Oct 6th 2018, so it was 5+ years ago. I have been improving, learning, rewriting, iterating, experimenting on OCR technology since then.

I created all of these apps to extract text from images/photos:

IMG2TXT: Image To Text OCR (opensource)
IMG2TXT OCR App (discontinued)
IMG2TXT Hindi / Indian OCR App (will be discontinued after the 1st app supports Hindi/Indian)
IMG2TXT : Persian OCR App (will be discontinued after the 1st app supports Persian)

After all these years, I have a simple thing to say "What is measured, improves". This quote is from "The Effective Executive" book by Peter F. Drucker.

ideas to improve OCR accuracy

Improving OCR accuracy of extracted text is not a small task. The obvious answer to how to improve text extraction from images is:

improve "traineddata" models
use auto correct; for example correct boxmg into boxing
use High DPI photo

I used to focus on all of these bullet points and more of them, such as:

pre-processing images/photo with
- black and white filter
- binarization with adaptive threshold
increase the DPI of the image artificially to be around 300 dpi
use the best models from tesseract OCR despite their large size

These ideas led me to improve performance and text accuracy to certain extent. Don't get me wrong! these tips and tricks me my apps run fast enough with good enough accuracy. But I see more accurate apps! for example, Google ML kit produces almost 99% accuracy in text extraction from clear images.

how to measure OCR accuracy improvement/progress ?

My measurements are not good enough. I need to follow "What is measured, improves" concept. I need to have a set of photos of papers to measure my app's accuracy against. I need a sample of photos that represents the real world use cases. Then I need to refactor and enhance the text extraction accuracy against this sample of images. So, people get the improvements in their daily tasks of typing a paper into digital document.

specifications of the image sample

I need to collect that image sample with the real world use cases in mind. So I need these images.

a photo of an old book, the paper is perfectly laid out on an even surface
a photo of an old book, the paper is warped as the book is open
a photo of a modern book with clear white background
a photo of a modern book with some image/illustration between paragraphs
a photo of an article written in Arabic with some words in English
a photo of an old yellowish book paper with a cursive font

This is the initial set of image specification of the collected photos. If you have a specific use case, send some photo samples to me on Twitter (x) or LinkedIn.

I hope you enjoyed reading this post as much as I enjoyed writing it. If you know a person who can benefit from this information, send them a link of this post. If you want to get notified about new posts, follow me on YouTube, and GitHub.

ocr Article's

30 articles in total

Quick and Dirty Document Analysis: Combining GOT-OCR and LLama in Python