Logo

dev-resources.site

for different kinds of informations.

AI-driven OCR Revolutionizes Intelligent Layout Analysis with 24+ Labels

Published at
9/25/2024
Categories
ai
ocr
documentprocessing
idp
Author
derek-compdf
Categories
4 categories in total
ai
open
ocr
open
documentprocessing
open
idp
open
Author
12 person written this
derek-compdf
open
AI-driven OCR Revolutionizes Intelligent Layout Analysis with 24+ Labels

With the rapid development of technology and the ever-changing business needs, automating repetitive tasks has become a key factor for efficiency enhancement in modern enterprises and a cornerstone for achieving digital transformation. RPA (Robotic Process Automation) is an effective technology to address this challenge. Increasingly, companies are adopting RPA technology to modernize their internal workflows.Ā 

Ā 

Customer Background and ChallengesĀ 

Ā 

A technology company specializing in office software development plans to create an RPA product and an intelligent Q&A product to help enterprises automate workflows and business processes, thereby meeting the needs of efficient, cost-effective, and compliant operations while enhancing customer experience.Ā 

Ā 

However, during the development of RPA and AI Q&A products, this company encountered challenges in processing unstructured documents: manual labeling massive documents was inefficient and error-prone, leading to increased costs and slow development progress. They learned that ComIDP's intelligent document processing solution once helped a data provider process over 3 million unstructured documents in 5 days, prompting them to request automated data labeling for intelligent layout recognition and data parsing.Ā 

Ā 

ComIDP customized layout recognition parameters for them and upgraded OCR technology using AI models, employing over 24 labels to restore the layout and logic of documents, ensuring the integrity and consistency of the document layout. This company deployed ComIDP's intelligent document solution in a clustered environment for developing RPA and intelligent Q&A products, significantly shortening their development cycle, reducing costs, and enabling rapid market entry for the products.Ā 

Ā 

Customer Pain PointsĀ 

Due to the complex content and inconsistent format of unstructured documents, data parsing and extraction become extremely challenging. Layout recognition is a major difficulty in parsing unstructured documents, as each layout has numerous page elements, varying layouts and styles, and different logical relationships between contents. Additionally, issues such as noise, skew, and perspective further increase the difficulty of recognition. This requires parsing technology with high adaptability and intelligence. However, lacking advanced technology support, this enterprise had to rely on manual processing, which was inefficient and inaccurate, directly impacting the effectiveness of the RPA and Q&A systems.

Manual Data Labeling

This technology enterprise previously used manual labeling of unstructured data for document layout recognition, which was time-consuming and prone to errors. When different people handled the same dataset, labeling results varied, leading to inconsistent data quality. This not only increased the cost and time for subsequent data verification but also complicated the development work and extended project timelines.

Massive Document Input

This company processes over hundreds of thousands of files daily, necessitating servers with high efficiency and high-load processing capacity. However, traditional server architectures could not handle such large-scale data inputs, resulting in slow system performance.

Self-development Challenges

In a competitive market, self-development can bring personalized solutions but is costly and time-consuming. Long development cycles make it difficult for companies to quickly respond to market changes, risking the loss of market opportunities.Ā 

Ā 

Customer RequirementsĀ 

This company detailed their product's application scenarios to our ComIDP team and proposed specific requirements for intelligent data labeling of layout analysis, aiming to optimize data parsing effects while achieving AI data automation.

Types of AI Data Labeling

They needed to annotate titles, paragraphs, code blocks, tables, formulas, lists, and non-text content within documents to ensure unstructured document completeness. Separating natural paragraphs and layout segmentation were particularly crucial.

Ā 

Type of Label

Sub-type

Note

title

title

All levels of titles on the page need to be labeled.

paragraph

paragraph

Text fragments consisting of plain text are categorized as paragraphs. To facilitate data search and location, large sections with multiple independent semantic paragraphs should be split, typically segmented by natural paragraphs and punctuation, and each text fragment after segmentation is called a paragraph.

block

block: unknown category, non-text block

A block is the output of layout segmentation. Data of the same type of information that is visually in a connected domain is a block.

1. An image on the same row is a block, a table is a block, and a large section of text under the same column is a block.

2. Blocks must consist of same-type information, mixed areas of different types cannot form one block and must be split. For example, a mixed area with an image and a table cannot be a block.

code-block: code block

img-block: mixed text and image block

table-block: table block

sci-block: scientific formulas block

list-block: list block, text, such as directories or text lists

Must be at least two lines (3 and more), with average line text not exceeding two rows, else it's a paragraph



Beyond these fundamental needs, each data labeling type had specific restrictions, such as standalone title labeling, non-overlapping paragraph and block, no multi-column blocks, and no blocks containing mixed data types.

Output Labeling

Post parsing the unstructured documents, this company required the output files in JSON format with limited output labels including title, paragraph, block, code-block, img-block, table-block, sci-block, and list-block. This supports subsequent key information extraction and semantic analysis, enhancing the accuracy of RPA and Q&A systems.Ā 

Ā 

ComIDP's R&D team customized the layout recognition parameters based on the customerā€™s needs. Constant updates and iterations led to an accuracy exceeding 95%, successfully delivered to them for acceptance.Ā 

Ā 

ComIDP SolutionĀ 

ComIDP team engaged in-depth conversations with this enterpriseā€™s R&D team to comprehend specific needs and business goals, ensuring custom and practical solutions. From data collection, AI model training, model optimization to testing reports, we provide professional, flexible, and efficient services for customers.

Layout Analysis Model Training

By collecting different types of samples for manual data labeling, such as financial reports, papers, newspapers, and books, our R&D team trained a layout analysis AI model applicable to various industries. This model accurately identifies and classifies various elements on the page, such as titles, paragraphs, tables, and images, using 24 predefined labels, with recognition accuracy surpassing 95%.Ā 

Ā 

Based on the specific data labeling needs of this enterprise, we further optimized our AI model. Through refined labeling types and rules, we achieved precise automated data labeling of complex document content. For instance, special recognition algorithms were designed and adjusted for code blocks and formula blocks in technical documents, accurately extracting and distinguishing these unique contents. AI-based ComIDP analyses both geometric and logical document layouts, ensuring 99% restoration of document layout and reading logic structure, thereby maintaining layout completeness and consistency. As requested by the client, labeled results are outputted in standardized JSON format, facilitating secondary processing and data analysis.Ā 

Ā 

Test Reports Verify Effectiveness

Functional Testing

Upon AI model training completion, we conducted multiple rounds of rigorous testing to validate its performance, simultaneously using client-provided examples as validation sets to detect model accuracy, eventually producing a functional testing report. The report elaborated on our AI OCR model's behavior in automatically processing various document types, including different formats, sizes, and languages, plus elements like stamps, charts, formulas, and flowcharts. These results served as critical acceptance criteria for the model.

Format

PNG, JPG, JPEG, BMP

Size

100KB ~ 30MB

Languages

Simplified Chinese, English, Mixed Chinese and English

Types

Tables, Complex Layouts, Stamps, Handwritten text, Exams, Formulas, Flowcharts, Skewed text, Scanned, and Photographed books and PPTs

Ā 

From the test report, we selected the ultimate effect of ComIDP processing documents with formulas. Results showed accurate recognition of both text and formulas, and our customer was very satisfied with the results.

Ā 

Stress Testing

Facing this enterprise with over a hundred thousand daily document inputs, we performed comprehensive stress testing to ensure the system could handle massive document input pressures. We tested PDF to Word (Grid Layout) with and without OCR in both synchronous and asynchronous environment. Our stress test report indicated ComIDP maintained stability, accuracy, and quick responses under high load, proving its excellent performance and reliability in high-load tasks.

Ā 

Ā 

Synchronous Testing

Asynchronous Testing

Test Scenario

200 users converting files simultaneously.

200 users converting files simultaneously, lasting over 10 minutes.

Test Results

All 200 users succeeded in conversion.

All 200 users succeeded in conversion.

Success Rate and Accuracy reached 100%, with no error responses.

Success Rate and Accuracy reached 100%, with no error responses.

99% response time under 1 second.

99% response time under 1 second.

Ā 

GPU&CPU Speed Testing

Additionally, we deployed a GPU to accelerate document processing speeds. Comparing GPU and CPU efficiency for the same tasks resulted in a detailed OCR GPU&CPU speed comparison report.

Ā 

Below illustrates ComIDP's time expenditure for processing 100 image samples using GPU vs CPU. Testing indicated that in a dual-GPU system's dual-container environment, ComIDP processes up to 20,000 images per minute on average. GPU processing time is 100 times faster than CPU, demonstrating significant speed advantages for large-scale document processing, substantially reducing time and boosting efficiency. For customerā€™s actual applications and document processing demands, we provided a customized cluster deployment solution to ensure high efficiency in ComIDP's real-world application.Ā 

ocr Article's
30 articles in total
Favicon
Quick and Dirty Document Analysis: Combining GOT-OCR and LLama in Python
Favicon
Pixtral Large: Revolutionizing Multimodal AI with Superior Performance
Favicon
Say goodbye to tedious data entry! The future of OCR is here, and itā€™s smarter than ever!
Favicon
Unlocking Text from Embedded-Font PDFs: A pytesseract OCR Tutorial
Favicon
Streamlining Healthcare Paperwork with AI-Powered OCR
Favicon
NoisOCR: A Python Library for Simulating Post-OCR Noisy Texts
Favicon
AI-driven OCR Revolutionizes Intelligent Layout Analysis with 24+ Labels
Favicon
šŸ“„ OCR Reader, šŸ” Analyzer, and šŸ’¬ Chat Assistant using šŸ”Ž Zerox, šŸ§  GPT-4o, powered by šŸš€ AI/ML API
Favicon
Qu'est-ce qu'OCRULUS ?
Favicon
Practical Approaches to Key Information Extraction (Part 1)
Favicon
OCR Data Extraction Software: Exploring the Latest Innovations in 2024
Favicon
Developing a Desktop MRZ Scanner for Passports, IDs, and Visas with Dynamsoft C++ Capture Vision SDK
Favicon
Streamlining Operations with Cloud OCR: Leading Use Cases in Business Automation
Favicon
Implementing Efficient Mobile OCR: A Developerā€™s Guide
Favicon
Automating VIN Code Recognition with OCR Technology
Favicon
OCR Solutions Uncovered: How to Choose the Best for Different Use Cases
Favicon
Steps to Develop an Angular Passport MRZ Reader & Scanner
Favicon
Mastering Text Extraction from Multi-Page PDFs Using OCR API: A Step-by-Step Guide
Favicon
Efficient Driver's License Recognition with OCR API: Step-by-Step Tutorial
Favicon
How to improve OCR accuracy ? | my 5-year experience
Favicon
I ask for help
Favicon
Mastering Parcel Scanning with C++: Barcode and OCR Text Extraction
Favicon
Difference Between OCR and ICR | A Complete Guide
Favicon
dvantages of iCustoms OCR: AI Precision for Streamlined Customs Processes
Favicon
5 C# OCR Libraries commonly Used by Developers
Favicon
Understand How to Transform Images into Text Easily
Favicon
OCR with tesseract, python and pytesseract
Favicon
Build a serverless EU-Driving Licences OCR with Amazon Textract on AWS
Favicon
Secure OCR and Biometrics Integration in Angular
Favicon
Removendo Dados Sensiveis de Images

Featured ones: