For 2 column style TOC that is scan/ocr -- suggestion for best tool to extract TOC text? #1

stillhope · 2023-01-10T21:35:16Z

Hello, Can you suggest best tool to extract the TOC text, from a 2 column TOC style (PDF is scanned and ocr'd).

The problem with OCR space it does not read the text in columns, e.g. first column then second column. Rather it reads left to right, so you get the text in the wrong place

For example: extract result from OCR space is (chapter Six is in column 2 of the TOC and the tool has read it on line 1)

Contents
Number Chapter Six: Units..............:.......48
Length, mass, capacity
Chapter One: Types Of and time.... ....

The problem with Tabular is I could not find any 2 column style TOC template. I tried to create my own template as a new person, and it did a very average job (e.g. did not recognise end of sentence, kept leading ..... before page number. I could not find any auto scripts in sublime text editor to handle the typical TOC edit text issues either.

Nuntber,
Chapter One: Types of,
number ........................................... 2,
Squares and square roots .................,2
Cubes and cube roots .......................,2
Multiples .......................................,4
Prime factorisation ..........................,6
Chapter Two: Using numbers .....1 0,

Tabular is better than OCRspace, in the fact text is in the correct order but still alot of manipulation using Sublime Text Editor to get the "TOC text file " into the required layout to be able to auto-create TOC bookmarks in PDF (ie using one of the apps, pdftk or jpdfbookmarks)

Tabular is currently has no ability to ask questions of help. On github the issue tab is not showing.

stillhope changed the title ~~For 2 column style TOC that is scan/ocr -- suggestion for best tool?~~ For 2 column style TOC that is scan/ocr -- suggestion for best tool to extract TOC text? Jan 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

For 2 column style TOC that is scan/ocr -- suggestion for best tool to extract TOC text? #1

For 2 column style TOC that is scan/ocr -- suggestion for best tool to extract TOC text? #1

stillhope commented Jan 10, 2023 •

edited

Loading

For 2 column style TOC that is scan/ocr -- suggestion for best tool to extract TOC text? #1

For 2 column style TOC that is scan/ocr -- suggestion for best tool to extract TOC text? #1

Comments

stillhope commented Jan 10, 2023 • edited Loading

stillhope commented Jan 10, 2023 •

edited

Loading