You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, Can you suggest best tool to extract the TOC text, from a 2 column TOC style (PDF is scanned and ocr'd).
The problem with OCR space it does not read the text in columns, e.g. first column then second column. Rather it reads left to right, so you get the text in the wrong place
For example: extract result from OCR space is (chapter Six is in column 2 of the TOC and the tool has read it on line 1)
Contents
Number Chapter Six: Units..............:.......48
Length, mass, capacity
Chapter One: Types Of and time.... ....
The problem with Tabular is I could not find any 2 column style TOC template. I tried to create my own template as a new person, and it did a very average job (e.g. did not recognise end of sentence, kept leading ..... before page number. I could not find any auto scripts in sublime text editor to handle the typical TOC edit text issues either.
Nuntber,
Chapter One: Types of,
number ........................................... 2,
Squares and square roots .................,2
Cubes and cube roots .......................,2
Multiples .......................................,4
Prime factorisation ..........................,6
Chapter Two: Using numbers .....1 0,
Tabular is better than OCRspace, in the fact text is in the correct order but still alot of manipulation using Sublime Text Editor to get the "TOC text file " into the required layout to be able to auto-create TOC bookmarks in PDF (ie using one of the apps, pdftk or jpdfbookmarks)
Tabular is currently has no ability to ask questions of help. On github the issue tab is not showing.
The text was updated successfully, but these errors were encountered:
stillhope
changed the title
For 2 column style TOC that is scan/ocr -- suggestion for best tool?
For 2 column style TOC that is scan/ocr -- suggestion for best tool to extract TOC text?
Jan 10, 2023
Hello, Can you suggest best tool to extract the TOC text, from a 2 column TOC style (PDF is scanned and ocr'd).
The problem with OCR space it does not read the text in columns, e.g. first column then second column. Rather it reads left to right, so you get the text in the wrong place
For example: extract result from OCR space is (chapter Six is in column 2 of the TOC and the tool has read it on line 1)
Contents
Number Chapter Six: Units..............:.......48
Length, mass, capacity
Chapter One: Types Of and time.... ....
The problem with Tabular is I could not find any 2 column style TOC template. I tried to create my own template as a new person, and it did a very average job (e.g. did not recognise end of sentence, kept leading ..... before page number. I could not find any auto scripts in sublime text editor to handle the typical TOC edit text issues either.
Nuntber,
Chapter One: Types of,
number ........................................... 2,
Squares and square roots .................,2
Cubes and cube roots .......................,2
Multiples .......................................,4
Prime factorisation ..........................,6
Chapter Two: Using numbers .....1 0,
Tabular is better than OCRspace, in the fact text is in the correct order but still alot of manipulation using Sublime Text Editor to get the "TOC text file " into the required layout to be able to auto-create TOC bookmarks in PDF (ie using one of the apps, pdftk or jpdfbookmarks)
Tabular is currently has no ability to ask questions of help. On github the issue tab is not showing.
The text was updated successfully, but these errors were encountered: