Skip to content

Commit

Permalink
Merge pull request #44 from kotaro-kinoshita/docs/english-document
Browse files Browse the repository at this point in the history
add english document
  • Loading branch information
kotaro-kinoshita authored Nov 26, 2024
2 parents 641cba7 + c2ad6fe commit cdedbaf
Show file tree
Hide file tree
Showing 11 changed files with 574 additions and 281 deletions.
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
日本語版 | [English](README_EN.md)

<img src="static/logo/horizontal.png" width="800px">

![Python](https://img.shields.io/badge/Python-3.9|3.10|3.11|3.12-F9DC3E.svg?logo=python&logoColor=&style=flat)
Expand Down Expand Up @@ -38,7 +40,7 @@ Markdown でエクスポートした結果は関してはリポジトリ内の[s

## 📣 リリース情報

- 2024 年 12XX YomiToku vX.X.X を公開
- 2024 年 1126 YomiToku v0.5.0 ベータ版を公開

## 💡 インストールの方法

Expand Down Expand Up @@ -86,4 +88,4 @@ yomitoku --help
非商用での個人利用、研究目的での利用はご自由にお使いください。
商用目的での利用に関しては、別途、商用ライセンスを提供しますので、開発者にお問い合わせください。

YomiToku © 2024 by MLism Inc. is licensed under CC BY-NC-SA 4.0. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-sa/4.0/
YomiToku © 2024 by Kotaro Kinoshita is licensed under CC BY-NC-SA 4.0. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-sa/4.0/
93 changes: 93 additions & 0 deletions README_EN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
[日本語版](README.md) | English

<img src="static/logo/horizontal.png" width="800px">

![Python](https://img.shields.io/badge/Python-3.9|3.10|3.11|3.12-F9DC3E.svg?logo=python&logoColor=&style=flat)
![Pytorch](https://img.shields.io/badge/Pytorch-2.5-EE4C2C.svg?logo=Pytorch&style=fla)
![CUDA](https://img.shields.io/badge/CUDA->=11.8-76B900.svg?logo=NVIDIA&style=fla)
![OS](https://img.shields.io/badge/OS-Linux|Mac|Win-1793D1.svg?&style=fla)
[![Document](https://img.shields.io/badge/docs-live-brightgreen)](https://kotaro-kinoshita.github.io/yomitoku-dev/)

## 🌟 Introduction

YomiToku is a Document AI engine specialized in Japanese document image analysis. It provides full OCR (optical character recognition) and layout analysis capabilities, enabling the recognition, extraction, and conversion of text and diagrams from images.

- 🤖 Equipped with four AI models trained on Japanese datasets: text detection, text recognition, layout analysis, and table structure recognition. All models are independently trained and optimized for Japanese documents, delivering high-precision inference.
- 🇯🇵 Each model is specifically trained for Japanese document images, supporting the recognition of over 7,000 Japanese characters, including vertical text and other layout structures unique to Japanese documents. (It also supports English documents.)
- 📈 By leveraging layout analysis, table structure parsing, and reading order estimation, it extracts information while preserving the semantic structure of the document layout.
- 📄 Supports a variety of output formats, including HTML, Markdown, JSON, and CSV. It also allows for the extraction of diagrams and images contained within the documents.
- ⚡ Operates efficiently in GPU environments, enabling fast document transcription and analysis. It requires less than 8GB of VRAM, eliminating the need for high-end GPUs.

## 🖼️ Demo

The verification results for various types of images are also included in [gallery.md](gallery.md)

| Input | Results of OCR |
| :--------------------------------------------------------: | :-----------------------------------------------------: |
| <img src="static/in/demo.jpg" width="400px"> | <img src="static/out/in_demo_p1_ocr.jpg" width="400px"> |
| Results of Layout Analysis | Results of HTML Export |
| <img src="static/out/in_demo_p1_layout.jpg" width="400px"> | <img src="static/out/demo_html.png" width="400px"> |


For the results exported in Markdown, please refer to [static/out/in_demo_p1.md](static/out/in_demo_p1.md) in the repository.

- `Red Frame`: Positions of figures and images
- `Green Frame`: Overall table region
- `Pink Frame`:` Table cell structure (text within the cells represents [row number, column number] (rowspan x colspan))
- `Blue Frame`: Paragraph and text group regions
- `Red Arrow`: Results of reading order estimation

Source of the image: Created by processing content from “Reiwa 6 Edition Information and Communications White Paper, Chapter 3, Section 2: Technologies Advancing with AI Evolution” (https://www.soumu.go.jp/johotsusintokei/whitepaper/ja/r06/pdf/n1410000.pdf):(Ministry of Internal Affairs and Communications).

## 📣 Release

- Released YomiToku vX.X.X on December XX, 2024.

## 💡 Installation

```
pip install yomitoku
```


- Please install the version of PyTorch that matches your CUDA version. By default, a version compatible with CUDA 12.4 or higher will be installed.
- PyTorch versions 2.5 and above are supported. As a result, CUDA version 11.8 or higher is required. If this is not feasible, please use the Dockerfile provided in the repository.

## 🚀 Usage

```
yomitoku ${path_data} -f md -o results -v --figure
```

- `${path_data}`: Specify the path to a directory containing images to be analyzed or directly provide the path to an image file. If a directory is specified, images in its subdirectories will also be processed.
- `-f`, `--format`: Specify the output file format. Supported formats are json, csv, html, and md.
- `-o`, `--outdir`: Specify the name of the output directory. If it does not exist, it will be created.
- `-v`, `--vis`: If specified, outputs visualized images of the analysis results.
- `-d`, `--device`: Specify the device for running the model. If a GPU is unavailable, inference will be executed on the CPU. (Default: cuda)
- `--ignore_line_break`: Ignores line breaks in the image and concatenates sentences within a paragraph. (Default: respects line breaks as they appear in the image.)
- `--figure_letter`: Exports characters contained within detected figures and tables to the output file.
- `--figure`: Exports detected figures and images to the output file (supported only for html and markdown).


For other options, please refer to the help documentation.
```
yomitoku --help
```

**NOTE**
- It is recommended to run on a GPU. The system is not optimized for inference on CPUs, which may result in significantly longer processing times.
- Only printed text recognition is supported. While it may occasionally read handwritten text, official support is not provided.
- YomiToku is optimized for document OCR and is not designed for scene OCR (e.g., text printed on non-paper surfaces like signs).
- The resolution of input images is critical for improving the accuracy of AI-OCR recognition. Low-resolution images may lead to reduced recognition accuracy. It is recommended to use images with a minimum short side resolution of 720px for inference.

## 📝 Documents

For more details, please refer to the [documentation](https://kotaro-kinoshita.github.io/yomitoku-dev/)

## LICENSE

The source code stored in this repository and the model weight files related to this project on Hugging Face Hub are licensed under CC BY-NC-SA 4.0.
You are free to use them for non-commercial personal use or research purposes.
For commercial use, a separate commercial license is available. Please contact the developers for more information.

YomiToku © 2024 by Kotaro Kinoshita is licensed under CC BY-NC-SA 4.0. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-sa/4.0/
21 changes: 21 additions & 0 deletions docs/index.en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
## 🌟 Introduction

YomiToku is a Document AI engine specialized in Japanese document image analysis. It provides full OCR (optical character recognition) and layout analysis capabilities, enabling the recognition, extraction, and conversion of text and diagrams from images.

- 🤖 Equipped with four AI models trained on Japanese datasets: text detection, text recognition, layout analysis, and table structure recognition. All models are independently trained and optimized for Japanese documents, delivering high-precision inference.
- 🇯🇵 Each model is specifically trained for Japanese document images, supporting the recognition of over 7,000 Japanese characters, including vertical text and other layout structures unique to Japanese documents. (It also supports English documents.)
- 📈 By leveraging layout analysis, table structure parsing, and reading order estimation, it extracts information while preserving the semantic structure of the document layout.
- 📄 Supports a variety of output formats, including HTML, Markdown, JSON, and CSV. It also allows for the extraction of diagrams and images contained within the documents.
- ⚡ Operates efficiently in GPU environments, enabling fast document transcription and analysis. It requires less than 8GB of VRAM, eliminating the need for high-end GPUs.。

## 🙋 FAQ

### Q. Is it possible to use YomiToku in an environment without internet access?
A. Yes, it is possible.
YomiToku connects to Hugging Face Hub to automatically download model files during the first execution, requiring internet access at that time. However, you can manually download the files in advance, allowing YomiToku to operate in an offline environment. For details, please refer to Usage under the section "Using YomiToku in an Offline Environment."

### Q. Is commercial use allowed?
A. This package is licensed under CC BY-NC 4.0. It is available for free for personal and research purposes. For commercial use, a paid commercial license is required. Please contact the developers for further details.

### Q. Can handwritten text be recognized?
A. Only printed text recognition is supported. While handwritten text may occasionally be recognized, it is not officially supported.
14 changes: 0 additions & 14 deletions docs/index.md → docs/index.ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,20 +8,6 @@ YomiToku は日本語に特化した AI 文章画像解析エンジン(Document
- 📄 多様な出力形式をサポートしています。html やマークダウン、json、csv のいずれかのフォーマットに変換可能です。また、文書内に含まれる図表、画像の抽出の出力も可能です。
- ⚡ GPU環境で高速に動作し、効率的に文書の文字起こし解析が可能です。また、VRAMも8GB以内で動作し、ハイエンドなGPUを用意する必要はありません。

## 🚀 主な機能

- 日本語の文書画像向けの汎用 AI-OCR 機能
- 文書画像のレイアウト解析 AI による図表や段落の自動推定機能
- 表の構造解析 AI による表の行列構造の認識機能

## 🔥 今後の開発予定

YomiToku は現在も開発中であり、今後以下のような機能の拡張を目指しています。

- 手書き文字の認識
- 数式の検知や認識、Latex フォーマットへの変換
- レイアウト解析機能の拡張による識別要素の細分化(タイトル、見出し、図表のキャプション)

## 🙋 FAQ

### Q. インターネットに接続できない環境での動作は可能ですか?
Expand Down
47 changes: 47 additions & 0 deletions docs/installation.en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Installation


This package requires Python 3.9 or later and PyTorch 2.5 or later for execution. PyTorch must be installed according to your CUDA version. A GPU with more than 8GB of VRAM is recommended. While it can run on a CPU, please note that the processing is not currently optimized for CPUs, which may result in longer execution times.

## from PYPI

```bash
pip install yomitoku
```

## using uv
This repository uses the package management tool [uv](https://docs.astral.sh/uv/). After installing uv, clone the repository and execute the following commands:

```bash
uv sync
```

When using uv, you need to modify the following part of the pyproject.toml file to match your CUDA version. By default, PyTorch compatible with CUDA 12.4 will be downloaded.

```pyproject.tom
[[tool.uv.index]]
name = "pytorch-cuda124"
url = "https://download.pytorch.org/whl/cu124"
explicit = true
```


## using docker

A Dockerfile is provided in the root of the repository, which you are welcome to use.

```bash
docker build -t yomitoku .
```

=== "GPU"

```bash
docker run -it --gpus all -v $(pwd):/workspace --name yomitoku yomitoku /bin/bash
```

=== "CPU"

```bash
docker run -it -v $(pwd):/workspace --name yomitoku yomitoku /bin/bash
```
13 changes: 11 additions & 2 deletions docs/installation.md → docs/installation.ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,23 @@
pip install yomitoku
```

## UV でのインストール
## uv でのインストール

本リポジトリはパッケージ管理ツールに [UV](https://docs.astral.sh/uv/) を使用しています。UV をインストール後、リポジトリをクローンし、以下のコマンドを実行してください
本リポジトリはパッケージ管理ツールに [uv](https://docs.astral.sh/uv/) を使用しています。uv をインストール後、リポジトリをクローンし、以下のコマンドを実行してください

```bash
uv sync
```

uvを利用する場合、`pyproject.toml`の以下の部分をご自身のcudaのバージョンに合わせて修正する必要があります。デフォルトではCUDA12.4に対応したpytorchがダウンロードされます。

```pyproject.tom
[[tool.uv.index]]
name = "pytorch-cuda124"
url = "https://download.pytorch.org/whl/cu124"
explicit = true
```

## Docker 環境での実行

リポジトリの直下に dockerfile を配置していますので、そちらも活用いただけます。
Expand Down
Loading

0 comments on commit cdedbaf

Please sign in to comment.