Skip to content

Latest commit

 

History

History
557 lines (297 loc) · 31.3 KB

CHANGELOG.md

File metadata and controls

557 lines (297 loc) · 31.3 KB

v2.8.3 - 2024-12-03

Fix

  • Improve handling of disallowed formats (#429) (34c7c79)

v2.8.2 - 2024-12-03

Fix

  • ParserError EOF inside string (#470) (#472) (c90c41c)
  • PermissionError when using tesseract_ocr_cli_model (#496) (d3f84b2)

Documentation

Performance

  • Prevent temp file leftovers, reuse core type (#487) (051789d)

v2.8.1 - 2024-11-29

Fix

Documentation

v2.8.0 - 2024-11-27

Feature

  • ocr: Added support for RapidOCR engine (#415) (85b2999)

Fix

  • Use correct image index in word backend (#442) (767563b)
  • Update tests and examples for docling-core 2.5.1 (#449) (29807a2)

v2.7.1 - 2024-11-26

Fix

Documentation

  • Add DocETL, Kotaemon, spaCy integrations; minor docs improvements (#408) (7a45b92)

v2.7.0 - 2024-11-20

Feature

  • Add support for ocrmac OCR engine on macOS (#276) (6efa96c)

Fix

v2.6.0 - 2024-11-19

Feature

  • Added support for exporting DocItem to an image when page image is available (#379) (3f91e7d)
  • Expose ocr-lang in CLI (#375) (ed785ea)
  • Added excel backend (#334) (926dfd2)
  • Extracting picture data for raster images found in PPTX (#349) (7a97d71)

Fix

  • Fixing images in the input Word files (#330) (8533039)
  • Reduce logging by keeping option for more verbose (#323) (8b437ad)

Documentation

v2.5.2 - 2024-11-13

Fix

v2.5.1 - 2024-11-12

Fix

  • Handling of single-cell tables in DOCX backend (#314) (fb8ba86)

Documentation

v2.5.0 - 2024-11-12

Feature

  • OCR: Introduce the OcrOptions.force_full_page_ocr parameter that forces a full page OCR scanning (#290) (c6b3763)

Fix

  • Configure env prefix for docling settings (#315) (5d4a10b)
  • Added handling of grouped elements in pptx backend (#307) (81c8243)
  • Allow mps usage for easyocr (#286) (97f214e)

Documentation

v2.4.2 - 2024-11-08

Fix

  • EasyOcrModel: Support the use_gpu pipeline parameter in EasyOcrModel. Initialize easyocr (#282) (0eb065e)

v2.4.1 - 2024-11-08

Fix

  • tesserocr: Raise Exception if tesserocr has not loaded any languages (#279) (704d792)
  • Dockerfile example copy command (#234) (90836db)

Documentation

v2.4.0 - 2024-11-04

Feature

  • Pdf backend, table mode as options and artifacts path (#203) (40ad987)

Documentation

v2.3.1 - 2024-10-30

Fix

  • Simplify torch dependencies and update pinned docling deps (#190) (eb679cc)
  • Allow to explicitly initialize the pipeline (#189) (904d24d)

v2.3.0 - 2024-10-30

Feature

  • Add pipeline timings and toggle visualization, establish debug settings (#183) (2a2c65b)

Fix

  • Fix duplicate title and heading + add e2e tests for html and docx (#186) (f542460)

v2.2.1 - 2024-10-28

Fix

  • Fix header levels for DOCX & HTML (#184) (b9f5c74)
  • Handling of long sequence of unescaped underscore chars in markdown (#173) (94d0729)
  • HTML backend, fixes for Lists and nested texts (#180) (7d19418)
  • MD Backend, fixes to properly handle trailing inline text and emphasis in headers (#178) (88c1673)

Documentation

v2.2.0 - 2024-10-23

Feature

  • Update to docling-parse v2 without history (#170) (4116819)
  • Support AsciiDoc and Markdown input format (#168) (3023f18)

Fix

  • Set valid=false for invalid backends (#171) (3496b48)

v2.1.0 - 2024-10-18

Feature

  • Add coverage_threshold to skip OCR for small images (#161) (b346faf)

Fix

Documentation

v2.0.0 - 2024-10-16

Feature

Breaking

Documentation

v1.20.0 - 2024-10-11

Feature

  • New experimental docling-parse v2 backend (#131) (5e4944f)

v1.19.1 - 2024-10-11

Fix

  • Remove stderr from tesseract cli and introduce fuzziness in the text validation of OCR tests (#138) (dae2a3b)

Documentation

  • Simplify LlamaIndex example using Docling extension (#135) (5f1bd9e)

v1.19.0 - 2024-10-08

Feature

  • Add options for choosing OCR engines (#118) (f96ea86)

v1.18.0 - 2024-10-03

Feature

v1.17.0 - 2024-10-03

Feature

v1.16.1 - 2024-09-27

Fix

Documentation

v1.16.0 - 2024-09-27

Feature

  • Support tableformer model choice (#90) (d6df76f)

v1.15.0 - 2024-09-24

Feature

v1.14.0 - 2024-09-24

Feature

Fix

  • Fix OCR setting for pypdfium, minor refactor (#102) (d96b96c)

Documentation

v1.13.1 - 2024-09-23

Fix

  • Updated the render_as_doctags with the new arguments from docling-core (#93) (4794ce4)

v1.13.0 - 2024-09-18

Feature

Fix

  • Bumped the glm version and adjusted the tests (#83) (442443a)

Documentation

  • Updated Docling logo.png with transparent background (#88) (0da7519)

v1.12.2 - 2024-09-17

Fix

  • tests: Adjust the test data to match the new version of LayoutPredictor (#82) (fa9699f)

v1.12.1 - 2024-09-16

Fix

  • CLI compatibility with python 3.10 and 3.11 (#79) (2870fdc)

v1.12.0 - 2024-09-13

Feature

Documentation

  • Showcase RAG with LlamaIndex and LangChain (#71) (53569a1)

v1.11.0 - 2024-09-10

Feature

v1.10.0 - 2024-09-10

Feature

  • Linux arm64 support and reducing dependencies (#69) (27a7a15)

v1.9.0 - 2024-09-03

Feature

  • Export document pages as multimodal output (#54) (1de2e4f)

Documentation

v1.8.5 - 2024-08-30

Fix

v1.8.4 - 2024-08-30

Fix

Documentation

  • Add instructions for cpu-only installation (#56) (a8a60d5)

v1.8.3 - 2024-08-28

Fix

  • Table cells overlap and model warnings (#53) (f49ee82)

v1.8.2 - 2024-08-27

Fix

Documentation

v1.8.1 - 2024-08-26

Fix

v1.8.0 - 2024-08-23

Feature

  • Page-level error reporting from PDF backend, introduce PARTIAL_SUCCESS status (#47) (a294b7e)

v1.7.1 - 2024-08-23

Fix

  • Better raise exception when a page fails to parse (#46) (8808463)
  • Upgrade docling-parse to 1.1.1, safety checks for failed parse on pages (#45) (7e84533)

v1.7.0 - 2024-08-22

Feature

  • Upgrade docling-parse PDF backend and interface to use page-by-page parsing (#44) (a8c6b29)

v1.6.3 - 2024-08-22

Fix

  • Usage of bytesio with docling-parse (#43) (fac5745)

v1.6.2 - 2024-08-22

Fix

  • Remove [ocr] extra to fix wheel install (#42) (6995268)

v1.6.1 - 2024-08-21

Fix

v1.6.0 - 2024-08-20

Feature

  • Add adaptive OCR, factor out treatment of OCR areas and cell filtering (#38) (e94d317)

v1.5.0 - 2024-08-20

Feature

  • Allow computing page images on-demand with scale and cache them (#36) (78347bf)

Documentation

v1.4.0 - 2024-08-14

Feature

  • Update parser with bytesio interface and set as new default backend (#32) (90dd676)

Fix

v1.3.0 - 2024-08-12

Feature

  • Output page images and extracted bbox (#31) (63d80ed)

v1.2.1 - 2024-08-07

Fix

Documentation

v1.2.0 - 2024-08-07

Feature

v1.1.2 - 2024-07-31

Fix

  • Set page number using 1-based indexing (#22) (d2d9543)

v1.1.1 - 2024-07-30

Fix

  • Correct text extraction for table cells (#21) (f4bf3d2)

v1.1.0 - 2024-07-26

Feature

  • Add simplified single-doc conversion (#20) (d603137)

v1.0.2 - 2024-07-24

Fix

  • Add easyocr to main deps for valid extra (#19) (54b3dda)

v1.0.1 - 2024-07-24

Fix

v1.0.0 - 2024-07-18

Feature

Breaking

v0.4.0 - 2024-07-17

Feature

  • Optimize table extraction quality, add configuration options (#11) (e9526bb)

v0.3.1 - 2024-07-17

Fix

Documentation

  • Reflect supported Python versions, add badges (#10) (2baa35c)

v0.3.0 - 2024-07-17

Feature

  • Enable python 3.12 support by updating glm (#8) (fb72688)

Documentation

  • Add setup with pypi to Readme (#7) (2803222)

v0.2.0 - 2024-07-16

Feature