Home

This wiki contains more detail on various aspects of the public API and the PDF document format.

Features

Extracts the position and size of letters from any PDF document. This enables access to the text and words in a PDF document.
Allows the user to retrieve images from the PDF document.
Allows the user to read PDF annotations, PDF forms, embedded documents and hyperlinks from a PDF.
Provides access to metadata in the document.
Exposes the internal structure of the PDF document.
Creates PDF documents containing text and path operations.
Read content from encrypted files by providing the password.
Document Layout Analysis - PdfPig also comes with some tools for document layout analysis such as the Recursive XY Cut, Document Spectrum and Nearest Neighbour algorithms, along with others. It also provides support for exporting page contents to Alto, PageXML and hOcr format. See Document Layout Analysis
Tables are not directly supported but you can use Tabula Sharp or Camelot Sharp. As of 2023 Tabula-sharp is the most complete port source

This provides an alternative to the commercial libraries such as SpirePDF or copyleft alternatives such as iText 7 (AGPL) for some use-cases.

It should be noted the library does not support use-cases such as converting HTML to PDF or from other document formats to PDF. For HTML to PDF a good quality solution is wkhtmltopdf. It also does not currently support generating images from PDF pages. If you need this functionality see if docnet meets your requirements.

Getting Started

PdfPig aims to provide 2 main areas of functionality:

Extracting PDF content.
Creating PDFs.

The simplest usage of the library for extracting content involves opening a document and extracting the position and text of all words across all pages:

using System.Collections.Generic;
using UglyToad.PdfPig;
using UglyToad.PdfPig.Content;

using (PdfDocument document = PdfDocument.Open(@"C:\path\to\file.pdf"))
{
	foreach (Page page in document.GetPages())
	{
		IEnumerable<Word> words = page.GetWords();
	}
}

Pages can also be accessed individually with an index starting at 1. You can also access the positions and sizes of the individual letters on a page:

using System.Collections.Generic;
using UglyToad.PdfPig;
using UglyToad.PdfPig.Content;

using (PdfDocument document = PdfDocument.Open(@"C:\path\to\file.pdf"))
{
	Page page = document.GetPage(1);
	IReadOnlyList<Letter> letters = page.Letters;
}

For document creation a new document can be created using the Standard14 fonts which are included in the PDF specification:

PdfDocumentBuilder builder = new PdfDocumentBuilder();
PdfPageBuilder page = builder.AddPage(PageSize.A4);
PdfDocumentBuilder.AddedFont font = builder.AddStandard14Font(Standard14Font.Helvetica);
page.AddText("Hello World!", 12, new PdfPoint(25, 520), font);
byte[] b = builder.Build();

The resulting bytes are a valid PDF document and can be saved to the file system, served from a web server, etc.

You can use document builder to visualise what pdf pig has done for document reading.

using UglyToad.PdfPig;
using UglyToad.PdfPig.Writer;

 using (var document = PdfDocument.Open(pdf))
 {
    var builder = new PdfDocumentBuilder{};
    var pageBuilder = builder.AddPage(document, pageNumber);
    pageBuilder.SetStrokeColor(255,0,0);
    var page = document.GetPage(pageNumber);
    foreach(var word in page.GetWords())
    {
         var box = word.BoundingBox;
         pageBuilder.DrawRectangle(box.BottomLeft, (decimal)box.Width, (decimal)box.Height);
    }
    
    builder.ToImage().Display();
 }

View this gist that goes through some basic beginner examples: https://gist.github.com/cordasfilip/c6d2510b358323dc2f71c843460cbcdf

Release Notes

Release notes as well as downloadable packages can be found on the releases page https://github.com/UglyToad/PdfPig/releases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Features

Getting Started

Contents

Release Notes

Clone this wiki locally