Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PdfDocumentBuilder creates broken file when copying pages from specific source PDFs #936

Open
cremor opened this issue Nov 12, 2024 · 2 comments
Assignees
Labels
bug document-editing Related to creating or editing/modifying documents

Comments

@cremor
Copy link

cremor commented Nov 12, 2024

I have an application that uses PdfPig to merge multiple PDF files. I'm using PdfDocumentBuilder.AddPage for this.
The users of the application have reported a case where the created (merged) PDF is invalid and contains broken/garbled text. When I open the created PDF in Acrobat Reader I even get an error message saying that the PDF page contains errors.

Sample input files:

  • Input.pdf
    According to its metadata this file was created with OpenOffice.
  • Input with Comments.pdf
    This is based on the same file as the first, but was edited with Wondershare PDFelement to add some comments as annotations.

Sample code:

string inputFile = @"C:\Data\Input.pdf";
string outputFile = @"C:\Data\Output.pdf";

using var targetStream = File.Open(outputFile, FileMode.Create, FileAccess.Write);
using var outputDocument = new PdfDocumentBuilder(targetStream);
using var inputDocument = PdfDocument.Open(inputFile);

for (int i = 1; i <= inputDocument.NumberOfPages; i++)
{
    outputDocument.AddPage(inputDocument, i);
}

I've tested the following versions of PdfPig, all are affected:

  • 0.1.8
  • 0.1.9
  • 0.1.10-alpha-20241103-132ad

Input:
grafik

Output when shown in Acrobat Reader:
grafik

Ouput when shown in Microsoft Edge:
grafik

@BobLd BobLd self-assigned this Nov 12, 2024
@BobLd BobLd added bug document-editing Related to creating or editing/modifying documents labels Nov 12, 2024
@cremor
Copy link
Author

cremor commented Nov 18, 2024

This might be related to an embedded font. Sometimes the end users get an error message like "The embedded font "EVPYXN+NotoSerifCJKjp-Regular-Identity-H" coud not be loaded..." from Acrobat Reader. But that error isn't shown every time and I haven't seen it myself yet.

Also, if the PDF (either of the two) is resaved with Acrobat Reader then the problem doesn't happen any more.

@BobLd
Copy link
Collaborator

BobLd commented Nov 18, 2024

@cremor thanks for the added context. I'll try to have a look soon but any help here would be really appreciated 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug document-editing Related to creating or editing/modifying documents
Projects
None yet
Development

No branches or pull requests

2 participants