Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No images detected for PDF with a clear image #940

Open
mikethea1 opened this issue Nov 20, 2024 · 2 comments
Open

No images detected for PDF with a clear image #940

mikethea1 opened this issue Nov 20, 2024 · 2 comments

Comments

@mikethea1
Copy link

Here is the file of interest:
repro_p1.pdf

Clearly this file has an image yet pdfpg does not find it:

using var doc = PdfDocument.Open(path);
var page = doc.GetPage(1);
Console.WriteLine(page.NumberOfImages); // 0

I dug into this a bit, and I believe the reason is that the image is being referenced via a "cs" operator (SetNonStrokeColorSpace) which references /Resources/Pattern and ultimately the image.

It would be cool if PdfPig could detect images referenced in this way!

@BobLd
Copy link
Collaborator

BobLd commented Nov 20, 2024

@mikethea1 thx for sharing the sample document, I'll look into that shortly

@BobLd
Copy link
Collaborator

BobLd commented Nov 25, 2024

@mikethea1 I had a look and you are correct, the image is inside the Pattern color. I think this is a bit of a corner case and it might not be beneficial to the library to include those.

You can still extract them relatively easily though by implementing your own ContentStreamProcessor. I'll try to create a sample to show you how to achieve that. If there's not real performance cost, we might include that in the cor library

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants