No images detected for PDF with a clear image #940

mikethea1 · 2024-11-20T15:04:37Z

Here is the file of interest:
repro_p1.pdf

Clearly this file has an image yet pdfpg does not find it:

using var doc = PdfDocument.Open(path);
var page = doc.GetPage(1);
Console.WriteLine(page.NumberOfImages); // 0

I dug into this a bit, and I believe the reason is that the image is being referenced via a "cs" operator (SetNonStrokeColorSpace) which references /Resources/Pattern and ultimately the image.

It would be cool if PdfPig could detect images referenced in this way!

The text was updated successfully, but these errors were encountered:

BobLd · 2024-11-20T21:00:00Z

@mikethea1 thx for sharing the sample document, I'll look into that shortly

BobLd · 2024-11-25T18:27:08Z

@mikethea1 I had a look and you are correct, the image is inside the Pattern color. I think this is a bit of a corner case and it might not be beneficial to the library to include those.

You can still extract them relatively easily though by implementing your own ContentStreamProcessor. I'll try to create a sample to show you how to achieve that. If there's not real performance cost, we might include that in the cor library

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No images detected for PDF with a clear image #940

No images detected for PDF with a clear image #940

mikethea1 commented Nov 20, 2024

BobLd commented Nov 20, 2024

BobLd commented Nov 25, 2024

No images detected for PDF with a clear image #940

No images detected for PDF with a clear image #940

Comments

mikethea1 commented Nov 20, 2024

BobLd commented Nov 20, 2024

BobLd commented Nov 25, 2024