Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue 62 #63

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Fix issue 62 #63

wants to merge 1 commit into from

Conversation

hderms
Copy link

@hderms hderms commented Nov 8, 2012

This fixes an issue I opened on the main repo found at #62

The feature proposed adds a new option to extracting images via Docsplit.extract_images. If you pass an additional argument in the form :and_return => :images, you receive an array of the paths of extracted images as a return value, instead of the path to the intermediate PDF, in the case of powerpoints.
Example:

Docsplit.extract_images('/tmp/some_ppt.ppt', :size => '1000x', :format => [:png, :jpg])

With a return value of:

["/tmp/some_ppt.png", "/tmp/some_ppt.jpg"]

The default return-value behavior of this method is preserved when that option is not specified, which is to return an array of PDFs that are returned by ensure_pdfs.

The justification for this feature is that I feel like users would benefit from getting the paths to the images created immediately upon calling the extract_images method. Having them determine these paths themselves seems out of spirit with the nature of this gem.

previous = size if @rolling
end
end
case @return_value
when :images

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this return type should be called "image_paths" instead of "images"?

@hderms
Copy link
Author

hderms commented Feb 22, 2013

Fixed inaccurate text in body of pull request. Just to be clear I modified the return value behavior for every case I could find that was appropriate, causing functions to return an array of file paths to the extracted data rather than the intermediate PDF, as it was previously.

@KurtPreston
Copy link

Any idea when this might get merged in? This is a great feature.

@@ -29,18 +29,23 @@ def initialize
def extract(pdfs, opts)
extract_options opts
FileUtils.mkdir_p @output unless File.exists?(@output)
pdfs = pdfs.is_a?(Array) ? pdfs : [pdfs]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do :

pdfs = Array(pdfs)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the input. I was misinformed about the nature of Array()

@antoinelyset
Copy link

👍

@dmayer
Copy link

dmayer commented Aug 16, 2013

Is there anything one can do to help getting this merged? It is rather messy to write code around extract_images to determine the generated file names based on "reverse engineering" the naming scheme and doing string manupulations. Thanks!

@bridgway
Copy link

+1 @dmayer's commet. I also think it would useful to have a similar feature for extract_pages as well so one can easily determine the generate file names when a PDF is split into individual pages. Cheers

@sandstrom
Copy link

ping @knowtheory @jashkenas

@antoinelyset
Copy link

If you're interested I did a ruby gem for this :

https://github.com/antoinelyset/poleica

@sandstrom
Copy link

I agree with @bridgway, would be useful on extract_pages too.

@knowtheory, what are your thoughts on this?

@steverob
Copy link

steverob commented Apr 5, 2016

Would love to see this merged.

Cleaning up code

Flatten the return value

Make it not add the path to the return value if an exception-worthy
event occurred. Instead, merely raise that exception

Make text_extractor also return paths to processed files

Make function extract_images always return array of image paths

Refine specs

Fix tests

Add nil check

Refactor tests to better isolate functionality

remove debugger

remove logger

Add printf debugging

Sanity checking

Printfs

Remove puts

Remove annoying line

Cleanup

Fix unnecessary usage of ternary operation to 'wrap' an Array and
replaced with Array() as it is more idiomatic

revert to original
@hderms hderms force-pushed the fix_issue_62 branch 2 times, most recently from aea6533 to 9789dd5 Compare April 5, 2016 19:08
@hderms
Copy link
Author

hderms commented Apr 5, 2016

@steverob I just squashed the old commits and will rebase against master in preparation for reconsideration by the maintainer

@steverob
Copy link

steverob commented Apr 5, 2016

Thank you! :)

Regards
Steve Robinson

On 06-Apr-2016, at 12:38 AM, Dermot Haughey [email protected] wrote:

@steverob I just squashed the old commits and will rebase against master in preparation for reconsideration by the maintainer


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants