Docsplit::TextExtractor#extract_text should return the path of the output text file? #139

nruth · 2016-01-25T23:26:36Z

related to #42

After extracting the text from a PDF or Doc file I need to do something with it. I understand not loading the string into ruby (it could be huge), but it'd be helpful to get the output file path as a return value. Otherwise we have to use different output dirs or try to reconstruct its path based on other information, which feels wrong.

Currently Docsplit::TextExtractor#extract_text is returning the source file paths. For Transparent doc(x) file conversion it returns the intermediary tempfile pdf.
E.g. when I map over an array with a pdf and a doc in my project's tmp dir I get back

[
"/var/folders/_j/q3pr8b3s1vj85mhqvyb06gr40000gn/T/docsplit/sample.docx20160125-29577-go3upi.pdf",
"/Users/nruth/dev/monitor/tmp/AISB08.pdf20160125-29577-1svhpfo.pdf"
]

Instead I'd like to be given the path of the output text files, so I can open them.

Would this be a good PR, or is there a deliberate reason to return these other file paths that could be documented?

The text was updated successfully, but these errors were encountered:

harssh · 2016-03-18T15:20:54Z

👍 Are we going ahead with this or is this already implemented ?

nruth · 2016-03-20T16:31:15Z

I didn't make a PR. I worked around the problem by putting the document into its own temporary subdirectory then using ls. I do think it's something that can be fixed, as it's just a forgot-to-think-about-the-return-value problem. But the PR backlog is growing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docsplit::TextExtractor#extract_text should return the path of the output text file? #139

Docsplit::TextExtractor#extract_text should return the path of the output text file? #139

nruth commented Jan 25, 2016

harssh commented Mar 18, 2016

nruth commented Mar 20, 2016

Docsplit::TextExtractor#extract_text should return the path of the output text file? #139

Docsplit::TextExtractor#extract_text should return the path of the output text file? #139

Comments

nruth commented Jan 25, 2016

harssh commented Mar 18, 2016

nruth commented Mar 20, 2016