You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After extracting the text from a PDF or Doc file I need to do something with it. I understand not loading the string into ruby (it could be huge), but it'd be helpful to get the output file path as a return value. Otherwise we have to use different output dirs or try to reconstruct its path based on other information, which feels wrong.
Currently Docsplit::TextExtractor#extract_text is returning the source file paths. For Transparent doc(x) file conversion it returns the intermediary tempfile pdf.
E.g. when I map over an array with a pdf and a doc in my project's tmp dir I get back
I didn't make a PR. I worked around the problem by putting the document into its own temporary subdirectory then using ls. I do think it's something that can be fixed, as it's just a forgot-to-think-about-the-return-value problem. But the PR backlog is growing.
related to #42
After extracting the text from a PDF or Doc file I need to do something with it. I understand not loading the string into ruby (it could be huge), but it'd be helpful to get the output file path as a return value. Otherwise we have to use different output dirs or try to reconstruct its path based on other information, which feels wrong.
Currently
Docsplit::TextExtractor#extract_text
is returning the source file paths. For Transparent doc(x) file conversion it returns the intermediary tempfile pdf.E.g. when I map over an array with a pdf and a doc in my project's tmp dir I get back
Instead I'd like to be given the path of the output text files, so I can open them.
Would this be a good PR, or is there a deliberate reason to return these other file paths that could be documented?
The text was updated successfully, but these errors were encountered: