Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for Excelx (xlsx) Exporter #64

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

athityakumar
Copy link
Member

@athityakumar athityakumar commented Oct 25, 2017

  • Fixes a silly yet major bug with linkage between daru & daru-io calls (for ALL write exporter methods, which yet again draws attention to handle Add tests for linkages between daru & daru-io calls #55 ASAP)
  • Adds support to Excelx Exporter (formatting support not added yet) with documentation and tests. Current arguments include sheet and boolean values for index, header and data (show them or not). And of course, filename for write method. Fix Support for Excelx Exporter #63.
  • All the write methods return true incase of no mishaps when the particular exporter is called.

# `Daru::DataFrame` instance variables
class Excelx < Base
Daru::DataFrame.register_io_module :to_excelx_string, self
Daru::DataFrame.register_io_module :write_excelx, self
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from standalone xlsx exporter, doesn't it make sense for the excel exporter to redirect here incase of .xlsx filename in excel exporter's write method? Though, for such a redirect to be possible, they would need to have similar kind of arguments. That is, formatting has to be supported by xlsx exporter like excel exporter.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure. Yep, it looks humane, but if/when you'll add formatting support for Excel exporters -- conventions on formatting are pretty different in two versions of Excel files and corresponding libraries.
In fact, I believe that in 2017 support for Excel 1999-2003 format (xls) should be done in "maintenance" mode, while xlsx exporter should be shiny and friendly.

string = df.to_excelx_string(index: false)
df.write_excel('path/to/file.xlsx', index: false)
```

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am starting to think this part of README becames too long and repetitive. Just as an idea for the future:

  1. Leave 2-3 most "tasty" examples in REDME + link to...
  2. Formats.md with regular structure...
  3. Which is built automatically, gathering the comments from top of all exporters/importers files, with similar format.

WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For generating files like {FORMAT}_Importer.md, does an ERB template work? Or is there a better option?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's think. In my head, it (probably) does the following:

  • takes all the importers (exporters) code;
  • extract, say, top-of-the-file comment with explanations;
  • join them together with pretty headers;
  • store to Formats.md (one, not document-per-formatter, it is tiresome to browse).
    • probably generates TOC on the top of the file;
    • probably adds some "preface" at the top of the file (taken from some _preface.md).

Can be implemented with just Ruby code, or small ERB template, which is the simplest.

WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zverok - Acknowledged. Let us take this discussion to #71 please? 😄

@@ -80,6 +80,7 @@ def write(path)
contents.each { |content| csv << content }
csv.close
end
true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that this could act as a consistent means of checking whether the "writing into a file" actually took place. Else, the File.write(...) just returns the number of characters written - which may not mean much to the user/developer compared to a consistent (constant) true.

It's not really necessary, but I thought this would be better. Let me know if this can be reverted.

# `Daru::DataFrame` instance variables
class Excelx < Base
Daru::DataFrame.register_io_module :to_excelx_string, self
Daru::DataFrame.register_io_module :write_excelx, self
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure. Yep, it looks humane, but if/when you'll add formatting support for Excel exporters -- conventions on formatting are pretty different in two versions of Excel files and corresponding libraries.
In fact, I believe that in 2017 support for Excel 1999-2003 format (xls) should be done in "maintenance" mode, while xlsx exporter should be shiny and friendly.

super(file_extension: '.xlsx')
end

# Exports an Excelx Exporter instance to an xlsx file.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it doesn't export the exporter, it "exports the dataframe", or "performs the export(er)".

(If it is standard phrase for all exporters, this notice is related to all of them).

# @param path [String] Path of excelx file where the dataframe is to be saved
#
# @example Writing an Excelx Exporter instance to an xlsx file
# instance.write('filename.xlsx')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe that such a simple example brings any clarity. Just remove it?

def process_offsets
@row_offset = @header ? 1 : 0
@col_offset = 0 unless @index
@col_offset ||= @dataframe.index.is_a?(Daru::MultiIndex) ? @dataframe.index.width : 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this or similar statement repeated in other places in codebase (calculating of index witdth)? Maybe move it to some utility module, or to base exporter?

@col_offset ||= @dataframe.index.is_a?(Daru::MultiIndex) ? @dataframe.index.width : 1
end

def fetch_headers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "fetch" a good name? "Fetch" implies some external source to be used, while this method in fact "formats" or "renders" its own instance variables.

case idx
when Daru::Vector, Daru::MultiIndex, Array then idx.map(&:to_s)
else [idx.to_s]
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't Array(idx).map(&:to_s) work instead of this case?..

@data = data
@index = index
@sheet = sheet
@header = header
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, do we have this lot of boolean vars in other importers too? I believe that one hash @render_parts = {data: data, headers: headers, index: index} could be more effective and will allow to generalize a lot. In addition, looking at var names you can wrongly conclude that, say, @index contains dataframe's index, not "if we need to render index" boolean.

formatting(row, @data)
end

def formatting(idx, format)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the best name, as we'll add "real" Excel formatting once. Something like just to_a?.. Like this:

# or "to_strings"?..
def to_a(object, render = true)
  return [] unless render
  Array(object).map(&:to_s)
end

BTW, have you checked that to_s is good enough? For example, wouldn't numbers or dates be formatted as strings, instead of appropriate Excel column types?

@athityakumar
Copy link
Member Author

Oops, I didn't notice that this Pull Request had already been reviewed. I'll soon push the changes for the next review. Meanwhile, I've also opened #71 for keeping track of, in a different PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants