Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding a warning to write_eml generates random packageId and system values #293

Open
amoeba opened this issue Jan 6, 2020 · 3 comments

Comments

@amoeba
Copy link
Collaborator

amoeba commented Jan 6, 2020

Stemming from #292,

Right now, when an eml object is written to disk with write_eml without a packageId, write_eml fills in packageId with a UUID and sets system to uuid:

EML/R/write_eml.R

Lines 27 to 31 in 6c2911c

## FIXME, insist on a packageId, or consider validation which can ignore id?
if (is.null(eml$packageId)) {
eml$packageId <- uuid::UUIDgenerate()
eml$system <- "uuid"
}

In #292, this created some understandable confusion for @scelmendorf because it meant that eml_validate's behavior was inconsistent depending on whether the object was written to disk first or not.

In #292, @cboettig wrote:

I think the best fix would be that write_eml should throw a warning if no packageId is found, and explain that it is adding an ID automatically. I'd love a PR for that if we have consensus on that.

Also in #292, @mobb wrote:

The temporary packageId and system are fine (as inserted by write_eml). However, users should be aware that this is happening, so I recommend a message to the screen if write_eml inserts these. For those who control their packageIds, the msg will help remind them to assign it.

I agree with the above and it seems we have a consensus but I wanted to file a standalone issue for discussion in case it was needed. Open to comments or suggested wording of the warnings/messages.

I propose:

  1. If packageId is not set when write_eml is called, issue a warning.
  2. The warning will have the text: "packageId generated automatically because it was not already set. See ?packageId for more information."
  3. Create a man page at ?packageId with more information. TBD. It'd explain how to set packageId and system and how to come up with sensible values.
@mbjones
Copy link
Member

mbjones commented Jan 12, 2020

While I fully appreciate the motivation for providing a packageId automatically, I am still uncomfortable with doing so in write_eml, which is meant to simply be a serialization of an in-memory object. Side-effects that aren't in user control often cause problems, and I think what is written to disk should be round-trippable with the in-memory object.

Instead of changing content, couldn't we simply provide warnings for any validation errors (including a missing packageId) so the user will know that their document doesn't validate when it is written to disk? This would allow folks that want to export a partial document for further downstream processing to do so, while still helping to prompt when required fields are missing.

@cboettig
Copy link
Member

@mbjones Yeah, I'm with you 💯 on that.

I think we should also add a helper routine to add_packageId() that could default to the current UUID method to avoid the possible complexity associated with this task. That would be far more transparent than doing it on write.

Perhaps we can also add some additional logic into the validator which would report that the EML was valid except for a missing packageId (possibly with a suggestion on how to add one?, e.g.)

EML file is valid except for a missing packageId. You can create one now with `add_packageId()`

I've always been a bit annoyed at the standard XML validation errors not being as user-friendly or comprehensive as they should be.

Thoughts?

@mbjones
Copy link
Member

mbjones commented Jan 13, 2020

Sounds great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants