Skip to content

Checksum, characterization and fixity checks

Jayanth Dungavath edited this page May 1, 2018 · 13 revisions

Fedora

  • We can provide see Example (4): Uploaded file with checksum multiple checksums for any given file while ingesting it, for Fedora to save them along with the file being ingested. All these checksums can be retrieved from Fedora. But, there is no configuration setting for Fedora to compute checksums apart from the default SHA-1 that it computes and stores in "premis:hasMessageDigest". "the underlying storage application (ModeShape) requires SHA-1 for internal management of binary resources. I do not believe that is something that can be configured-away... if that is what you are interested in determining." --Andrew Woods, Fedora Project.

  • "Also, I grabbed Andrew Woods, who is here at LDCX - Andrew is the tech lead for Fedora, and asked about sha256 in Fedora. Modeshape, which Fedora relies upon for the file system, only does sha1. But we can ingest MD5, and/or sha1 and/or sha256 and tell Fedora (per object) which checksum to pay attention to for fixity checks. If we ingest a sha256 we can tell Fedora4 to recalculate a sha256, for that particular object, when it runs fixity checks. There is no mechanism for retrospectively generating sha256's for objects already in the repository that don't have one, but Andrew said if that is a need that we have, he would be glad to work with us to do it. Last, the Fedora Spec that is being drafted will approach fixity differently, which means it will be different in Fedora 5. See http://fedora.info/spec/". --Linda Newman.

  • "you have the option of providing any or all of the following digests for that resource: SHA-1, SHA-256, and/or MD5. If you provide any of these digests on-ingest, Fedora will internally calculate and verify a match with the provided digest(s).". We can update see Example (2) Update default fixity algorithm: the default fixity algorithm such that if we provide any checksum other than SHA-1, Fedora will run fixity checks against the file by computing the checksum of given file using the algorithm stated in default fixity algorithm setting and compare it against the provided checksum on-ingest.

  • We can retrieve just the stored checksums provided during ingest with upcoming Fedora 5.0 release, instead of entire content.

  • If you would like to fine tune Fedora by changing application configuration, refer configuration page. Specifically look at configuration chain to understand various config files and their priority.

Hyrax

  • The checksum that we see on FileSet View page is the checksum computed during fileset ingest and the computed checksum is stored in Solr. There is a specific job that computes and stores the checksum along with other details like file title, page count etc. The actual job is part of Hydra Works. Also look at fits_document file in Hydra Works.

  • Checkout the Hydra File Characterization repo.

  • When we talk about computed checksum above, it is done so by FITS. Because by default the checksum that FITS computes on any given file is MD5, it is stored in solr and the same is displayed on FileSet view page. There is no inbuilt mechanism for FITS to compute checksum other than MD5. Please look at enable-checksum and also locate your local ~/fits-1.2.0/xml/fits.xml file for the respective configuration enable to true by default.

  • The algorithm used to compute checksum is MD5 as already stated. Checksum is stored in "original_checksum_tesim" solr field.

screen shot 2018-04-04 at 2 01 53 pm
  • As you can notice from the above screenshot, Hyrax characterization job not only stores MD5 checksum but also stores SHA1 extracted from Fedora. We can display SHA1 on FileSet show page. If we need to SHA256, we will have to implement our own jobs to compute and store the same.

Resources

Will continue to update this page as and when I find more information.

Clone this wiki locally