Command-line arguments with a memory (stored in YAML-files).
Documentation: https://shelephant.readthedocs.io
shelephant presents you with a way to copy files (from a remote, using SSH) in two steps:
- Collect a list of files that should be copied in a YAML-file, allowing you to review and customise the copy operation (e.g. by changing the order and making last-minute manual changes).
- Perform the copy, efficiently skipping files that are identical.
Typical workflow:
# Collect files to copy & compute their checksum (e.g. on remote system)
# - creates "shelephant_dump.yaml"
shelephant_dump *.hdf5
# - reads "shelephant_dump.yaml"
# - creates "shelephant_checksum.yaml"
shelephant_checksum
# Combine all needed info (locally)
# - reads "shelephant_dump.yaml" and "shelephant_checksum.yaml"
# - creates "shelephant_hostinfo.yaml"
shelephant_hostinfo --host myhost --prefix /some/path --files --checksum
# Copy from remote (can be restarted and any time, existing files are skipped)
# - reads "shelephant_hostinfo.yaml"
shelephant_get
- The filenames can be customised.
- To copy to a remote system use
shelephant_send
.- Get details in the help of the respective commands, e.g.
shelephant_dump --help
.- shelephant works for both local as remote copy actions.
shelephant_dump
: list filenames in a YAML file.shelephant_checksum
: get the checksums of files listed in a YAML file.shelephant_hostinfo
: collect host information (from a remote system).
shelephant_get
: copy from remote, based on earlier stored information.shelephant_send
: copy to remote, based on earlier stored information.shelephant_rm
: remove files listed in a YAML file.shelephant_cp
: copy files listed in a YAML file.shelephant_mv
: move files listed in a YAML file.
shelephant_extract
: isolate a (number of) field(s) in a (new) YAML file.shelephant_merge
: merge two YAML-files.shelephant_parse
: parse a YAML-files and print to screen.
This library is free to use under the MIT license. Any additions are very much appreciated, in terms of suggested functionality, code, documentation, testimonials, word-of-mouth advertisement, etc. Bug reports or feature requests can be filed on GitHub. As always, the code comes with no guarantee. None of the developers can be held responsible for possible mistakes.
Download: .zip file | .tar.gz file.
(c - MIT) T.W.J. de Geus (Tom) | [email protected] | www.geus.me | github.com/tdegeus/shelephant
conda install -c conda-forge shelephant
This will also download and install all necessary dependencies.
pip install shelephant
This will also download and install the necessary Python modules.
# Download shelephant
git checkout https://github.com/tdegeus/shelephant.git
cd shelephant
# Install
python -m pip install .
This will also download and install the necessary Python modules.
Suppose that we want to copy all *.txt
files
from a certain directory /path/where/files/are/stored
on a remote host hostname
.
First step, collect information on the host:
# connect to the host
ssh hostname
# go the relevant location at the host
cd "/path/where/files/are/stored/on/remote"
# list files to copy
shelephant_dump -o files_to_copy.yaml *.txt
# optional but useful, get the checksum of the files to copy
shelephant_checksum -o files_checksum.yaml files_to_copy.yaml
# disconnect
exit # or press Ctrl + D
Second step, copy files to the local system, collecting everything in a single place:
# go to the relevant location on the local system
# (often this is new directory)
cd "/path/where/to/copy/to"
# get the file-information compiled on the host
# and store in a (temporary) local file
# note that all paths are on the remote system,
# and that they are now copied using secure-copy (scp)
shelephant_hostinfo \
-o remote_info.yaml \
--host "hostname" \
--prefix "/path/where/files/are/stored/on/remote" \
--files "files_to_copy.yaml " \
--checksum "files_checksum.yaml"
# finally, get the files using secure copy
# (the files are stored relative to the path of 'remote_info.yaml',
# identically to how they are relative to 'files_to_copy.yaml' on remote)
shelephant_get remote_info.yaml
If you use the default filenames for
shelephant_dump
(shelephant_dump.yaml
) andshelephant_checksum
(shelephant_checksum.yaml
) remotely, you can also specify--files
and--checksum
without an argument.
An interesting benefit that derives from having computed the checksums on the host,
is that shelephant_get
can be stopped and restarted:
only files that do not exist locally, or that were only partially copied
(whose checksum does not match the remotely computed checksum), will be copied;
all fully copied files will be skipped.
Let's further illustrate with a complete example. On the host, suppose that we have
/path/where/files/are/stored/on/remote
- foo.txt
- bar.txt
This will give, files_to_copy.yaml
:
- foo.txt
- bar.txt
files_checksum.yaml
(for example):
- 2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae
- fcde2b2edba56bf408601fb721fe9b5c338d10ee429ea04fae5511b68fbf8fb9
This information will be collected to remote_info.yaml
host: hostname
root: /path/where/files/are/stored/on/remote
files:
- foo.txt
- bar.txt
checksum:
- 2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae
- fcde2b2edba56bf408601fb721fe9b5c338d10ee429ea04fae5511b68fbf8fb9
shelephant_get
will now copy foo.txt
and bar.txt
relative to the directory of
remote_info.yaml
(in this case in the same folder as remote_info.yaml
).
It will skip any files whose filename and checksum match to target ones.
Suppose that we want to restart multiple times, or that we update the files present on the remote after copying them initially. In that case, we can use previously computed checksums to avoid recomputing them (which can be costly for large files).
First step, update information on the host:
# connect to the host
ssh hostname
# go the relevant location at the host
cd "/path/where/files/are/stored/on/remote"
# collect the previously computed information
shelephant_hostinfo -o precomputed_checksums.yaml -f files_to_copy.yaml -c files_checksum.yaml
# list files to copy
shelephant_dump -o files_to_copy.yaml *.txt
# get the checksum of the files to copy, where possible reading precomputed values
shelephant_checksum -o files_checksum.yaml files_to_copy.yaml -l precomputed_checksums.yaml
# disconnect
exit # or press Ctrl + D
Second step, copy files to the local system, collecting everything in a single place:
# go to the relevant location on the local system
# (often this is new directory)
cd "/path/where/to/copy/to"
# collect the previously computed information
shelephant_hostinfo -o precomputed_checksums.yaml -f files_present.yaml -c files_checksum.yaml
# list files currently present locally
shelephant_dump -o files_present.yaml *.txt
# get the checksum of the files to copy, where possible reading precomputed values
shelephant_checksum -o files_checksum.yaml files_present.yaml -l precomputed_checksums.yaml
# combine local files and checksums
shelephant_hostinfo -o precomputed_checksums.yaml -f files_present.yaml -c files_checksum.yaml
# get the file-information compiled on the host [as before]
shelephant_hostinfo \
-o remote_info.yaml \
--host "hostname" \
--prefix "/path/where/files/are/stored/on/remote" \
--files "files_to_copy.yaml " \
--checksum "files_checksum.yaml"
# get the files using secure copy
# use the precomputed checksums instead of computing them
shelephant_get remote_info.yaml --local "precomputed_checksums.yaml"
Suppose that we want to copy all *.txt
files
from a certain local directory /path/where/files/are/stored/locally
,
to a remote host hostname
.
First, we will collect information locally:
# go the relevant location (locally)
cd /path/where/files/are/stored/locally
# list files to copy
shelephant_dump -o files_to_copy.yaml *.txt
Then, we will specify some basic information about the host
# specify basic information about the host
# and store in a (temporary) local file
shelephant_hostinfo \
-o remote_info.yaml \
--host "hostname" \
--prefix "/path/where/to/copy/to/on/remote" \
Now we can copy the files:
shelephant_send files_to_copy.yaml remote_info.yaml
Suppose that copying was interrupted before completing. We can avoid recopying by again using the checksums. We therefore need to know which files are already present remotely and which checksum they have. Thereto:
# connect to the host
ssh hostname
# go the relevant location at the host
cd "/path/where/to/copy/to/on/remote"
# list files to copy
shelephant_dump -o files_to_copy.yaml *.txt
# get the checksum of the files to copy
shelephant_checksum -o files_checksum.yaml files_to_copy.yaml
# disconnect
exit # or press Ctrl + D
Now we will complement the basic host-info:
shelephant_hostinfo \
-o remote_info.yaml \
--host "hostname" \
--prefix "/path/where/to/copy/to/on/remote" \
--files "files_to_copy.yaml " \
--checksum "files_checksum.yaml"
And restart the partial copy:
shelephant_send files_to_copy.yaml remote_info.yaml