Object Storage Tutorial.
SNIA Tutorials on Object Storage:
The Storage Networking Industry Association (SNIA) is a not–for–profit global organization, made up of member companies spanning the global storage market.
Git tutorial https://github.com/cs-course/git-tutorial.
For those who want unlimited private repositories, try bitbucket.
- Python Distributions:
- Option 1: Anaconda
- Download from https://repo.continuum.io/archive/Anaconda3-5.1.0-Windows-x86_64.exe & install
- Option 2: WinPython
- Download from https://sourceforge.net/projects/winpython/files/WinPython_3.6/3.6.3.0/ & install
- Option 1: Anaconda
- Fast deployment by docker:
- Option 3: Python Docker https://github.com/Zhan2012/python-lab
docker login daocloud.io && docker pull daocloud.io/zhan2016/python-lab:master-31a932d
- Option 3: Python Docker https://github.com/Zhan2012/python-lab
Ongoing course: Java Programming, 2017-2018 2nd semester, just follow your teacher's guide.
Installation helper scripts https://github.com/Zhan2012/java-bundle (For adventurers).
Virtual Machine: Virtualbox, VMWare ...
For those who want to run mock-s3 and s3proxy without trouble, Linux is a must, refer to https://github.com/cs-course/vagrant-tutorial.
For those who want to run Openstack Swift or Ceph in docker, refer to Docker tutorial https://github.com/cs-course/docker-tutorial.
- Object Storage for Beginners:
- Option 1: Minio, latest version https://minio.io/downloads.html.
- Experimental Mock Servers:
- Option 2: mock-s3, a Python clone of fake-s3. Requires Python 2.7.
- Option 2.1: Python 3 migrating, maybe BUGGY https://github.com/Zhan2012/mock-s3.
- Option 3: s3proxy, access other storage via the S3 API. Binary bundles here
- Option 3.1: Use Java/Maven to build
mvn package -Dmaven.test.skip=true
.
- Option 3.1: Use Java/Maven to build
- Option 4: fake-s3, a lightweight server clone of Amazon S3. Depends on Ruby, the Origin project.
- Option 5: s3mock, S3 mock library for Java/Scala. Java/SBT Building is required.
- Option 6: S3Mock, S3 mock as Docker image or JUnit rule. Use Java/Maven to build, use docker to run, contributed by Adobe (c).
- Option 2: mock-s3, a Python clone of fake-s3. Requires Python 2.7.
- Industry Level Projects:
- Option 7: Openstack Swift, fast deployment by All-in-one container: https://github.com/cs-course/openstack-swift-docker.
- Option 8: Ceph, docker files and images to run Ceph in containers: https://github.com/ceph/ceph-container.
Besides Option 1, Option 2, 3 offer compile-free executable.
- Standalone Utilities:
- Option 1: Minio Client https://docs.minio.io/docs/minio-client-quickstart-guide
- Option 2: s3cmd https://github.com/s3tools/s3cmd
- run
pip install s3cmd
in python environment - Configure for Minio https://docs.minio.io/docs/s3cmd-with-minio
- run
- Option 3: aws-shell https://github.com/awslabs/aws-shell
- run
pip install aws-shell
in python environment - Configure for Minio https://docs.minio.io/docs/aws-cli-with-minio
- Official Manual https://docs.aws.amazon.com/cli/latest/userguide/using-s3-commands.html
- run
- APIs:
- Option 4: aws-sdk-java
- Option 5: boto
Option 2 & 3 are more general and versatile, both are widely used for various object storage services.
- COSBench: cloud object storage benchmark https://dl.acm.org/citation.cfm?doid=2479871.2479900
- COSBench: A Benchmark Tool for Cloud Object Storage Services http://www.cs.cmu.edu/~qingzhen/files/cosbench_cloud12.pdf
- COSBench: A benchmark tool for Cloud Storage https://www.snia.org/sites/default/files/files2/files2/SDC2013/presentations/Cloud/YaguangWang__COSBench_Final.pdf
Contribute your experiences in https://github.com/Zhan2012/obs-tutorial/wiki.
Report more problems in https://github.com/Zhan2012/obs-tutorial/issues.
In computer programming, create, read, update, and delete (as an acronym CRUD) are the four basic functions of persistent storage.
Operation | SQL | HTTP |
---|---|---|
Create | INSERT | PUT / POST |
Read (Retrieve) | SELECT | GET |
Update (Modify) | UPDATE | PUT / POST / PATCH |
Delete (Destroy) | DELETE | DELETE |
Throughput, Latency under different object size, concurrency, server total.
Suggested topics:
- How object size affects performance?
- for a particular application, is there a best way to fit into OBS?
- The main factors behind I/O latency?
- Get latency distribution first.
- What will happen when clients are crowded?
- Why tests 'fail'? (not terminate)
- The outcome of scaling out (putting more servers into system)?
More insights are encouraged.
- How to do these experiments on your own codes (besides COSBench)?
- Using Python as Lab Platform
- Jupyter Notebook Tutorial https://github.com/cs-course/jupyter-tutorial
Recent SNIA blog posts on Object Storage http://sniablog.org/category/object-storage/.
Enterprise level Object Store comparison.
Zhan.Shi @ 2018