-
Notifications
You must be signed in to change notification settings - Fork 356
Docker
Docker is a platform for making software reproducible, by reproducing the entire Linux environment it runs in.
These Docker images exist so far:
- conceptnet: Includes everything you need to work with ConceptNet except the actual data.
- conceptnet-web: An image that runs the ConceptNet Web server, including the browseable frontend and the API.
Okay, first you need the data. We used to include this in the Docker image but the multi-gigabyte uploads and downloads were making Docker Hub very sad.
Go to http://conceptnet5.media.mit.edu/downloads/v5.4/ and download conceptnet5_db_5.4.tar.bz2
and conceptnet5_vector_space_5.4.tar.bz2
, which together are about 7.6 GB of files to download.
Save them in a directory on some large disk (let's say it's /large-disk/conceptnet5.4
) and extract them.
Now set up Docker if you don't have it already. (You may want to tell it to put its image files on the same large disk.)
Once you've got Docker set up, type this:
sudo docker run -v /large-disk/conceptnet5.4/data:/conceptnet_data rspeer/conceptnet-web:5.4
Change the part that says /large-disk/conceptnet5.4
if that's not the actual directory it's in. /conceptnet_data
is what the directory will be called inside the container.
Assuming everything worked, you'll be running the ConceptNet 5.4 server on your computer, on port 10054.
If you want to use your server as an API endpoint like say, to measure how similar cats and dogs are, as show here;
http://conceptnet5.media.mit.edu/data/5.4/assoc/c/en/cat?filter=/c/en/dog/.&limit=1
You can run your docker using this command
sudo docker run -p 0.0.0.0:80:10054 -v /large-disk/conceptnet5.4/data:/conceptnet_data rspeer/conceptnet-web:5.4
The above command binds port 80 (or any other port of your choosing) of the host machine and redirects it to the Docker container port. Read more about it in Expose Docker ports .
You should now be able to access the API here;
http://<host-machine-public-ip>:80/data/5.4/assoc/c/en/cat?filter=/c/en/dog/.&limit=1
ConceptNet 5.5 aims to use Docker more effectively, using best practices that have arisen since the release of 5.4.
You will need:
- 200 GB of disk space (it's okay if it's on a separate drive)
- 16 GB of RAM (32 GB is better)
- An OS that supports virtualization (Linux kernel 3.10 or later, or you might be able to make it work on macOS or Windows)
- Docker 1.12 or later
- Docker Compose 1.8 or later
- Git
These recent versions of Docker utilities are unlikely to be packaged with your OS. You need to go download them specifically. Docker Compose 1.5 is the latest version packaged for Ubuntu at the moment, and it definitely won't work.
Now get the ConceptNet 5.5 repository if you don't already have it:
git clone [email protected]:commonsense/conceptnet5 -b version5.5
cd conceptnet5
Docker is designed to run Linux containers on Linux systems, but now there are ways to run it inside a virtual machine on Windows 10 or macOS. I haven't used those versions, and they may require slight changes to the commands for setting it up -- particularly on Windows, where paths look different.
If you successfully run ConceptNet in Docker for Windows or Mac, I'd appreciate your help -- describe what you did here.
Back to assuming you're on Linux. For convenience, add yourself to the docker group. Otherwise you'll have to put sudo
before all of the following commands.
Building ConceptNet 5.5 requires about 200 GB of space. You may not have that kind of space available on your root partition (on a desktop solid-state drive or a standard AWS server), in which case you'll need some kind of external disk.
Let's assume your disk is mounted at /bigdrive
, and you're going to put the data in /bigdrive/conceptnet5.5
. This path is actually written in the docker-compose.yml
file, so if you want the data to go somewhere different, you should start by editing docker-compose.yml
to change that path.
Whatever path you end up using, make sure it exists.
mkdir /bigdrive/conceptnet5.5
This can be done in one command:
./scripts/build-in-docker.sh
For me, this command takes about 6 hours to run.
If your build crashes in some weird state, or if you changed something about ConceptNet and want to rebuild it, you can ask Snakemake to remove all the files it built (while leaving files it downloaded alone):
docker-compose run conceptnet snakemake clean
If that's not enough, you can start completely fresh by just deleting the contents of /bigdrive/conceptnet5.5
.
Starting points
Reproducibility
Details