Skip to content
Rob Speer edited this page Oct 31, 2016 · 13 revisions

Docker is a platform for making software reproducible, by reproducing the entire Linux environment it runs in.

ConceptNet 5.5 uses Docker as the primary way of making its build process reproducible. If you want, you can ignore Docker and set up all the dependencies of ConceptNet separately, but Docker will make sure you have a container that satisfies all of ConceptNet's dependencies.

You will need:

  • 120 GB of disk space on the drive that Docker runs on
  • 4 GB of RAM
  • The time and bandwidth to download about 12 GB of raw data
  • An OS that supports virtualization (Linux kernel 3.10 or later, or you might be able to make it work on macOS or Windows)
  • Docker 1.12 or later
  • Docker Compose 1.8 or later
  • Git

These recent versions of Docker utilities are unlikely to be packaged with your OS. You need to go download them specifically. Docker Compose 1.5 is the latest version packaged for Ubuntu at the moment, and it definitely won't work.

Now get the ConceptNet 5.5 repository if you don't already have it:

git clone [email protected]:commonsense/conceptnet5 -b version5.5
cd conceptnet5

Operating system support

Docker is designed to run Linux containers on Linux systems, but now there are ways to run it inside a virtual machine on Windows 10 or macOS. I've only minimally experimented with these.

In many cases, they'll require you to run a virtual machine that Docker runs inside. The most important thing to make sure of is that your virtual machine meets the requirements, particularly the requirement of 120 GB of disk space. (Docker Machine only allocates 10 GB of disk space by default.)

If you're on Linux, you should either add yourself to the docker group or put sudo before all of the following commands.

Running ConceptNet for the first time

At the command line, in the root of the ConceptNet repository (which contains a file called docker-compose.yml), just run this command:

docker-compose up --build

This will start downloading the data and loading it into PostgreSQL, and also start serving the Web interface on localhost. (It uses the standard Web port, port 80. If your machine already runs a Web server, edit docker-compose.yml and change the ports entry from "80:80" to something else like "8000:80".)

The only problem is that you won't be able to browse ConceptNet, because the database will still be loading. It turns out to unavoidably take a while to load data into a PostgreSQL database. In my experience, you need to wait about 3 hours.

The loading process will output several warnings that aren't important:

  • "WARNING: No password has been set for the database."

    • This is okay because the database is not accessible from the network, only from inside of the container.
  • "WARNING: you are running uWSGI as root !!! (use the --uid flag)"

    • Docker has complete power over the containers it creates. It's always root.
  • "LOG: checkpoints are occurring too frequently (16 seconds apart)"

    • PostgreSQL complains about this when it's loading lots of data. It'll be fine once the data is done loading.

Managing your Docker volumes

ConceptNet creates three named Docker volumes where it stores data, so that the second and subsequent times you start it up, it doesn't have to go through the long downloading-and-loading process.

If you're changing the ConceptNet code, or if you start a build that fails (perhaps because you ran out of disk space), you may want to remove these volumes to start fresh. To remove a volume, type docker volume rm followed by the volume name.

Here's what the volumes contain

  • conceptnet5_psql: The PostgreSQL database, in its loaded, non-portable form. Remove this to build a fresh database.
  • conceptnet5_cn5data: Where input data is downloaded to, and where intermediate data goes if you run the full build. If you want to restart the build process from scratch, remove this and conceptnet5_psql.
  • conceptnet5_nginx: the Web server's cache. The Web server takes advantage of the fact that each version of ConceptNet is immutable: once it renders a particular page, it saves it in the cache and never has to render it again. If you change the data or the page layout, remove conceptnet5_nginx to clear the cache.

Building ConceptNet from raw data

This step is for people who want to make changes to ConceptNet's code or input data and build their own version. It's not essential. You can skip it if you want!

Remove all of ConceptNet's data volumes (see above), then run:

docker-compose run conceptnet scripts/build.sh

Several hours later, you will have your own edition of ConceptNet, built from your local code.

This step has higher system requirements than the others. You'll need about 240 GB of disk space and 16 GB of RAM.

Clone this wiki locally