Skip to content

The Configuration Folder

jashkenas edited this page Sep 13, 2010 · 14 revisions

The configuration folder is all that you need to deploy to a given machine (apart from the cloud-crowd gem itself) to make it into a productive citizen of your CloudCrowd cluster. Its contents include…

config.yml

config.yml is the main CloudCrowd configuration file. It contains all the fiddly little settings that you’ll want to tweak to get your installation running smoothly. Here’s the rundown of everything you can specify:

central_server The URL to the central server. If you’re developing, this is probably localhost. If you’ve got a full production deployment, then this is probably aiming at your load balancer. If you’re running an internal CloudCrowd installation, this DNS probably only resolves on your local network.
max_workers The maximum number of Workers that a Node is allowed to run. If a node reaches this number, it will be considered ‘busy’, and will have to wait for some of its workers to finish processing before accepting any new WorkUnits.
max_load The highest that the (one-minute) load average is allowed to climb before a Node starts refusing to take any more work. When the load drops back down, the node starts receiving work again. The optimal value for this depends on the number of processors in your node, and is probably higher than you expect — the load average trails the actual load over the course of a minute, and this may aggravate spiky behavior.
min_free_memory The lowest that available memory can drop before the node refuses to take more work. Measured in megabytes. This, in conjunction with max_workers, is a good way to ensure that your node doesn’t start to swap.
storage s3’ or ‘filesystem’. The storage system used by the AssetStore to store work unit results. ‘filesystem’ storage is only appropriate in development, on single-machine installations, or if you have a networked volume handy.
aws_access_key Your Amazon Web Services access key, for S3 storage.
aws_secret_key Your Amazon Web Services secret access key, for S3 storage.
s3_bucket The S3 bucket you’d like to store your CloudCrowd results in.
s3_authentication If this option is turned on with S3 storage, all results will be saved as private files, and the URLs returned will be temporarily authenticated for 24 hours
http_authentication If this option is turned on, all communication with the central server, including web requests to view the Operations Center, API calls, and requests to and from Nodes, will use HTTP basic authentication with the login and password specified.
login The common login for all CloudCrowd requests via HTTP basic authentication, if enabled.
password The password for all CloudCrowd requests via HTTP basic authentication, if enabled.
actions_path By default CloudCrowd looks in the ‘actions’ subdirectory of the configuration folder for all your installed actions. If you’d prefer to keep them somewhere else, you can specify an actions_path.
work_unit_retries The number of separate attempts that will be made to process an individual WorkUnit before marking it as failed.
local_storage_path If using the ‘filesystem’ storage, all results will be saved inside of this directory. The default value is '/tmp/cloud_crowd_storage'. Useful if you have a networked drive.
temp_storage_path Directory where all workers store their scratch files. The default is '/tmp/cloud_crowd_temp'
log_path Directory for storing server and node log files if daemonized. Defaults to log.
pid_path Directory for storing server and node PID files if daemonized. Defaults to tmp/pids.

database.yml

CloudCrowd uses ActiveRecord to manage the central server’s database, so you can use the standard configuration for any ActiveRecord compatible database. See the ActiveRecord Documentation for more information.

Only the central server needs to have the database configured in database.yml. Nodes will never connect directly to the database. To load the CloudCrowd schema into the database specified by database.yml, use:

crowd load_schema

actions

Please install all of the Ruby files for your custom actions into the actions folder. Actions are dispatched by file and class name, so a WebScraper action should be filed in actions/web_scraper.rb, and should be invoked by specifying action : 'web_scraper' in your job creation request. The actions folder has an interesting property that you can take advantage of: Nodes will only receive work for the actions that have been installed in their configuration folder. If you’d like to specialize some of the machines in your cluster to only perform specific, higher priority actions, then simply include only those actions that you wish to run.

config.ru

config.ru is a standard Rackup file that can be used by Rack-compliant servers, such as Thin, Passenger, and Unicorn to launch instances of the central server. For example, to launch three instances using Thin:

thin start -R config.ru -p 9173 --servers 3
Clone this wiki locally