The Configuration Folder

The configuration folder is all that you need to deploy to a given machine (apart from the cloud-crowd gem itself) to make it into a productive citizen of your CloudCrowd cluster. Its contents include…

config.yml

config.yml is the main CloudCrowd configuration file. It contains all the fiddly little settings that you’ll want to tweak to get your installation running smoothly. Here’s the rundown of everything you can specify:

central_server	The URL to the central server. If you’re developing, this is probably localhost. If you’ve got a full production deployment, then this is probably aiming at your load balancer. If you’re running an internal CloudCrowd installation, this DNS probably only resolves on your local network.
max_workers	The maximum number of `Workers` that a `Node` is allowed to run. If a node reaches this number, it will be considered ‘busy’, and will have to wait for some of its workers to finish processing before accepting any new `WorkUnits`.
max_load	The highest that the (one-minute) load average is allowed to climb before a `Node` starts refusing to take any more work. When the load drops back down, the node starts receiving work again. The optimal value for this depends on the number of processors in your node, and is probably higher than you expect — the load average trails the actual load over the course of a minute, and this may aggravate spiky behavior.
min_free_memory	The lowest that available memory can drop before the node refuses to take more work. Measured in megabytes. This, in conjunction with max_workers, is a good way to ensure that your node doesn’t start to swap.
storage	‘s3’ or ‘filesystem’. The storage system used by the `AssetStore` to store work unit results. ‘filesystem’ storage is only appropriate in development, on single-machine installations, or if you have a networked volume handy.
aws_access_key	Your Amazon Web Services access key, for S3 storage.
aws_secret_key	Your Amazon Web Services secret access key, for S3 storage.
s3_bucket	The S3 bucket you’d like to store your CloudCrowd results in.
s3_authentication	If this option is turned on with S3 storage, all results will be saved as private files, and the URLs returned will be temporarily authenticated for 24 hours
http_authentication	If this option is turned on, all communication with the central server, including web requests to view the Operations Center, API calls, and requests to and from Nodes, will use HTTP basic authentication with the login and password specified.
login	The common login for all CloudCrowd requests via HTTP basic authentication, if enabled.
password	The password for all CloudCrowd requests via HTTP basic authentication, if enabled.
actions_path	By default CloudCrowd looks in the ‘actions’ subdirectory of the configuration folder for all your installed actions. If you’d prefer to keep them somewhere else, you can specify an actions_path.
work_unit_retries	The number of separate attempts that will be made to process an individual `WorkUnit` before marking it as failed.
local_storage_path	If using the ‘filesystem’ storage, all results will be saved inside of this directory. The default value is `'/tmp/cloud_crowd_storage'`. Useful if you have a networked drive.
temp_storage_path	Directory where all workers store their scratch files. The default is `'/tmp/cloud_crowd_temp'`
log_path	Directory for storing server and node log files if daemonized. Defaults to `log`.
pid_path	Directory for storing server and node PID files if daemonized. Defaults to `tmp/pids`.

database.yml

CloudCrowd uses ActiveRecord to manage the central server’s database, so you can use the standard configuration for any ActiveRecord compatible database. See the ActiveRecord Documentation for more information.

Only the central server needs to have the database configured in database.yml. Nodes will never connect directly to the database. To load the CloudCrowd schema into the database specified by database.yml, use:

crowd load_schema

actions

Please install all of the Ruby files for your custom actions into the actions folder. Actions are dispatched by file and class name, so a WebScraper action should be filed in actions/web_scraper.rb, and should be invoked by specifying action : 'web_scraper' in your job creation request. The actions folder has an interesting property that you can take advantage of: Nodes will only receive work for the actions that have been installed in their configuration folder. If you’d like to specialize some of the machines in your cluster to only perform specific, higher priority actions, then simply include only those actions that you wish to run.

config.ru

config.ru is a standard Rackup file that can be used by Rack-compliant servers, such as Thin, Passenger, and Unicorn to launch instances of the central server. For example, to launch three instances using Thin:

thin start -R config.ru -p 9173 --servers 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Configuration Folder

config.yml

database.yml

actions

config.ru

Clone this wiki locally