This is how the official agent 6 image available here is built.
Head over to datadoghq.com to get the official installation guide.
For a simple docker run, you can quickly get started with:
docker run -d -v /var/run/docker.sock:/var/run/docker.sock:ro \
-v /proc/:/host/proc/:ro \
-v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
-e DD_API_KEY=<YOUR_API_KEY> \
datadog/agent:latest
The agent is highly customizable, here are the most used environment variables:
DD_API_KEY
: your API key (required)DD_HOSTNAME
: hostname to use for metrics (if autodetection fails)DD_TAGS
: host tags, separated by spaces. For example:simple-tag-0 tag-key-1:tag-value-1
DD_CHECK_RUNNERS
: the agent runs all checks in sequence by default (default value =1
runner). If you need to run a high number of checks (or slow checks) thecollector-queue
component might fall behind and fail the healthcheck. You can increase the number of runners to run checks in parallel
Starting with Agent v6.4.0, the agent proxy settings can be overridden with the following environment variables:
DD_PROXY_HTTP
: an http URL to use as a proxy forhttp
requests.DD_PROXY_HTTPS
: an http URL to use as a proxy forhttps
requests.DD_PROXY_NO_PROXY
: a space-separated list of URLs for which no proxy should be used.
Note: at the moment, the trace agent only supports the above proxy environment variables starting from version 6.5.0
For more information: https://docs.datadoghq.com/agent/proxy/#agent-v6
These features are disabled by default for security or performance reasons, you need to explicitly enable them:
DD_APM_ENABLED
: run the trace-agent along with the infrastructure agent, allowing the container to accept traces on 8126/tcpDD_LOGS_ENABLED
: run the log-agent along with the infrastructure agent. See below for detailsDD_PROCESS_AGENT_ENABLED
: enable live process collection in the process-agent. The Live Container View is already enabled by default if the Docker socket is available
Send custom metrics via the statsd protocol:
DD_DOGSTATSD_NON_LOCAL_TRAFFIC
: listen to dogstatsd packets from other containers, required to send custom metricsDD_HISTOGRAM_PERCENTILES
: histogram percentiles to compute, separated by spaces. The default is "0.95"DD_HISTOGRAM_AGGREGATES
: histogram aggregates to compute, separated by spaces. The default is "max median avg count"DD_DOGSTATSD_SOCKET
: path to the unix socket to listen to. Must be in arw
mounted volume.DD_DOGSTATSD_ORIGIN_DETECTION
: enable container detection and tagging for unix socket metrics. Running in host PID mode (e.g. with --pid=host) is required.
We automatically collect common tags from Docker, Kubernetes, ECS, Swarm, Mesos, Nomad and Rancher, and allow you to extract even more tags with the following options:
DD_DOCKER_LABELS_AS_TAGS
: extract docker container labelsDD_DOCKER_ENV_AS_TAGS
: extract docker container environment variablesDD_KUBERNETES_POD_LABELS_AS_TAGS
: extract pod labelsDD_KUBERNETES_POD_ANNOTATIONS_AS_TAGS
: extract pod annotations
You can either define them in your custom datadog.yaml
, or set them as JSON maps in these envvars. The map key is the source (label/envvar) name, and the map value the Datadog tag name.
DD_KUBERNETES_POD_LABELS_AS_TAGS='{"app":"kube_app","release":"helm_release"}'
DD_DOCKER_LABELS_AS_TAGS='{"com.docker.compose.service":"service_name"}'
You can use shell patterns in label names to define simple rules for mapping labels to Datadog tag names using the same simple template system used by Autodiscovery. This is only supported by DD_KUBERNETES_POD_LABELS_AS_TAGS
.
To add all pod labels as tags to your metrics where tags names are prefixed by kube_
, you can use the following:
DD_KUBERNETES_POD_LABELS_AS_TAGS='{"*":"kube_%%label%%"}'
To add only pod labels as tags to your metrics that start with app
, you can use the following:
DD_KUBERNETES_POD_LABELS_AS_TAGS='{"app*":"kube_%%label%%"}'
Integration credentials can be stored in Docker / Kubernetes secrets and used in Autodiscovery templates. See the setup instructions for the helper script and the agent documentation for more information.
You can exclude containers from the metrics collection and autodiscovery, if these are not useful for you. We already exclude Kubernetes and OpenShift pause
containers by default. See the datadog.yaml.example
file for more documentation, and examples.
DD_AC_INCLUDE
: whitelist of containers to always includeDD_AC_EXCLUDE
: blacklist of containers to exclude
The format for these option is space-separated strings. For example, if you only want to monitor two images, and exclude the rest, specify:
DD_AC_EXCLUDE = "image:.*"
DD_AC_INCLUDE = "image:cp-kafka image:k8szk"
Please note that the docker.containers.running
, .stopped
, .running.total
and .stopped.total
metrics are not affected by these settings and always count all containers. This does not affect your per-container billing.
You can add extra listeners and config providers via the DD_EXTRA_LISTENERS
and DD_EXTRA_CONFIG_PROVIDERS
environment variables. They will be added on top of the ones defined in the listeners
and config_providers
section of the datadog.yaml configuration file.
The DCA is a beta feature, if you are facing any issues please reach out to our support team Starting with Agent v6.3.2, you can use the Datadog Cluster Agent.
Cluster level features are now handled by the cluster agent, and you will find a [DCA]
notation next to the affected features. Please refer to the below user documentation as well as the technical documentation here for further details on the instrumentation.
Please refer to the dedicated section about the Kubernetes integration for more details.
DD_KUBERNETES_COLLECT_METADATA_TAGS
: configures the agent to collect Kubernetes metadata (service names) as tags.DD_KUBERNETES_METADATA_TAG_UPDATE_FREQ
: set the collection frequency in seconds for the Kubernetes metadata (service names) from the API Server (or the Datadog Cluster Agent if enabled).DD_COLLECT_KUBERNETES_EVENTS
[DCA]: configures the cluster agent to collect Kubernetes events. See Event collection for more details.DD_LEADER_ELECTION
[DCA]: activates the leader election. Will be activated if theDD_COLLECT_KUBERNETES_EVENTS
is set totrue
. The expected value is a bool: true/false.DD_LEADER_LEASE_DURATION
[DCA]: only used if the leader election is activated. See the details here. The expected value is a number of seconds.DD_KUBE_RESOURCES_NAMESPACE
[DCA]: configures the namespace where the Cluster Agent creates the configmaps required for the Leader Election, the Event Collection (optional) and the Horizontal Pod Autoscaling.
DD_JMX_CUSTOM_JARS
: space-separated list of custom jars to load in jmxfetch (only for the-jmx
variants)DD_ENABLE_GOHAI
: enable or disable the system information collector gohai (enabled by default if not set)DD_EXPVAR_PORT
: change the port for fetching expvar public variables from the agent. (defaults to 5000, you may then also have to change the agent_stat.yaml)
Some options are not yet available as environment variable bindings. To customize these, the agent supports mounting a custom /etc/datadog-agent/datadog.yaml
configuration file (based on the docker or kubernetes base configurations) for these options, and using environment variables for the rest.
To run custom checks and configurations without building your own image, you can mount additional files in these folders:
/checks.d/
: custom checks in this folder will be copied over and used, if a corresponding configuration is found/conf.d/
: check configurations and Autodiscovery templates in this folder will be copied over in the agent's configuration folder. You can mount a host folder, kubernetes configmaps, or other volumes. Note: autodiscovery templates now are directly stored in the mainconf.d
folder, not in anauto_conf
subfolder.
For more information about the container's lifecycle, see SUPERVISION.md.
This sub-section is only valid for the agent versions < 6.3.2 or when not using the Datadog Cluster Agent.
To deploy the Agent in your Kubernetes cluster, you can use the manifest in manifests. Firstly, make sure you have the correct RBAC in place. You can use the files in manifests/rbac that contain the minimal requirements to run the Kubernetes Cluster level checks and perform the leader election.
kubectl create -f manifests/rbac
Please note that with the above RBAC, every agent will have access to the API Server, to list the pods, services ... These accesses vanish when using the Datadog Cluster Agent. Indeed, the agents will only have access to the local kubelet and only the Cluster Agent will be able to access cluster level insight (nodes, services...).
Once the RBAC is in place, you can then create the agents with:
kubectl create -f manifests/agent.yaml
The manifest for the agent has the KUBERNETES
environment variable enabled, which will enable the collection of local kubernetes metrics via the kubelet's API. For the event collection and the API server check please read below.
If you want the event collection to be resilient, you can create a ConfigMap datadogtoken
that agents will use to save and share a state reflecting which events where pulled last.
To create such a ConfigMap, you can use the following command:
kubectl create -f manifests/datadog_configmap.yaml
See details in Event Collection.
This sub-section is only valid for agent versions > 6.3.2 and when using the Datadog Cluster Agent.
Event collection is handled by the cluster agent and the RBAC for the agent is slimmed down to the kubelet's API access. There is now a dedicated Clusterrole for the agent which should be as follows:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: datadog-agent
rules:
- apiGroups: # Kubelet connectivity
- ""
resources:
- nodes/metrics
- nodes/spec
- nodes/proxy
verbs:
- get
It goes along the ClusterRoleBinding and the Service Account, dedicated to the datadog-agents.
Similarly to Agent 5, Agent 6 collects events from the Kubernetes API server.
1/ Set the collect_kubernetes_events
variable to true
in the datadog.yaml
file, you can use the environment variable DD_COLLECT_KUBERNETES_EVENTS
for this.
2/ Give the agents proper RBACs to activate this feature. See the RBAC section.
3/ A ConfigMap can be used to store the event.tokenKey
and the event.tokenTimestamp
. It has to be deployed in the default
namespace and be named datadogtoken
.
Run kubectl create configmap datadogtoken --from-literal="event.tokenKey"="0"
.
You can also use the example in [manifests/datadog_configmap.yaml][https://github.com/DataDog/datadog-agent/blob/master/Dockerfiles/manifests/datadog_configmap.yaml].
Note: When the ConfigMap is used, if the agent in charge (via the Leader election) of collecting the events dies, the next leader elected will use the ConfigMap to identify the last events pulled. This is in order to avoid duplicate the events collected, as well as putting less stress on the API Server.
Datadog Agent 6 supports built in leader election option for the Kubernetes event collector and the Kubernetes cluster related checks (i.e. Controle Plane service check).
This feature relies on Endpoints, you can enable it by setting the DD_LEADER_ELECTION
environment variable to true
the Datadog Agents will need to have a set of actions allowed prior to its deployment nevertheless.
See the RBAC section for more details and keep in mind that these RBAC entities will need to be created before the option is set.
Agents coordinate by performing a leader election among members of the Datadog DaemonSet through kubernetes to ensure only one leader agent instance is gathering events at a given time.
This functionality is disabled by default, enabling the event collection will activate it (see Event collection) to avoid duplicating collecting events and stress on the API server.
The leaderLeaseDuration is the duration for which a leader stays elected. It should be > 30 seconds and is 60 seconds by default. The longer it is, the less frequently your agents hit the apiserver with requests, but it also means that if the leader dies (and under certain conditions), events can be missed until the lease expires and a new leader takes over.
It can be configured with the environment variable DD_LEADER_LEASE_DURATION
.
If you are using the DCA, find all the RBAC for the agent as well as the Cluster agent here
In the context of using the Kubernetes integration, and when deploying agents in a Kubernetes cluster, a set of rights are required for the agent to integrate seamlessly.
You will need to allow the agent to be allowed to perform a few actions:
get
andupdate
of theConfigmaps
nameddatadogtoken
to update and query the most up to date version token corresponding to the latest event stored in ETCD.list
andwatch
of theEvents
to pull the events from the API Server, format and submit them.get
,update
andcreate
for theEndpoint
. The Endpoint used by the agent for the Leader election feature is nameddatadog-leader-election
.list
thecomponentstatuses
resource, in order to submit service checks for the Controle Plane's components status.
You can find the templates in manifests/rbac here. This will create the Service Account in the default namespace, a Cluster Role with the above rights and the Cluster Role Binding.
The agent can collect node labels from the APIserver and report them as host tags. This feature is disabled by default, as it is usually redundant with cloud provider host tags. If you need to do so, you can provide a node label -> host tag mapping in the DD_KUBERNETES_NODE_LABELS_AS_TAGS
environment variable. The format is the inline JSON described in the tagging section.
By default, the agent is using the kubernetes node name as an alias that can be used to forward metrics and events. This allows to submit events and metrics from remote hosts.
However, if you have several clusters where some nodes could have similar node names, some host alias collisions could occur. To prevent those, the agent supports the use of a cluster-unique identifier (such as the actual cluster name), through the environment variable DD_CLUSTER_NAME
. That identifier will be added to the node name as a host alias, and avoid collision issues altogether.
Our default configuration targets Kubernetes 1.7.6 and later, as we rely on features and endpoints introduced in this version. More installation steps are required for older versions:
- RBAC objects (
ClusterRoles
andClusterRoleBindings
) are available since Kubernetes 1.6 and OpenShift 1.3, but are available under differentapiVersion
prefixes:rbac.authorization.k8s.io/v1
in Kubernetes 1.8+ (and OpenShift 3.9+), the default apiVersion we targetrbac.authorization.k8s.io/v1beta1
in Kubernetes 1.5 to 1.7 (and OpenShift 3.7)v1
in Openshift 1.3 to 3.6
You can apply our yaml manifests with the following sed
invocations:
sed "s%authorization.k8s.io/v1%authorization.k8s.io/v1beta1%" clusterrole.yaml | kubectl apply -f -
sed "s%authorization.k8s.io/v1%authorization.k8s.io/v1beta1%" clusterrolebinding.yaml | kubectl apply -f -
or for Openshift 1.3 to 3.6:
sed "s%rbac.authorization.k8s.io/v1%v1%" clusterrole.yaml | oc apply -f -
sed "s%rbac.authorization.k8s.io/v1%v1%" clusterrolebinding.yaml | oc apply -f -
-
The
kubelet
check retrieves metrics from the Kubernetes 1.7.6+ (OpenShift 3.7.0+) prometheus endpoint. You need to enable cAdvisor port mode for older versions. -
Our default daemonset makes use of the downward API to pass the kubelet's IP to the agent. This only works on versions 1.7 and up. For older versions, here are other ways to enable kubelet connectivity:
- On versions 1.6, use
fieldPath: spec.nodeName
and make sure your node name is resolvable and reachable from the pod - If
DD_KUBERNETES_KUBELET_HOST
is unset, the agent will retrieve the node hostname from docker and try to connect there. Seedocker info | grep "Name:"
and make sure the name is resolvable and reachable - If the IP of the docker default gateway is constant across your cluster, you can directly pass that IP in the
DD_KUBERNETES_KUBELET_HOST
envvar. You can retrieve the IP with theip addr show | grep docker0
command.
- On versions 1.6, use
-
Our default configuration relies on bearer token authentication to the APIserver and kubelet. On 1.3, the kubelet does not support bearer token auth, you will need to setup client certificates for the
datadog-agent
serviceaccount and pass them to the agent.
The Datadog Agent can collect logs from containers starting at the version 6. Two installations are possible:
- on the host: where the agent is external to the Docker environment
- or by deploying its containerized version in the Docker environment
To run a Docker container which embeds the Datadog Agent to monitor your host use the following command:
docker run -d --name datadog-agent \
-e DD_API_KEY=<YOUR_API_KEY> \
-e DD_LOGS_ENABLED=true \
-e DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL=true \
-e DD_AC_EXCLUDE="name:datadog-agent" \
-v /var/run/docker.sock:/var/run/docker.sock:ro \
-v /proc/:/host/proc/:ro \
-v /opt/datadog-agent/run:/opt/datadog-agent/run:rw \
-v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
datadog/agent:latest
The commands related to log collection are the following:
-e DD_LOGS_ENABLED=true
: this parameter enables log collection when set totrue
. The Agent looks for log instructions in configuration files.-e DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL=true
: this parameter adds a log configuration that enables log collection for all containers (seeOption 1
below)-v /opt/datadog-agent/run:/opt/datadog-agent/run:rw
: to make sure you do not lose any logs from containers during restarts or network issues, the last line that was collected for each container in this directory is stored on the host.-e DD_AC_EXCLUDE="name:datadog-agent"
: to prevent the Datadog Agent from collecting and sending its own logs. Remove this parameter if you want to collect the Datadog Agent logs.
Important notes: Integration Pipelines and Processors will not be installed automatically, as the source and service are set to the docker
generic value.
The source and service values can be overridden thanks to Autodiscovery as described below; it automatically installs integration Pipelines that parse your logs and extract all the relevant information from them.
The second step is to use Autodiscovery to customize the source
and service
value. This allows Datadog to identify the log source for each container.
Since version 6.2 of the Datadog Agent, you can configure log collection directly in the container labels. Pod annotations are also supported for Kubernetes environment, see the [Kubernetes Autodiscovery documentation][https://docs.datadoghq.com/agent/autodiscovery/#template-source-kubernetes-pod-annotations].
You can build your own debian package using inv agent.omnibus-build
Then you can call inv agent.image-build
that will take the debian package generated above and use it to build the image
To build the image you'll need the agent debian package that can be found on this APT listing here.
You'll need to download one of the datadog-agent*_amd64.deb
package in this directory, it will then be used by the Dockerfile
and installed within the image.
You can then build the image using docker build -t datadog/agent:master .
To build the jmx variant, add --build-arg WITH_JMX=true
to the build command