Skip to content

[CXP-2151] Updates to Processes and Containers documentation #29409

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
31 changes: 17 additions & 14 deletions content/en/infrastructure/containers/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,23 +35,25 @@ Coupled with [Docker][2], [Kubernetes][3], [ECS][4], and other container technol

## Setup

To display data on the Containers view, enable the Process Agent.
To display data on the Containers view, enable container collection.

{{< tabs >}}
{{% tab "Docker" %}}

Set the `DD_PROCESS_AGENT_ENABLED` env variable to `true`.
The Datadog Agent enables container collection in Docker environments by default.

For verification, ensure that `DD_PROCESS_CONFIG_CONTAINER_COLLECTION_ENABLED` is set to `true`.

For example:

```
-v /etc/passwd:/etc/passwd:ro
-e DD_PROCESS_AGENT_ENABLED=true
-e DD_PROCESS_CONFIG_CONTAINER_COLLECTION_ENABLED=true
```
{{% /tab %}}
{{% tab "Datadog Operator" %}}

The Datadog Operator enables the Process Agent by default.
The Datadog Operator enables container collection by default.

For verification, ensure that `features.liveContainerCollection.enabled` is set to `true` in your `datadog-agent.yaml`:

Expand All @@ -73,38 +75,40 @@ spec:
{{% /tab %}}
{{% tab "Helm" %}}

If you are using the [official Helm chart][1], enable the `processAgent.enabled` parameter in your [`values.yaml`][2] file:
If you are using the [official Helm chart][1], container collection is enabled by default.

For verification, ensure that the `processAgent.containerCollection` parameter is set to `true` in your [`values.yaml`][2] file:

```yaml
datadog:
# (...)
processAgent:
enabled: true
containerCollection: true
```

Then, upgrade your Helm chart.

In some setups, the Process Agent and Cluster Agent cannot automatically detect a Kubernetes cluster name. If this happens, the feature does not start, and the following warning displays in the Cluster Agent log: `Orchestrator explorer enabled but no cluster name set: disabling.` In this case, you must set `datadog.clusterName` to your cluster name in `values.yaml`.
In some setups, the Cluster Agent cannot automatically detect a Kubernetes cluster name. If this happens, the feature does not start, and the following warning displays in the Cluster Agent log: `Orchestrator explorer enabled but no cluster name set: disabling.` In this case, you must set `datadog.clusterName` to your cluster name in `values.yaml`.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove mention of dedicated process agent as container collection functionality no longer runs in process agent by default


```yaml
datadog:
#(...)
clusterName: <YOUR_CLUSTER_NAME>
#(...)
processAgent:
enabled: true
containerCollection: true
```

[1]: https://github.com/DataDog/helm-charts
[2]: https://github.com/DataDog/helm-charts/blob/master/charts/datadog/values.yaml
{{% /tab %}}
{{% tab "Amazon ECS" %}}

Update your Task Definitions with the following environment variable:
Update your task definitions with the following environment variable:

```json
{
"name": "DD_PROCESS_AGENT_ENABLED",
"name": "DD_PROCESS_CONFIG_CONTAINER_COLLECTION_ENABLED",
"value": "true"
}
```
Expand Down Expand Up @@ -169,14 +173,13 @@ ECS containers are tagged by:
Kubernetes containers are tagged by:

* `pod_name`
* `kube_pod_ip`
* `kube_service`
* `kube_namespace`
* `kube_replica_set`
* `kube_daemon_set`
* `kube_job`
* `kube_deployment`
* `kube_cluster`
* `kube_cluster_name`

If you have a configuration for [Unified Service Tagging][7] in place, Datadog automatically picks up `env`, `service`, and `version` tags. Having these tags available lets you tie together APM, logs, metrics, and container data.

Expand All @@ -194,7 +197,7 @@ You can switch between the "Scatter Plot" and "Timeseries" tabs in the collapsib

By default, the graph groups by the `short_image` tag key. The size of each dot represents the number of containers in that group, and clicking on a dot displays the individual containers and hosts that contribute to the group.

The query at the top of the scatter plot analytic allows you to control your scatter plot analytic:
The options at the top of the graph allow you to control your scatter plot analytic:

* Selection of metrics to display.
* Selection of the aggregation method for both metrics.
Expand Down Expand Up @@ -226,7 +229,7 @@ You can see indexed logs that you have chosen to index and persist by selecting

{{< img src="infrastructure/livecontainers/errorlogs.png" alt="Preview Logs Side panel" style="width:100%;">}}

## Notes and known issues
## Additional information
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither of these seem like issues, so updating the section title to match the one on the Processes page: https://docs.datadoghq.com/infrastructure/process/?tab=linuxwindows#additional-information


* Real-time (2s) data collection is turned off after 30 minutes. To resume real-time collection, refresh the page.
* RBAC settings can restrict Kubernetes metadata collection. See the [RBAC entities for the Datadog Agent][14].
Expand Down
31 changes: 17 additions & 14 deletions content/en/infrastructure/process/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,13 +186,13 @@ For example:
}
```

To start collecting process information in ECS Fargate, add the [`PidMode` parameter][3] to the Task Definition and set it to `task` as follows:
To start collecting process information in ECS Fargate, add the [`pidMode` parameter][3] to the Task Definition and set it to `task` as follows:

```text
"pidMode": "task"
```

Once enabled, use the `AWS Fargate` Containers facet on the [Live Processes page][1] to filter processes by ECS, or enter `fargate:ecs` in the search query.
Once enabled, use the `AWS Fargate` Containers facet on the [Live Processes page][1] to filter for processes running in ECS, or enter `fargate:ecs` in the search query.

{{< img src="infrastructure/process/fargate_ecs.png" alt="Processes in AWS Fargate" >}}

Expand All @@ -207,7 +207,7 @@ For more information about installing the Datadog Agent with AWS ECS Fargate, se

### I/O stats

I/O and open files stats can be collected by the Datadog system-probe, which runs with elevated privileges. To enable the process module of the system-probe, use the following configuration:
I/O and open files stats can be collected by the Datadog system-probe, which runs with elevated privileges. To collect these stats, enable the process module of the system-probe:

1. Copy the system-probe example configuration:

Expand All @@ -232,8 +232,10 @@ I/O and open files stats can be collected by the Datadog system-probe, which run
**Note**: If the `systemctl` command is not available on your system, run the following command instead: `sudo service datadog-agent restart`


### Optimize footprint for process collection
By default, the Datadog Agent has a separate Process Agent for container and process collection. You can consolidate container and process collection to the core Agent if you're running a Linux environment.
### Optimized process collection footprint
As of Agent v7.65.0, container and process collection run in the core Datadog Agent by default on Linux, reducing the Agent's overall footprint.

For verification, you can explicitly enable this feature.

{{< tabs >}}
{{% tab "Helm" %}}
Expand Down Expand Up @@ -273,6 +275,8 @@ process_config:
{{% /tab %}}
{{< /tabs >}}

Explicitly setting the config flag to false will cause container and process collection to run in the separate Process Agent. Note that container and process collection always run in the separate Process Agent in non-Linux environments.


### Process arguments scrubbing

Expand Down Expand Up @@ -391,8 +395,8 @@ You can also filter your processes using Datadog [tags][3], such as `host`, `pod
Datadog automatically generates a `command` tag, so that you can filter for:

- Third-party software, for example: `command:mongod`, `command:nginx`
- Container management software, for example: `command:docker`, `command:kubelet`)
- Common workloads, for example: `command:ssh`, `command:CRON`)
- Container management software, for example: `command:docker`, `command:kubelet`
- Common workloads, for example: `command:ssh`, `command:CRON`

#### Containerized environment tags

Expand All @@ -405,14 +409,13 @@ Furthermore, processes in ECS containers are also tagged by:
Processes in Kubernetes containers are tagged by:

- `pod_name`
- `kube_pod_ip`
- `kube_service`
- `kube_namespace`
- `kube_replica_set`
- `kube_daemon_set`
- `kube_job`
- `kube_deployment`
- `Kube_cluster`
- `kube_cluster_name`

If you have configuration for [Unified Service Tagging][4] in place, `env`, `service`, and `version` are picked up automatically.
Having these tags available lets you tie together APM, logs, metrics, and process data.
Expand All @@ -422,24 +425,24 @@ Having these tags available lets you tie together APM, logs, metrics, and proces

You can create rule definitions to add manual tags to processes based on the command line.

1. On the **Manage Process Tags** tab, select _New Process Tag Rule_ button
1. On the **Manage Process Tags** tab, select the _New Process Tag Rule_ button
2. Select a process to use as a reference
3. Define the parsing and match criteria for your tag
4. If validation passes, create a new rule

After a rule is created, tags are available for all process command line values that match the rule criteria. These tags are be available in search and can be used in the definition of [Live Process Monitors][6] and [Custom Metrics][13].
After a rule is created, tags are available for all process command line values that match the rule criteria. These tags are available in search and can be used in the definition of [Live Process Monitors][6] and [Custom Metrics][13].

## Scatter plot

Use the scatter plot analytic to compare two metrics with one another in order to better understand the performance of your containers.

To access the scatter plot analytic [in the Processes page][5] click on the _Show Summary graph_ button the select the "Scatter Plot" tab:
To access the scatter plot analytic [in the Processes page][5] click on the _Show Summary graph_ button then select the "Scatter Plot" tab:

{{< img src="infrastructure/process/scatterplot_selection.png" alt="Scatter plot selection" style="width:60%;">}}

By default, the graph groups by the `command` tag key. The size of each dot represents the number of processes in that group, and clicking on a dot displays the individual pids and containers that contribute to the group.
By default, the graph groups by the `command` tag key. The size of each dot represents the number of processes in that group, and clicking on a dot displays the individual processes and containers that contribute to the group.

The query at the top of the scatter plot analytic allows you to control your scatter plot analytic:
The options at the top of the graph allow you to control your scatter plot analytic:

- Selection of metrics to display.
- Selection of the aggregation method for both metrics.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,11 +39,11 @@ You can generate a new process-based metric directly from queries in the [**Live

{{< img src="infrastructure/process/process2metrics_create.png" alt="Create a process-based metric" style="width:80%;">}}

1. **Select tags to filter your query**: The query syntax is the same as for [Live Processes][2]. Only processes matching the scope of your filters are considered for aggregation. Text search filters are supported only on the Live Processes page.
1. **Select tags to filter your query**: The available tags are the same as for [Live Processes][2]. Only processes matching the scope of your filters are considered for aggregation. Text search filters are supported only on the Live Processes page.
Copy link
Author

@kkhor-datadog kkhor-datadog May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query syntax is not the same. Boolean operators are not supported here, unlike in the Processes page.

2. **Select the measure you would like to track**: Enter a measure such as `Total CPU %` to aggregate a numeric value and create its corresponding `count`, `min`, `max`, `sum`, and `avg` aggregated metrics.
3. **Add tags to `group by`**: Select tags to be added as dimensions to your metrics, so they can be filtered, aggregated, and compared. By default, metrics generated from processes do not have any tags unless explicitly added. Any tag available for Live Processes queries can be used in this field.
4. **Name your metric**: Fill in the name of your metric. Process-based metrics always have the prefix _proc._ and suffix _[measure_selection]_.
5. **Add percentile aggregations**: Select the _Include percentile aggregations_ checkbox to generate p50, p75, p90, p95, and p99 percentiles. Percentile metrics are also considered customer metrics, and billed accordingly.
5. **Add percentile aggregations**: Select the _Include percentile aggregations_ checkbox to generate p50, p75, p90, p95, and p99 percentiles. Percentile metrics are also considered custom metrics, and billed accordingly.

You can create multiple metrics using the same query by selecting the **Create Another** checkbox at the bottom of the metric creation modal. When selected, the modal remains open after your metric has been created, with the filters and aggregation groups already filled in.

Expand All @@ -67,7 +67,7 @@ To change the metric type or name, a new metric must be created.

{{< img src="infrastructure/process/process2metrics_dashboard_widget.png" alt="Graphing process distribution metrics in dashboards" style="width:80%;">}}

Once created, you can use process distribution aggregate and percentile metrics like any other in Datadog. For instance:
Once created, you can use process-based metrics like any other in Datadog. For instance:

- Graph process-based metrics in dashboards and notebooks to track the historical resource consumption of important workloads
- Create threshold or anomaly-based monitors on top of process-based metrics to detect when CPU or RSS memory dips or spikes unexpectedly
Expand Down
13 changes: 7 additions & 6 deletions content/en/monitors/types/process.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Live Processes and Live Process Monitoring are included in the Enterprise plan.

## Overview

Live Process Monitors are based on data collected by the [Process Agent][1]. Create monitors that warn or alert based on the count of any group of processes across hosts or tags.
Live Process Monitors are based on data collected by [Live Processes][1]. Create monitors that warn or alert based on the count of any group of processes across hosts or tags.

Live Process Monitors are best used in the following scenarios:

Expand All @@ -39,19 +39,19 @@ Live Process Monitors are best used in the following scenarios:
There are two ways to create a Live Process Monitor:

- Using the main navigation: **Monitors --> New Monitor --> Live Process**.
- On the [Live Process page][4], search for a process you want to monitor. Then click the dropdown menu next to **+New Metric** and click **Create monitor**.
- On the [Processes page][4], search for a process you want to monitor. Then click the dropdown menu next to **+New Metric** and click **Create monitor**.

### Select processes

You can use either tags or a fuzzy text search to filter across all processes in your infrastructure. Matching processes and counts are displayed below the search:

{{< img src="monitors/monitor_types/process/select_processes.png" alt="Select processes" style="width:90%;">}}

After defining your search, a graph is displayed above the search inputs with an approximation of the total number of processes found. It is recommended to keep your monitor scoped to a few thousand processes. Use additional tags to narrow the search down or consider splitting a monitor into multiple ones if needed. For more granular data, see the [Live Process page][4].
After defining your search, a graph is displayed above the search inputs with an approximation of the total number of processes found. It is recommended to keep your monitor scoped to a few thousand processes. Use additional tags to narrow the search down or consider splitting a monitor into multiple ones if needed. For more granular data, see the [Processes page][4].

#### Tags search

Filter processes to monitor by their tags. Datadog recommends trying to filter processes by their tags before using the full text search.
Filter processes to monitor by their tags. Datadog recommends trying to filter processes by their tags before using the full text search. If the existing tags are insufficient, you can define [custom tags][8] for your processes.

#### Full text search

Expand All @@ -77,13 +77,13 @@ If you cannot scope processes down to the granularity you would like using tags,
- The process count was `above`, `above or equal to`, `below`, or `below or equal to`
- the threshold during the last `5 minutes`, `15 minutes`, `1 hour`, or larger. Additionally, you can use `custom` to set a value between 5 minutes and 24 hours.

Process Count, in this case, refers to the number of all matching processes that were alive during the time interval.
The process count refers to the number of all matching processes that were alive during the time interval.

Use thresholds to set a numeric value for triggering an alert. Datadog has two types of notifications: alert and warning. Live Process Monitors recover automatically based on the alert or warning threshold.

#### Best practices for timeframe selection

Live Process Monitors use a [rolling time window][7] to evaluate process count. In other words, every minute, the monitor checks the past X minutes and triggers if the alerting condition is met. Using evaluation windows shorter than 5 minutes is discouraged in order to prevent any false positives due to sporadic network disruption between the Process Agent and Datadog.
Live Process Monitors use a [rolling time window][7] to evaluate process count. In other words, every minute, the monitor checks the past X minutes and triggers if the alerting condition is met. Using evaluation windows shorter than 5 minutes is discouraged in order to prevent any false positives due to sporadic network disruption between the Agent and Datadog.

### Advanced alert conditions

Expand All @@ -104,3 +104,4 @@ For detailed instructions on the **Configure notifications and automations** sec
[5]: /monitors/configuration/#advanced-alert-conditions
[6]: /monitors/notify/
[7]: /monitors/configuration/?tab=thresholdalert#evaluation-window
[8]: /infrastructure/process/#rules-to-create-custom-tags
Loading