diff --git a/content/en/infrastructure/containers/_index.md b/content/en/infrastructure/containers/_index.md index 6f425f65c790a..f96b7cdc7a0a4 100644 --- a/content/en/infrastructure/containers/_index.md +++ b/content/en/infrastructure/containers/_index.md @@ -35,23 +35,25 @@ Coupled with [Docker][2], [Kubernetes][3], [ECS][4], and other container technol ## Setup -To display data on the Containers view, enable the Process Agent. +To display data on the Containers view, enable container collection. {{< tabs >}} {{% tab "Docker" %}} -Set the `DD_PROCESS_AGENT_ENABLED` env variable to `true`. +The Datadog Agent enables container collection in Docker environments by default. + +For verification, ensure that `DD_PROCESS_CONFIG_CONTAINER_COLLECTION_ENABLED` is set to `true`. For example: ``` -v /etc/passwd:/etc/passwd:ro --e DD_PROCESS_AGENT_ENABLED=true +-e DD_PROCESS_CONFIG_CONTAINER_COLLECTION_ENABLED=true ``` {{% /tab %}} {{% tab "Datadog Operator" %}} -The Datadog Operator enables the Process Agent by default. +The Datadog Operator enables container collection by default. For verification, ensure that `features.liveContainerCollection.enabled` is set to `true` in your `datadog-agent.yaml`: @@ -73,18 +75,20 @@ spec: {{% /tab %}} {{% tab "Helm" %}} -If you are using the [official Helm chart][1], enable the `processAgent.enabled` parameter in your [`values.yaml`][2] file: +If you are using the [official Helm chart][1], container collection is enabled by default. + +For verification, ensure that the `processAgent.containerCollection` parameter is set to `true` in your [`values.yaml`][2] file: ```yaml datadog: # (...) processAgent: - enabled: true + containerCollection: true ``` Then, upgrade your Helm chart. -In some setups, the Process Agent and Cluster Agent cannot automatically detect a Kubernetes cluster name. If this happens, the feature does not start, and the following warning displays in the Cluster Agent log: `Orchestrator explorer enabled but no cluster name set: disabling.` In this case, you must set `datadog.clusterName` to your cluster name in `values.yaml`. +In some setups, the Cluster Agent cannot automatically detect a Kubernetes cluster name. If this happens, the feature does not start, and the following warning displays in the Cluster Agent log: `Orchestrator explorer enabled but no cluster name set: disabling.` In this case, you must set `datadog.clusterName` to your cluster name in `values.yaml`. ```yaml datadog: @@ -92,7 +96,7 @@ datadog: clusterName: #(...) processAgent: - enabled: true + containerCollection: true ``` [1]: https://github.com/DataDog/helm-charts @@ -100,11 +104,11 @@ datadog: {{% /tab %}} {{% tab "Amazon ECS" %}} -Update your Task Definitions with the following environment variable: +Update your task definitions with the following environment variable: ```json { - "name": "DD_PROCESS_AGENT_ENABLED", + "name": "DD_PROCESS_CONFIG_CONTAINER_COLLECTION_ENABLED", "value": "true" } ``` @@ -169,14 +173,13 @@ ECS containers are tagged by: Kubernetes containers are tagged by: * `pod_name` -* `kube_pod_ip` * `kube_service` * `kube_namespace` * `kube_replica_set` * `kube_daemon_set` * `kube_job` * `kube_deployment` -* `kube_cluster` +* `kube_cluster_name` If you have a configuration for [Unified Service Tagging][7] in place, Datadog automatically picks up `env`, `service`, and `version` tags. Having these tags available lets you tie together APM, logs, metrics, and container data. @@ -194,7 +197,7 @@ You can switch between the "Scatter Plot" and "Timeseries" tabs in the collapsib By default, the graph groups by the `short_image` tag key. The size of each dot represents the number of containers in that group, and clicking on a dot displays the individual containers and hosts that contribute to the group. -The query at the top of the scatter plot analytic allows you to control your scatter plot analytic: +The options at the top of the graph allow you to control your scatter plot analytic: * Selection of metrics to display. * Selection of the aggregation method for both metrics. @@ -226,7 +229,7 @@ You can see indexed logs that you have chosen to index and persist by selecting {{< img src="infrastructure/livecontainers/errorlogs.png" alt="Preview Logs Side panel" style="width:100%;">}} -## Notes and known issues +## Additional information * Real-time (2s) data collection is turned off after 30 minutes. To resume real-time collection, refresh the page. * RBAC settings can restrict Kubernetes metadata collection. See the [RBAC entities for the Datadog Agent][14]. diff --git a/content/en/infrastructure/process/_index.md b/content/en/infrastructure/process/_index.md index 375aae9818c48..ac370585ec30f 100644 --- a/content/en/infrastructure/process/_index.md +++ b/content/en/infrastructure/process/_index.md @@ -186,13 +186,13 @@ For example: } ``` -To start collecting process information in ECS Fargate, add the [`PidMode` parameter][3] to the Task Definition and set it to `task` as follows: +To start collecting process information in ECS Fargate, add the [`pidMode` parameter][3] to the Task Definition and set it to `task` as follows: ```text "pidMode": "task" ``` -Once enabled, use the `AWS Fargate` Containers facet on the [Live Processes page][1] to filter processes by ECS, or enter `fargate:ecs` in the search query. +Once enabled, use the `AWS Fargate` Containers facet on the [Live Processes page][1] to filter for processes running in ECS, or enter `fargate:ecs` in the search query. {{< img src="infrastructure/process/fargate_ecs.png" alt="Processes in AWS Fargate" >}} @@ -207,7 +207,7 @@ For more information about installing the Datadog Agent with AWS ECS Fargate, se ### I/O stats -I/O and open files stats can be collected by the Datadog system-probe, which runs with elevated privileges. To enable the process module of the system-probe, use the following configuration: +I/O and open files stats can be collected by the Datadog system-probe, which runs with elevated privileges. To collect these stats, enable the process module of the system-probe: 1. Copy the system-probe example configuration: @@ -232,8 +232,10 @@ I/O and open files stats can be collected by the Datadog system-probe, which run **Note**: If the `systemctl` command is not available on your system, run the following command instead: `sudo service datadog-agent restart` -### Optimize footprint for process collection -By default, the Datadog Agent has a separate Process Agent for container and process collection. You can consolidate container and process collection to the core Agent if you're running a Linux environment. +### Optimized process collection footprint +As of Agent v7.65.0, container and process collection run in the core Datadog Agent by default on Linux, reducing the Agent's overall footprint. + +For verification, you can explicitly enable this feature. {{< tabs >}} {{% tab "Helm" %}} @@ -273,6 +275,8 @@ process_config: {{% /tab %}} {{< /tabs >}} + Explicitly setting the config flag to false will cause container and process collection to run in the separate Process Agent. Note that container and process collection always run in the separate Process Agent in non-Linux environments. + ### Process arguments scrubbing @@ -391,8 +395,8 @@ You can also filter your processes using Datadog [tags][3], such as `host`, `pod Datadog automatically generates a `command` tag, so that you can filter for: - Third-party software, for example: `command:mongod`, `command:nginx` -- Container management software, for example: `command:docker`, `command:kubelet`) -- Common workloads, for example: `command:ssh`, `command:CRON`) +- Container management software, for example: `command:docker`, `command:kubelet` +- Common workloads, for example: `command:ssh`, `command:CRON` #### Containerized environment tags @@ -405,14 +409,13 @@ Furthermore, processes in ECS containers are also tagged by: Processes in Kubernetes containers are tagged by: - `pod_name` -- `kube_pod_ip` - `kube_service` - `kube_namespace` - `kube_replica_set` - `kube_daemon_set` - `kube_job` - `kube_deployment` -- `Kube_cluster` +- `kube_cluster_name` If you have configuration for [Unified Service Tagging][4] in place, `env`, `service`, and `version` are picked up automatically. Having these tags available lets you tie together APM, logs, metrics, and process data. @@ -422,24 +425,24 @@ Having these tags available lets you tie together APM, logs, metrics, and proces You can create rule definitions to add manual tags to processes based on the command line. -1. On the **Manage Process Tags** tab, select _New Process Tag Rule_ button +1. On the **Manage Process Tags** tab, select the _New Process Tag Rule_ button 2. Select a process to use as a reference 3. Define the parsing and match criteria for your tag 4. If validation passes, create a new rule -After a rule is created, tags are available for all process command line values that match the rule criteria. These tags are be available in search and can be used in the definition of [Live Process Monitors][6] and [Custom Metrics][13]. +After a rule is created, tags are available for all process command line values that match the rule criteria. These tags are available in search and can be used in the definition of [Live Process Monitors][6] and [Custom Metrics][13]. ## Scatter plot Use the scatter plot analytic to compare two metrics with one another in order to better understand the performance of your containers. -To access the scatter plot analytic [in the Processes page][5] click on the _Show Summary graph_ button the select the "Scatter Plot" tab: +To access the scatter plot analytic [in the Processes page][5] click on the _Show Summary graph_ button then select the "Scatter Plot" tab: {{< img src="infrastructure/process/scatterplot_selection.png" alt="Scatter plot selection" style="width:60%;">}} -By default, the graph groups by the `command` tag key. The size of each dot represents the number of processes in that group, and clicking on a dot displays the individual pids and containers that contribute to the group. +By default, the graph groups by the `command` tag key. The size of each dot represents the number of processes in that group, and clicking on a dot displays the individual processes and containers that contribute to the group. -The query at the top of the scatter plot analytic allows you to control your scatter plot analytic: +The options at the top of the graph allow you to control your scatter plot analytic: - Selection of metrics to display. - Selection of the aggregation method for both metrics. @@ -491,7 +494,7 @@ You can customize integration views (for example, when aggregating a query for N ## Processes across the platform -### Live containers +### Live Containers Live Processes adds extra visibility to your container deployments by monitoring the processes running on each of your containers. Click on a container in the [Live Containers][9] page to view its process tree, including the commands it is running and their resource consumption. Use this data alongside other container metrics to determine the root cause of failing containers or deployments. diff --git a/content/en/infrastructure/process/increase_process_retention.md b/content/en/infrastructure/process/increase_process_retention.md index ff77d84b9b367..c28e3531a68fe 100644 --- a/content/en/infrastructure/process/increase_process_retention.md +++ b/content/en/infrastructure/process/increase_process_retention.md @@ -39,11 +39,11 @@ You can generate a new process-based metric directly from queries in the [**Live {{< img src="infrastructure/process/process2metrics_create.png" alt="Create a process-based metric" style="width:80%;">}} -1. **Select tags to filter your query**: The query syntax is the same as for [Live Processes][2]. Only processes matching the scope of your filters are considered for aggregation. Text search filters are supported only on the Live Processes page. +1. **Select tags to filter your query**: The available tags are the same as for [Live Processes][2]. Only processes matching the scope of your filters are considered for aggregation. Text search filters are supported only on the Live Processes page. 2. **Select the measure you would like to track**: Enter a measure such as `Total CPU %` to aggregate a numeric value and create its corresponding `count`, `min`, `max`, `sum`, and `avg` aggregated metrics. 3. **Add tags to `group by`**: Select tags to be added as dimensions to your metrics, so they can be filtered, aggregated, and compared. By default, metrics generated from processes do not have any tags unless explicitly added. Any tag available for Live Processes queries can be used in this field. 4. **Name your metric**: Fill in the name of your metric. Process-based metrics always have the prefix _proc._ and suffix _[measure_selection]_. -5. **Add percentile aggregations**: Select the _Include percentile aggregations_ checkbox to generate p50, p75, p90, p95, and p99 percentiles. Percentile metrics are also considered customer metrics, and billed accordingly. +5. **Add percentile aggregations**: Select the _Include percentile aggregations_ checkbox to generate p50, p75, p90, p95, and p99 percentiles. Percentile metrics are also considered custom metrics, and billed accordingly. You can create multiple metrics using the same query by selecting the **Create Another** checkbox at the bottom of the metric creation modal. When selected, the modal remains open after your metric has been created, with the filters and aggregation groups already filled in. @@ -67,7 +67,7 @@ To change the metric type or name, a new metric must be created. {{< img src="infrastructure/process/process2metrics_dashboard_widget.png" alt="Graphing process distribution metrics in dashboards" style="width:80%;">}} -Once created, you can use process distribution aggregate and percentile metrics like any other in Datadog. For instance: +Once created, you can use process-based metrics like any other in Datadog. For instance: - Graph process-based metrics in dashboards and notebooks to track the historical resource consumption of important workloads - Create threshold or anomaly-based monitors on top of process-based metrics to detect when CPU or RSS memory dips or spikes unexpectedly diff --git a/content/en/monitors/types/process.md b/content/en/monitors/types/process.md index 3441f2801dff6..4010052aeac93 100644 --- a/content/en/monitors/types/process.md +++ b/content/en/monitors/types/process.md @@ -25,7 +25,7 @@ Live Processes and Live Process Monitoring are included in the Enterprise plan. ## Overview -Live Process Monitors are based on data collected by the [Process Agent][1]. Create monitors that warn or alert based on the count of any group of processes across hosts or tags. +Live Process Monitors are based on data collected by [Live Processes][1]. Create monitors that warn or alert based on the count of any group of processes across hosts or tags. Live Process Monitors are best used in the following scenarios: @@ -39,7 +39,7 @@ Live Process Monitors are best used in the following scenarios: There are two ways to create a Live Process Monitor: - Using the main navigation: **Monitors --> New Monitor --> Live Process**. -- On the [Live Process page][4], search for a process you want to monitor. Then click the dropdown menu next to **+New Metric** and click **Create monitor**. +- On the [Processes page][4], search for a process you want to monitor. Then click the dropdown menu next to **+New Metric** and click **Create monitor**. ### Select processes @@ -47,11 +47,11 @@ You can use either tags or a fuzzy text search to filter across all processes in {{< img src="monitors/monitor_types/process/select_processes.png" alt="Select processes" style="width:90%;">}} -After defining your search, a graph is displayed above the search inputs with an approximation of the total number of processes found. It is recommended to keep your monitor scoped to a few thousand processes. Use additional tags to narrow the search down or consider splitting a monitor into multiple ones if needed. For more granular data, see the [Live Process page][4]. +After defining your search, a graph is displayed above the search inputs with an approximation of the total number of processes found. It is recommended to keep your monitor scoped to a few thousand processes. Use additional tags to narrow the search down or consider splitting a monitor into multiple ones if needed. For more granular data, see the [Processes page][4]. #### Tags search -Filter processes to monitor by their tags. Datadog recommends trying to filter processes by their tags before using the full text search. +Filter processes to monitor by their tags. Datadog recommends trying to filter processes by their tags before using the full text search. If the existing tags are insufficient, you can define [custom tags][8] for your processes. #### Full text search @@ -77,13 +77,13 @@ If you cannot scope processes down to the granularity you would like using tags, - The process count was `above`, `above or equal to`, `below`, or `below or equal to` - the threshold during the last `5 minutes`, `15 minutes`, `1 hour`, or larger. Additionally, you can use `custom` to set a value between 5 minutes and 24 hours. -Process Count, in this case, refers to the number of all matching processes that were alive during the time interval. +The process count refers to the number of all matching processes that were alive during the time interval. Use thresholds to set a numeric value for triggering an alert. Datadog has two types of notifications: alert and warning. Live Process Monitors recover automatically based on the alert or warning threshold. #### Best practices for timeframe selection -Live Process Monitors use a [rolling time window][7] to evaluate process count. In other words, every minute, the monitor checks the past X minutes and triggers if the alerting condition is met. Using evaluation windows shorter than 5 minutes is discouraged in order to prevent any false positives due to sporadic network disruption between the Process Agent and Datadog. +Live Process Monitors use a [rolling time window][7] to evaluate process count. In other words, every minute, the monitor checks the past X minutes and triggers if the alerting condition is met. Using evaluation windows shorter than 5 minutes is discouraged in order to prevent any false positives due to sporadic network disruption between the Agent and Datadog. ### Advanced alert conditions @@ -104,3 +104,4 @@ For detailed instructions on the **Configure notifications and automations** sec [5]: /monitors/configuration/#advanced-alert-conditions [6]: /monitors/notify/ [7]: /monitors/configuration/?tab=thresholdalert#evaluation-window +[8]: /infrastructure/process/#rules-to-create-custom-tags