You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/running.md
+1-24Lines changed: 1 addition & 24 deletions
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@ sudo docker run \
19
19
20
20
cAdvisor is now running (in the background) on `http://localhost:8080/`. The setup includes directories with Docker state cAdvisor needs to observe.
21
21
22
-
**Note**:
22
+
**Note**:
23
23
- If docker daemon is running with [user namespace enabled](https://docs.docker.com/engine/reference/commandline/dockerd/#starting-the-daemon-with-user-namespaces-enabled),
24
24
you need to add `--userns=host` option in order for cAdvisor to monitor Docker containers,
25
25
otherwise cAdvisor can not connect to docker daemon.
@@ -122,26 +122,3 @@ cAdvisor is now running (in the foreground) on `http://localhost:8080/`.
122
122
## Runtime Options
123
123
124
124
cAdvisor has a series of flags that can be used to configure its runtime behavior. More details can be found in runtime [options](runtime_options.md).
125
-
126
-
## Hardware Accelerator Monitoring
127
-
128
-
cAdvisor can export some metrics for hardware accelerators attached to containers.
129
-
Currently only Nvidia GPUs are supported. There are no machine level metrics.
130
-
So, metrics won't show up if no container with accelerators attached is running.
131
-
Metrics will only show up if accelerators are explicitly attached to the container, e.g., by passing `--device /dev/nvidia0:/dev/nvidia0` flag to docker.
132
-
If nothing is explicitly attached to the container, metrics will NOT show up. This can happen when you access accelerators from privileged containers.
133
-
134
-
There are two things that cAdvisor needs to show Nvidia GPU metrics:
135
-
- access to NVML library (`libnvidia-ml.so.1`).
136
-
- access to the GPU devices.
137
-
138
-
If you are running cAdvisor inside a container, you will need to do the following to give the container access to NVML library:
139
-
```
140
-
-e LD_LIBRARY_PATH=<path-where-nvml-is-present>
141
-
--volume <above-path>:<above-path>
142
-
```
143
-
144
-
If you are running cAdvisor inside a container, you can do one of the following to give it access to the GPU devices:
145
-
- Run with `--privileged`
146
-
- If you are on docker v17.04.0-ce or above, run with `--device-cgroup-rule 'c 195:* mrw'`
147
-
- Run with `--device /dev/nvidiactl:/dev/nvidiactl /dev/nvidia0:/dev/nvidia0 /dev/nvidia1:/dev/nvidia1 <and-so-on-for-all-nvidia-devices>`
Copy file name to clipboardExpand all lines: docs/runtime_options.md
+12-12Lines changed: 12 additions & 12 deletions
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@ This document describes a set of runtime flags available in cAdvisor.
10
10
11
11
*`--env_metadata_whitelist`: a comma-separated list of environment variable keys that needs to be collected for containers, only support containerd and docker runtime for now.
12
12
13
-
## Limiting which containers are monitored
13
+
## Limiting which containers are monitored
14
14
*`--docker_only=false` - do not report raw cgroup metrics, except the root cgroup.
15
15
*`--raw_cgroup_prefix_whitelist` - a comma-separated list of cgroup path prefix that needs to be collected even when `--docker_only` is specified
@@ -134,8 +134,8 @@ cAdvisor stores the latest historical data in memory. How long of a history it s
134
134
--application_metrics_count_limit=100: Max number of application metrics to store (per container) (default 100)
135
135
--collector_cert="": Collector's certificate, exposed to endpoints for certificate based authentication.
136
136
--collector_key="": Key for the collector's certificate
137
-
--disable_metrics=<metrics>: comma-separated list of metrics to be disabled. Options are accelerator,advtcp,app,cpu,cpuLoad,cpu_topology,cpuset,disk,diskIO,hugetlb,memory,memory_numa,network,oom_event,percpu,perf_event,process,referenced_memory,resctrl,sched,tcp,udp. (default advtcp,cpu_topology,cpuset,hugetlb,memory_numa,process,referenced_memory,resctrl,sched,tcp,udp)
138
-
--enable_metrics=<metrics>: comma-separated list of metrics to be enabled. If set, overrides 'disable_metrics'. Options are accelerator,advtcp,app,cpu,cpuLoad,cpu_topology,cpuset,disk,diskIO,hugetlb,memory,memory_numa,network,oom_event,percpu,perf_event,process,referenced_memory,resctrl,sched,tcp,udp.
137
+
--disable_metrics=<metrics>: comma-separated list of metrics to be disabled. Options are advtcp,app,cpu,cpuLoad,cpu_topology,cpuset,disk,diskIO,hugetlb,memory,memory_numa,network,oom_event,percpu,perf_event,process,psi_avg,psi_total,referenced_memory,resctrl,sched,tcp,udp. (default advtcp,cpu_topology,cpuset,hugetlb,memory_numa,process,referenced_memory,resctrl,sched,tcp,udp)
138
+
--enable_metrics=<metrics>: comma-separated list of metrics to be enabled. If set, overrides 'disable_metrics'. Options are advtcp,app,cpu,cpuLoad,cpu_topology,cpuset,disk,diskIO,hugetlb,memory,memory_numa,network,oom_event,percpu,perf_event,process,psi_avg,psi_total,referenced_memory,resctrl,sched,tcp,udp.
139
139
--prometheus_endpoint="/metrics": Endpoint to expose Prometheus metrics on (default "/metrics")
*[Intel® 64 and IA32 Architectures Performance Monitoring Events](https://software.intel.com/content/www/us/en/develop/download/intel-64-and-ia32-architectures-performance-monitoring-events.html)
247
247
248
248
249
249
##### Uncore Events configuration
250
250
Uncore Event name should be in form `PMU_PREFIX/event_name` where **PMU_PREFIX** mean
251
251
that statistics would be counted on all PMUs with that prefix in name.
252
252
253
-
Let's explain this by example:
253
+
Let's explain this by example:
254
254
255
255
```json
256
256
{
@@ -260,7 +260,7 @@ Let's explain this by example:
260
260
"uncore_imc_0/cas_count_write",
261
261
"cas_count_all"
262
262
],
263
-
"custom_events": [
263
+
"custom_events": [
264
264
{
265
265
"config": [
266
266
"0x304"
@@ -419,11 +419,11 @@ See example configuration below:
419
419
```
420
420
421
421
In the example above:
422
-
*`instructions` will be measured as a non-grouped event and is specified using human friendly interface that can be
423
-
obtained by calling `perf list`. You can use any name that appears in the output of `perf list` command. This is
422
+
*`instructions` will be measured as a non-grouped event and is specified using human friendly interface that can be
423
+
obtained by calling `perf list`. You can use any name that appears in the output of `perf list` command. This is
424
424
interface that majority of users will rely on.
425
425
*`instructions_retired` will be measured as non-grouped event and is specified using an advanced API that allows
426
-
to specify any perf event available (some of them are not named and can't be specified with plain string). Event name
426
+
to specify any perf event available (some of them are not named and can't be specified with plain string). Event name
427
427
should be a human readable string that will become a metric name.
428
428
*`cas_count_read` will be measured as uncore non-grouped event on all Integrated Memory Controllers Performance Monitoring Units because of unset `type` field and
429
429
`uncore_imc` prefix.
@@ -435,7 +435,7 @@ Resctrl file system is not hierarchical like cgroups, so users should set `--doc
435
435
436
436
```
437
437
--resctrl_interval=0: Resctrl mon groups updating interval. Zero value disables updating mon groups.
0 commit comments