Skip to content

[ML] Inference request count telemetry per node #110947

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

jonathan-buttner
Copy link
Contributor

@jonathan-buttner jonathan-buttner commented Jul 16, 2024

WIP

This PR wires up the telemetry code to record inference request counts by model id (if it is defined).

The inference section of the telemetry looks like this now:

"inference": {
        "available": true,
        "enabled": true,
        "models": [
            {
                "service": "cohere",
                "task_type": "RERANK",
                "count": 1
            },
            {
                "service": "cohere",
                "task_type": "TEXT_EMBEDDING",
                "count": 1
            },
            {
                "service": "openai",
                "task_type": "TEXT_EMBEDDING",
                "count": 1
            }
        ],
        "requests": [
            {
                "service": "cohere",
                "task_type": "rerank",
                "count": 1,
                "model_id": "rerank-english-v3.0"
            },
            {
                "service": "cohere",
                "task_type": "text_embedding",
                "count": 1,
                "model_id": "embed-english-v3.0"
            },
            {
                "service": "openai",
                "task_type": "text_embedding",
                "count": 1,
                "model_id": "text-embedding-3-small"
            }
        ]
    },

TODOs

  • Spinning up a cluster and testing that the apm metering works as expected
  • Making the changes in the telemetry repo to index the inference usage in a new index

@jonathan-buttner jonathan-buttner added >non-issue :ml Machine learning Team:ML Meta label for the ML team v8.16.0 labels Jul 16, 2024
@jonathan-buttner jonathan-buttner added the cloud-deploy Publish cloud docker image for Cloud-First-Testing label Jul 16, 2024
@jonathan-buttner
Copy link
Contributor Author

@elasticmachine merge upstream

@prwhelan
Copy link
Member

@elasticmachine update branch

@elasticmachine
Copy link
Collaborator

merge conflict between base and head

@prwhelan
Copy link
Member

@elasticmachine update branch

@prwhelan
Copy link
Member

Tested with APM:

{
  "_index": ".ds-metrics-apm.app.elasticsearch-default-2024.07.22-000001",
    ...
    "data_stream": {
      "dataset": "apm.app.elasticsearch",
      "namespace": "default",
      "type": "metrics"
    },
    "es": {
      "inference": {
        "requests": {
          "count": {
            "total": 8
          }
        }
      }
    },
    ...
    "labels": {
      "model_id": ".elser_model_2_linux-x86_64",
      "otel_instrumentation_scope_name": "elasticsearch",
      "service": "elser",
      "task_type": "sparse_embedding"
    },
    ...
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cloud-deploy Publish cloud docker image for Cloud-First-Testing :ml Machine learning >non-issue Team:ML Meta label for the ML team v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants