feat(seer grouping): Call Seer before creating a new group #71026

lobsterkatie · 2024-05-16T16:33:32Z

This uses the helpers added in #70999 to - depending on the state of the projects:similarity-embeddings-metadata and projects:similarity-embeddings-grouping flags - decide whether we should call Seer before creating a new group, make the API call if so, and then store the results and/or use them to actually prevent new group creation in favor of using an existing similar issue. The behavior is as follows:

| metadata  | grouping | call  | metadata in | metadata in | use Seer-matched |
|   flag    |  flag    | Seer? |   event?    |   group?    |  group, if any?  |
|-----------|----------|-------|-------------|-------------|------------------|
| off       | off      | no    | -           | -           | -                |
| on        | off      | yes   | yes *       | yes         | no               |
| on or off | on       | yes   | yes *       | only if new | yes              |

* For now, the only event with the data will be the event which triggers the Seer 
call, not subsequent events with that hash. In the long run we will probably need
to store the data on the `GroupHash` record itself. 
See https://github.com/getsentry/sentry/issues/70454.

This should be enough for us to run a POC on S4S and measure the effect on grouping.

This adds two helpers, `should_call_seer_for_grouping` and `get_seer_similar_issues`, to be used when we (maybe) call Seer as part of event ingestion. `should_call_seer_for_grouping` does exactly what you'd think given the name, right now only basing the decision on feature flags and whether or not the event has a usable title and/or stacktrace. In the future we'll also include rate limit and killswitch checks, and any other criteria which it makes sense to add. `get_seer_similar_issues` is a wrapper around `get_similarity_data_from_seer` (which is what actually makes the API call to Seer). It extracts request data from the given event, makes the request, pulls together metadata about the results, and if a matching group is found and the flag is on, pulls the `Group` record out of the database. (I chose to put the feature flag check there rather than in the code where the the the grouping actually happens so that we can save the trip to the database if we're not going to end up using the results for grouping.) Code to actually use these helpers is added in #71026.

src/sentry/event_manager.py

vartec · 2024-05-16T21:04:55Z

src/sentry/event_manager.py

+                        # We only want to add this data to new groups, while we're testing
+                        # TODO: Remove this once we're out of the testing phase


Wouldn't we want to keep that for debugging even after testing phase?

We might end up keeping it, but the current plan is to take it out eventually. For debugging purposes, we have the same information ~~on the event~~ hopefully added to the corresponding GroupHash record. (See #70454.)

vartec · 2024-05-16T21:13:43Z

src/sentry/event_manager.py

+                is_new = False if seer_matched_group else True
+                is_regression = (
+                    False
+                    if is_new
+                    else _process_existing_aggregate(
+                        # If `seer_matched_group` were `None`, `is_new` would be true and we
+                        # wouldn't be here
+                        group=NonNone(seer_matched_group),
+                        event=event,
+                        incoming_group_values=group_creation_kwargs,
+                        release=release,
+                    )
+                )


purely opinion based, so entirely up to you, but I find just regular ifs more readable than using conditional expressions multiple times.

Suggested change

is_new = False if seer_matched_group else True

is_regression = (

False

if is_new

else _process_existing_aggregate(

# If `seer_matched_group` were `None`, `is_new` would be true and we

# wouldn't be here

group=NonNone(seer_matched_group),

event=event,

incoming_group_values=group_creation_kwargs,

release=release,

)

)

if is_new := not seer_matched_group:

is_regression = False

outcome = "new_group"

else:

is_regression = _process_existing_aggregate(

# If `seer_matched_group` were `None`, `is_new` would be true and we

# wouldn't be here

group=NonNone(seer_matched_group),

event=event,

incoming_group_values=group_creation_kwargs,

release=release,

)

outcome = "seer_match"

and below...

span.set_tag("outcome", outcome) metric_tags["outcome"] = outcome

It's true, I do love a good ternary. 🙂 I do sometimes get the sense that they're a little more idiomatic in JS/TS (which is - for better or worse - very much my first love, programming-language-wise) than in Python. Also for whatever reason it actually does make more sense to my brain. But you're right that there's a time and a place, and I do probably lean into it a little hard sometimes.

I'm going to leave it as is for now. [UPDATE: Partially true - I did change to is_new = not seer_matched_group.] It works, and there's lots more to do, but I do take the overall point.

(FWIW, the structure of _save_aggregate_new is different enough that when I port this there, I think I am going to have to structure it more with ifs than with ternaries.)

codecov · 2024-05-16T21:27:53Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.90%. Comparing base (d6767ca) to head (a60dc00).
Report is 2 commits behind head on master.

Additional details and impacted files

@@             Coverage Diff             @@
##           master   #71026       +/-   ##
===========================================
+ Coverage   56.77%   77.90%   +21.12%     
===========================================
  Files        6516     6525        +9     
  Lines      290322   290617      +295     
  Branches    50236    50286       +50     
===========================================
+ Hits       164834   226399    +61565     
+ Misses     120747    57971    -62776     
- Partials     4741     6247     +1506

see 1955 files with indirect coverage changes

jangjodi · 2024-05-17T15:17:19Z

src/sentry/event_manager.py

+                                "seer_similarity"
+                            ] = seer_response_data
+
+                    except Exception as e:


do we know what exception types we're catching here? or is this meant to be a generic catch so that we don't break anything and lose events?

Much more the latter. Any reasonably-anticipatable errors in that logic are caught before they bubble up this high, but the last thing I want to do is break ingestion, so it seemed wise to try-catch it just in case. (But yeah, I can add a comment, because it's a reasonable question. [UPDATE: Done.])

This adds two helpers, `should_call_seer_for_grouping` and `get_seer_similar_issues`, to be used when we (maybe) call Seer as part of event ingestion. `should_call_seer_for_grouping` does exactly what you'd think given the name, right now only basing the decision on feature flags and whether or not the event has a usable title and/or stacktrace. In the future we'll also include rate limit and killswitch checks, and any other criteria which it makes sense to add. `get_seer_similar_issues` is a wrapper around `get_similarity_data_from_seer` (which is what actually makes the API call to Seer). It extracts request data from the given event, makes the request, pulls together metadata about the results, and if a matching group is found and the flag is on, pulls the `Group` record out of the database. (I chose to put the feature flag check there rather than in the code where the the the grouping actually happens so that we can save the trip to the database if we're not going to end up using the results for grouping.) Code to actually use these helpers is added in #71026.

This uses the helpers added in #70999 to - depending on the state of the `projects:similarity-embeddings-metadata` and `projects:similarity-embeddings-grouping` flags - decide whether we should call Seer before creating a new group, make the API call if so, and then store the results and/or use them to actually prevent new group creation in favor of using an existing similar issue. The behavior is as follows: | metadata | grouping | call | metadata in | metadata in | use Seer-matched | | flag | flag | Seer? | event? | group? | group, if any? | |-----------|----------|-------|-------------|-------------|------------------| | off | off | no | - | - | - | | on | off | yes | yes * | yes | no | | on or off | on | yes | yes * | only if new | yes | * For now, the only event with the data will be the event which triggers the Seer call, not subsequent events with that hash. In the long run we will probably need to store the data on the `GroupHash` record itself. See #70454. This should be enough for us to run a POC on S4S and measure the effect on grouping.

This comment was marked as off-topic.

Sign in to view

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label May 16, 2024

lobsterkatie force-pushed the kmclb-call-seer-before-creating-a-new-group branch from 17bef00 to 17b2bb7 Compare May 16, 2024 17:11

vercel bot deployed to Preview May 16, 2024 17:14 View deployment

lobsterkatie force-pushed the kmclb-add-seer-ingest-helpers branch from 7df4e9e to 49ec31c Compare May 16, 2024 17:17

lobsterkatie force-pushed the kmclb-call-seer-before-creating-a-new-group branch from 17b2bb7 to 90f52bb Compare May 16, 2024 17:23

vercel bot deployed to Preview May 16, 2024 17:26 View deployment

lobsterkatie force-pushed the kmclb-add-seer-ingest-helpers branch 2 times, most recently from ebf05ed to 2d9484b Compare May 16, 2024 18:27

lobsterkatie marked this pull request as ready for review May 16, 2024 18:31

lobsterkatie requested a review from a team as a code owner May 16, 2024 18:31

lobsterkatie requested review from JoshFerge and jangjodi May 16, 2024 18:31

lobsterkatie force-pushed the kmclb-call-seer-before-creating-a-new-group branch from 90f52bb to 42793eb Compare May 16, 2024 18:32

vercel bot deployed to Preview May 16, 2024 18:35 View deployment

lobsterkatie force-pushed the kmclb-add-seer-ingest-helpers branch from 2d9484b to b2bc90b Compare May 16, 2024 19:56

lobsterkatie force-pushed the kmclb-call-seer-before-creating-a-new-group branch from 42793eb to 5a1d263 Compare May 16, 2024 19:56

vercel bot deployed to Preview May 16, 2024 19:59 View deployment

lobsterkatie mentioned this pull request May 16, 2024

feat(seer grouping): Add Seer-related ingest helpers #70999

Merged

Base automatically changed from kmclb-add-seer-ingest-helpers to master May 16, 2024 20:55

lobsterkatie force-pushed the kmclb-call-seer-before-creating-a-new-group branch from 5a1d263 to 9d64934 Compare May 16, 2024 20:58

vercel bot deployed to Preview May 16, 2024 21:00 View deployment

vartec reviewed May 16, 2024

View reviewed changes

src/sentry/event_manager.py Outdated Show resolved Hide resolved

vartec reviewed May 16, 2024

View reviewed changes

vartec approved these changes May 16, 2024

View reviewed changes

jangjodi reviewed May 17, 2024

View reviewed changes

lobsterkatie force-pushed the kmclb-call-seer-before-creating-a-new-group branch from 9d64934 to 47d36a3 Compare May 17, 2024 17:08

vercel bot deployed to Preview May 17, 2024 17:12 View deployment

lobsterkatie added 5 commits May 17, 2024 10:20

add seer call to _save_aggregate

7cfa181

fix metrics

e3f4021

update comments to reflect possible seer call

c6caa2a

add TODOs

a4ad689

add tests for using seer during grouping

a60dc00

lobsterkatie force-pushed the kmclb-call-seer-before-creating-a-new-group branch from 47d36a3 to a60dc00 Compare May 17, 2024 17:21

vercel bot deployed to Preview May 17, 2024 17:24 View deployment

lobsterkatie merged commit a411b4a into master May 17, 2024
51 checks passed

lobsterkatie deleted the kmclb-call-seer-before-creating-a-new-group branch May 17, 2024 18:45

github-actions bot locked and limited conversation to collaborators Jun 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(seer grouping): Call Seer before creating a new group #71026

feat(seer grouping): Call Seer before creating a new group #71026

Uh oh!

lobsterkatie commented May 16, 2024 •

edited

Loading

Uh oh!

This comment was marked as off-topic.

Uh oh!

vartec May 16, 2024

Uh oh!

lobsterkatie May 16, 2024

Uh oh!

vartec May 16, 2024

Uh oh!

lobsterkatie May 16, 2024 •

edited

Loading

Uh oh!

codecov bot commented May 16, 2024 •

edited

Loading

Uh oh!

jangjodi May 17, 2024

Uh oh!

lobsterkatie May 17, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

		# We only want to add this data to new groups, while we're testing
		# TODO: Remove this once we're out of the testing phase

Uh oh!

feat(seer grouping): Call Seer before creating a new group #71026

feat(seer grouping): Call Seer before creating a new group #71026

Uh oh!

Conversation

lobsterkatie commented May 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as off-topic.

Uh oh!

vartec May 16, 2024

Choose a reason for hiding this comment

Uh oh!

lobsterkatie May 16, 2024

Choose a reason for hiding this comment

Uh oh!

vartec May 16, 2024

Choose a reason for hiding this comment

Uh oh!

lobsterkatie May 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented May 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jangjodi May 17, 2024

Choose a reason for hiding this comment

Uh oh!

lobsterkatie May 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lobsterkatie commented May 16, 2024 •

edited

Loading

lobsterkatie May 16, 2024 •

edited

Loading

codecov bot commented May 16, 2024 •

edited

Loading

lobsterkatie May 17, 2024 •

edited

Loading