Skip to content

Removed upb_Decoder's stateful, bespoke logic for field lookup. #21707

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 13, 2025

Conversation

copybara-service[bot]
Copy link

@copybara-service copybara-service bot commented May 12, 2025

Removed upb_Decoder's stateful, bespoke logic for field lookup.

upb_Decoder used to have its own logic for finding a MiniTable field by number. Instead of using the standard upb_MiniTable_FindFieldByNumber() function, it used a custom linear search that cached the most recently seen field index. The theory behind this design was that field numbers are usually serialized in increasing order, so by linear searching from the last seen index, we would have a small expected number of elements to inspect before finding the matching one.

In practice, it appears that we can use the generic searching logic with no loss in performance -- in fact, performance increases by as much as 6% (see attached benchmark results).

Removing the stateful search has several benefits:

  1. This simplifies the decoder since we no longer have to preserve this extra state (last_field_index). Note that it was not even possible to put this variable into upb_Decoder because it is preserved per parsed message, so we previously needed a stack of last_field_index values. This CL lets us get rid of that completely.
  2. This makes parsing performance less sensitive to whether field numbers are serialized in order.
  3. This simplification will help as we refine the dispatch logic that switches between the standard and fasttable decoders.

@copybara-service copybara-service bot force-pushed the test_757333347 branch 2 times, most recently from ac8144f to dcfee8e Compare May 13, 2025 18:00
upb_Decoder used to have its own logic for finding a MiniTable field by number.  Instead of using the standard `upb_MiniTable_FindFieldByNumber()` function, it used a custom linear search that cached the most recently seen field index.  The theory behind this design was that field numbers are usually serialized in increasing order, so by linear searching from the last seen index, we would have a small expected number of elements to inspect before finding the matching one.

In practice, it appears that we can use the generic searching logic with no loss in performance -- in fact, performance increases by as much as 6% (see attached benchmark results).

Removing the stateful search has several benefits:

1. This simplifies the decoder since we no longer have to preserve this extra state (`last_field_index`).  Note that it was not even possible to put this variable into `upb_Decoder` because it is preserved per parsed message, so we previously needed a *stack* of `last_field_index` values.  This CL lets us get rid of that completely.
2. This makes parsing performance less sensitive to whether field numbers are serialized in order.
3. This simplification will help as we refine the dispatch logic that switches between the standard and fasttable decoders.

PiperOrigin-RevId: 758304268
@copybara-service copybara-service bot merged commit 1c088d8 into main May 13, 2025
@copybara-service copybara-service bot deleted the test_757333347 branch May 13, 2025 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants