Skip to content

subscriber: use state machine to parse EnvFilter directives #3243

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 8, 2025

Conversation

djc
Copy link
Contributor

@djc djc commented Mar 25, 2025

Intended to fix #3174. Have not done any benchmarks yet -- suggestions on how best to approach that are welcome.

This is a little bit more code, but IMO maybe slightly more readable?

It's correct enough that it passes the tests, there might be edge cases that aren't covered but those should be easy to solve.

@dpc
Copy link
Contributor

dpc commented Mar 26, 2025

> hyperfine --warmup 30 -N ./target/release/dotr-before ./target/release/dotr-after
Benchmark 1: ./target/release/dotr-before
  Time (mean ± σ):       1.2 ms ±   0.1 ms    [User: 0.4 ms, System: 0.7 ms]
  Range (min … max):     1.1 ms …   1.9 ms    2044 runs

Benchmark 2: ./target/release/dotr-after
  Time (mean ± σ):     613.8 µs ±  79.6 µs    [User: 289.1 µs, System: 254.0 µs]
  Range (min … max):   511.5 µs … 979.8 µs    5264 runs

Summary
  ./target/release/dotr-after ran
    2.03 ± 0.34 times faster than ./target/release/dotr-before 

Added true just to have a baseline:

Benchmark 1: ./target/release/dotr-before
  Time (mean ± σ):       1.2 ms ±   0.1 ms    [User: 0.4 ms, System: 0.7 ms]
  Range (min … max):     1.1 ms …   1.8 ms    2535 runs

Benchmark 2: ./target/release/dotr-after
  Time (mean ± σ):     611.7 µs ±  80.3 µs    [User: 295.6 µs, System: 244.1 µs]
  Range (min … max):   518.8 µs … 1663.5 µs    3825 runs

Benchmark 3: true
  Time (mean ± σ):     595.8 µs ±  74.1 µs    [User: 311.4 µs, System: 215.4 µs]
  Range (min … max):   512.3 µs … 1035.5 µs    5362 runs

This app sets RUST_LOG internally, so setting it from the outside doesn't make a difference.

AFAICT you fixed it and it's nearly free now.

@klensy
Copy link
Contributor

klensy commented Mar 27, 2025

After this, regex can be removed from Cargo.toml (for tracing-subscriber), only left in dev-deps?

@djc djc force-pushed the parse-directive branch from 12445d7 to b298124 Compare March 27, 2025 12:21
@djc
Copy link
Contributor Author

djc commented Mar 27, 2025

After this, regex can be removed from Cargo.toml (for tracing-subscriber), only left in dev-deps?

Nice catch, yes! Added that into the first commit.

@djc djc force-pushed the parse-directive branch 2 times, most recently from d533cdf to a4484c3 Compare March 29, 2025 10:38
Copy link
Contributor

@hds hds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please rebase this against master. We merge everything in there first and then David or I will handle backporting to v0.1.x.

Before making this change, I think we need more tests to ensure that the behavior isn't changing. Especially because there are things I'm seeing in the code that are different to what the docs say.

I'm happy to have a look at writing some of those tests, but it may take me a little bit to get to it.

hds added a commit that referenced this pull request Apr 30, 2025
With a view to replacing the env filter parsing in #3243, this change
adds some additional tests to improve our confidence in not breaking
existing behavior. Tests for empty and invalid directives is added, as
well as tests for directives overriding less specific directives.

One of the latter tests (`more_specific_dynamic_filter_less_verbose`)
currently fails, which is a known issue reported in #1388. The test is
in place with `#[should_panic]` and can be reverted to a normal test
when that behavior is fixed.

The documentation on `parse`, `parse_lossy`, `from_env_lossy` and
`try_from_env` has also been made more explicit as to the result of an
empty directive (the default directive is used). This was already
documented on `with_default_directive`.
@djc djc force-pushed the parse-directive branch from a4484c3 to bab5257 Compare April 30, 2025 16:07
@djc djc requested a review from yaahc as a code owner April 30, 2025 16:07
@djc djc changed the base branch from v0.1.x to master April 30, 2025 16:07
@djc djc requested a review from davidbarsky as a code owner April 30, 2025 16:07
@djc djc force-pushed the parse-directive branch from bab5257 to 2007997 Compare April 30, 2025 16:21
@djc djc force-pushed the parse-directive branch from 2007997 to 06e5695 Compare April 30, 2025 16:26
for (i, c) in from.trim().char_indices() {
state = match (state, c) {
(Start, '[') => Span { span_start: i + 1 },
(Start, c) if !c.is_alphanumeric() => return Err(ParseError::new()),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this arm to pass the new test cases from #3262. This seems like a reasonable way to weed out invalid stuff while allowing non-ASCII target names?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to match the previous regex here, and is_alphanumeric is missing a couple of characters.

In the current implementation, a target accepts [\w:-] so as well as : and -, we need everything specified by \w, which the regex docs specify as: word character (\p{Alphabetic} + \p{M} + \d + \p{Pc} + \p{Join_Control}).

Is there some way we can get those character classes in Rust?

Otherwise, the real thing that made that test invalid is that it started with a comma. I'm happy to modify the test if we want to accept starting with a comma (and ignore empty directives). I think it would be acceptable to be more permissive about what characters we accept for the target (or a span name for example), but we should not be more restrictive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a !['-', ':', '_'] clause, which covers the : and - that were there explicitly plus _ as the character from \p{Pc} that seems like something we want to support. Of course \p{Alphabetic} and \d are covered via is_alphanumeric(). I don't think there's a straightforward way to support check for the other properties, but it also seems pretty unlikely these will cause problems.

@djc
Copy link
Contributor Author

djc commented Apr 30, 2025

@djc djc requested a review from hds April 30, 2025 16:28
for (i, c) in from.trim().char_indices() {
state = match (state, c) {
(Start, '[') => Span { span_start: i + 1 },
(Start, c) if !c.is_alphanumeric() => return Err(ParseError::new()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to match the previous regex here, and is_alphanumeric is missing a couple of characters.

In the current implementation, a target accepts [\w:-] so as well as : and -, we need everything specified by \w, which the regex docs specify as: word character (\p{Alphabetic} + \p{M} + \d + \p{Pc} + \p{Join_Control}).

Is there some way we can get those character classes in Rust?

Otherwise, the real thing that made that test invalid is that it started with a comma. I'm happy to modify the test if we want to accept starting with a comma (and ignore empty directives). I think it would be acceptable to be more permissive about what characters we accept for the target (or a span name for example), but we should not be more restrictive.

@djc djc force-pushed the parse-directive branch from 06e5695 to d2d99b6 Compare May 5, 2025 13:16
Copy link
Contributor

@hds hds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all your work on this implementation!

@hds hds changed the title Use state machine to parse directives subscriber: use state machine to parse EnvFilter directives May 8, 2025
@hds hds merged commit 28561e0 into master May 8, 2025
57 checks passed
@hds hds deleted the parse-directive branch May 8, 2025 09:51
@djc
Copy link
Contributor Author

djc commented May 8, 2025

Thank you for all your work on this implementation!

Thank you for the careful reviews!

@hds hds mentioned this pull request May 21, 2025
21 tasks
hds pushed a commit that referenced this pull request May 28, 2025
There is a report in #3174 that even in release mode, building the regex
used to parse `EnvFilter` directives can take a relatively large amount
of time (600us).

This change replaces the `regex` based parsing of the directives with a
state machine implementation that is faster and also easier to reason
about.

Fixes: #3174
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tracing-subscriber's Directive parsing adds 600us to startup time
5 participants