This is an unstructured dumping ground for notes during development.
I've used smartstring for longterm string storage, but might want to consider flexstr as well.
My initial filter syntax design was focused on filtering of events in an event
stream, e.g. as observed by the fmt layer's output. However, tracing's model is
richer than that, and filtering spans is also necessary for a proper filter. In
fact, in construction of this language, I misunderstood what the env filter
target[span]
did. I thought it enabled events within target
within a span
named span
, but in fact it enables all events within a span named span
which is itself marked with target
.
The difficulty in making a nice-to-use format is that target
enables both
spans within target
and events within target
. The 99% case will want to
enable both together. However, separation has reasonably simple user stories as
well; consider enabling all spans for a crate, but only events within a specific
module (i.e. target).
One idea I had was to use ?span#target
to specify span name/target together
(basically stealing URL fragment syntax for it). That could allow the symantics
target
: spans and events with targettarget
?span
: spans with namespan
#target
: spans with targettarget
* #span
: events within a span with targettarget
Also, I think I want to inverse the current CSS inspired ?span > ?span
to be
target ?span < ?span
, to maintain a consistent inside-out order. (Perhaps +
can be used here? But the inside-out order seems pertinent.)
Combined with a rule that any spans used to enable events are enabled, this seems somewhat reasonable, at least.
For recorded event filtering, it would certainly also be useful to be able to filter e.g. a json serialized event stream and only have to process/deserialize a single event at a time. This can be done with e.g. json db queries, but using the same filter language as runtime filters is beneficial, and the domain knowledge added makes filtering easier.
I believe filtering on event fields to be impossible with today's Subscribe
(published Layer
) design. The reason is that Subscribe::enabled
gets only
Metadata
and Context
, and the recording of fields at Subscribe::on_record
only happens after enabled
is determined.
This could be addressable by making on_event
return ControlFlow
; the
Layered
collector would then only continue recording an event if subscribers
report they want to ControlFlow::Continue
to do so.
Per my reading of tracing_subscriber::EnvFilter, it matches fields on entry
rather than recording and matching at filter time. This is good! AIUI it's an
optimization that allows a) the field matching to be memoized and b) a negative
filter to early-cancel contained spans/events. Unfortunately... more complicated
queries such as (my_crate ?span_a ?span_b)=TRACE
don't
lend themselves as well to such caching. It's possible, and perhaps worth
doing, but basically requires making an automaton to handle more complex cases
which the query language supports, like ((?a ?b | ?a > ?c) & ?d)
. Plus, I
would like to support filtering recorded spans (e.g. tracing-memory, another
semi-abandoned project of mine), and those don't really have the same enter/exit
behavior... but maybe I can just "replay" the events to filter them through a
memoized aproach, once it exists?
I think, first-pass, proof-of-concept, serialize span fields into Extensions
and do a full match on each event, rather than putting the development effort
into generating the automaton while the project is still experimental.
We completely punt on the static directive optimization that tracing_subscriber EnvFilter has for the time being. This will almost certainly need to be looked into at a later point to match env_logger/current perf for statically disabled events. (Note "static" here means always for the collector, not compile-time.) Callsite caching of static directives is certainly an important optimization.
- Nested span matching is based on CSS selectors;
element element
is transitive contains;element > element
is direct contains. However, there's no way to notate root ("> element
", maybe? Makes the grammar awkward[^1]), and while that's not something CSS wants, it'd be useful for us. - Field syntax should be ready for
valuable::Structable
.- Proposal:
valuable::Listable
can be handled by[]
- Proposal:
valuable::Enumerable
can be handled by= Name? Fields
- Weak proposal:
valuable::Tuplable
could be directly handled by(,)
, but I'm not super happy with that solution since()
is semantic for queries outside of{}
. - I have no idea how to handle
valuable::Mappable
in a not-bad way. - All of this depends on how exactly tracing
valuable
support pans out. I probably shouldn't bother with implementing nestedField
s yet, even.
- Proposal:
- Field presence without a comparison value just asks for its presence.
- It'd be nice to provide a translation to JSONiq or similar for the JSON event formatter.
- The query language technically isn't a query language AIUI, as it doesn't return structured results; it only offers filtering. SQL for tracing events is a much bigger task than I'm personally willing to take on.
- We probably want a way to match
my_app
but notmy_app::module
. This is a concern even iftracing
is adjusted to not matchtracing_filter
. Proposal:- String targets match the target exactly. Rationale: module nesting is common
for
module::path
style automatic targets, but if someone specifies a custom target that does not meet this convention, they probably aren't using nesting and matching the exact specified target would work correctly. #my_app
to match exactly. Rationale:#
is used to make strings "more literal", so#my_app
would mean the patternmy_app
, but "more literal". This would also mean that"my_app"
would get the module matching behavior, but#"my_app"#
would be an exact match. But maybe this is too mean to the parser, since "token kind" isn't LL(1) anymore?- Make
my_app
only match the literal targetmy_app
, and add a fuzzy match syntax. Problem: this deviates greatly from existing practice, probably too much. However, it would be more consistent with field names (see next).
- String targets match the target exactly. Rationale: module nesting is common
for
- Field names should certainly always be exact matches. This is different and somewhat inconsistent with target patterns.
- Matching
field = "string"
should probably be an exact match, as that's what=
logically means. However, env_logger provides a regex match for the event message, and that's quite useful for working with less-structured events. Proposal:field ~ "regex"
(or~=
). - Do we want a shorthand for
{ message ~ "regex" }
? Proposal: allow a tail~ "regex"
inSelect
to mean a{ message ~ "regex" }
event query. - String syntax doesn't provide escapes, which might be surprising? I just don't want to support them, though.
field.field
shorthand forfield = { field }
seems desirable.
As currently specified, the query language uses the ASCII subset of
UAX31 Pattern Syntax.
That is, \t\n\v\f\r
are considered whitespace, and we reserve ASCII symbols
!"#$%&'()*+,-./;<=>?@[\]^`{|}~
for syntax; any other characters are treated
as "pattern" characters. For use convenience, _:
are also considered pattern
characters; _
is a valid rust ident char, and :
shows up in common targets.
The current grammar uses "#&(),-<=>?{|}
. All of the other syntax characters
!$%'*+./;[\]^`~
are reserved and can be given semantics in future updates.
Also, invalid syntax can of course be given meaning.
We probably want to keep prefix matching for targets, so that a my_app
query
returns all events from my_app
, including ones from modules, so they have
targets like my_app::error
or my_app::tracing
. However, it's worth noting
that a simple prefix matcher means that the tracing
filter will also include
events from tracing_subscriber
and tracing_filter
, so perhaps a slightly
different rule is warrented; perhaps == "my_app" || .starts_with("my_app:")
?
Or perhaps the regex my_app(?-u:\b)
(any character not in [0-9A-Za-z_]
follows.)?