Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update fixes #2

Merged
merged 15 commits into from
Mar 12, 2024
Merged

Update fixes #2

merged 15 commits into from
Mar 12, 2024

Conversation

tomtau
Copy link
Contributor

@tomtau tomtau commented Feb 21, 2024

@TheVeryDarkness I previously merged that original PR by rebasing it, because I wanted to preserve a linear history without merge commits... but that made a merge conflict between those two master branches, so I created a new branch off the current master (update-fixes) and cherry-picked the new commits from your master.
How do you prefer to continue working on it going forward? Continue on that diverged master and I'll cherry-pick the new commits, or work directly on a branch?

Summary by CodeRabbit

  • New Features

    • Renamed function parse_with to parse_with_partial and added check_with_partial in repetition.rs.
    • Added methods try_parse_with_partial, try_parse_partial, check_with_partial, check_partial, parse_partial to TypedNode.
    • Introduced FullRuleStruct trait for rules with full capacity.
    • Defined PairContainer methods for handling pairs and children pairs.
    • Defined PairTree trait extending PairContainer with pair tree operation methods.
    • Introduced TypedParser trait for typed syntax tree production with parsing methods.
  • Enhancements

    • Simplified handling of documentation comments in grammar generation for improved readability.
    • Added box_all_rules configuration flag for controlling rule boxing behavior in the parser.
    • Enhanced code documentation with detailed attributes for enum variants and struct fields.
  • Refactor

    • Revised logic for generating rule enums and content type handling based on configuration flags.
    • Restructured documentation comment storage for better management.
    • Renamed functions and added methods to traits for improved functionality and clarity.
  • Documentation

    • Updated grammar and rule documentation for clarity and understanding.
    • Enriched library documentation with detailed comments and attributes.
  • Bug Fixes

    • Fixed attribute parsing to prevent potential runtime errors.

Copy link

coderabbitai bot commented Feb 21, 2024

Walkthrough

This update introduces a comprehensive enhancement across the board for a JSON parser using the pest library. It includes a new JSON parser definition, streamlined handling of rule documentation, extended support for grammar and rule documentation, upgraded typed syntax tree functionality, and new traits for improved parsing operations. The changes aim to make the parser more robust, maintainable, and easier to understand.

Changes

Files Summary
derive/tests/json.pest, derive/tests/json.rs Introduced a parser for JSON with tests covering basic and structural parsing.
generator/src/..., meta/src/..., pest/src/... Enhanced documentation handling, simplified rule enum generation, and improved typed syntax tree functionality. Added support for grammar and rule documentation.
generator/src/typed/..., pest/src/typed/... Modified parsing logic with new traits and methods for better parsing operations and reachability analysis.
pest/src/{choice.rs, formatter.rs, lib.rs, sequence.rs} Added documentation and attributes, introduced core parts of pest3, and updated sequence type generation.

"In the realm of code, where the parsers play,
🐇 A rabbit hopped, crafting JSON's way.
With pest in paw, it weaved through the night,
🌟 Enhancing each rule, with documentation bright.
Through sequences, choices, and trees it did hop,
Leaving behind code, that's top of the crop!"
🚀🐰✨

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit-tests for this file.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository from git and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 12

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 7d8add0 and 9cfbba9.
Files selected for processing (21)
  • derive/tests/json.pest (1 hunks)
  • derive/tests/json.rs (1 hunks)
  • generator/src/common.rs (1 hunks)
  • generator/src/typed/attr.rs (1 hunks)
  • generator/src/typed/config.rs (1 hunks)
  • generator/src/typed/generator.rs (8 hunks)
  • generator/tests/expected.rs (28 hunks)
  • generator/tests/syntax.pest (1 hunks)
  • meta/src/doc.rs (1 hunks)
  • meta/src/grammar.pest (2 hunks)
  • meta/src/parser.rs (8 hunks)
  • meta/tests/pest3sample.pest (2 hunks)
  • pest/src/choice.rs (1 hunks)
  • pest/src/formatter.rs (1 hunks)
  • pest/src/lib.rs (2 hunks)
  • pest/src/sequence.rs (3 hunks)
  • pest/src/typed/mod.rs (1 hunks)
  • pest/src/typed/template/mod.rs (3 hunks)
  • pest/src/typed/template/repetition.rs (1 hunks)
  • pest/src/typed/traits.rs (6 hunks)
  • pest/src/typed/wrapper.rs (1 hunks)
Additional comments: 35
meta/src/doc.rs (1)
  • 5-6: The change from line_docs to grammar_doc in the DocComment struct shifts the documentation focus to the entire grammar file. This approach can provide a more structured overview but consider the impact on detailed documentation for individual rules.
pest/src/typed/mod.rs (1)
  • 1-1: The added documentation comment provides a clear overview of the module's purpose, enhancing readability and maintainability.
generator/src/typed/config.rs (1)
  • 6-6: The addition of the box_all_rules field introduces a new configuration option, enhancing flexibility in rule handling. Consider documenting its purpose and impact for clarity.
pest/src/lib.rs (1)
  • 20-20: The addition of the std module re-exporting items from the standard library could simplify usage but verify the necessity and impact of this approach to avoid unnecessary dependencies or confusion.
pest/src/typed/wrapper.rs (2)
  • 6-11: The addition of the get_const method to the String trait enhances usability by providing a convenient way to retrieve string constants.
  • 16-18: The enhancements to the Rule trait, including associated types and constants, improve its expressiveness and utility. Consider adding tests or examples to demonstrate these features' usage and benefits.
generator/tests/syntax.pest (1)
  • 1-1: The addition of the compatibility comment likely aids in understanding the test's purpose and context, enhancing clarity.
generator/src/typed/attr.rs (1)
  • 24-27: Handling the "box_all_rules" attribute enhances configurability. Consider a more user-friendly approach for unrecognized attributes, such as detailed error messages, instead of panicking.
generator/src/common.rs (1)
  • 45-55: Simplifying the handling of documentation comments in the generate_rule_enum function likely improves readability and maintainability. Verify that the simplification maintains the documentation's quality and detail.
derive/tests/json.rs (1)
  • 1-57: The tests for the JSON parser cover basic scenarios and use assertions to validate outcomes, enhancing reliability. Consider adding more diverse test cases to cover edge cases and complex JSON structures.
meta/tests/pest3sample.pest (1)
  • 83-89: The documentation updates for rules ~ and ^ provide clear explanations of their usage differences, enhancing understanding for users.
meta/src/grammar.pest (1)
  • 8-20: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [11-113]

Supporting /// and //! comments for documenting grammar rules and the grammar file enhances documentation practices. Ensure compatibility with existing grammar files and documentation practices.

pest/src/typed/template/repetition.rs (1)
  • 40-40: Renaming parse_with to parse_with_partial might clarify the function's purpose. Verify its impact on the codebase and ensure consistency in naming conventions.
pest/src/choice.rs (1)
  • 1-25: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [38-76]

The implementations of TypedNode and PairContainer traits for the generated choice types are correctly done, ensuring that these types can be seamlessly integrated into the parsing system. The TypedNode implementation provides a method for partial parsing, which is essential for the flexibility of the parsing process. The PairContainer implementation ensures that each choice type can correctly handle its child pairs, which is crucial for the traversal and manipulation of the parsed data. These implementations are well-designed and adhere to the expected contracts of the traits they implement.

generator/src/typed/generator.rs (5)
  • 47-56: The introduction of the boxed field in the RuleInfo struct and its determination logic within the new constructor method is a significant change. This logic decides whether a rule should be boxed based on the box_all_rules configuration flag or the rule's reachability. This is a clever way to manage memory and performance by conditionally boxing rules, potentially reducing stack usage for deeply nested or recursive rules. However, it's important to ensure that all downstream code that interacts with RuleInfo instances properly handles the boxed attribute to avoid logic errors or performance regressions.
  • 183-187: The conditional boxing of rule content in the create_rule function based on the boxed flag from RuleInfo is a critical update. This change allows for more flexible memory management, potentially improving performance for large or complex grammars by boxing only certain rule contents. It's essential to verify that this change does not introduce any unintended side effects, such as increased heap allocations for grammars where boxing is not necessary or beneficial. Testing with various grammar sizes and complexities would be advisable to assess the impact of this change.
  • 403-409: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [406-423]

The addition of a reachability parameter to the process_rule function and its utilization within the function to create a RuleInfo instance with reachability analysis is a noteworthy change. This enhancement allows for more sophisticated rule processing based on the reachability of rules within the grammar, potentially optimizing the generated parser code. It's crucial to ensure that the reachability analysis is accurate and that the resulting parser code maintains correctness while benefiting from any optimizations introduced by this change.

  • 436-442: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [439-448]

The process_rules function now performs reachability analysis by calling collect_reachability before processing individual rules. This pre-processing step is essential for enabling optimizations based on rule reachability within the grammar. It's important to verify that this reachability analysis is performed efficiently and does not introduce significant overhead to the code generation process, especially for large grammars with many rules.

  • 520-534: The collect_reachability function introduces a sophisticated mechanism for analyzing rule reachability within the grammar, which is used to determine whether certain rules should be boxed. This function plays a crucial role in optimizing the generated parser code by avoiding infinite size structs and managing memory more effectively. It's important to ensure that this function accurately captures all reachability relationships between rules and that the resulting reachability graph is used correctly throughout the code generation process.
pest/src/typed/template/mod.rs (2)
  • 403-403: The addition of the Default derive to the Empty struct is a good practice for types that can be instantiated with default values, especially for types representing "empty" or "no-op" behavior. This makes it easier to use the Empty struct in generic contexts where a default instance might be required.
  • 415-422: The implementation of the NeverFailedTypedNode trait for the Empty struct with the parse_with_partial method is logically sound. This method essentially returns the input position unchanged along with a new instance of Empty, which aligns with the expected behavior of a type that represents an "always successful" parse operation. This addition enhances the expressiveness of the parsing framework by providing a built-in way to represent operations that cannot fail.
generator/tests/expected.rs (6)
  • 6-6: The documentation comment for the Regular enum variant is clear and concise, providing a straightforward description of its purpose.
  • 8-8: The documentation comment for the Atomic enum variant is similarly clear and concise. It's good practice to keep documentation comments straightforward, aiding in maintainability and readability.
  • 10-10: The documentation comment for the NonAtomic enum variant follows the established pattern of clarity and conciseness. Consistency in documentation style across the codebase is beneficial for overall project maintainability.
  • 119-122: The addition of documentation attributes for the content and span fields in the r#Regular struct enhances the code's readability and maintainability by clearly describing the purpose and use of these fields. This is a good practice, especially in public or widely used structs.
  • 155-158: The documentation for the content and span fields in the r#Atomic struct follows the same positive pattern as seen with the r#Regular struct. Consistent and clear documentation like this is crucial for future code maintenance and understanding.
  • 191-194: The documentation for the content and span fields in the r#NonAtomic struct is clear and maintains consistency with the documentation style used throughout the file. This consistency in documentation style is commendable and should be maintained.
meta/src/parser.rs (8)
  • 118-118: The addition of the doc field to the ParseRule struct is a positive change for supporting documentation comments. This aligns well with the objective of enhancing parsing capabilities and improving documentation practices.
  • 334-334: The introduction of parse_grammar_doc function to parse grammar documentation comments is a valuable addition. It enhances the parser's capability to handle documentation comments, which is crucial for generating more informative and readable code.
  • 373-384: The modification in the parse_rule function to handle parsing of documentation comments is well-implemented. It ensures that documentation comments are correctly associated with their respective rules, improving the maintainability and readability of the parsed rules.
  • 478-489: The implementation of parse_grammar_doc function is concise and correctly extracts the content of grammar documentation comments. However, it's important to ensure that the comment syntax (//!) is consistently used and recognized throughout the project's documentation.
  • 491-502: Similarly, the parse_rule_doc function effectively extracts rule documentation comments. This addition is crucial for maintaining well-documented parsing rules, which can significantly aid in understanding and maintaining the parser's logic.
  • 850-856: The test grammar_test validates the parsing of grammar and rule documentation comments, which is essential for ensuring the new functionality works as expected. It's good practice to include such tests when introducing new parsing capabilities.
  • 1472-1498: The tests for parsing grammar documentation comments are thorough and cover various scenarios, including handling multiple lines of comments. These tests are crucial for ensuring the robustness of the documentation parsing functionality.
  • 1506-1524: The tests for parsing rule documentation comments are also comprehensive and ensure that the parsing functionality works as intended. It's important to maintain such tests to catch any potential issues early in the development process.

@@ -112,8 +112,11 @@ impl<'i> Partition<'i> {

/// Formatter options for [Span](crate::Span).
pub struct FormatOption<SpanFormatter, MarkerFormatter, NumberFormatter> {
/// Formatter for characters in span.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment for span_formatter is clear and concise, effectively explaining its purpose. However, it could be enhanced by specifying the expected behavior or format of the formatter for better clarity.

Consider expanding the comment to include examples or expected outcomes of the formatter's application.

pub span_formatter: SpanFormatter,
/// Formatter for position marker.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment for marker_formatter is straightforward but lacks detail on what constitutes a "position marker."

It would be beneficial to provide examples or further explanation of what markers are used for, enhancing understanding for future contributors.

pub marker_formatter: MarkerFormatter,
/// Formatter for line numbers and column numbers of start and end.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment for number_formatter is succinct but could be improved by detailing the types of numbers it formats (e.g., line and column numbers) and their context within the formatter's usage.

Adding context about where and how the numbers are used within the formatter could provide clearer guidance for developers.

Comment on lines +20 to +21
#[doc = ::core::stringify!(A choice for $type.)]
$variant($type)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation string generated for each variant uses ::core::stringify! to include the type name in the documentation. This is a clever use of Rust's macro system to generate informative documentation dynamically. However, it might be more readable and informative to include both the variant name and the type in the documentation string to clarify the role of each variant in the generated choice type.

- #[doc = ::core::stringify!(A choice for $type.)]
+ #[doc = concat!("`", stringify!($variant), "`: A choice of type `", stringify!($type), "`.")]

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
#[doc = ::core::stringify!(A choice for $type.)]
$variant($type)
#[doc = concat!("`", stringify!($variant), "`: A choice of type `", stringify!($type), "`.")]
$variant($type)

Comment on lines 1 to 25
//! A [macro](crate::choice_type) to generate choice type
//! and generated choice types that has 2 to 15 variants,
//! which are named `Choice{i}`.

// TODO: peek to decide matched choice when possible.
/// Generate choice type.
///
/// ## Example
///
/// ```rust
/// pest::choice_type!(Choice2, (Choice0, T0), (Choice1, T1));
/// ```
#[macro_export]
macro_rules! choice_type {
( $name:ident, $( ( $variant:ident, $type:ident ) ),* $(,)? ) => {
/// Choice type generated by `pest::choice_type!`.
#[derive(Clone, Debug, Eq, Hash, PartialEq)]
pub enum $name<$( $type ),*> {
$( $variant($type) ),*
$(
#[doc = ::core::stringify!(A choice for $type.)]
$variant($type)
),*
}
impl<'i, R, $($type, )*>
$crate::typed::TypedNode<'i, R> for $name<$($type, )*>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 NOTE
This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [38-63]

The parsing logic within try_parse_with_partial is well-implemented, using snapshots to efficiently backtrack on failed parsing attempts. This approach allows for a clean and efficient way to try each choice in sequence until one succeeds or all fail. The use of ::core::option::Option::Some and ::core::option::Option::None is consistent and clear, ensuring that the function's return type is correctly handled. The logic here is solid, but adding comments to explain the parsing strategy and the role of snapshots could improve maintainability and readability for future contributors.

+ // Attempt to parse each choice in sequence, using stack snapshots for efficient backtracking.
  $(
      stack.snapshot();
      match $type::try_parse_with_partial(input, stack, tracker) {
          ::core::option::Option::Some((input, choice)) => {
              stack.clear_snapshot();
              let res = Self::$variant(choice);
              return ::core::option::Option::Some((input, res))
          }
          ::core::option::Option::None => {
              stack.restore();
          }
      }
  )*

Comment on lines 105 to 109
/// A pair that can be converted to a pair tree.
pub trait PairTree<R: RuleType>: PairContainer<R> + wrapper::Rule<R> {
/// Get pair span.
fn get_span(&self) -> (usize, usize);
/// Convert `Self` to a pair tree.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PairTree trait extends PairContainer with additional functionality specific to pair trees, such as getting the span of a pair and converting it into a pair tree. This trait is well-designed, providing essential operations for working with pair trees in a typed context. The method as_pair_tree is particularly noteworthy for its ability to convert a complex structure into a simple, navigable pair tree. Ensuring that the documentation for this trait and its methods is clear and comprehensive will be beneficial for maintainability and usability.

+ /// `PairTree` extends `PairContainer` with methods specific to pair trees, enabling operations like retrieving spans and converting to pair trees.

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
/// A pair that can be converted to a pair tree.
pub trait PairTree<R: RuleType>: PairContainer<R> + wrapper::Rule<R> {
/// Get pair span.
fn get_span(&self) -> (usize, usize);
/// Convert `Self` to a pair tree.
/// A pair that can be converted to a pair tree.
/// `PairTree` extends `PairContainer` with methods specific to pair trees, enabling operations like retrieving spans and converting to pair trees.
pub trait PairTree<R: RuleType>: PairContainer<R> + wrapper::Rule<R> {
/// Get pair span.
fn get_span(&self) -> (usize, usize);
/// Convert `Self` to a pair tree.

Comment on lines +14 to +36
( $name:ident, $( ( $field:ident, $type:ident, $trivia:ident ) ),* $(,)? ) => {
/// Sequence type generated by `pest::sequence_type!`.
///
/// - `0`: No trivia.
/// - `1`: Optional trivia.
/// - `2`: Mandatory trivia.
#[derive(Clone, Debug, Eq, Hash, PartialEq)]
pub struct $name<$( $type, const $trivia: ::core::primitive::u8, )*> {
$( $variant: $type ),*
$(
#[doc = ::core::stringify!(Field for $type.)]
pub $field: $type,
)*
}
impl<'i, $($type, const $trivia: ::core::primitive::u8, )*>
$name<$($type, $trivia, )*> {
/// As a tuple of references.
pub fn as_tuple(&self) -> ($(&$type, )*) {
( $( &self.$field, )* )
}
/// Into a tuple of values.
pub fn into_tuple(self) -> ($($type, )*) {
( $( self.$field, )* )
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The macro for generating sequence types is well-designed, providing a clear and flexible way to define sequence types with varying numbers of fields and trivia handling. The use of const $trivia: ::core::primitive::u8 as a way to specify trivia handling for each field is a clever approach that allows for compile-time configuration of parsing behavior. However, it would be beneficial to expand the documentation to explain the trivia handling mechanism in more detail, including how the 0, 1, and 2 values affect parsing.

+ /// The `trivia` parameter for each field controls how whitespace and comments are handled:
+ /// - `0`: No trivia is considered.
+ /// - `1`: Optional trivia is allowed.
+ /// - `2`: Mandatory trivia must be present.

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
( $name:ident, $( ( $field:ident, $type:ident, $trivia:ident ) ),* $(,)? ) => {
/// Sequence type generated by `pest::sequence_type!`.
///
/// - `0`: No trivia.
/// - `1`: Optional trivia.
/// - `2`: Mandatory trivia.
#[derive(Clone, Debug, Eq, Hash, PartialEq)]
pub struct $name<$( $type, const $trivia: ::core::primitive::u8, )*> {
$( $variant: $type ),*
$(
#[doc = ::core::stringify!(Field for $type.)]
pub $field: $type,
)*
}
impl<'i, $($type, const $trivia: ::core::primitive::u8, )*>
$name<$($type, $trivia, )*> {
/// As a tuple of references.
pub fn as_tuple(&self) -> ($(&$type, )*) {
( $( &self.$field, )* )
}
/// Into a tuple of values.
pub fn into_tuple(self) -> ($($type, )*) {
( $( self.$field, )* )
}
( $name:ident, $( ( $field:ident, $type:ident, $trivia:ident ) ),* $(,)? ) => {
/// Sequence type generated by `pest::sequence_type!`.
///
/// - `0`: No trivia.
/// - `1`: Optional trivia.
/// - `2`: Mandatory trivia.
/// The `trivia` parameter for each field controls how whitespace and comments are handled:
/// - `0`: No trivia is considered.
/// - `1`: Optional trivia is allowed.
/// - `2`: Mandatory trivia must be present.
#[derive(Clone, Debug, Eq, Hash, PartialEq)]
pub struct $name<$( $type, const $trivia: ::core::primitive::u8, )*> {
$(
#[doc = ::core::stringify!(Field for $type.)]
pub $field: $type,
)*
}
impl<'i, $($type, const $trivia: ::core::primitive::u8, )*>
$name<$($type, $trivia, )*> {
/// As a tuple of references.
pub fn as_tuple(&self) -> ($(&$type, )*) {
( $( &self.$field, )* )
}
/// Into a tuple of values.
pub fn into_tuple(self) -> ($($type, )*) {
( $( self.$field, )* )
}

Comment on lines 55 to 65
if i > 1 {
input = $crate::typed::template::try_handle_trivia::<R, $trivia>(input, stack, tracker)?;
}
let (next, $variant) = $type::try_parse_with_partial(input, stack, tracker)?;
let (next, $field) = $type::try_parse_with_partial(input, stack, tracker)?;
input = next;
)*
let res = Self {
$($variant, )*
$($field, )*
};
::core::option::Option::Some((input, res))
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [57-76]

The implementation of try_parse_with_partial within the macro uses a loop to parse each field in sequence, handling trivia according to the specified trivia parameter for each field. This approach is efficient and flexible, allowing for precise control over the parsing process. However, adding comments to explain the role of the trivia parameter in controlling the parsing flow and the importance of the order in which fields are parsed could improve the readability and maintainability of this code.

+ // Parse each field in sequence, handling trivia according to the specified `trivia` parameter.
+ // The order of fields is important for correctly parsing the sequence.
  $(
      i += 1;
      if i > 1 {
          input = $crate::typed::template::try_handle_trivia::<R, $trivia>(input, stack, tracker)?;
      }
      let (next, $field) = $type::try_parse_with_partial(input, stack, tracker)?;
      input = next;
  )*

@TheVeryDarkness
Copy link
Contributor

Sorry, I'll switch to that branch later.

@TheVeryDarkness
Copy link
Contributor

@tomtau I've cherry-picked all commits in my hand. But module system hasn't been finished yet, so please merge this later.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 17

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 9cfbba9 and 9516746.
Files ignored due to path filters (1)
  • generator/Cargo.toml is excluded by: !**/*.toml
Files selected for processing (21)
  • derive/tests/grammar.pest (1 hunks)
  • derive/tests/sample.rs (1 hunks)
  • generator/src/typed/attr.rs (1 hunks)
  • generator/src/typed/generator.rs (13 hunks)
  • generator/src/typed/module.rs (4 hunks)
  • generator/src/typed/output.rs (1 hunks)
  • generator/tests/.gitignore (1 hunks)
  • generator/tests/expected.rs (30 hunks)
  • generator/tests/generator.rs (2 hunks)
  • generator/tests/syntax.pest (2 hunks)
  • pest/src/choice.rs (2 hunks)
  • pest/src/formatter.rs (2 hunks)
  • pest/src/lib.rs (2 hunks)
  • pest/src/sequence.rs (3 hunks)
  • pest/src/typed/mod.rs (1 hunks)
  • pest/src/typed/template/mod.rs (27 hunks)
  • pest/src/typed/template/repetition.rs (2 hunks)
  • pest/src/typed/tracker.rs (9 hunks)
  • pest/src/typed/traits.rs (8 hunks)
  • pest/src/typed/wrapper.rs (1 hunks)
  • pest/tests/predefined_node.rs (4 hunks)
Files not summarized due to errors (1)
  • generator/tests/expected.rs: Error: Message exceeds token limit
Files skipped from review due to trivial changes (1)
  • generator/tests/.gitignore
Files skipped from review as they are similar to previous changes (7)
  • generator/src/typed/attr.rs
  • generator/tests/syntax.pest
  • pest/src/choice.rs
  • pest/src/formatter.rs
  • pest/src/typed/mod.rs
  • pest/src/typed/template/mod.rs
  • pest/src/typed/wrapper.rs
Additional comments: 25
derive/tests/sample.rs (2)
  • 10-15: The test function comment correctly tests partial and complete parsing of a string with a comment. It's good practice to include comments explaining the purpose of each test case for better maintainability.
  • 17-23: The test function skip_on_two_end effectively tests parsing behavior with trailing spaces. It's well-structured and follows best practices for testing edge cases.
generator/tests/generator.rs (2)
  • 22-57: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [10-37]

The test function generated correctly generates code, writes it to a file, formats it using rustfmt, and compares it with an expected output. It's a good practice to handle the output of external commands carefully and provide clear error messages, as done here.

  • 22-57: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [39-58]

The test function generated_sample follows the same pattern as generated for testing code generation with a different input. It's well-implemented and follows best practices for testing and error handling.

pest/src/lib.rs (1)
  • 30-68: The macro unicode for generating unicode property types is correctly defined and implemented. It follows Rust's conventions for macros, including documentation and trait implementations. This macro enhances the library's functionality by providing a structured way to handle unicode properties.
derive/tests/grammar.pest (1)
  • 39-49: The updated stack manipulation operations using pest::stack::push, pest::stack::pop, and related functions are correctly implemented within the grammar definitions. These changes improve clarity and maintainability by using a more structured approach to stack manipulation.
pest/tests/predefined_node.rs (2)
  • 16-20: Making the Rule enum public enhances the flexibility and usability of the library by allowing external modules to reference these rules directly.
  • 56-69: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [31-66]

The addition of the unicode!(MATH) declaration and the implementation of the check_with_partial method for the StrFoo struct are correctly done. These changes expand the functionality and improve the library's capability to handle unicode properties and parsing checks.

pest/src/typed/template/repetition.rs (1)
  • 99-132: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [40-129]

The renaming of parse_with to parse_with_partial and the addition of the check_with_partial function are correctly implemented. These changes improve the clarity and consistency of the API, making it more explicit when partial parsing checks are performed.

generator/src/typed/output.rs (1)
  • 168-170: Introducing aliases soi and eoi for SOI and EOI improves readability and usability by providing shorter and more intuitive names for these commonly used entities. This change is a good practice for enhancing the developer experience.
pest/src/sequence.rs (2)
  • 1-3: The file header provides a clear introduction to the purpose of the macro. It's concise and informative.
  • 93-93: The method for_each_child_pair in the PairContainer trait implementation is a good addition for handling child pairs. It's well-implemented and follows the principle of least surprise.
pest/src/typed/traits.rs (1)
  • 20-25: The method try_parse_with_partial in the TypedNode trait is a significant addition, allowing for partial parsing of input. Ensure that implementations of this method handle edge cases and partial inputs correctly to avoid parsing errors.
generator/src/typed/module.rs (1)
  • 18-21: The addition of the Rule { argc: usize } variant to the RuleGenerics enum introduces a way to specify rules with a specific argument count. This is a positive change for flexibility but ensure that the handling of argc is consistent and well-documented across the codebase.
pest/src/typed/tracker.rs (2)
  • 172-174: The refinement of the record_option_during_with function signature simplifies the definition and ensures consistency in parameter types. This change enhances readability and maintainability of the code.
  • 207-213: The addition of the record_empty_during function is a thoughtful enhancement that provides a clear and concise way to handle cases where no result is expected from the provided function. This addition improves the expressiveness of the API.
generator/src/typed/generator.rs (1)
  • 179-183: The conditional logic for determining the content_type based on the boxed flag is clear and concise. However, it's essential to ensure that boxing is used judiciously to avoid unnecessary heap allocations, which can impact performance.

This change effectively allows for flexibility in handling rule content, enabling optimizations based on the specific needs of the grammar being processed.

generator/tests/expected.rs (8)
  • 1-7: The use of #[allow] attributes at the beginning of the Rule enum is appropriate for suppressing compiler warnings that are expected in this context. This approach helps maintain clarity and focus on the essential parts of the code.
  • 11-15: The documentation comments added to specific rules within the Rule enum (Regular, Atomic, NonAtomic) enhance the readability and maintainability of the code by clearly describing the purpose and usage of each rule. This is a good practice, especially in a parsing context where understanding the role of each rule is crucial.
  • 48-51: The definition of OptionalTrivia and MandatoryTrivia types in the trivia module using Rep and RepOnce generics is a concise and clear way to express the concept of optional and mandatory repetitions in the parsing context. This approach leverages Rust's type system to enforce parsing rules at the type level, enhancing code safety and readability.
  • 126-129: The r#Regular struct is well-documented, and its fields are clearly defined, aligning with the grammar specified in the documentation comment. However, it's important to ensure that the use of raw string literals (e.g., r#Regular) for naming is consistent and justified across the codebase to avoid confusion.
Verification successful

The verification process has confirmed the consistent use of raw string literals for struct names within the generator/tests/expected.rs file, as demonstrated by the multiple instances of struct declarations using the r# prefix. This consistency suggests that the naming convention is deliberate and aligns with the project's standards or requirements for naming structs in this context.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Verify the consistent use of raw string literals for struct names across the codebase.
rg 'pub struct r#'

Length of output: 1610

* 169-179: The `r#Atomic` struct's implementation and documentation clearly articulate its purpose and structure, following the grammar specified. The use of `Sequence3` to compose the rule's structure demonstrates effective use of generics to model complex parsing patterns. This pattern of defining parsing rules should be consistently applied throughout the codebase for maintainability. * 233-241: The `r#NonAtomic` struct is another example of clear documentation and struct definition aligning with the specified grammar. The use of `Sequence2` here simplifies the representation of a sequence of parsing operations, showcasing the flexibility of the generics used in this parsing framework. * 291-294: The `r#ExactString` struct is correctly documented and defined. It represents a specific parsing rule for an exact string match. This kind of explicit documentation and struct definition for each parsing rule significantly aids in understanding the parser's behavior and debugging potential issues. * 1511-1515: The `generics` module re-exports from the `pest` crate, providing a centralized place for generic parsing constructs used throughout the file. This approach enhances modularity and reusability of parsing components. It's crucial to ensure that these re-exports are used consistently across the project to leverage their benefits fully.

Comment on lines +39 to +70
pub fn call(
&self,
name: &str,
rule_ref: &RuleRef<'_>,
args: Option<&ProcessedPathArgs>,
root: &TokenStream,
) -> TokenStream {
let generics = generics();
let name = format_ident!("r#{}", name);
match self {
Self::Rule => {
assert!(
rule_ref.args.is_none(),
"Unexpected arguments in `{}`.",
rule_ref,
);
let (path, name) = getter();
quote! { #root::rules::#( #path:: )* #name :: <'i> }
Self::Rule { argc } => match (argc, args) {
(0, None) => {
quote! { #root::rules::#name::<'i> }
}
(0, Some(_args)) => {
panic!("Unexpected arguments in `{}`.", rule_ref,);
}
(argc, None) => {
panic!("Expect {argc} arguments in `{rule_ref}`.");
}
(_argc, Some(ProcessedPathArgs::Slice(slice))) => {
panic!("Unexpected slice {slice} in `{rule_ref}`.");
}
(argc, Some(ProcessedPathArgs::Call(args))) => {
assert_eq!(
*argc,
args.len(),
"Argument count not matched in `{}`.",
rule_ref
);
quote! { #root::rules::#name::<'i, #(#args, )*> }
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call method has been significantly updated to handle different scenarios based on the rule type. It's crucial to ensure that the panic messages are clear and helpful for debugging. Additionally, the use of assert_eq! for argument count validation is appropriate, but consider adding more detailed error messages to assist in troubleshooting.

- panic!("Unexpected arguments in `{}`.", rule_ref,);
+ panic!("Unexpected arguments in `{rule_ref}`. No arguments expected for this rule type.");
- panic!("Expect {argc} arguments in `{rule_ref}`.");
+ panic!("Expected {argc} arguments in `{rule_ref}`, but none were provided.");
- panic!("Unexpected slice {slice} in `{rule_ref}`.");
+ panic!("Unexpected slice argument {slice} in `{rule_ref}`. Slice arguments are not supported for this rule type.");
- "Argument count not matched in `{}`.",
+ "Expected {argc} arguments, but received {} in `{rule_ref}`.", args.len()

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
pub fn call(
&self,
name: &str,
rule_ref: &RuleRef<'_>,
args: Option<&ProcessedPathArgs>,
root: &TokenStream,
) -> TokenStream {
let generics = generics();
let name = format_ident!("r#{}", name);
match self {
Self::Rule => {
assert!(
rule_ref.args.is_none(),
"Unexpected arguments in `{}`.",
rule_ref,
);
let (path, name) = getter();
quote! { #root::rules::#( #path:: )* #name :: <'i> }
Self::Rule { argc } => match (argc, args) {
(0, None) => {
quote! { #root::rules::#name::<'i> }
}
(0, Some(_args)) => {
panic!("Unexpected arguments in `{}`.", rule_ref,);
}
(argc, None) => {
panic!("Expect {argc} arguments in `{rule_ref}`.");
}
(_argc, Some(ProcessedPathArgs::Slice(slice))) => {
panic!("Unexpected slice {slice} in `{rule_ref}`.");
}
(argc, Some(ProcessedPathArgs::Call(args))) => {
assert_eq!(
*argc,
args.len(),
"Argument count not matched in `{}`.",
rule_ref
);
quote! { #root::rules::#name::<'i, #(#args, )*> }
}
(0, Some(_args)) => {
panic!("Unexpected arguments in `{rule_ref}`. No arguments expected for this rule type.");
}
(argc, None) => {
panic!("Expected {argc} arguments in `{rule_ref}`, but none were provided.");
}
(_argc, Some(ProcessedPathArgs::Slice(slice))) => {
panic!("Unexpected slice argument {slice} in `{rule_ref}`. Slice arguments are not supported for this rule type.");
}
(argc, Some(ProcessedPathArgs::Call(args))) => {
assert_eq!(
*argc,
args.len(),
"Expected {argc} arguments, but received {} in `{rule_ref}`.", args.len()
);
quote! { #root::rules::#name::<'i, #(#args, )*> }
}

Comment on lines +47 to +56
fn new(
rule: &'g ParseRule,
config: Config,
reachability: &BTreeMap<&str, BTreeSet<&str>>,
) -> Self {
let rule_name = rule.name.as_str();
let boxed = true;
let boxed = match config.box_all_rules {
true => true,
false => !reachability.contains_key(rule_name),
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constructor for RuleInfo now includes logic to determine the boxed field based on the box_all_rules configuration flag and the presence of the rule name in the reachability map. This is a critical change as it affects how rules are processed and generated, potentially impacting the performance and memory usage of the generated parser.

Consider adding documentation comments to explain the significance of the boxed field and the conditions under which a rule is boxed. This will improve maintainability and understandability of the code.

+ /// Determines whether the rule should be boxed based on configuration and reachability.

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
fn new(
rule: &'g ParseRule,
config: Config,
reachability: &BTreeMap<&str, BTreeSet<&str>>,
) -> Self {
let rule_name = rule.name.as_str();
let boxed = true;
let boxed = match config.box_all_rules {
true => true,
false => !reachability.contains_key(rule_name),
};
fn new(
rule: &'g ParseRule,
config: Config,
reachability: &BTreeMap<&str, BTreeSet<&str>>,
) -> Self {
/// Determines whether the rule should be boxed based on configuration and reachability.
let rule_name = rule.name.as_str();
let boxed = match config.box_all_rules {
true => true,
false => !reachability.contains_key(rule_name),
};

Comment on lines +114 to +162
#[derive(Clone, Debug)]
enum ModuleNode<'g> {
// MutableCollection(RefCell<HashMap<&'g str, Rc<ModuleNode<'g>>>>),
Collection(HashMap<&'g str, Rc<ModuleNode<'g>>>),
Map {
bound: HashSet<String>,
mapper: fn(&'g str) -> TokenStream,
},
Generics(RuleGenerics),
Imported(Rc<Self>),
}

impl<'g> ModuleNode<'g> {
/// [Err] here is actually not an error.
fn child(&self, node: &'g str) -> &Self {
match self {
Self::Collection(map) => map.get(node).unwrap_or_else(|| {
panic!(
"Node {} can't be resolved. Available nodes are {:?}.",
node,
map.keys()
)
}),
Self::Map { bound, mapper } => {
assert!(bound.contains(node), "{node} is out of bound.");
panic!("No child found.")
}
Self::Generics(_generics) => panic!("No child found."),
Self::Imported(rc) => rc.child(node),
}
}
fn get(
&self,
root: &TokenStream,
node: &'g str,
rule_ref: &RuleRef<'g>,
args: &Option<ProcessedPathArgs>,
) -> TokenStream {
match self {
Self::Collection(map) => panic!("Collection can't be called."),
Self::Map { bound, mapper } => {
assert!(args.is_none(), "Unexpected arguments.");
mapper(node)
}
Self::Generics(generics) => generics.call(node, rule_ref, args.as_ref(), root),
Self::Imported(rc) => rc.get(root, node, rule_ref, args),
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updates to the ModuleNode enum introduce new functionalities like handling collections, maps, and generics. Ensure that the panic messages in the child method are informative and guide the user towards resolving the issue. Additionally, the handling of generics in the get method appears to be well-implemented, but thorough testing is recommended to ensure that all edge cases are covered.

- panic!("Node {} can't be resolved. Available nodes are {:?}.", node, map.keys());
+ panic!("Node '{node}' can't be resolved. Available nodes are: {:?}", map.keys().collect::<Vec<_>>());
- panic!("No child found.");
+ panic!("No child found for the specified node. Ensure that the node name is correct and that it has been properly defined.");

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
#[derive(Clone, Debug)]
enum ModuleNode<'g> {
// MutableCollection(RefCell<HashMap<&'g str, Rc<ModuleNode<'g>>>>),
Collection(HashMap<&'g str, Rc<ModuleNode<'g>>>),
Map {
bound: HashSet<String>,
mapper: fn(&'g str) -> TokenStream,
},
Generics(RuleGenerics),
Imported(Rc<Self>),
}
impl<'g> ModuleNode<'g> {
/// [Err] here is actually not an error.
fn child(&self, node: &'g str) -> &Self {
match self {
Self::Collection(map) => map.get(node).unwrap_or_else(|| {
panic!(
"Node {} can't be resolved. Available nodes are {:?}.",
node,
map.keys()
)
}),
Self::Map { bound, mapper } => {
assert!(bound.contains(node), "{node} is out of bound.");
panic!("No child found.")
}
Self::Generics(_generics) => panic!("No child found."),
Self::Imported(rc) => rc.child(node),
}
}
fn get(
&self,
root: &TokenStream,
node: &'g str,
rule_ref: &RuleRef<'g>,
args: &Option<ProcessedPathArgs>,
) -> TokenStream {
match self {
Self::Collection(map) => panic!("Collection can't be called."),
Self::Map { bound, mapper } => {
assert!(args.is_none(), "Unexpected arguments.");
mapper(node)
}
Self::Generics(generics) => generics.call(node, rule_ref, args.as_ref(), root),
Self::Imported(rc) => rc.get(root, node, rule_ref, args),
}
}
}
#[derive(Clone, Debug)]
enum ModuleNode<'g> {
// MutableCollection(RefCell<HashMap<&'g str, Rc<ModuleNode<'g>>>>),
Collection(HashMap<&'g str, Rc<ModuleNode<'g>>>),
Map {
bound: HashSet<String>,
mapper: fn(&'g str) -> TokenStream,
},
Generics(RuleGenerics),
Imported(Rc<Self>),
}
impl<'g> ModuleNode<'g> {
/// [Err] here is actually not an error.
fn child(&self, node: &'g str) -> &Self {
match self {
Self::Collection(map) => map.get(node).unwrap_or_else(|| {
panic!(
"Node '{node}' can't be resolved. Available nodes are: {:?}.",
map.keys().collect::<Vec<_>>()
)
}),
Self::Map { bound, mapper } => {
assert!(bound.contains(node), "{node} is out of bound.");
panic!("No child found for the specified node. Ensure that the node name is correct and that it has been properly defined.")
}
Self::Generics(_generics) => panic!("No child found for the specified node. Ensure that the node name is correct and that it has been properly defined."),
Self::Imported(rc) => rc.child(node),
}
}
fn get(
&self,
root: &TokenStream,
node: &'g str,
rule_ref: &RuleRef<'g>,
args: &Option<ProcessedPathArgs>,
) -> TokenStream {
match self {
Self::Collection(map) => panic!("Collection can't be called."),
Self::Map { bound, mapper } => {
assert!(args.is_none(), "Unexpected arguments.");
mapper(node)
}
Self::Generics(generics) => generics.call(node, rule_ref, args.as_ref(), root),
Self::Imported(rc) => rc.get(root, node, rule_ref, args),
}
}
}

Comment on lines 408 to 416
"~" => res.add_option_trivia(inter.typename),
"^" => res.add_mandatory_trivia(inter.typename),
_ => {
let rule_info = RuleInfo::from(rule);
let rule_info = RuleInfo::new(rule, config, reachability);
res.insert_rule_struct(create_rule(
config,
&rule_config,
&rule_info,
inter.typename,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [394-413]

The addition of the reachability parameter to the process_rule function is a significant change that introduces the concept of reachability analysis into rule processing. This analysis can help optimize the generated parser by identifying which rules need to be boxed to prevent infinite recursion or excessively large structs.

It would be beneficial to add more detailed comments explaining how reachability analysis is performed and its impact on the generated parser. This will aid future maintainability and provide valuable context for developers working with this code.

+ /// Processes a single rule, taking into account reachability analysis to optimize boxing of rules.

Comment on lines +510 to +528
/// Wrap some nodes in [std::boxed::Box] to avoid infinite size struct,
/// which can break the edges in the reference graph,
/// and then collect reachability.
///
/// Rules that are not in map keys are wrapped.
///
/// We won't promise anything on which nodes are boxed.
fn collect_reachability(rules: &[ParseRule]) -> BTreeMap<&str, BTreeSet<&str>> {
let mut res = BTreeMap::new();
let rule_trivia = rules.iter().find(|rule| rule.name == "~");
for rule in rules {
let entry = res.entry(rule.name.as_str()).or_default();
collect_used_rule(rule, rule_trivia, entry);
}
// Length of any path is no more than `rules.len()`.
for _ in 0..rules.len() {
// Before the `i`-th iteration,
// `res[a]` contains all nodes that can be reached from `a`
// in no more than `i+1` steps.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The collect_reachability function is a new addition that performs reachability analysis to determine which rules should be boxed. This function is crucial for optimizing the generated parser and preventing issues with infinite recursion or excessively large structs.

While the function's implementation appears to be correct, adding more comprehensive comments explaining the algorithm and its purpose would greatly enhance the code's readability and maintainability.

+ /// Performs reachability analysis to identify rules that should be boxed to optimize the generated parser.
+ /// This analysis helps prevent infinite recursion and manage struct sizes by determining the reachability of rules.

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
/// Wrap some nodes in [std::boxed::Box] to avoid infinite size struct,
/// which can break the edges in the reference graph,
/// and then collect reachability.
///
/// Rules that are not in map keys are wrapped.
///
/// We won't promise anything on which nodes are boxed.
fn collect_reachability(rules: &[ParseRule]) -> BTreeMap<&str, BTreeSet<&str>> {
let mut res = BTreeMap::new();
let rule_trivia = rules.iter().find(|rule| rule.name == "~");
for rule in rules {
let entry = res.entry(rule.name.as_str()).or_default();
collect_used_rule(rule, rule_trivia, entry);
}
// Length of any path is no more than `rules.len()`.
for _ in 0..rules.len() {
// Before the `i`-th iteration,
// `res[a]` contains all nodes that can be reached from `a`
// in no more than `i+1` steps.
/// Wrap some nodes in [std::boxed::Box] to avoid infinite size struct,
/// which can break the edges in the reference graph,
/// and then collect reachability.
///
/// Rules that are not in map keys are wrapped.
///
/// We won't promise anything on which nodes are boxed.
/// Performs reachability analysis to identify rules that should be boxed to optimize the generated parser.
/// This analysis helps prevent infinite recursion and manage struct sizes by determining the reachability of rules.
fn collect_reachability(rules: &[ParseRule]) -> BTreeMap<&str, BTreeSet<&str>> {
let mut res = BTreeMap::new();
let rule_trivia = rules.iter().find(|rule| rule.name == "~");
for rule in rules {
let entry = res.entry(rule.name.as_str()).or_default();
collect_used_rule(rule, rule_trivia, entry);
}
// Length of any path is no more than `rules.len()`.
for _ in 0..rules.len() {
// Before the `i`-th iteration,
// `res[a]` contains all nodes that can be reached from `a`
// in no more than `i+1` steps.

Comment on lines +58 to +63
/// Check whether the some leading part of the result can be accepted.
fn check_with_partial(
input: Position<'i>,
stack: &mut Stack<Span<'i>>,
tracker: &mut Tracker<'i, R>,
) -> Option<(Position<'i>, Self)>;
/// Try parse given string into a node.
) -> Option<Position<'i>>;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check_with_partial method adds the ability to check if a leading part of the input can be parsed without actually performing the parsing. This could be useful for lookahead operations. Documenting specific use cases and examples would enhance understanding.

+ /// This method checks if a leading part of the input can be parsed according to the node's rules.
+ /// It's useful for lookahead operations and determining if parsing should proceed.

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
/// Check whether the some leading part of the result can be accepted.
fn check_with_partial(
input: Position<'i>,
stack: &mut Stack<Span<'i>>,
tracker: &mut Tracker<'i, R>,
) -> Option<(Position<'i>, Self)>;
/// Try parse given string into a node.
) -> Option<Position<'i>>;
/// Check whether the some leading part of the result can be accepted.
/// This method checks if a leading part of the input can be parsed according to the node's rules.
/// It's useful for lookahead operations and determining if parsing should proceed.
fn check_with_partial(
input: Position<'i>,
stack: &mut Stack<Span<'i>>,
tracker: &mut Tracker<'i, R>,
) -> Option<Position<'i>>;

Comment on lines +110 to +118
/// Struct for rules with full capacity.
pub trait FullRuleStruct<'i>: wrapper::Rule {
/// Wrapped inner type.
type Inner: TypedNode<'i, Self::Rule>;
/// Wrapped content type.
type Content: From<Self::Inner>;
/// Create from span and content.
fn new(content: Self::Content, span: Span<'i>) -> Self;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FullRuleStruct trait and its method new provide a structured way to create rule structs with full capacity. It's well-defined, but including examples of implementing and using this trait would be beneficial for developers.

+ /// Implement this trait for rule structs that encapsulate a complete rule with its content.
+ /// Example implementation:
+ /// ```
+ /// impl FullRuleStruct for MyRule {
+ ///     type Inner = MyInnerNode;
+ ///     type Content = MyContent;
+ ///     fn new(content: Self::Content, span: Span) -> Self {
+ ///         MyRule { content, span }
+ ///     }
+ /// }
+ /// ```

Comment on lines +144 to +152
/// A container of pairs.
pub trait PairContainer<R> {
/// for `Self` as a pair if it is, otherwise for each child pair.
fn for_self_or_for_each_child_pair(&self, f: &mut impl FnMut(Pair<R>)) {
self.for_each_child_pair(f)
}
/// For each child pair.
fn for_each_child_pair(&self, f: &mut impl FnMut(Pair<R>));
/// Convert children pairs into a [Vec].
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PairContainer trait's methods for_self_or_for_each_child_pair and vec_children_pairs are well-designed for handling pairs and their children. Expanding the documentation to explain the purpose and usage of each method would help future developers.

+ /// This trait is used to abstract over entities that can contain pairs, allowing for operations on those pairs or their children.
+ /// It provides methods to iterate over child pairs and to collect them into a vector.

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
/// A container of pairs.
pub trait PairContainer<R> {
/// for `Self` as a pair if it is, otherwise for each child pair.
fn for_self_or_for_each_child_pair(&self, f: &mut impl FnMut(Pair<R>)) {
self.for_each_child_pair(f)
}
/// For each child pair.
fn for_each_child_pair(&self, f: &mut impl FnMut(Pair<R>));
/// Convert children pairs into a [Vec].
/// A container of pairs.
/// This trait is used to abstract over entities that can contain pairs, allowing for operations on those pairs or their children.
/// It provides methods to iterate over child pairs and to collect them into a vector.
pub trait PairContainer<R> {
/// for `Self` as a pair if it is, otherwise for each child pair.
fn for_self_or_for_each_child_pair(&self, f: &mut impl FnMut(Pair<R>)) {
self.for_each_child_pair(f)
}
/// For each child pair.
fn for_each_child_pair(&self, f: &mut impl FnMut(Pair<R>));
/// Convert children pairs into a [Vec].

Comment on lines 249 to 260
) -> Option<(Position<'i>, T)> {
T::try_parse_with_partial(input, stack, tracker)
}
/// See [TypedNode::try_parse].
#[inline]
fn try_parse<'i, T: TypedNode<'i, R>>(input: &'i str) -> Result<T, Error<R>> {
T::try_parse(input)
}
/// See [TypedNode::try_parse_partial].
#[inline]
fn try_parse_partial<'i, T: TypedNode<'i, R>>(
input: &'i str,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [232-257]

The TypedParser trait introduces static methods for parsing that delegate to the corresponding methods in the TypedNode trait. Documenting the relationship between TypedParser and TypedNode would clarify the design intention and usage patterns.

+ /// `TypedParser` provides static methods for parsing that are designed to work with any type that implements `TypedNode`.
+ /// This trait serves as a utility for simplifying the invocation of parsing operations.

Comment on lines +271 to +279
/// Parse leading part of given input into a typed node.
fn parse_with_partial(input: Position<'i>, stack: &mut Stack<Span<'i>>)
-> (Position<'i>, Self);
/// Parse leading part of given input into a typed node.
fn parse_partial(input: &'i str) -> (Position<'i>, Self) {
let input = Position::from_start(input);
let mut stack = Stack::new();
Self::parse_with_partial(input, &mut stack)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The modifications to the NeverFailedTypedNode trait, specifically the addition of methods for parsing with partial input, enhance the trait's utility by supporting more flexible parsing scenarios. Adding examples or more detailed documentation on how to implement and use these methods could improve usability.

+ /// These methods extend `NeverFailedTypedNode` with the ability to parse partial input, offering more flexibility in parsing operations.
+ /// Implementers should ensure that parsing can proceed without failure, adhering to the trait's guarantees.

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
/// Parse leading part of given input into a typed node.
fn parse_with_partial(input: Position<'i>, stack: &mut Stack<Span<'i>>)
-> (Position<'i>, Self);
/// Parse leading part of given input into a typed node.
fn parse_partial(input: &'i str) -> (Position<'i>, Self) {
let input = Position::from_start(input);
let mut stack = Stack::new();
Self::parse_with_partial(input, &mut stack)
}
/// Parse leading part of given input into a typed node.
/// These methods extend `NeverFailedTypedNode` with the ability to parse partial input, offering more flexibility in parsing operations.
/// Implementers should ensure that parsing can proceed without failure, adhering to the trait's guarantees.
fn parse_with_partial(input: Position<'i>, stack: &mut Stack<Span<'i>>)
-> (Position<'i>, Self);
/// Parse leading part of given input into a typed node.
fn parse_partial(input: &'i str) -> (Position<'i>, Self) {
let input = Position::from_start(input);
let mut stack = Stack::new();
Self::parse_with_partial(input, &mut stack)
}

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 9516746 and 8dc3206.
Files selected for processing (2)
  • pest/src/typed/template/repetition.rs (3 hunks)
  • pest/src/typed/traits.rs (7 hunks)
Files skipped from review as they are similar to previous changes (1)
  • pest/src/typed/traits.rs
Additional comments: 1
pest/src/typed/template/repetition.rs (1)
  • 40-43: The function parse_with_partial has been correctly renamed and its signature appears to be consistent with the expected functionality. However, ensure that all references to this function throughout the codebase have been updated to reflect the new name.

Comment on lines +103 to +126
fn check_with_partial(
mut input: Position<'i>,
stack: &mut Stack<Span<'i>>,
tracker: &mut Tracker<'i, R>,
) -> Option<Position<'i>> {
for i in 0..MAX {
match restore_on_none(stack, |stack| {
check_unit::<R, T, TRIVIA>(input, stack, tracker, i)
}) {
Some(next) => {
input = next;
}
None => {
if i < MIN {
return None;
} else {
break;
}
}
}
}

Some(input)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition of check_with_partial function is a significant enhancement. It's important to ensure that this function is properly documented, explaining its purpose, parameters, and return value. This will help maintain the code's readability and understandability.

Consider adding comprehensive documentation comments for the check_with_partial function to explain its usage and parameters.

Comment on lines 99 to 129

Some((input, Self { content: vec }))
}
#[inline]
fn check_with_partial(
mut input: Position<'i>,
stack: &mut Stack<Span<'i>>,
tracker: &mut Tracker<'i, R>,
) -> Option<Position<'i>> {
for i in 0..MAX {
match restore_on_none(stack, |stack| {
check_unit::<R, T, TRIVIA>(input, stack, tracker, i)
}) {
Some(next) => {
input = next;
}
None => {
if i < MIN {
return None;
} else {
break;
}
}
}
}

Some(input)
}
}
impl<T, const TRIVIA: u8, const MIN: usize, const MAX: usize> IntoIterator
for RepMinMax<T, TRIVIA, MIN, MAX>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [40-126]

The logic within both parse_with_partial and check_with_partial functions seems to correctly implement the intended parsing behavior, with appropriate handling of the MIN and MAX constraints. However, it's crucial to add unit tests covering various edge cases, such as reaching the MAX limit, not meeting the MIN requirement, and handling trivia between parsed units.

Would you like assistance in creating unit tests for these functions to ensure their robustness and correctness?

@TheVeryDarkness
Copy link
Contributor

TheVeryDarkness commented Feb 27, 2024

@tomtau and could you please explain how the module path (mod1::mod2::mod3::...) is mapped into the path in file system (dir1/dir2/dir3/...)?
I found grammar.pest only accepts module path, and parser.rs simply uses module path as file path and ignores built-in modules.
And when the grammar parser sees mod1::mod2, should it be converted to ./mod1/mod2.pest? And how could user access files in ancesters folders, say ../common.pest?

@tomtau
Copy link
Contributor Author

tomtau commented Feb 27, 2024

@TheVeryDarkness the pest3 module system isn't yet properly defined, there are two aspects to it: syntax and semantics.

syntax

I saw two proposals for it:

  1. specifying system paths directly: pest language evolution pest#333 (comment) use "cool.pest"; use "this.pest" as that;
  2. Rust-like modules: Implementation of modules/use statements pest#660

I think the current pest3 implementation was leaning towards the latter (even though the former was in the original proposal by @dragostis ) which seems fair: it's familiar to Rust users, it doesn't have issues with different system path notations on different operating systems, and can potentially be abstracted to "pest_vm".

And how could user access files in ancesters folders, say ../common.pest?

If we go with that, I guess we can refer to the ancestor modules using the super as in Rust?

semantics

There are multiple issues related to semantics, off the top of my head (there are likely more):

  • whether we add/allow mod {} syntax and its precedence over filesystem
  • mapping to filesystem
  • mapping to "virtual" modules in pest_vm (e.g. in the browser context where there's no OS filesystem)
  • where there are any visibility modifiers
  • scoping (especially of trivia operators)

could you please explain how the module path (mod1::mod2::mod3::...) is mapped into the path in file system (dir1/dir2/dir3/...)?
And when the grammar parser sees mod1::mod2, should it be converted to ./mod1/mod2.pest?

I think for this initial prototype, we can go with simple decisions, e.g. the grammar parser sees mod1::mod2, it can convert it to ./mod1/mod2.pest, scope trivia operators to each module (if one wants to re-use the trivia operator, they can re-import it explicitly), no visibility operators...

@tomtau tomtau merged commit 0f33163 into pest-parser:master Mar 12, 2024
@tomtau tomtau deleted the update-fixes branch March 12, 2024 14:46
@tomtau
Copy link
Contributor Author

tomtau commented Mar 12, 2024

@TheVeryDarkness I rebase-merged this one, so feel free to start a new branch off the latest master!

@TheVeryDarkness
Copy link
Contributor

TheVeryDarkness commented Mar 12, 2024

Sorry, I'm busy these weeks :(
Maybe let me implement that feature this or next weekend.

@tomtau
Copy link
Contributor Author

tomtau commented Mar 12, 2024

No worries!

@TheVeryDarkness TheVeryDarkness restored the update-fixes branch March 26, 2024 01:46
@tomtau tomtau mentioned this pull request Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants