Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Strings (multiple chars) as 'quotechars' #117

Closed
wants to merge 3 commits into from

Conversation

nickrobinson251
Copy link
Collaborator

@nickrobinson251 nickrobinson251 commented Jun 10, 2022

test/runtests.jl Outdated Show resolved Hide resolved
@nickrobinson251
Copy link
Collaborator Author

nickrobinson251 commented Jun 10, 2022

The test which is failing is

case = (str = " {\\}} ", kwargs = (), x = 0, code = (QUOTED | INVALID | ESCAPED_STRING | EOF), vpos = 3, vlen = 2, tlen = 6) oq = "{{"
cq = "}}"

(so after the replacement: str = " {{\\}}}} ")

where we're setting the wrong code, as we're setting adding INVALID_DELIMITERto the code.

More generally, i think this is parsing one characted too many e.g. #= text here =#" is parsing to "text here =":

julia> str = "#= text here =#";

julia> res = Parsers.xparse(String, str; openquotechar="#=", closequotechar="=#", escapechar=0x22, stripquoted=true);

julia> Parsers.getstring(str, res.val, 0x22)
"text here ="

EDIT: Solved!

@codecov
Copy link

codecov bot commented Jun 10, 2022

Codecov Report

Merging #117 (345007a) into main (f143231) will increase coverage by 0.17%.
The diff coverage is 96.49%.

@@            Coverage Diff             @@
##             main     #117      +/-   ##
==========================================
+ Coverage   87.65%   87.83%   +0.17%     
==========================================
  Files           9        9              
  Lines        2309     2351      +42     
==========================================
+ Hits         2024     2065      +41     
- Misses        285      286       +1     
Impacted Files Coverage Δ
src/utils.jl 90.81% <ø> (ø)
src/strings.jl 96.65% <94.44%> (-0.25%) ⬇️
src/Parsers.jl 94.36% <97.43%> (+0.11%) ⬆️
src/floats.jl 93.27% <0.00%> (+0.29%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f143231...345007a. Read the comment docs.

@nickrobinson251
Copy link
Collaborator Author

okay, tests are passing -- so this could be ready for review

(now for me to go figure out if this is actually gonna help me over at nickrobinson251/PowerFlowData.jl#27 ...)

Are there any benchmarks we should run when we add things to the xparse, @quinnj ?

Also this changes oq and cq fields to a small union Union{UInt8, PtrLen}, which is think is fine, given we do it for delim, but let me know if anything should be changed (e.g. if it's better to add explicit oq isa ... branches)

And this makes the keyword names openquotechar/closequotechar a bit of a misnomer now Char isn't required, but i also think that's probably fine in practice

(also cc @BSnelling, since we chatted a bit about this)

end

# Used by CSV.jl
function checkdelim!(buf::AbstractVector{UInt8}, pos, len, options::Options)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comment and type annotation here because i ended up being led astray by this function... and then wondered why we had it at all 😂

@@ -216,7 +223,11 @@ for useio in (false, true)
@test x == case.x
end
@test code == case.code
@test tlen == case.tlen
if Parsers.quoted(code)
@test tlen == ncodeunits(str)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i should probably add a comment here:

  • we don't check case.tlen for quoted input, because our test code replaces the quoted chars in the input with (poentially) multiple characters
  • but also for this to be the right check, the testcases have to be limited to inputs without characters after a delimiter, which they are right now, but we might add such testcases in the future so we should probably make the assumption explicit

@@ -96,15 +96,20 @@ function Options(
dateformat::Union{Nothing, String, Dates.DateFormat, Format},
ignorerepeated, ignoreemptylines, comment, quoted, debug, stripwhitespace=false, stripquoted=false)
asciival(wh1) && asciival(wh2) || throw(ArgumentError("whitespace characters must be ASCII"))
asciival(oq) && asciival(cq) && asciival(e) || throw(ArgumentError("openquotechar, closequotechar, and escapechar must be ASCII characters"))
(oq isa String || asciival(oq)) && (cq isa String || asciival(cq)) && asciival(e) || throw(ArgumentError("openquotechar, closequotechar, and escapechar must be ASCII characters"))
(oq == delim) || (cq == delim) || (e == delim) && throw(ArgumentError("delim argument must be different than openquotechar, closequotechar, and escapechar arguments"))
Copy link
Collaborator Author

@nickrobinson251 nickrobinson251 Jun 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to think about how delim and oq/cq interact e.g. when delim::Char in oq::String

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably just check the delim and oq/cq (i) are not equal, (ii) none of the chars in one are in the other

@nickrobinson251
Copy link
Collaborator Author

closing in favour of #127 + #141

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant