Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse trailing comments from buses data in v30 files #78

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/PowerFlowData.jl
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
module PowerFlowData

using DocStringExtensions
using InlineStrings: InlineString1, InlineString3, InlineString15
using InlineStrings: InlineString1, InlineString3, InlineString15, InlineString31
using Parsers: Parsers, xparse, checkdelim!
using Parsers: codes, eof, invalid, invaliddelimiter, newline, valueok, peekbyte
using PrettyTables: pretty_table
Expand Down
57 changes: 57 additions & 0 deletions src/parsing.jl
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,18 @@ const OPTIONS_SPACE = Parsers.Options(
ignorerepeated=true,
wh1=0x00,
)
# These Options are for supporting extracting end-of-line comments like
# `/* [STBC 1 ] */` which we want to parse to `STBC 1`.
const OPTIONS_COMMENT = Parsers.Options(
sentinel=missing,
quoted=true,
openquotechar="/* [",
closequotechar="] */",
stripquoted=true,
delim=' ',
ignorerepeated=true,
wh1=0x00,
)

@inline getoptions(delim::Char) = ifelse(delim === ',', OPTIONS_COMMA, OPTIONS_SPACE)

Expand Down Expand Up @@ -244,6 +256,51 @@ end
return block
end

###
### Buses
###

# In v30 files, we've seen trailing end-of-line comments. These can be present
# in any section, but so far we have only had a use-case for the comments in the
# buses section.
# See https://github.com/nickrobinson251/PowerFlowData.jl/issues/27
#
# The currently handles data which looks like:
# 111,'STBC ',161.00,1, 0.00, 0.00,227, 1,1.09814, -8.327, 1 /* [STBC 1 ] */
# And does _not_ handle data with a comma before the comment like:
# 111,'STBC ',161.00,1, 0.00, 0.00,227, 1,1.09814, -8.327, 1, /* [STBC 1 ] */
@generated function parse_row!(rec::R, bytes, pos, len, options) where {R <: Buses30}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of interest, why does this need to be a generated function?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately thiis is the necessary to avoid dynamic dispatch (#22)

whenever we call:

fo col in 1:ncols
  T = eltype(fieldtype(R, col))
  parse_value!(record, col, T, bytes, pos, len, options)

the type of T is not known til runtime, meaaning the parse_value! call has to look-up decide at runtime which method to call, which adds a big performance cost, since we have one parse_value! call per value/element in the file

so we use a generated function to unroll the loop and generate code like

parse_value!(record, 1, Int64, bytes, pos, len, options)
parse_value!(record, 2, Float64, bytes, pos, len, options)
...
parse_value!(record, 12, Bool, bytes, pos, len, options)

I think the other way we might have been able to avoid the dynamic dispatch is by manually doing what "union splitting" does e.g. like is done in ODBC.jl

CSV.jl uses a mix of these techniques e.g. splitting the big union of expected types and generating code for custom types

block = Expr(:block)
n = fieldcount(R)
# The last column is the comment. We need to handle the second-last column separately
# too in order to be sure we stop parsing it before hitting the comment.
for col in 1:(n - 2)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to handle the last two entires differently, becase the second-last entry (the last "real" column) we need to be sure to stop parsing before the comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm unfamiliar with the package, so I have some potentially basic questions.

I see that this function dispatches over Buses30. But it's not always the case that a PSSE v30 will have the comment. How do we deal with both situations? Also, it's not just the buses that have augmented names, other components have it too; are they being accounted for?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not always the case that a PSSE v30 will have the comment. How do we deal with both situations?

we define Buses30 as having 12 columns, with the 12th being the comments. In this function we parse the first (12 - 2) = 10 columns as usual. We then parse the 11th (and potentially last column present, if no comments) with whitespace as the delimiter (i.e. stop parsing when we hit a space or newline). If when parsing the 11th column we hit the newline character then there are no comments and we set the 12th column as missing, else if we hit whitespace we then parse the rest of the line as the 12th and final column. See the if newline in the code.

Also, it's not just the buses that have augmented names, other components have it too; are they being accounted for?

As far as i understood only the Buses comments ("augmented names") were needed, so this PR only parses comments from the Buses section (and only for v30 files). I doesn't extract comments present in any other section. Do we need to?

These are good question and i'm very happy to answer as many questions as you have!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we do use augmented names for other components, namely generators, loads, and branches (incl. transformers)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we need to functionality not just for buses?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, wait, for other components, these comments can be reconstructed from columns already in the data, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of loads and generators the augmented names are constructed from other columns. But for branches (including transformers) we use the end of line comments.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transformers are the biggest pain 😂

T = eltype(fieldtype(R, col))
push!(block.args, quote
rec, pos, code = parse_value!(rec, $col, $T, bytes, pos, len, options)
end)
end
m = n - 1
Tm = eltype(fieldtype(R, m))
Tn = eltype(fieldtype(R, n))
push!(block.args, quote
# @show (rec, pos, code)
# Assume no comma delim before the comment, so need to set whitespace as delim.
# And move to next non-whitespace character, so we don't immediately hit a delim.
pos = checkdelim!(bytes, pos, len, OPTIONS_SPACE)
(rec, pos, code) = parse_value!(rec, $m, $Tm, bytes, pos, len, OPTIONS_SPACE)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to set whitespace as the delim in order to not parse the comment when parsing the second-last entry (hence OPTIONS_SPACE) but if we parsed with comma as the delim til now then we may be at a whitespace so we use checkdelim! to move to the next non-whitespace char before trying to parse this second-last entry

if newline(code)
# If we hit a newline before delimiter there is no trailing comment.
push!(getfield(rec, $n)::Vector{$Tn}, missing)
else
(rec, pos, code) = parse_value!(rec, $n, $Tn, bytes, pos, len, OPTIONS_COMMENT)
end
end)
push!(block.args, :(return rec, pos))
# @show block
return block
end

###
### transformers
###
Expand Down
5 changes: 5 additions & 0 deletions src/types.jl
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,11 @@ struct Buses30 <: Buses
See [`Owners`](@ref).
"""
owner::Vector{OwnerNum}
"""
End-of-line comments. Not part of the official PSS/E format specification, but
present in some files nonetheless.
"""
comment::Vector{Union{InlineString31,Missing}}
end

"""
Expand Down
19 changes: 18 additions & 1 deletion test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ using Test
@test startswith(repr(mime, net.caseid), "CaseID: (ic = 0, sbase = 100.0, ")

@test repr(mime, net.buses; context=(:compact => true)) == "Buses with 2 records"
@test startswith(repr(mime, net.buses), "Buses with 2 records, 11 columns:\n──")
@test startswith(repr(mime, net.buses), "Buses with 2 records, 12 columns:\n──")

mt_dc_line = net.multi_terminal_dc.lines[1]
@test eval(Meta.parse(repr(mt_dc_line))) isa MultiTerminalDCLine
Expand Down Expand Up @@ -497,6 +497,23 @@ using Test
@test net_space.branches.j == net_space_manual.branches.j
end

@testset "End-of-line comments" begin
net_eol = parse_network("testfiles/eolcomments.raw")
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: add comment here about only Buses30 having comment column


# Only `Buses30` parse trailing EOL comments.
buses = net_eol.buses
@test length(buses) == 2
@test buses.i == [10010, 337918] # first col
@test buses.owner == [1, 1] # last col before comments
@test all(contains.(buses.comment, ["NOBO 1", "NAU_E2 1"]))

loads = net_eol.loads
@test length(loads) == 1
@test loads.i == [10010] # first col
@test loads.owner == [1] # last non-missing col
@test isequal(loads.intrpt, [missing]) # last col
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering what exactly this test is checking for. Is it checking that the end of line comment has not been interpreted as intrpt? In that case should there also be a test that loads.scale is missing? scale is the 13th column so would be where the end of line comment could mistakenly end up.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's just checking that loads parse as expected (but checking only select columns as a proxy, rather than every column)

scale is the 13th column so would be where the end of line comment could mistakenly end up.

i don't follow this - can you explain?

Anyway it sounds like if we want something like this PR then it does need to parse commnts from more than just Buses data, so this'd need updating

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scale is the 13th column so would be where the end of line comment could mistakenly end up.

I thought this test might be intended to check that the end of line comment hadn't been parsed and placed into the last column by mistake. In the test file for this test, loads has 12 columns, so I thought checking the 13th column was missing would be informative.

end

@testset "issues" begin
sz = parse_network("testfiles/spacezero.raw")
@test length(sz.buses) == 2
Expand Down
7 changes: 7 additions & 0 deletions test/testfiles/eolcomments.raw
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
0,100.0,30 / PSS(tm)E-30 RAW created Thu, Oct 13 2018 13:08
SEP 2018 V2 MODEL
MADE ON 13-Oct-2018 13:08@
10010,'NOBO ',161.00,1, 0.00, 0.00,327, 1,1.03182, 5.469, 1 /* [NOBO 1 ] */
337918,'3NAUNEPPE-2+',115.00,1, 0.00, 0.00,327, 1,1.01644, 0.679, 1 /* [NAU_E2 1 ] */
0 / end of bus cards
10010,'T1',1,327, 1, 7.698, 4.063, 9.463, 16.549, -1.802, 2.043, 1 /* [NOBO T1 ] */