Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: String indexer/query methods can't handle repeating fields #26

Closed
wokket opened this issue Aug 13, 2021 · 5 comments
Closed

Bug: String indexer/query methods can't handle repeating fields #26

wokket opened this issue Aug 13, 2021 · 5 comments
Assignees

Comments

@wokket
Copy link
Owner

wokket commented Aug 13, 2021

No support for handling multiple repeats (~ delim by default)

@sempervictus
Copy link
Contributor

Will take a look at this - feel free to assign me.

@sempervictus
Copy link
Contributor

Could i ask for a test-case which needs to pass here?

@wokket
Copy link
Owner Author

wokket commented Aug 18, 2021

I'll get you a code sample later (real work atm), but for:

OBX|1|NM|301.0500^White Blood Count (WBC)^00065227^6690-2^Leukocytes^pCLOCD|1|10.1|10\S\9/L|3.1-9.7|H||A~S|F|||201411130916|MYFAC^MyFake Hospital^L|

I'd expect a query string of F10.R1 to return either A or S depending on whether we're 0 or 1-based indexing (I'd have to check the spec for that).

@sempervictus
Copy link
Contributor

Thanks, will look into it today

sempervictus pushed a commit to sempervictus/rust-hl7 that referenced this issue Aug 18, 2021
Address wokket#26 by updating `Field::parse()` to split on both the
component delimeter (as before) and the repeat delimeter to permit
component indexing/querying regardless of subdivision modality.

Testing:
  Add test per wokket#26 to find R1 and R2 from `A~S`, passing with
--features string_index.
@sempervictus
Copy link
Contributor

Proposed fix and added test in the refactor PR - #31.

sempervictus pushed a commit to sempervictus/rust-hl7 that referenced this issue Aug 18, 2021
Address wokket#26 by updating `Field::parse()` to split on both the
component delimeter (as before) and the repeat delimeter to permit
component indexing/querying regardless of subdivision modality.

Testing:
  Add test per wokket#26 to find R1 and R2 from `A~S`, passing with
--features string_index.
sempervictus pushed a commit to sempervictus/rust-hl7 that referenced this issue Aug 19, 2021
GH seems to have gone nuts claming that the rebase/flatten branch
somehow changes commits going back to before i even started work
on this repo. So merge-squashing all the work into a single commit
to avoid dealing with nonsense - if attribution of bugs down the
line is an issue, feel free to blame me first and ask for a hand to
resolve.

```
commit 44de819 (origin/refactor/flatten, refactor/flatten)
Author: RageLtMan <rageltman [at] sempervictus>
Date:   Wed Aug 18 19:40:17 2021 -0400

CI: test/bench with features

commit bafe1ca
Author: RageLtMan <rageltman [at] sempervictus>
Date:   Wed Aug 18 12:10:21 2021 -0400

Cleanup and test `Field::is_subdivided` method

commit b245701
Author: RageLtMan <rageltman [at] sempervictus>
Date:   Wed Aug 18 11:22:48 2021 -0400

Field convenience method and housekeeping

commit fd0d43f
Author: RageLtMan <rageltman [at] sempervictus>
Date:   Wed Aug 18 11:02:13 2021 -0400

Split field in repeat delim

Address wokket#26 by updating `Field::parse()` to split on both the
component delimeter (as before) and the repeat delimeter to permit
component indexing/querying regardless of subdivision modality.

Testing:
  Add test per wokket#26 to find R1 and R2 from `A~S`, passing with
--features string_index.

commit bb72ab5
Author: RageLtMan <rageltman [at] sempervictus>
Date:   Wed Aug 18 04:42:46 2021 -0400

Fix benchmark

commit f91cd51
Author: RageLtMan <rageltman [at] sempervictus>
Date:   Wed Aug 18 04:17:17 2021 -0400

Implement `new(&str)` for Message

The removal of native implementation initializers for Message has
resulted in users having to `use std::convert::TryFrom` whenever
instantiating `Message` structs. Unlike the more common From trait,
this one is not always in-scope making users import it.

Add `Message::new(&str)` which synchronously returns a `Message`
directly without Result wrapping - intended for use when developers
are sure of what they're parsing and can afford to skip validation
and async dispatch.

Testing:
  Added test to compare the results of `new` and `from_str`
  All tests passing

commit 8931d73
Author: RageLtMan <rageltman [at] sempervictus>
Date:   Wed Aug 18 02:56:43 2021 -0400

Major refactor: flatten code and object structures

After converting Field from an enum to a Struct and implementing
various generic mechanisms for common functionality, we are seeing
legacy code and object structures cluttering and breaking efforts
respectively. The `MshSegment` does not behave like a generic one
in terms of structure or implementation, and being "in the line of
fire" whenever segments are used to be parsed out through a match
statement makes it very cumbersome, sometimes unworkable due to
borrow checks on the string references.

Remove the `Segment` enum and convert `GenericSegment` struct to be
the new `Segment` struct - possible because `Separators` live in
the `Message` structure permitting on-demand re-parse of a `Segment`
struct into any other named segment type such as `MshSegment`. This
is how the `msh()` implementation now works for `Message` - finding
the relevant `Segment` and returning a parsed `MshSegment` from it
and the `Message` `Separators` already stored.

This permits `query()` and `Index<&str>` mechanisms to search for
all types of segments and fields, using syntax a la `MSH.F3` for
which a test-case has been added to the feature-gated test.

Now that the object structures and interfaces are generic, simple,
and common - the code structure itself has been reorganized per wokket#25
into a flat set of files within src/.

Testing:
  Updated test object derivations to remove destructuring of enums.
  Added check to string-index for MSH fields.
  All tests passing with `index_string` feature enabled and without

commit 6201ef6
Author: RageLtMan <rageltman [at] sempervictus>
Date:   Wed Aug 18 00:48:04 2021 -0400

Move string indexing behind a feature flag

Instead of deprecating the `index(String)` and `index(&str)` calls
for `Message`, `GenericSegment`, and `Field`, move them behind a
feature flag for `string_index`.
The `query()` semantic has limits to how it handles called params
(attempts to convert the str idx calls to use `query()` reveal that
it converts an &str to str in the process), but the string indexing
approach isn't very Rusty in terms of idiomatic implementation
(despite bringing parity to libs in other languages). So instead of
throwing the baby out with the bath water, we can use feature flags
to allow users to choose how the library works for them.

Testing:
  Builds, passes tests, with the feature enabled and normally.

```
sempervictus pushed a commit to sempervictus/rust-hl7 that referenced this issue Aug 19, 2021
Address wokket#26 by updating `Field::parse()` to split on both the
component delimeter (as before) and the repeat delimeter to permit
component indexing/querying regardless of subdivision modality.

Testing:
  Add test per wokket#26 to find R1 and R2 from `A~S`, passing with
--features string_index.
sempervictus pushed a commit to sempervictus/rust-hl7 that referenced this issue Aug 19, 2021
Address wokket#26 by updating `Field::parse()` to split on both the
component delimeter (as before) and the repeat delimeter to permit
component indexing/querying regardless of subdivision modality.

Testing:
  Add test per wokket#26 to find R1 and R2 from `A~S`, passing with
--features string_index.
@wokket wokket closed this as completed in 2c2a78e Aug 20, 2021
sempervictus added a commit to sempervictus/rust-hl7 that referenced this issue Aug 25, 2021
 wokket#26

* Move string indexing behind a feature flag

Instead of deprecating the `index(String)` and `index(&str)` calls
for `Message`, `GenericSegment`, and `Field`, move them behind a
feature flag for `string_index`.
The `query()` semantic has limits to how it handles called params
(attempts to convert the str idx calls to use `query()` reveal that
it converts an &str to str in the process), but the string indexing
approach isn't very Rusty in terms of idiomatic implementation
(despite bringing parity to libs in other languages). So instead of
throwing the baby out with the bath water, we can use feature flags
to allow users to choose how the library works for them.

Testing:
  Builds, passes tests, with the feature enabled and normally.

* Major refactor: flatten code and object structures

After converting Field from an enum to a Struct and implementing
various generic mechanisms for common functionality, we are seeing
legacy code and object structures cluttering and breaking efforts
respectively. The `MshSegment` does not behave like a generic one
in terms of structure or implementation, and being "in the line of
fire" whenever segments are used to be parsed out through a match
statement makes it very cumbersome, sometimes unworkable due to
borrow checks on the string references.

Remove the `Segment` enum and convert `GenericSegment` struct to be
the new `Segment` struct - possible because `Separators` live in
the `Message` structure permitting on-demand re-parse of a `Segment`
struct into any other named segment type such as `MshSegment`. This
is how the `msh()` implementation now works for `Message` - finding
the relevant `Segment` and returning a parsed `MshSegment` from it
and the `Message` `Separators` already stored.

This permits `query()` and `Index<&str>` mechanisms to search for
all types of segments and fields, using syntax a la `MSH.F3` for
which a test-case has been added to the feature-gated test.

Now that the object structures and interfaces are generic, simple,
and common - the code structure itself has been reorganized per wokket#25
into a flat set of files within src/.

Testing:
  Updated test object derivations to remove destructuring of enums.
  Added check to string-index for MSH fields.
  All tests passing with `index_string` feature enabled and without

* Implement `new(&str)` for Message

The removal of native implementation initializers for Message has
resulted in users having to `use std::convert::TryFrom` whenever
instantiating `Message` structs. Unlike the more common From trait,
this one is not always in-scope making users import it.

Add `Message::new(&str)` which synchronously returns a `Message`
directly without Result wrapping - intended for use when developers
are sure of what they're parsing and can afford to skip validation
and async dispatch.

Testing:
  Added test to compare the results of `new` and `from_str`
  All tests passing

* Fix benchmark

* Split field in repeat delim

Address wokket#26 by updating `Field::parse()` to split on both the
component delimeter (as before) and the repeat delimeter to permit
component indexing/querying regardless of subdivision modality.

Testing:
  Add test per wokket#26 to find R1 and R2 from `A~S`, passing with
--features string_index.

* Field convenience method and housekeeping

* Cleanup and test `Field::is_subdivided` method

* CI: test/bench with features

* String index - handle MSH off-by-one problem

The HL7 spec defines the first field of the `MSH` segment to be the
field separator directly following the "MSH" characters in the HL7
string. Machines parsing on that delimeter obviously can't handle
such meatland nonsense in format specification natively and need to
have a workaround implemented to offset by -1 for indices into the
`MSH` segment by field.

Implement the off-by-one calculation, and create a temporary hack of
sorts to return `&"|"` because attempts to slice into the `source`
of an MSH segment to pull out `seg[3..3]` violates borrow checker
constraints failing compilation.

@wokket: do you want me to port the change to your `query()` impl?

Relevant spec:
  https://hl7-definition.caristix.com/v2/HL7v2.8/Segments/MSH

Testing:
  Updated MSH string index test to check for MSH.F2 being the set
of delimeters despite that actually being the first field.
  Tests passing

* Update readme

* Fix OOB index attempt on fields vec of segment

Under normal `Index<usize>` and friends contexts, indexing out of
bounds is prevented by length checks against the relevant vectors.
However, when extracting a compound index by String or &str, the
decomposition of the compound idx calls into the `fields` Vec when
passing the index operation down to the fields themselves. This
indexing operation does not first check to ensure that it is within
the extents of the vector its touching, resulting in an OOB error
which otherwise appears to be handled in the `usize` versions.

Fix this by adding a bounds check in the compound index destructure
phase and extending tests to cover the condition.

Testing:
  Expanded tests to catch the error
  All tests passing

* Add OOB fix to `Segment::query()`

* First pass @ PR review notes

* Refactor `Field` for 3 tiers of subdivision

Thanks @wokket for spec clarification on the internal structures of
a complex HL7 `Field` - the first set of separation occurs in the
repeat delimeter, each of which is then treated effectively as a
whole field. The repeating segments are split like entire `Field`s
were prior to this commit - dividing them up by component delims,
each product of which is split into subcomponent delims.

Functionally, this introduces a new dimension to the data requiring
updates to the `Index` implementations to handle another level of
depth. `Query()` and string indexing are also impacted since their
syntax now maps correctly to the subdivisions of a `Field` struct.

Testing:
  Updated tests for added depth and separation semantic
  All tests passing
  Requires manual review from @wokket since this is a major change
and he knows the spec better.

* Expand str idx to take additional level of depth

With the added dimension of repeats pushing existing field vectors
down one level of nesting, the indexing mechanism to quickly pull
values needs to have an additional level of depth.
The Index implementations push down their called parameters to the
lower-level struct within the one handling the current depth of
indexing, so the only effort required to achieve this is to add one
more possible lentgh of index values to the conditional checking
which `uint` indexer to call.
Implement the additional change, add a test to Message extracting
the `S2` subcomponent of `OBR.F1.R1.C2`.

Testing:
  Added test for new functionality
  All tests passing

* Resolve clippy complaints

* Restore separate `query()` tests

* Simplify get_msh_and_read_field benchmark

Moved call to `msh()` out of the loop, removed the result matcher
and replaced with `unwrap()` since we're measuring field retreival.

* CI: use all features

* Fix typo in benchmark

* Simplify `msh()` per @wokket

* Borrow for benchmark iter

Co-authored-by: RageLtMan <rageltman [at] sempervictus>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants