Skip to content

Commit

Permalink
Major refactor: flatten code and object structures (#34). Resolves #26
Browse files Browse the repository at this point in the history
* Move string indexing behind a feature flag

Instead of deprecating the `index(String)` and `index(&str)` calls
for `Message`, `GenericSegment`, and `Field`, move them behind a
feature flag for `string_index`.
The `query()` semantic has limits to how it handles called params
(attempts to convert the str idx calls to use `query()` reveal that
it converts an &str to str in the process), but the string indexing
approach isn't very Rusty in terms of idiomatic implementation
(despite bringing parity to libs in other languages). So instead of
throwing the baby out with the bath water, we can use feature flags
to allow users to choose how the library works for them.

Testing:
  Builds, passes tests, with the feature enabled and normally.

* Major refactor: flatten code and object structures

After converting Field from an enum to a Struct and implementing
various generic mechanisms for common functionality, we are seeing
legacy code and object structures cluttering and breaking efforts
respectively. The `MshSegment` does not behave like a generic one
in terms of structure or implementation, and being "in the line of
fire" whenever segments are used to be parsed out through a match
statement makes it very cumbersome, sometimes unworkable due to
borrow checks on the string references.

Remove the `Segment` enum and convert `GenericSegment` struct to be
the new `Segment` struct - possible because `Separators` live in
the `Message` structure permitting on-demand re-parse of a `Segment`
struct into any other named segment type such as `MshSegment`. This
is how the `msh()` implementation now works for `Message` - finding
the relevant `Segment` and returning a parsed `MshSegment` from it
and the `Message` `Separators` already stored.

This permits `query()` and `Index<&str>` mechanisms to search for
all types of segments and fields, using syntax a la `MSH.F3` for
which a test-case has been added to the feature-gated test.

Now that the object structures and interfaces are generic, simple,
and common - the code structure itself has been reorganized per #25
into a flat set of files within src/.

Testing:
  Updated test object derivations to remove destructuring of enums.
  Added check to string-index for MSH fields.
  All tests passing with `index_string` feature enabled and without

* Implement `new(&str)` for Message

The removal of native implementation initializers for Message has
resulted in users having to `use std::convert::TryFrom` whenever
instantiating `Message` structs. Unlike the more common From trait,
this one is not always in-scope making users import it.

Add `Message::new(&str)` which synchronously returns a `Message`
directly without Result wrapping - intended for use when developers
are sure of what they're parsing and can afford to skip validation
and async dispatch.

Testing:
  Added test to compare the results of `new` and `from_str`
  All tests passing

* Fix benchmark

* Split field in repeat delim

Address #26 by updating `Field::parse()` to split on both the
component delimeter (as before) and the repeat delimeter to permit
component indexing/querying regardless of subdivision modality.

Testing:
  Add test per #26 to find R1 and R2 from `A~S`, passing with
--features string_index.

* Field convenience method and housekeeping

* Cleanup and test `Field::is_subdivided` method

* CI: test/bench with features

* String index - handle MSH off-by-one problem

The HL7 spec defines the first field of the `MSH` segment to be the
field separator directly following the "MSH" characters in the HL7
string. Machines parsing on that delimeter obviously can't handle
such meatland nonsense in format specification natively and need to
have a workaround implemented to offset by -1 for indices into the
`MSH` segment by field.

Implement the off-by-one calculation, and create a temporary hack of
sorts to return `&"|"` because attempts to slice into the `source`
of an MSH segment to pull out `seg[3..3]` violates borrow checker
constraints failing compilation.

@wokket: do you want me to port the change to your `query()` impl?

Relevant spec:
  https://hl7-definition.caristix.com/v2/HL7v2.8/Segments/MSH

Testing:
  Updated MSH string index test to check for MSH.F2 being the set
of delimeters despite that actually being the first field.
  Tests passing

* Update readme

* Fix OOB index attempt on fields vec of segment

Under normal `Index<usize>` and friends contexts, indexing out of
bounds is prevented by length checks against the relevant vectors.
However, when extracting a compound index by String or &str, the
decomposition of the compound idx calls into the `fields` Vec when
passing the index operation down to the fields themselves. This
indexing operation does not first check to ensure that it is within
the extents of the vector its touching, resulting in an OOB error
which otherwise appears to be handled in the `usize` versions.

Fix this by adding a bounds check in the compound index destructure
phase and extending tests to cover the condition.

Testing:
  Expanded tests to catch the error
  All tests passing

* Add OOB fix to `Segment::query()`

* First pass @ PR review notes

* Refactor `Field` for 3 tiers of subdivision

Thanks @wokket for spec clarification on the internal structures of
a complex HL7 `Field` - the first set of separation occurs in the
repeat delimeter, each of which is then treated effectively as a
whole field. The repeating segments are split like entire `Field`s
were prior to this commit - dividing them up by component delims,
each product of which is split into subcomponent delims.

Functionally, this introduces a new dimension to the data requiring
updates to the `Index` implementations to handle another level of
depth. `Query()` and string indexing are also impacted since their
syntax now maps correctly to the subdivisions of a `Field` struct.

Testing:
  Updated tests for added depth and separation semantic
  All tests passing
  Requires manual review from @wokket since this is a major change
and he knows the spec better.

* Expand str idx to take additional level of depth

With the added dimension of repeats pushing existing field vectors
down one level of nesting, the indexing mechanism to quickly pull
values needs to have an additional level of depth.
The Index implementations push down their called parameters to the
lower-level struct within the one handling the current depth of
indexing, so the only effort required to achieve this is to add one
more possible lentgh of index values to the conditional checking
which `uint` indexer to call.
Implement the additional change, add a test to Message extracting
the `S2` subcomponent of `OBR.F1.R1.C2`.

Testing:
  Added test for new functionality
  All tests passing

* Resolve clippy complaints

* Restore separate `query()` tests

* Simplify get_msh_and_read_field benchmark

Moved call to `msh()` out of the loop, removed the result matcher
and replaced with `unwrap()` since we're measuring field retreival.

* CI: use all features

* Fix typo in benchmark

* Simplify `msh()` per @wokket

* Borrow for benchmark iter

Co-authored-by: RageLtMan <rageltman [at] sempervictus>
  • Loading branch information
sempervictus authored Aug 20, 2021
1 parent d94f58a commit 2c2a78e
Show file tree
Hide file tree
Showing 12 changed files with 614 additions and 638 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,6 @@ jobs:
- name: Build
run: cargo build --verbose
- name: Run tests
run: cargo test --verbose
run: cargo test --verbose --all-features
- name: Run Benchmarks
run: cargo bench
run: cargo bench --all-features
3 changes: 3 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ description = "HL7 Parser and object builder? query'er? - experimental only at a
license = "MIT OR Apache-2.0"
repository = "https://github.com/wokket/rust-hl7/"

[features]
string_index = []

[lib]
name="rusthl7"
path="src/lib.rs"
Expand Down
46 changes: 32 additions & 14 deletions benches/simple_parse.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
use criterion::{criterion_group, criterion_main, Criterion};
use rusthl7::{message::*, segments::Segment};
use rusthl7::{message::*, segments::Segment, segments::MshSegment};
use std::convert::TryFrom;

fn get_sample_message() -> &'static str {
Expand All @@ -19,7 +19,7 @@ fn get_segments_by_name(c: &mut Criterion) {
let m = Message::try_from(get_sample_message()).unwrap();

b.iter(|| {
let _segs = m.generic_segments_by_name("OBR").unwrap();
let _segs = m.segments_by_name("OBR").unwrap();
//assert!(segs.len() == 1);
})
});
Expand All @@ -28,14 +28,11 @@ fn get_segments_by_name(c: &mut Criterion) {
fn get_msh_and_read_field(c: &mut Criterion) {
c.bench_function("Read Field from MSH (variable)", |b| {
let m = Message::try_from(get_sample_message()).unwrap();
let msh = m.msh().unwrap();

b.iter(|| {
let seg = m.segments.first();

if let Some(Segment::MSH(msh)) = seg {
let _app = msh.msh_3_sending_application.as_ref().unwrap(); // direct variable access
//println!("{}", _app.value());
}
let _app = &msh.msh_3_sending_application.as_ref().unwrap(); // direct variable access
//println!("{}", _app.value());
})
});
}
Expand All @@ -45,12 +42,9 @@ fn get_pid_and_read_field_via_vec(c: &mut Criterion) {
let m = Message::try_from(get_sample_message()).unwrap();

b.iter(|| {
let seg = &m.segments[1];

if let Segment::Generic(pid) = seg {
let _field = pid[3];
assert_eq!(_field, "555-44-4444"); // lookup from vec
}
let pid = &m.segments[1];
let _field = pid[3];
assert_eq!(_field, "555-44-4444"); // lookup from vec
})
});
}
Expand All @@ -66,6 +60,30 @@ fn get_pid_and_read_field_via_query(c: &mut Criterion) {
});
}

#[cfg(feature = "string_index")]
fn get_pid_and_read_field_via_index(c: &mut Criterion) {
c.bench_function("Read Field from PID (index)", |b| {
let m = Message::try_from(get_sample_message()).unwrap();

b.iter(|| {
let _val = m["PID.F3"]; // query via Message
assert_eq!(_val, "555-44-4444"); // lookup from vec
})
});
}

#[cfg(feature = "string_index")]
criterion_group!(
benches,
message_parse,
get_segments_by_name,
get_msh_and_read_field,
get_pid_and_read_field_via_vec,
get_pid_and_read_field_via_query,
get_pid_and_read_field_via_index
);

#[cfg(not(feature = "string_index"))]
criterion_group!(
benches,
message_parse,
Expand Down
8 changes: 4 additions & 4 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ This library is trying to provide the _tooling_ you need to build robust HL7 bas
- [x] Initially use hl7 default separator chars
- [x] Use separator chars from the message
- [X] Add support for sub-field (component/subcomponent) items
- [ ] Field repeats (via `~`) are currently missing ([#26](https://github.com/wokket/rust-hl7/issues/26))
- [X] Initially, avoid any per-segment knowledge, requirement to read the spec too much etc.
- [x] Field repeats (via `~`)
- [x] Initially, avoid any per-segment knowledge, requirement to read the spec too much etc.
- Implementing all the segments, across all the hl7 versions, version-specific parsing etc is tooooo much while we're getting started.
- [-] Add support for [HL7 escape sequences](https://www.lyniate.com/knowledge-hub/hl7-escape-sequences/) ([#22](https://github.com/wokket/rust-hl7/issues/22))
- [x] Decoding of the most common escape sequences including `\E\`, `\R\`, `\S\` & `\T\`
Expand All @@ -28,5 +28,5 @@ This library is trying to provide the _tooling_ you need to build robust HL7 bas
- [x] Parse a message using a `TryFrom<&str>` impl rather than a dedicated parser
- [x] Index into messages using HL7 string index notation and binary methods
- [x] Index into sub-fields using HL7 string index notation and binary methods
- [X] Index into the segment enum using HL7 string index notation and binary methods
- [ ] Implement buffer-copy-free generic indexing into MSH
- [x] Index into the segment enum using HL7 string index notation and binary methods
- [x] Implement buffer-copy-free generic indexing into MSH
4 changes: 2 additions & 2 deletions src/escape_sequence.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,10 @@ use std::borrow::Cow;
///
/// ## Details
///
/// This decoder will replace some, **but not all** of the standard HL7 escape sequences.
/// This decoder will replace some, **but not all** of the standard HL7 escape sequences.
/// - `\E\`,`\F\`, '\R\`, `\S\`, `\T\` are all handled, and replaced with the Escape, Field, Repeat, Component and Sub-Component separator chars respectively
/// - `\X..\` hexidecimal erscape sequences are supported (2 hex digits per char)
///
///
/// The following sequences are **NOT** replaced by design and will be left in the string:
/// - `\H\` Indicates the start of highlighted text, this is a consuming application problem and will not be replaced.
/// - `\N\` Indicates the end of highlighted text and resumption of normal text. This is a consuming application problem and will not be replaced.
Expand Down
Loading

0 comments on commit 2c2a78e

Please sign in to comment.