Major refactor: flatten code and object structures (#34). Resolves #26

* Move string indexing behind a feature flag Instead of deprecating the `index(String)` and `index(&str)` calls for `Message`, `GenericSegment`, and `Field`, move them behind a feature flag for `string_index`. The `query()` semantic has limits to how it handles called params (attempts to convert the str idx calls to use `query()` reveal that it converts an &str to str in the process), but the string indexing approach isn't very Rusty in terms of idiomatic implementation (despite bringing parity to libs in other languages). So instead of throwing the baby out with the bath water, we can use feature flags to allow users to choose how the library works for them. Testing: Builds, passes tests, with the feature enabled and normally. * Major refactor: flatten code and object structures After converting Field from an enum to a Struct and implementing various generic mechanisms for common functionality, we are seeing legacy code and object structures cluttering and breaking efforts respectively. The `MshSegment` does not behave like a generic one in terms of structure or implementation, and being "in the line of fire" whenever segments are used to be parsed out through a match statement makes it very cumbersome, sometimes unworkable due to borrow checks on the string references. Remove the `Segment` enum and convert `GenericSegment` struct to be the new `Segment` struct - possible because `Separators` live in the `Message` structure permitting on-demand re-parse of a `Segment` struct into any other named segment type such as `MshSegment`. This is how the `msh()` implementation now works for `Message` - finding the relevant `Segment` and returning a parsed `MshSegment` from it and the `Message` `Separators` already stored. This permits `query()` and `Index<&str>` mechanisms to search for all types of segments and fields, using syntax a la `MSH.F3` for which a test-case has been added to the feature-gated test. Now that the object structures and interfaces are generic, simple, and common - the code structure itself has been reorganized per #25 into a flat set of files within src/. Testing: Updated test object derivations to remove destructuring of enums. Added check to string-index for MSH fields. All tests passing with `index_string` feature enabled and without * Implement `new(&str)` for Message The removal of native implementation initializers for Message has resulted in users having to `use std::convert::TryFrom` whenever instantiating `Message` structs. Unlike the more common From trait, this one is not always in-scope making users import it. Add `Message::new(&str)` which synchronously returns a `Message` directly without Result wrapping - intended for use when developers are sure of what they're parsing and can afford to skip validation and async dispatch. Testing: Added test to compare the results of `new` and `from_str` All tests passing * Fix benchmark * Split field in repeat delim Address #26 by updating `Field::parse()` to split on both the component delimeter (as before) and the repeat delimeter to permit component indexing/querying regardless of subdivision modality. Testing: Add test per #26 to find R1 and R2 from `A~S`, passing with --features string_index. * Field convenience method and housekeeping * Cleanup and test `Field::is_subdivided` method * CI: test/bench with features * String index - handle MSH off-by-one problem The HL7 spec defines the first field of the `MSH` segment to be the field separator directly following the "MSH" characters in the HL7 string. Machines parsing on that delimeter obviously can't handle such meatland nonsense in format specification natively and need to have a workaround implemented to offset by -1 for indices into the `MSH` segment by field. Implement the off-by-one calculation, and create a temporary hack of sorts to return `&"|"` because attempts to slice into the `source` of an MSH segment to pull out `seg[3..3]` violates borrow checker constraints failing compilation. @wokket: do you want me to port the change to your `query()` impl? Relevant spec: https://hl7-definition.caristix.com/v2/HL7v2.8/Segments/MSH Testing: Updated MSH string index test to check for MSH.F2 being the set of delimeters despite that actually being the first field. Tests passing * Update readme * Fix OOB index attempt on fields vec of segment Under normal `Index<usize>` and friends contexts, indexing out of bounds is prevented by length checks against the relevant vectors. However, when extracting a compound index by String or &str, the decomposition of the compound idx calls into the `fields` Vec when passing the index operation down to the fields themselves. This indexing operation does not first check to ensure that it is within the extents of the vector its touching, resulting in an OOB error which otherwise appears to be handled in the `usize` versions. Fix this by adding a bounds check in the compound index destructure phase and extending tests to cover the condition. Testing: Expanded tests to catch the error All tests passing * Add OOB fix to `Segment::query()` * First pass @ PR review notes * Refactor `Field` for 3 tiers of subdivision Thanks @wokket for spec clarification on the internal structures of a complex HL7 `Field` - the first set of separation occurs in the repeat delimeter, each of which is then treated effectively as a whole field. The repeating segments are split like entire `Field`s were prior to this commit - dividing them up by component delims, each product of which is split into subcomponent delims. Functionally, this introduces a new dimension to the data requiring updates to the `Index` implementations to handle another level of depth. `Query()` and string indexing are also impacted since their syntax now maps correctly to the subdivisions of a `Field` struct. Testing: Updated tests for added depth and separation semantic All tests passing Requires manual review from @wokket since this is a major change and he knows the spec better. * Expand str idx to take additional level of depth With the added dimension of repeats pushing existing field vectors down one level of nesting, the indexing mechanism to quickly pull values needs to have an additional level of depth. The Index implementations push down their called parameters to the lower-level struct within the one handling the current depth of indexing, so the only effort required to achieve this is to add one more possible lentgh of index values to the conditional checking which `uint` indexer to call. Implement the additional change, add a test to Message extracting the `S2` subcomponent of `OBR.F1.R1.C2`. Testing: Added test for new functionality All tests passing * Resolve clippy complaints * Restore separate `query()` tests * Simplify get_msh_and_read_field benchmark Moved call to `msh()` out of the loop, removed the result matcher and replaced with `unwrap()` since we're measuring field retreival. * CI: use all features * Fix typo in benchmark * Simplify `msh()` per @wokket * Borrow for benchmark iter Co-authored-by: RageLtMan <rageltman [at] sempervictus>
wokket · Aug 20, 2021 · 2c2a78e · 2c2a78e
1 parent d94f58a
commit 2c2a78e
Show file tree

Hide file tree

Showing 12 changed files with 614 additions and 638 deletions.
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -19,6 +19,6 @@ jobs:
     - name: Build
       run: cargo build --verbose
     - name: Run tests
-      run: cargo test --verbose
+      run: cargo test --verbose --all-features
     - name: Run Benchmarks
-      run: cargo bench
+      run: cargo bench --all-features
diff --git a/Cargo.toml b/Cargo.toml
@@ -7,6 +7,9 @@ description = "HL7 Parser and object builder? query'er? - experimental only at a
 license = "MIT OR Apache-2.0"
 repository = "https://github.com/wokket/rust-hl7/"
 
+[features]
+string_index = []
+
 [lib]
 name="rusthl7"
 path="src/lib.rs"

diff --git a/benches/simple_parse.rs b/benches/simple_parse.rs
@@ -1,5 +1,5 @@
 use criterion::{criterion_group, criterion_main, Criterion};
-use rusthl7::{message::*, segments::Segment};
+use rusthl7::{message::*, segments::Segment, segments::MshSegment};
 use std::convert::TryFrom;
 
 fn get_sample_message() -> &'static str {
@@ -19,7 +19,7 @@ fn get_segments_by_name(c: &mut Criterion) {
         let m = Message::try_from(get_sample_message()).unwrap();
 
         b.iter(|| {
-            let _segs = m.generic_segments_by_name("OBR").unwrap();
+            let _segs = m.segments_by_name("OBR").unwrap();
             //assert!(segs.len() == 1);
         })
     });
@@ -28,14 +28,11 @@ fn get_segments_by_name(c: &mut Criterion) {
 fn get_msh_and_read_field(c: &mut Criterion) {
     c.bench_function("Read Field from MSH (variable)", |b| {
         let m = Message::try_from(get_sample_message()).unwrap();
+        let msh = m.msh().unwrap();
 
         b.iter(|| {
-            let seg = m.segments.first();
-
-            if let Some(Segment::MSH(msh)) = seg {
-                let _app = msh.msh_3_sending_application.as_ref().unwrap(); // direct variable access
-                                                                            //println!("{}", _app.value());
-            }
+            let _app =  &msh.msh_3_sending_application.as_ref().unwrap(); // direct variable access
+            //println!("{}", _app.value());
         })
     });
 }
@@ -45,12 +42,9 @@ fn get_pid_and_read_field_via_vec(c: &mut Criterion) {
         let m = Message::try_from(get_sample_message()).unwrap();
 
         b.iter(|| {
-            let seg = &m.segments[1];
-
-            if let Segment::Generic(pid) = seg {
-                let _field = pid[3];
-                assert_eq!(_field, "555-44-4444"); // lookup from vec
-            }
+            let pid = &m.segments[1];
+            let _field = pid[3];
+            assert_eq!(_field, "555-44-4444"); // lookup from vec
         })
     });
 }
@@ -66,6 +60,30 @@ fn get_pid_and_read_field_via_query(c: &mut Criterion) {
     });
 }
 
+#[cfg(feature = "string_index")]
+fn get_pid_and_read_field_via_index(c: &mut Criterion) {
+    c.bench_function("Read Field from PID (index)", |b| {
+        let m = Message::try_from(get_sample_message()).unwrap();
+
+        b.iter(|| {
+            let _val = m["PID.F3"]; // query via Message
+            assert_eq!(_val, "555-44-4444"); // lookup from vec
+        })
+    });
+}
+
+#[cfg(feature = "string_index")]
+criterion_group!(
+    benches,
+    message_parse,
+    get_segments_by_name,
+    get_msh_and_read_field,
+    get_pid_and_read_field_via_vec,
+    get_pid_and_read_field_via_query,
+    get_pid_and_read_field_via_index
+);
+
+#[cfg(not(feature = "string_index"))]
 criterion_group!(
     benches,
     message_parse,

diff --git a/readme.md b/readme.md
@@ -15,8 +15,8 @@ This library is trying to provide the _tooling_ you need to build robust HL7 bas
 - [x] Initially use hl7 default separator chars
 - [x] Use separator chars from the message
 - [X] Add support for sub-field (component/subcomponent) items
-    - [ ] Field repeats (via `~`) are currently missing ([#26](https://github.com/wokket/rust-hl7/issues/26))
-- [X] Initially, avoid any per-segment knowledge, requirement to read the spec too much etc.
+    - [x] Field repeats (via `~`)
+- [x] Initially, avoid any per-segment knowledge, requirement to read the spec too much etc.
     - Implementing all the segments, across all the hl7 versions, version-specific parsing etc is tooooo much while we're getting started.
 - [-] Add support for [HL7 escape sequences](https://www.lyniate.com/knowledge-hub/hl7-escape-sequences/) ([#22](https://github.com/wokket/rust-hl7/issues/22))
     - [x] Decoding of the most common escape sequences including `\E\`, `\R\`, `\S\` & `\T\`
@@ -28,5 +28,5 @@ This library is trying to provide the _tooling_ you need to build robust HL7 bas
 - [x] Parse a message using a `TryFrom<&str>` impl rather than a dedicated parser
 - [x] Index into messages using HL7 string index notation and binary methods
     - [x] Index into sub-fields using HL7 string index notation and binary methods
-    - [X] Index into the segment enum using HL7 string index notation and binary methods
-- [ ] Implement buffer-copy-free generic indexing into MSH
+    - [x] Index into the segment enum using HL7 string index notation and binary methods
+- [x] Implement buffer-copy-free generic indexing into MSH
diff --git a/src/escape_sequence.rs b/src/escape_sequence.rs
@@ -21,10 +21,10 @@ use std::borrow::Cow;
 ///
 /// ## Details
 ///
-/// This decoder will replace some, **but not all** of the standard HL7 escape sequences. 
+/// This decoder will replace some, **but not all** of the standard HL7 escape sequences.
 /// - `\E\`,`\F\`, '\R\`, `\S\`, `\T\` are all handled, and replaced with the Escape, Field, Repeat, Component and Sub-Component separator chars respectively
 /// - `\X..\` hexidecimal erscape sequences are supported (2 hex digits per char)
-/// 
+///
 /// The following sequences are **NOT** replaced by design and will be left in the string:
 /// - `\H\` Indicates the start of highlighted text, this is a consuming application problem and will not be replaced.
 /// - `\N\` Indicates the end of highlighted text and resumption of normal text.  This is a consuming application problem and will not be replaced.