Allow EDI file to have something before UNA or UNB. It can happens with files generated by the software SAGE COALA. #76

pulse-mind · 2024-10-16T18:28:53Z

…th files generated by the software SAGE COALA. Added a unit test with a file

pulse-mind · 2024-10-17T06:50:25Z

Thank you for the time spent on this project.
@nerdoc Is that possible to accept it ? If yes when do you plan to do it, merge and release ?

nerdoc · 2024-10-17T20:29:45Z

Ok, this is interesting, and I think it would be definitely a possibility to merge such a PR/idea.
But I think there has to be done some "thinkwork" before...
I can thin of 2 ways of handling that prepending header:

This header sage coala prepends, what is it? Should the data in it be parsed too in any way? Does it contain relevant data for the message, or is it just a container that "contains" edifact data? To be honest, maybe then it would be out of scope of pydifact to read that data, and a "sage coala" parser must be used, and pydifact could then parse the output of that parser - the edifact payload.
The header can be completely ignored, and sage coala is one of many companies that use a header for their purposes and just transport edifact data as payload. It could be accepted that pydifact loads these formats.

I could think about accepting it because of use case 2, as I myself maybe would have use for that too.

There is one thing I can see which could lead to a problem. If you use message.find("UNA"), this correctly finds 56xyz%<GARBAGE,+Foo'Bla++UNA:+,? 'UNB....
But it also would happily find UNA here:
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, UNA sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. and use " sed " as control characters, which is not desireable.

ATM pydifact "identifies" a EDIFACT file using the UNA header, if present, and if not, it uses the default control characters and starts parsing.
There is no error handling if you throw a "Lorem ipsum" file onto pydifact.

Let me think about it a few days, but I think I can accept that PR. Thanks for adding a test to it - maybe you could shorten the sample file a bit to speed up the testing, if that is ok for you - I think it is not necessary to use the whole file.

nerdoc · 2024-10-17T20:30:37Z

If yes when do you plan to do it, merge and release ?
Oh, a release could be done shortly after the merge. This shouldn't be the problem.

pulse-mind · 2024-10-17T20:42:40Z

You're right, UNA can be found in the string where it is not a "UNA". So may be we can look for a regexp that match a normal UNA !? What is the normal format, I can build the regexp...
And for the testfile, I can try to reduce it...

nerdoc · 2024-10-17T20:54:56Z

It's not easy, and maybe impossible. As it would be perfectly valid EDIFACT to use "ABCDEF" as control characters, a file could start with UNAABCDEF. "F" would be the segment separator then, and every segment would be terminatef with "F". So no regexp would help much, the only pattern that comes to my mind: the control characters must not repeat themselves. So UNA123456 is valid, UNA122456 is invalid.
This could be done additionally as small check if the UNA Advise Segment is a valid one.

pulse-mind · 2024-10-18T07:25:01Z

Ok so let's think differently. We want to improve the current behavior.

Before, the code checks if it starts by "UNA".
So the goal is to add feature that does not break the actual behavior.

If I look at https://unece.org/fileadmin/DAM/trade/edifact/untdid/d422_s.htm.
There is always a ' (segment terminator) at the end. So we can think to two use case :

file is starting by UNA
file is not starting by UNA so we should look for 'UNA
And later we can improve that (start by step).

I can improve the code in this way ?

pulse-mind · 2024-10-18T16:34:15Z

I made another branch with this suggestion that sounds better, this is the commit : pulse-mind@303cfd5

nerdoc · 2024-10-20T11:23:18Z

That looks better. And It is even better than the current solution. I think this is a useful addition and will maybe solve other "non-strictly" compatibility problems - maybe there are other forms of EDIFACT file implementations that start with another header.

And it doesn't hurt current behaviour.

So yes, please implement this change in this PR, and I will merge it!

nerdoc · 2024-10-20T11:24:40Z

Just make sure you run "black" at the end, as the current patch is not accepted by the black format CI pre-commit hook! Thanks!

pulse-mind · 2024-10-20T13:15:33Z

cool @nerdoc . I closed this one and open a new pull request. I run black on the other branch, I hope I did it well.
I am not used with this even if I should :D.

(pydifact) ➜  pydifact git:(compatibility_edi_generated_by_sage_coala_v2) python -m black ./
All done! ✨ 🍰 ✨
33 files left unchanged.

Allow EDI file to have something before UNA or UNB. It can happens wi…

8c5ec73

…th files generated by the software SAGE COALA. Added a unit test with a file

pulse-mind added 2 commits October 18, 2024 08:54

Reduce the size file of sage_coala.ped

fd4c504

Improve syntax quality

3724dc1

pulse-mind closed this Oct 20, 2024

pulse-mind mentioned this pull request Oct 20, 2024

Allow EDI file to have something before UNA or UNB. It can happens with files generated by the software SAGE COALA. #77

Merged

nerdoc mentioned this pull request Oct 21, 2024

AssertionError #73

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow EDI file to have something before UNA or UNB. It can happens with files generated by the software SAGE COALA. #76

Allow EDI file to have something before UNA or UNB. It can happens with files generated by the software SAGE COALA. #76

pulse-mind commented Oct 16, 2024

pulse-mind commented Oct 17, 2024

nerdoc commented Oct 17, 2024

nerdoc commented Oct 17, 2024

pulse-mind commented Oct 17, 2024

nerdoc commented Oct 17, 2024

pulse-mind commented Oct 18, 2024

pulse-mind commented Oct 18, 2024

nerdoc commented Oct 20, 2024

nerdoc commented Oct 20, 2024

pulse-mind commented Oct 20, 2024

Allow EDI file to have something before UNA or UNB. It can happens with files generated by the software SAGE COALA. #76

Allow EDI file to have something before UNA or UNB. It can happens with files generated by the software SAGE COALA. #76

Conversation

pulse-mind commented Oct 16, 2024

pulse-mind commented Oct 17, 2024

nerdoc commented Oct 17, 2024

nerdoc commented Oct 17, 2024

pulse-mind commented Oct 17, 2024

nerdoc commented Oct 17, 2024

pulse-mind commented Oct 18, 2024

pulse-mind commented Oct 18, 2024

nerdoc commented Oct 20, 2024

nerdoc commented Oct 20, 2024

pulse-mind commented Oct 20, 2024