Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow EDI file to have something before UNA or UNB. It can happens with files generated by the software SAGE COALA. #76

Conversation

pulse-mind
Copy link
Contributor

fix #75

…th files generated by the software SAGE COALA.

Added a unit test with a file
@pulse-mind
Copy link
Contributor Author

Thank you for the time spent on this project.
@nerdoc Is that possible to accept it ? If yes when do you plan to do it, merge and release ?

@nerdoc
Copy link
Member

nerdoc commented Oct 17, 2024

Ok, this is interesting, and I think it would be definitely a possibility to merge such a PR/idea.
But I think there has to be done some "thinkwork" before...
I can thin of 2 ways of handling that prepending header:

  1. This header sage coala prepends, what is it? Should the data in it be parsed too in any way? Does it contain relevant data for the message, or is it just a container that "contains" edifact data? To be honest, maybe then it would be out of scope of pydifact to read that data, and a "sage coala" parser must be used, and pydifact could then parse the output of that parser - the edifact payload.
  2. The header can be completely ignored, and sage coala is one of many companies that use a header for their purposes and just transport edifact data as payload. It could be accepted that pydifact loads these formats.

I could think about accepting it because of use case 2, as I myself maybe would have use for that too.

There is one thing I can see which could lead to a problem. If you use message.find("UNA"), this correctly finds 56xyz%<GARBAGE,+Foo'Bla++UNA:+,? 'UNB....
But it also would happily find UNA here:
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, UNA sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. and use " sed " as control characters, which is not desireable.

ATM pydifact "identifies" a EDIFACT file using the UNA header, if present, and if not, it uses the default control characters and starts parsing.
There is no error handling if you throw a "Lorem ipsum" file onto pydifact.

Let me think about it a few days, but I think I can accept that PR. Thanks for adding a test to it - maybe you could shorten the sample file a bit to speed up the testing, if that is ok for you - I think it is not necessary to use the whole file.

@nerdoc
Copy link
Member

nerdoc commented Oct 17, 2024

If yes when do you plan to do it, merge and release ?
Oh, a release could be done shortly after the merge. This shouldn't be the problem.

@pulse-mind
Copy link
Contributor Author

You're right, UNA can be found in the string where it is not a "UNA". So may be we can look for a regexp that match a normal UNA !? What is the normal format, I can build the regexp...
And for the testfile, I can try to reduce it...

@nerdoc
Copy link
Member

nerdoc commented Oct 17, 2024

It's not easy, and maybe impossible. As it would be perfectly valid EDIFACT to use "ABCDEF" as control characters, a file could start with UNAABCDEF. "F" would be the segment separator then, and every segment would be terminatef with "F". So no regexp would help much, the only pattern that comes to my mind: the control characters must not repeat themselves. So UNA123456 is valid, UNA122456 is invalid.
This could be done additionally as small check if the UNA Advise Segment is a valid one.

@pulse-mind
Copy link
Contributor Author

Ok so let's think differently. We want to improve the current behavior.

Before, the code checks if it starts by "UNA".
So the goal is to add feature that does not break the actual behavior.

If I look at https://unece.org/fileadmin/DAM/trade/edifact/untdid/d422_s.htm.
There is always a ' (segment terminator) at the end. So we can think to two use case :

  • file is starting by UNA
  • file is not starting by UNA so we should look for 'UNA
    And later we can improve that (start by step).

I can improve the code in this way ?

@pulse-mind
Copy link
Contributor Author

I made another branch with this suggestion that sounds better, this is the commit : pulse-mind@303cfd5

@nerdoc
Copy link
Member

nerdoc commented Oct 20, 2024

That looks better. And It is even better than the current solution. I think this is a useful addition and will maybe solve other "non-strictly" compatibility problems - maybe there are other forms of EDIFACT file implementations that start with another header.

And it doesn't hurt current behaviour.

So yes, please implement this change in this PR, and I will merge it!

@nerdoc
Copy link
Member

nerdoc commented Oct 20, 2024

Just make sure you run "black" at the end, as the current patch is not accepted by the black format CI pre-commit hook! Thanks!

@pulse-mind
Copy link
Contributor Author

cool @nerdoc . I closed this one and open a new pull request. I run black on the other branch, I hope I did it well.
I am not used with this even if I should :D.

(pydifact) ➜  pydifact git:(compatibility_edi_generated_by_sage_coala_v2) python -m black ./
All done! ✨ 🍰 ✨
33 files left unchanged.

@nerdoc nerdoc mentioned this pull request Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EDI files from SAGE COALA starting by DOC
2 participants