Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#1865]: create processor to parse date time strings #2207

Merged
merged 21 commits into from
Dec 8, 2023

Conversation

dshunter107
Copy link
Contributor

@dshunter107 dshunter107 commented Nov 20, 2023

Closes #1865

This processor parses a DateTime string. It converts the string into a ZonedDateTime variable, and places the variable in the event variable. Tests are made to ensure that the code works properly.

I am still somewhat new, so any feedback is welcome.

Purpose

Remarks

PR introduces (a) breaking change(s): no

PR introduces (a) deprecation(s): no

Closes apache#1865

This processor parses a DateTime string. It converts the string into a
ZonedDateTime variable, and places the variable in the event variable.
Tests are made to ensure that the code works properly.

I am still somewhat new, so any feedback is welcome.
@github-actions github-actions bot added java Pull requests that update Java code pipeline elements Relates to pipeline elements backend Everything that is related to the StreamPipes backend testing Relates to any kind of test (unit test, integration, or E2E test). labels Nov 20, 2023
Copy link
Contributor

@bossenti bossenti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for providing this processor element @dshunter107 🙏🏼
This is really a decent start.

I've left you some remarks inline.

In addition, there is one thing missing for your pipeline element.
To get

.withLocales(Locales.EN)
.withAssets(Assets.DOCUMENTATION, Assets.ICON)

to work, we need to pass some resource files.
You can have a look in our docs or at this adapter for some references.

I'm happy to help wherever you need assistance 🙂

@bossenti bossenti changed the title Creation of DateTimeProcessor [#1865]: create processor to parse date time strings Nov 20, 2023
Incorporates feedback such as the following:
* Way to initialize variables with constructors
* simplifies methods with more efficient methods
* creates the resource file for the processor
* standardizes the names of inputs (i.e field)
* creates the documentation for the processor
@github-actions github-actions bot added the documentation Everything related to documentation label Nov 24, 2023
@dshunter107
Copy link
Contributor Author

Okay. That clears up a few things for me. I have added the resource files.

Thanks for all the feedback. I definitely feel like I am getting better with each pull request for each project. Everyone has been giving great feedback and helping me learn. Thanks!!

@bossenti
Copy link
Contributor

Okay. That clears up a few things for me. I have added the resource files.

Thanks for all the feedback. I definitely feel like I am getting better with each pull request for each project. Everyone has been giving great feedback and helping me learn. Thanks!!

Great to hear, hope you enjoy it 🙂
In case there is anything unclear to you, please never hesitate to reach out

@bossenti
Copy link
Contributor

bossenti commented Nov 24, 2023

I have adapted the asset files a bit, hope you are okay with it 🙂

I just tested it locally and have some last remarks:

  • You need to register your processor in the class TransformationExtensionModuleExport. Otherwise streampipes won't recognize it
  • Please align the id in ProcessingElementBuilders create() with the name of the resources directory
  • The timezone parameter has a lot of values:
    image

I think we need to sort them at least alphabetically. In addition, I would suggest to use a dropdown menu as input dialog (I can help you here, in case you are not sure about how to achieve)
Lastly, is my understanding correct that the timezone parameter should be optional?

@dshunter107
Copy link
Contributor Author

dshunter107 commented Nov 25, 2023

I have adapted the asset files a bit, hope you are okay with it 🙂

I just tested it locally and have some last remarks:

  • You need to register your processor in the class TransformationExtensionModuleExport. Otherwise streampipes won't recognize it
  • Please align the id in ProcessingElementBuilders create() with the name of the resources directory
  • The timezone parameter has a lot of values:
    image

I think we need to sort them at least alphabetically. In addition, I would suggest to use a dropdown menu as input dialog (I can help you here, in case you are not sure about how to achieve) Lastly, is my understanding correct that the timezone parameter should be optional?

  • I believe I registered the processor. Let me know if I did that correctly.
  • I aligned the Id with the resource directory
  • I sorted the time zone parameter options. I may need you to point me in the direction of how to do the dropdown menus. Are you saying that there is already code that can easily transform the radio group into a dropdown list? Or are you saying that I would need to create the code to do so?

Regarding the Timezone Parameter. I would say the answer to your question is "somewhat"; however, I am not so sure that "optional" is the best word to describe it. It was a little challenging for me to initially wrap my head around java DateTime, so I apologize if the following is not explained well.

There are two types of objects when it comes to DateTimes: (1) LocalDateTime, and (2) ZonedDateTime. Like the name suggests, LocalDateTime has both a date and a time but has no information about the time zone that it corresponds to. As a result, it does not map to a specific instance in the real world. It could be any of the 24 hours corresponding to the times in the world or even any of the myriad of ways different countries and locales recognize their time.

The LocalDateTime variable leads to the problem discussed in the Github issue associated with this pull request: the LocalDateTime will correspond to the timezone of the current user. To expand on this problem further with an example, if data was being pulled at 3PM in Colorado and was being sent to two apachestreams users in California and New York, the user in California would get a reading of 3pm in California, and the user in New York would get a reading of 3PM in New York. All 3 readings (California, Colorado, and New York) would correspond to a different instance in time.

My solution to this was to use the other DateTime variable: ZonedDateTime. In order to place a ZonedDateTime in the event, the time zone must be a part of the ZonedDateTime. Now the California user (once the time zone is entered) would see a reading of 3pm in Colorado, and the New York user would also see a reading of 3pm in Colorado. All 3 locations would correspond to the same instant. With that problem fixed, an interesting wrinkle occurs...

The ISODateTime standard could either be a string formatted for LocalDateTime or a string formatted for ZonedDateTime. In the event that the string is already formatted for ZonedDateTime, we would now have a potential for 2 separate time zones applying for the same datetime. One time zone would be from the user and one from the datetime string. In my solution, I made the assumption that the initial location would have the correct time zone in the string, so I ignored the user time zone in preference of the time zone for the datetime string. As a result, to describe my overall solution, I would say it was to make the time zone input mandatory but ignore it if the datetime string already has the time zone.

I think making the time zone input optional is definitely doable. The issues that would need to be addressed with that way are (1) if the user does not put in the time zone with a LocalDateTime string and (2) if the user puts in the time zone string with a ZonedDateTime string. One solution I thought of for issue 1 would be to have a default time zone like "US/Eastern". Let me know if you prefer the route for making the time zone optional.

As a side note: I am not aware of the relative frequency of LocalDateTime strings and ZonedDateTime strings. I made a guess that LocalDateTime strings would be more likely, so I created a user experience that would be the same if all of the strings were sent as ISODateTime standardized strings formatted as LocalDateTime strings.

@bossenti
Copy link
Contributor

I believe I registered the processor. Let me know if I did that correctly.
I aligned the Id with the resource directory
I sorted the time zone parameter options. I may need you to point me in the direction of how to do the dropdown menus. Are you saying that there is already code that can easily transform the radio group into a dropdown list? Or are you saying that I would need to create the code to do so?

Yes, that's perfectly done now

Yes there should already be the an element to create a dropdown, I'll have a look

@bossenti
Copy link
Contributor

Thank you very much for your detailed explanation and deliberate thoughts 🙏🏼

I would say it was to make the time zone input mandatory but ignore it if the datetime string already has the time zone.

With the context you provided, I'm totally fine with this solution 🙂
We only need to adapt the Documentation.md and the strings.en file since I described it as optional there 😅

@bossenti
Copy link
Contributor

With respect to the dropdown menu, you can pass an additional boolean parameter to requiredValueSelection. If this is set to true we should get the expected outcome.

requiredSingleValueSelection(Labels.withId(SELECTED_INPUT_TIMEZONE),
            Options.from(getTimeZoneOptions()),
                true)

Can you please verify this?

Please don't ask why this parameter is called horizontalRendering
Doesn't make any sense, imho 🙈

@dshunter107
Copy link
Contributor Author

I am unable to verify. I am unable to use the docker commands with CLI because none of the Pipeline elements show up. When I start the program with docker-compose up -d in the compose folder, I am able to see all of the pipeline elements except the DateTimeFromStringDataProcessor. I probably did something incorrectly.

I found the tutorial for DataProcessors at the following location:
https://streampipes.apache.org/docs/extend-tutorial-data-processors/
I followed all the steps, but I still am not able to find the processor at all. How were you able to test it when you sent me the photos?

@bossenti
Copy link
Contributor

Looks great!

image

However, I ran into an issue when testing the processing element end-to-end.
This relates to the propagation of events between pipeline elements and is not directly caused by these processing elements.
I'll try to find some time in the upcoming days to debug it - please bear with me 😊

@bossenti
Copy link
Contributor

In case you want to continue here: one possible workaround here would be to pass the timestamp as long value including milliseconds instead of the local date time object diretly

@dshunter107
Copy link
Contributor Author

Take your time, No worries.

Sure, whatever works. I included the timestamp as a long with milliseconds in the event as well as the time zone that the user specified.
I have a few clarifying questions:

(1) Did you mean to say the "Zoned Date Time" object?
(2) Are you saying that I should delete the "Zoned Date Time" object for now?
(3) If we have only the milliseconds, how would you like to handle different time zones? As of now, with only the milliseconds, if the data was taken in Florida at 4PM, a Streampipes user in California would have a reading for 1PM. This would result in the same instant, but a different time. Is that okay? I put the time zone provided by the user in the event as well, if we want to be able to specify the same instant but the time zone provided by the user.
(4) Could this issue be related to the OutputStrategy that I selected:

.outputStrategy(OutputStrategies.append( EpProperties.timestampProperty(OUTPUT_DATETIME_RUNTIME_NAME)))

I ask because I am not sure if the timestampProperty was designed for ZonedDateTime.

@bossenti
Copy link
Contributor

bossenti commented Dec 1, 2023

his would result in the same instant, but a different time. Is that okay?

It is not ideal, but I'm afraid there is no real alternative in StreamPipes so far.

(4) Could this issue be related to the OutputStrategy that I selected:

The output strategy is fine. Since you add the timezone as well, we should include this in the output strategy as well and adapt the documentation accordingly.

The only aspect I'm unsure about is whether having multiple timestamp properties is a problem. @tenthe @dominikriemer What do you think? Will this lead anywhere to problems?

One possible workaround is to use the time zone to reconstitute the
datetime after the event has been collected.
@dshunter107
Copy link
Contributor Author

Okay. I added the output strategy. Let me know if there is anything you would like me to do.

@bossenti
Copy link
Contributor

bossenti commented Dec 5, 2023

Great, thank you!

I'll give it a test in the upcoming days.
Appreciate your patience.

Copy link
Contributor

@bossenti bossenti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some final improvements, hope you are okay with them 😊

Please apologize the long review process for your PR

@dominikriemer
Copy link
Member

Thanks for your contribution @dshunter107!
That a very useful pipeline element.

@dominikriemer dominikriemer merged commit da05e53 into apache:dev Dec 8, 2023
18 checks passed
@dshunter107
Copy link
Contributor Author

Thanks for your contribution @dshunter107! That a very useful pipeline element.

You're welcome! Glad it works.

@dshunter107 dshunter107 deleted the parse-time branch December 15, 2023 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Everything that is related to the StreamPipes backend documentation Everything related to documentation java Pull requests that update Java code pipeline elements Relates to pipeline elements testing Relates to any kind of test (unit test, integration, or E2E test).
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New Processor: Datetime from String
3 participants