Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CALET text conversion #3

Merged
merged 1 commit into from
Sep 16, 2024
Merged

CALET text conversion #3

merged 1 commit into from
Sep 16, 2024

Conversation

athish-thiru
Copy link
Contributor

Added code for CALET text conversion. The code works by calling a general text parsing function that is meant to be used by all missions and then adding additional details that are specific CALET. However if a different code structure is preferred, I can change it.

gcn_classic_text_to_json/conversion.py Outdated Show resolved Hide resolved
gcn_classic_text_to_json/conversion.py Outdated Show resolved Hide resolved
gcn_classic_text_to_json/notices/calet/__init__.py Outdated Show resolved Hide resolved
gcn_classic_text_to_json/notices/calet/README.md Outdated Show resolved Hide resolved
gcn_classic_text_to_json/utils.py Outdated Show resolved Hide resolved

import requests

from ... import conversion, utils
Copy link

@Courey Courey Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a missing file that corresponds with this utils import? I see that you are importing conversion and utils from the grandparent package but I am unable to find utils.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file is in #2

}


def text_to_json_calet(notice, record_number):
Copy link

@Courey Courey Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit unusual to me. Typically we don't put logic in init.py files.
It is much more common to see something like the following:

from my_package import main_function
main_function()

so that would end up with the functions below existing in a separate file and then being imported and called as needed.

from file_you_created import create_all_calet_jsons
create_all_calet_jsons()

There are a couple of reasons for this:

  1. code readability. People typically aren't looking in init files for functions. In cases where the package is larger and there might be a large quantity of functions that go in there, it can get really long and really hard to read. It isn't hard to read here because there is not a lot of code, but it's good to get in the habit of putting things in a standard place.
  2. Again, not really an issue here because it's a small amount of code, but because everything in this file gets run at import that means if you have a lot of code in the init file, it can eventually lead to slow import times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initially trying to configure the way you've described but I think the reason that I formatted the files this way was that I was struggling with figuring out the relatives import and tried a few different things and this format stuck. But now that I have relative imports working I could certainly change it back. I could define a conversion.py file for each subpackage which has most of the logic and keep the core stuff in the init.py

main.sh Outdated
@@ -0,0 +1,3 @@
#!/bin/bash

python3 -m gcn_classic_text_to_json.notices.calet.__init__
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this code intended to be run? Is it something that will be called from the command line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is how the code is intended to be run for now. It needs to be called from the command line with the repo directory as your root. This was just something quick that I got up so I could start testing if the conversions were accurate. If there are better ideas for how to call the subpackages, I can change it.

Copy link

@Courey Courey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, we're going to make a few changes but let's do them one bit at a time so that they don't get jumbled together.

Let's start with these steps:

  1. let's break the functions out into their own file(s) and run it like you have it here just to make sure that the imports are working properly. You can call them from the init files for now (or if you feel like you don't need the init files, you can just import them relatively instead of as a package).

  2. Then go ahead and add in your PR Utility functions #2 into this PR so utils is there. Typically to get a PR approved, all the code it needs to run is expected to be in the PR. There are times that you have to line them up in order and merge in order, but it can get complicated really fast if you do that. So since it's not a ton of additional code, we'll just close Utility functions #2 and bring that into this branch and they can go in together.

  3. you can remove the main.sh file because we are going to replace it. the main.sh is fine, it's just that it's not necessary to use bash to accomplish what you want to do. instead of bash main.sh you can just call it all with python. Here are two alternate ways to approach it:
    A. Inside the gcn_classic_text_to_json let's create a file called __main__.py and inside of that import the entry function (create_all_calet_jsons). It would look like this (but keep in mind that the import may have changed since you are moving things out into a file that isn't the init file) so it would look like this but with an updated import statement:

from gcn_classic_text_to_json.notices.calet import create_all_calet_jsons
if __name__ == "__main__":
    create_all_calet_jsons()

this way when you want to run the code, you can just do python3 -m gcn_classic_text_to_json to run it. This will only work if we leave them as packages, so if you decide that you just want to use relative imports instead of modules, try option B.

B. just create a file at the root of the project and put the import for create_all_calet_jsons import and function call like so (again keeping in mind any changes you have made to imports):

from gcn_classic_text_to_json.notices.calet import create_all_calet_jsons

create_all_calet_jsons()

when I was testing this locally, I just called it entrypoint.py and placed it at the root level. the way I call it in the terminal is python3 entrypoint.py. You can call it whatever you want, entrypoint as a file name is just a way of saying "get to the code starting here". Another one commonly used for cli execution is cli.py. so in that case you'd run python3 cli.py

The last one is not super duper important functionally because it does get called correctly the way you have it. I just wanted to show you other ways to do it that would keep it pure python. There are many ways to create an entrypoint to your code and usually we determine which one to use based on how the code will be called. If we wanted to turn the whole thing into an importable package that we would call from another code base, we'd use yet a different way of doing it. (which I am happy to pair with you on and show you or send you reading materials just so you know what all the options are, but I didn't want it to get too confusing here. so just let me know if you'd like that).

Imports in python are sometimes hard and confusing and take time to get used to. You are very much not alone in having encountered issues with them.

I have tested your code locally and it does run just fine and it seems to be doing exactly what you want it to be doing, so excellent job on getting it all to work! If you need any help or get bogged down anywhere in this change request, please don't suffer too long. I am happy to pair with you.

@athish-thiru athish-thiru requested a review from Courey September 10, 2024 03:04
Copy link

@Courey Courey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!

Added CALET text conversion

Added main.sh

Added README with keyword mapping

switched to using Beautiful Soup

small changes

updated README

refactored code to use email library

switched to using context manager to handle files

Added utility functions

removed parse_trigger_ids() (moved to conversion.py)

added general parsing function

moved utils.py

fixed error with set_alert_datetime()

removed redundant functions

refactored code and updated main call

Set up project environment

Added License, updated README and added main directory

updated README.md

updated README and .gitignore

switched to using sernum to save jsons

added bs4 to poetry
@athish-thiru athish-thiru merged commit 028cc18 into nasa-gcn:main Sep 16, 2024
1 check failed
@athish-thiru athish-thiru deleted the calet branch September 16, 2024 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants