-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add lint and mypy CI jobs #3
Conversation
Add CI jobs for: - linting, using black - type checking, using mypy These jobs are similar to those used for the fms main repo. Signed-off-by: Andrea Frittoli <[email protected]>
pretraining/utils/dataset_utils.py
Outdated
if reader is not None: | ||
del reader | ||
if shard_file is not None: | ||
shard_file.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like more than just black reformatting changes? these are all required for mypy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this, because open_file
was used with a path
which according to the stubs does not work and also open_file
may return None.
I added the shard_file
file descriptor and added the logic to close files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without testing, I think we need to be cautious about making any changes that aren't just mypy type hints in this PR. reading closer... this change is opening a file and creating a new file object, and introduces new logic about where it gets closed.
I just tried locally doing f=open(some_arrow_file_path); reader = pa.ipc.open_file(f)
and got an error of TypeError: binary file expected, got text file
. I don't see any other issues on quick glance, but it's easy to miss things like double-closing a file that would cause failures.
To make this work, I think it would have needed to be shard_file = open(path, mode='rb')
. Without actually testing it, to be on the safe side, I think it'll be better to just use a targeted type ignore hint with a comment, or fix it some simpler way.
d81a3ce
to
72d61ab
Compare
Signed-off-by: Andrea Frittoli <[email protected]>
a7d22ee
to
d3ae276
Compare
79849df
to
a0f55be
Compare
pretraining/utils/train_utils.py
Outdated
@@ -115,7 +110,7 @@ def get_policies(cfg, rank): | |||
verify_bfloat_support = ( | |||
torch.version.cuda | |||
and torch.cuda.is_bf16_supported() | |||
and packaging.version.parse(torch.version.cuda).release >= (11, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this was using packaging
not version
, it makes me think that the import packaging.version
was failing and we're actually only importing pkg_resources (as packaging). I think the changes to the imports are changing what happens here such taht I'm not sure this will work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original code could have been either of them, and the fallback import suggests they would be identical in signature.
However the second one according to mypy is broken (there is no packaging module in pkg_resources).
I'll test it now in case return values changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have cuda installed on my laptop so I cannot verify completely.
From what I tested both behave the same, but to be on the safe side I added an ignore comment there too.
Based on the output of mypy, fixed the following: - Added ibm-fms to requirements - Ran no_implicit_optional (PEP 484 prohibits implicit Optional) - Exclude files that depend on untyped packages - Fix typing issues in dataset_util Signed-off-by: Andrea Frittoli <[email protected]>
Add isort config based on the fms setup. Fix imports where required. Include the root folder. Signed-off-by: Andrea Frittoli <[email protected]>
@afrittoli Is this change what we expected? 52235e4#diff-060f427d7e3e19de03067dd05b75e3758ba40f69745b2afa58a6ffe63024f81eL844 Doesn't look right to me. There are two issues I am seeing:
|
@lchu-ibm the way this Also some of the classes listed as options in the documentation were in the process of being removed in another PR, and weren't used anywhere else in this repo, so made sense to exclude from the type hint just to avoid conflicts. If you need more types as options though (and they support the call mechanism) feel free to add them. |
@nairbv I didn't make my question clear, my questions are:
class A:
def __init__(self, x: A) for type hint. If this is truly what we want, then there are ways to do it, but not like this.
|
Ah I guess I misunderstood. I think you can do If there are subclasses that need to be passed here, I think it's probably fine to add them, we weren't being very particular in deciding what to put there. If you're getting test failures locally though in your IDE on code that passes CI, there might just be some mismatch in mypy version or config. |
Can this be done? I didn't check it but aren't we using a variable before declaring it? not from a class perspective but just python... I think there are ways to use Self (not self), or "TreeNode" (as string), but definitely not TreeNode as how the compiler would know this var. |
That was in the original code, my change restricted the list of types to a subset because the generic
We can add more classes to the list or alternatively add a common parent that meets the contract and use its type instead. |
I think it is definitely my original question that confused both you and Brian. so let me explain - the original code uses class A:
def __init__(self, x: Z) where Z is the "to be unpacked parent class", and let's say Z has subset [A, B] this PR changed it to class A:
def __init__(self, x: [A,B]) which isn't correct based on my understanding of python as you put an undeclared variable A there. you need something like, e.g. (see my quote around A) class A:
def __init__(self, x: ["A",B]) |
Got it, thank you for the clarification. Even though |
Add CI jobs for:
For CI jobs to pass, implement corresponding fixes: