Skip to content

Commit

Permalink
Merge pull request #253 from aiqwe/improve_readibility
Browse files Browse the repository at this point in the history
Simple enhancement for readibility
  • Loading branch information
hynky1999 authored Aug 14, 2024
2 parents b5443d2 + f266bdb commit 451e593
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 2 deletions.
5 changes: 4 additions & 1 deletion src/datatrove/io.py
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,10 @@ def list_files(
Get a list of files on this directory. If `subdirectory` is given will search in `path/subdirectory`. If
glob_pattern is given, it will only return files that match the pattern, which can be used to match a given
extension, for example `*.myext`. Be careful with subdirectories when using glob (use ** if you want to match
any subpath). Args: subdirectory: str: (Default value = "") recursive: bool: (Default value = True)
any subpath).
Args: subdirectory: str: (Default value = "")
recursive: bool: (Default value = True)
glob_pattern: str | None: (Default value = None)
Returns: a list of file paths, relative to `self.path`
Expand Down
2 changes: 1 addition & 1 deletion src/datatrove/tools/check_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ def open_file(path):
datafiles = input_folder.list_files(glob_pattern="*.ds")
datafiles_index = input_folder.list_files(glob_pattern="*.ds.index")
datafiles_loss = input_folder.list_files(glob_pattern="*.ds.loss")
check_loss = not not datafiles_loss
check_loss = bool(datafiles_loss)
assert len(datafiles) == len(datafiles_index) and (not check_loss or len(datafiles) == len(datafiles_loss)), (
"Mismatch between number of .ds, " ".ds.index and/or .ds.loss files"
)
Expand Down

0 comments on commit 451e593

Please sign in to comment.