-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dataset] add shuffle at shards tar/raw file level #2424
Conversation
raw 和 shard的source dataset需要加个shuffle的参数,原来是不shuffle的,要不然ut 过不了 |
OK. |
增加了两个参数, |
默认值可以直接给个sys.max |
self.dp = TextLineDataPipe(filenames).repeat(cycle).prefetch( | ||
prefetch).shard(partition) | ||
prefetch) | ||
if shuffle: | ||
self.dp = self.dp.shuffle(buffer_size=shuffle_size) | ||
self.dp = self.dp.shard(partition) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个shuffle是不是应该在prefetch之前?@Mddct
No description provided.