DataLoader supprot dict str #31481

heavengate · 2021-03-08T16:13:32Z

PR types

Function optimization

PR changes

APIs

Describe

DataLoader optimization

support data format: dict, list, str
log ERROR info when shared memory insufficient
refine blocking queue kill ENFORCE check
re-raise worker exception in main process
add CPU place guard for collate in workers to ensure tensor operations runs on CPU

no effect on speed

Model	batch_size	develop	This PR
ResNet50	1*128	343.19 samples/s	343.73 samples/s
ResNet50	8*128	2456.51 samples/s	2462.64 ms/step
MobileNetV1	1*128	1045.33 samples/s	1043.87 samples/s
MobileNetV1	8*128	3227.82 samples/s	3225.13 sample/s

TODO:

remove ENFORCE check in blocking queue Receive
refine CPU tensor pipeline
enhance main process check when SIGBUS kill sub-process

paddle-bot-old · 2021-03-08T16:13:49Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

LielinJiang · 2021-03-09T02:19:26Z

python/paddle/fluid/dataloader/collate.py

@@ -0,0 +1,87 @@
+#   Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.


2021 for new file

Done, thanks!

LielinJiang · 2021-03-09T02:20:38Z

python/paddle/fluid/dataloader/flat.py

+                    structure.append('{}{}'.format(FIELD_PREFIX, field_idx))
+                    flat_batch.append(field.numpy())
+                    field_idx += 1
+                elif isinstance(field, (str, bytes, numbers.Number, np.number)):


what is the difference between numbers.Number and np.number

Done, thanks!

…est=develop

chenwhql · 2021-03-09T03:20:14Z

paddle/fluid/imperative/data_loader.cc

+                        "DataLoader workers.\n");
+REGISTER_SIGNAL_HANDLER(
+    SIGBUS, SIGBUS_handler,
+    "ERROR: Unexpected BUS error encountered in DataLoader worker. "


方便在comment里附一个此类报错的示例结果吗？想看下格式

Done, thanks!

chenwhql · 2021-03-11T07:52:38Z

python/paddle/fluid/dataloader/worker.py

+        self.exc_msg = "".join(traceback.format_exception(*exc_info))
+
+    def reraise(self):
+        msg = "DataLoader worker({}) caught {} with message:\n{}".format(


还有这个改进后的报错示例也想看下

Done, thanks!

chenwhql

先approve，但目前从这两种报错格式来看，还是会给用户的使用造成困惑，估计还是会DataLoader issue不断，预估凯鹏还是会被各种问题打断日常工作，所以希望后续还能完善下，主要问题有以下几点

首先用户看到报错的时候没有红框，所以大概率Get不到重点
第一种类型的报错，我们之前写了很详细的报错提示，报到最后这个问题是能解决的，可以后面再看看
然后blocking queue的报错应该是对用户调试无帮助的，建议去掉，如果需要也可以改现有单测

heavengate · 2021-03-12T03:52:15Z

先approve，但目前从这两种报错格式来看，还是会给用户的使用造成困惑，估计还是会DataLoader issue不断，预估凯鹏还是会被各种问题打断日常工作，所以希望后续还能完善下，主要问题有以下几点

首先用户看到报错的时候没有红框，所以大概率Get不到重点

第一种类型的报错，我们之前写了很详细的报错提示，报到最后这个问题是能解决的，可以后面再看看

然后blocking queue的报错应该是对用户调试无帮助的，建议去掉，如果需要也可以改现有单测

是的，blocking queue的Receive里的EnforceNotKilled现在被test_multiprocess_reader单测里老版本PyReader的单测依赖没法删除，这个后续配合CPU tensor pipeline的调整下个PR继续优化，优化后应该能继续改进SIGBUS信号的捕获处理，这个会配合后续工作持续改进

TCChenlong · 2021-03-12T05:20:27Z

python/paddle/fluid/dataloader/collate.py

+
+def default_collate_fn(batch):
+    """
+    Default batch collating function for :code:`fluid.io.DataLoader`,


paddle.io.DataLoader

Done, thanks!

TCChenlong · 2021-03-12T05:27:20Z

python/paddle/fluid/dataloader/worker.py

+        WorkerInfo: an instance of WorkerInfo which contains fields above.
+
+    .. note::
+        For mode usage and exampls, please see :code:`paddle.io.IterableDataset`


For mode usage and exampls -> For more usage and examples

Done, thanks!

TCChenlong

LGTM

qingqing01

Need to test whether the flatten_batch affect the original speed or not

heavengate · 2021-03-12T07:04:53Z

Need to test whether the flatten_batch affect the original speed or not

affect on original model is tested above, this PR has no affects on original model(original model datas are all in list format)

Model	batch_size	develop	This PR
ResNet50	1*128	343.19 samples/s	343.73 samples/s
ResNet50	8*128	2456.51 samples/s	2462.64 ms/step
MobileNetV1	1*128	1045.33 samples/s	1043.87 samples/s
MobileNetV1	8*128	3227.82 samples/s	3225.13 sample/s

change PaddleClas data format to dict as follow, this PR also has no affects on dict format data model, speed testing result as follows

{'image': transform(img, self.ops), 'label': int(label)}

Model	batch_size	develop	This PR
ResNet50	1*128	343.19 samples/s	343.45 samples/s
ResNet50	8*128	2456.51 samples/s	2459.33 ms/step
MobileNetV1	1*128	1045.33 samples/s	1044.12 samples/s
MobileNetV1	8*128	3227.82 samples/s	3225.76 sample/s

heavengate added 2 commits March 8, 2021 16:07

add dict/str/list supprot for DataLoader. test=develop

fae592e

fix format. test=develop

f7ed641

heavengate requested review from chenwhql, qingqing01 and LielinJiang March 8, 2021 16:18

reset -> shutdown. test=develop

8cf6712

LielinJiang reviewed Mar 9, 2021

View reviewed changes

heavengate added 6 commits March 9, 2021 13:49

show exception in workers. test=develop

c078842

wrap and reraise exception. test=develop

38b72c0

fix circular reference in _restore which may cause GPU memory leak. t…

7bee560

…est=develop

remove redundancy check. test=develop

42c742f

add CPU guard for dataset fetch function. test=develop

427e1cd

fix unittest. test=develop

4be15f0

heavengate force-pushed the dataloader_supprot_dict_str branch from 7eab9e2 to 4be15f0 Compare March 10, 2021 16:34

chenwhql reviewed Mar 11, 2021

View reviewed changes

heavengate requested review from saxon-zh and XiaoguangHu01 March 11, 2021 08:37

chenwhql previously approved these changes Mar 12, 2021

View reviewed changes

heavengate requested a review from TCChenlong March 12, 2021 04:48

TCChenlong reviewed Mar 12, 2021

View reviewed changes

fix doc. test=develop

9da44d5

heavengate dismissed chenwhql’s stale review via 9da44d5 March 12, 2021 05:32

fix doc. test=develop

2f19649

TCChenlong approved these changes Mar 12, 2021

View reviewed changes

chenwhql approved these changes Mar 12, 2021

View reviewed changes

qingqing01 reviewed Mar 12, 2021

View reviewed changes

LielinJiang approved these changes Mar 13, 2021

View reviewed changes

heavengate mentioned this pull request Mar 14, 2021

simplify reader and enable use_shared_memory PaddlePaddle/PaddleDetection#2342

Closed

heavengate merged commit a32e8bf into PaddlePaddle:develop Mar 15, 2021

heavengate deleted the dataloader_supprot_dict_str branch March 15, 2021 02:30

yghstill mentioned this pull request Mar 17, 2021

Support not pad batch in mask-rcnn PaddlePaddle/PaddleDetection#2320

Closed

yghstill mentioned this pull request Mar 23, 2021

support not pad gt in rcnn model PaddlePaddle/PaddleDetection#2411

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataLoader supprot dict str #31481

DataLoader supprot dict str #31481

heavengate commented Mar 8, 2021 •

edited

Loading

paddle-bot-old bot commented Mar 8, 2021

LielinJiang Mar 9, 2021

heavengate Mar 10, 2021

LielinJiang Mar 9, 2021

heavengate Mar 10, 2021

chenwhql Mar 9, 2021

heavengate Mar 11, 2021

chenwhql Mar 11, 2021

heavengate Mar 11, 2021

chenwhql left a comment

heavengate commented Mar 12, 2021

TCChenlong Mar 12, 2021

heavengate Mar 12, 2021

TCChenlong Mar 12, 2021

heavengate Mar 12, 2021

TCChenlong left a comment

qingqing01 left a comment

heavengate commented Mar 12, 2021

		@@ -0,0 +1,87 @@
		# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.

DataLoader supprot dict str #31481

DataLoader supprot dict str #31481

Conversation

heavengate commented Mar 8, 2021 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Mar 8, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chenwhql left a comment

Choose a reason for hiding this comment

heavengate commented Mar 12, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TCChenlong left a comment

Choose a reason for hiding this comment

qingqing01 left a comment

Choose a reason for hiding this comment

heavengate commented Mar 12, 2021

heavengate commented Mar 8, 2021 •

edited

Loading