export task progress #958

denniswambua · 2017-03-16T14:36:14Z

closes #951

ukanga · 2017-03-16T16:53:53Z

onadata/libs/utils/export_builder.py

+        meta = {'progress': additions}
+        if total:
+            meta.update({'total': total})
+        current_task.update_state(state='PENDING',


Do we have any other started other than pending? For example, started? http://docs.celeryproject.org/en/latest/reference/celery.states.html#all-states

ukanga · 2017-03-16T16:56:19Z

onadata/libs/utils/export_builder.py

@@ -542,6 +565,7 @@ def write_row(row, csv_writer, fields):
                            self.pre_process_row(child_row, section),
                            csv_writer, fields)
            index += 1
+            track_task_progress(i, total_records)


Would it be better to make this updates in chunks/batches of 10 or 100 or X? Something configurable in settings. Could be an overkill when you are iterating over 10k or 100k records.

you could also only batch for large datasets, i.e. if it's < 10k, do not batch

ukanga · 2017-03-16T16:58:28Z

onadata/libs/utils/export_tools.py

    else:
        records = query_data(xform, query=filter_query, start=start, end=end)
+        total_records = xform.num_of_submissions


Does this take into account the filter query? An export could be using filters where the number of records is less than the total number of submissions.

ukanga · 2017-03-20T08:46:12Z

onadata/libs/utils/export_tools.py

    else:
        records = query_data(xform, query=filter_query, start=start, end=end)

+        if filter_query:
+            total_records = query_data(xform, query=filter_query, start=start,
+                                       end=end, count=True)[0].get('count')


This makes a huge assumption that we will always receive a list of at least size one.

well, since count is True, looks like this will be a list of length 1 unless the query results in a generator, in which case maybe you'd need to call len on it? Is this guaranteed to be on line 15 of the apps.viewer.models.parsed_instance function? Not related, but why does the condition on lines 10-11 of apps.viewer.models.parsed_instance exist at all? this would put a lot of objects in memory and probably segfault on large datasets

pld · 2017-03-20T19:04:37Z

onadata/libs/utils/async_status.py

@@ -3,14 +3,18 @@
 PENDING = 0
 SUCCESSFUL = 1
 FAILED = 2
+PROGRESS = 3
+RETRY = 4
+STARTED = 5


what's the difference between this and PROGRESS?

Between PENDING? PENDING is a unkown state, I have added PROGRESS to indicate that the task has started and is processing.
Docs: http://docs.celeryproject.org/en/latest/reference/celery.states.html#all-states

that makes sense, I am asking about the difference between, PROGRESS and STARTED

There is no big difference STARTED means the job has started and PROGRESS (custom task state) is how the task is progressing. STARTED is fired when the task is started but is cleared and we require a custom state when updating the task meta field.

pld · 2017-03-20T19:06:14Z

onadata/libs/utils/export_builder.py

+    """
+    try:
+        if additions % getattr(settings, 'EXPORT_TASK_PROGRESS_UPDATE_BATCH',
+                               100) == 0:


I'd put this in a DEFAULT_UPDATE_BATCH var, @ukanga what do you think?

pld · 2017-03-20T19:07:30Z

onadata/libs/utils/export_builder.py

@@ -542,6 +565,7 @@ def write_row(row, csv_writer, fields):
                            self.pre_process_row(child_row, section),
                            csv_writer, fields)
            index += 1
+            track_task_progress(i, total_records)


you could also only batch for large datasets, i.e. if it's < 10k, do not batch

pld · 2017-03-20T19:19:45Z

onadata/libs/utils/export_tools.py

    else:
        records = query_data(xform, query=filter_query, start=start, end=end)

+        if filter_query:
+            total_records = query_data(xform, query=filter_query, start=start,
+                                       end=end, count=True)[0].get('count')


well, since count is True, looks like this will be a list of length 1 unless the query results in a generator, in which case maybe you'd need to call len on it? Is this guaranteed to be on line 15 of the apps.viewer.models.parsed_instance function? Not related, but why does the condition on lines 10-11 of apps.viewer.models.parsed_instance exist at all? this would put a lot of objects in memory and probably segfault on large datasets

Signed-off-by: Dennis Wambua <[email protected]>

records when filter query supplied Signed-off-by: Dennis Wambua <[email protected]>

Signed-off-by: Dennis Wambua <[email protected]>

pld · 2017-03-22T12:47:21Z

Is there a reason not to combine these into a single state constant?

denniswambua · 2017-03-22T13:01:42Z

I introduced PROGRESS mainly because of http://docs.celeryproject.org/en/latest/reference/celery.states.html#celery.states.state

Especially here

>>> state('PROGRESS') > state(STARTED)
True

>>> state('PROGRESS') > state('SUCCESS')
False

pld · 2017-03-22T13:03:13Z

cool, got it

ukanga requested changes Mar 16, 2017

View reviewed changes

denniswambua force-pushed the 951-export-task-progress branch from 62c2ac7 to a704a0e Compare March 20, 2017 07:12

ukanga reviewed Mar 20, 2017

View reviewed changes

ukanga approved these changes Mar 20, 2017

View reviewed changes

pld reviewed Mar 20, 2017

View reviewed changes

denniswambua added 5 commits March 22, 2017 15:18

Track celery export tasks

92a20a6

Signed-off-by: Dennis Wambua <[email protected]>

Added tests

548771a

Signed-off-by: Dennis Wambua <[email protected]>

flake8 fix

e75272c

Signed-off-by: Dennis Wambua <[email protected]>

Enable task started state, update task progress in batches and count

77c48ca

records when filter query supplied Signed-off-by: Dennis Wambua <[email protected]>

Default task progress update constant

df1e41a

Signed-off-by: Dennis Wambua <[email protected]>

denniswambua force-pushed the 951-export-task-progress branch from a704a0e to df1e41a Compare March 22, 2017 12:18

denniswambua changed the title ~~951 export task progress~~ export task progress Mar 22, 2017

ukanga merged commit 910e6c2 into master Mar 24, 2017

ukanga deleted the 951-export-task-progress branch March 24, 2017 07:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

export task progress #958

export task progress #958

denniswambua commented Mar 16, 2017

ukanga Mar 16, 2017

ukanga Mar 16, 2017

denniswambua Mar 20, 2017

pld Mar 20, 2017

ukanga Mar 16, 2017

ukanga Mar 20, 2017

pld Mar 20, 2017

pld Mar 20, 2017

denniswambua Mar 21, 2017

pld Mar 21, 2017

denniswambua Mar 22, 2017

pld Mar 20, 2017

ukanga Mar 21, 2017

pld Mar 20, 2017

pld Mar 20, 2017

pld commented Mar 22, 2017 via email

denniswambua commented Mar 22, 2017

pld commented Mar 22, 2017 via email

export task progress #958

export task progress #958

Conversation

denniswambua commented Mar 16, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pld commented Mar 22, 2017 via email

denniswambua commented Mar 22, 2017

pld commented Mar 22, 2017 via email