Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

export task progress #958

Merged
merged 5 commits into from
Mar 24, 2017
Merged

export task progress #958

merged 5 commits into from
Mar 24, 2017

Conversation

denniswambua
Copy link
Contributor

closes #951

meta = {'progress': additions}
if total:
meta.update({'total': total})
current_task.update_state(state='PENDING',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any other started other than pending? For example, started? http://docs.celeryproject.org/en/latest/reference/celery.states.html#all-states

@@ -542,6 +565,7 @@ def write_row(row, csv_writer, fields):
self.pre_process_row(child_row, section),
csv_writer, fields)
index += 1
track_task_progress(i, total_records)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to make this updates in chunks/batches of 10 or 100 or X? Something configurable in settings. Could be an overkill when you are iterating over 10k or 100k records.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could also only batch for large datasets, i.e. if it's < 10k, do not batch

else:
records = query_data(xform, query=filter_query, start=start, end=end)
total_records = xform.num_of_submissions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this take into account the filter query? An export could be using filters where the number of records is less than the total number of submissions.

@denniswambua denniswambua force-pushed the 951-export-task-progress branch from 62c2ac7 to a704a0e Compare March 20, 2017 07:12
else:
records = query_data(xform, query=filter_query, start=start, end=end)

if filter_query:
total_records = query_data(xform, query=filter_query, start=start,
end=end, count=True)[0].get('count')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes a huge assumption that we will always receive a list of at least size one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, since count is True, looks like this will be a list of length 1 unless the query results in a generator, in which case maybe you'd need to call len on it? Is this guaranteed to be on line 15 of the apps.viewer.models.parsed_instance function? Not related, but why does the condition on lines 10-11 of apps.viewer.models.parsed_instance exist at all? this would put a lot of objects in memory and probably segfault on large datasets

@@ -3,14 +3,18 @@
PENDING = 0
SUCCESSFUL = 1
FAILED = 2
PROGRESS = 3
RETRY = 4
STARTED = 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the difference between this and PROGRESS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Between PENDING? PENDING is a unkown state, I have added PROGRESS to indicate that the task has started and is processing.
Docs: http://docs.celeryproject.org/en/latest/reference/celery.states.html#all-states

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense, I am asking about the difference between, PROGRESS and STARTED

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no big difference STARTED means the job has started and PROGRESS (custom task state) is how the task is progressing. STARTED is fired when the task is started but is cleared and we require a custom state when updating the task meta field.

"""
try:
if additions % getattr(settings, 'EXPORT_TASK_PROGRESS_UPDATE_BATCH',
100) == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd put this in a DEFAULT_UPDATE_BATCH var, @ukanga what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

@@ -542,6 +565,7 @@ def write_row(row, csv_writer, fields):
self.pre_process_row(child_row, section),
csv_writer, fields)
index += 1
track_task_progress(i, total_records)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could also only batch for large datasets, i.e. if it's < 10k, do not batch

else:
records = query_data(xform, query=filter_query, start=start, end=end)

if filter_query:
total_records = query_data(xform, query=filter_query, start=start,
end=end, count=True)[0].get('count')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, since count is True, looks like this will be a list of length 1 unless the query results in a generator, in which case maybe you'd need to call len on it? Is this guaranteed to be on line 15 of the apps.viewer.models.parsed_instance function? Not related, but why does the condition on lines 10-11 of apps.viewer.models.parsed_instance exist at all? this would put a lot of objects in memory and probably segfault on large datasets

Signed-off-by: Dennis Wambua <[email protected]>
Signed-off-by: Dennis Wambua <[email protected]>
Signed-off-by: Dennis Wambua <[email protected]>
records when filter query supplied

Signed-off-by: Dennis Wambua <[email protected]>
@denniswambua denniswambua force-pushed the 951-export-task-progress branch from a704a0e to df1e41a Compare March 22, 2017 12:18
@pld
Copy link
Member

pld commented Mar 22, 2017 via email

@denniswambua denniswambua changed the title 951 export task progress export task progress Mar 22, 2017
@denniswambua
Copy link
Contributor Author

I introduced PROGRESS mainly because of http://docs.celeryproject.org/en/latest/reference/celery.states.html#celery.states.state

Especially here

>>> state('PROGRESS') > state(STARTED)
True

>>> state('PROGRESS') > state('SUCCESS')
False

@pld
Copy link
Member

pld commented Mar 22, 2017 via email

@ukanga ukanga merged commit 910e6c2 into master Mar 24, 2017
@ukanga ukanga deleted the 951-export-task-progress branch March 24, 2017 07:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Keep track of export progress
3 participants