Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Average Wall Hours statistic to be exact rather than approximate in aggregate mode #964

Merged
merged 3 commits into from
Jun 17, 2019

Conversation

jpwhite4
Copy link
Member

@jpwhite4 jpwhite4 commented Jun 13, 2019

Description

Update the average wall hours statistic to be exact rather than approximate. Also update the help description to accurately describe the algorithm.

The keen observer may wonder why the regression test results are unchanged. The reason is that the old code produced the correct answer when all of the jobs that contributed to the calculation ended in the specified time range. This is the case for the regression tests.

Manually double checking the calculation is done by running a query on the fact tables and comparing with the XDMoD output. Care must be taken because the wallduration is not the total wall time of the job. it is the contribution of the job wall time in the time period. The following sql can be used to validate the results of group by pi from 2016-12-01 to 2016-12-30 (The -31 in the sql is not a typo: recall an end time of 2016-12-30 includes 2016-12-30):

SELECT 
    p.long_name,
    SUM(CASE
        WHEN
            (jt.start_time_ts BETWEEN UNIX_TIMESTAMP('2016-12-01') AND UNIX_TIMESTAMP('2016-12-31')
                AND jt.end_time_ts BETWEEN UNIX_TIMESTAMP('2016-12-01') AND UNIX_TIMESTAMP('2016-12-31'))
        THEN
            jt.wallduration
        WHEN
            (jt.start_time_ts < UNIX_TIMESTAMP('2016-12-01')
                AND jt.end_time_ts BETWEEN UNIX_TIMESTAMP('2016-12-01') AND UNIX_TIMESTAMP('2016-12-31'))
        THEN
            jt.wallduration * (jt.end_time_ts - UNIX_TIMESTAMP('2016-12-01')) / (jt.end_time_ts - jt.start_time_ts)
        WHEN
            (jt.start_time_ts BETWEEN UNIX_TIMESTAMP('2016-12-01') AND UNIX_TIMESTAMP('2016-12-31')
                AND jt.end_time_ts > UNIX_TIMESTAMP('2016-12-31'))
        THEN
            jt.wallduration * (UNIX_TIMESTAMP('2016-12-31') - jt.start_time_ts + 1) / (jt.end_time_ts - jt.start_time_ts)
        WHEN
            (jt.start_time_ts < UNIX_TIMESTAMP('2016-12-01')
                AND jt.end_time_ts > UNIX_TIMESTAMP('2016-12-31'))
        THEN
            jt.wallduration * (UNIX_TIMESTAMP('2016-12-31') - UNIX_TIMESTAMP('2016-12-01')) / (jt.end_time_ts - jt.start_time_ts)
        ELSE jt.wallduration
    END) / SUM(3600) AS average_wallduration_per_job,
    SUM(jt.wallduration) / 3600.0,
    SUM(1)
FROM
    job_tasks jt,
    job_records jr,
    person p
WHERE
    p.id = jr.principalinvestigator_person_id
        AND jt.job_record_id = jr.job_record_id
        AND (jt.end_time_ts BETWEEN UNIX_TIMESTAMP('2016-12-01') AND UNIX_TIMESTAMP('2016-12-31')
        OR jt.start_time_ts BETWEEN UNIX_TIMESTAMP('2016-12-01') AND UNIX_TIMESTAMP('2016-12-31')
        OR (jt.start_time_ts < UNIX_TIMESTAMP('2016-12-01')
        AND jt.end_time_ts > UNIX_TIMESTAMP('2016-12-31')))
GROUP BY 1
ORDER BY 2 DESC;

@jpwhite4 jpwhite4 changed the title Update Average Wall Hours statistic Update Average Wall Hours statistic to be exact rather than approximate in aggregate mode Jun 13, 2019
@jpwhite4 jpwhite4 merged commit 65d7a1b into ubccr:xdmod8.5 Jun 17, 2019
@jpwhite4 jpwhite4 deleted the average_wall branch June 17, 2019 17:20
@jpwhite4 jpwhite4 added this to the 8.5.0 milestone Aug 5, 2019
@plessbd plessbd added the data quality Data quality issues such as improvements to sql queries to improve precision or consistency label Aug 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data quality Data quality issues such as improvements to sql queries to improve precision or consistency
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants