An Ember problem causes Detailed Network test to fail on RedHat-7 VM and on MultiRank #108

jpvandy · 2016-02-08T22:30:04Z

This issue is about an Ember file naming problem that breaks Scheduler Detailed Network test
Issue 147 is about an occasional Ember assert about "bufLen" buffer usage problem.
Issue 274 is about an occasional Ember Time Limit.

The test was running on all other tested platforms. The failure appears to have to do with over writing the motif-xx.log files.
(These failures can trigger another problem, the SQE timelimit processor really attempts to deal with a time limit for an SST run. the Detailed network test has a python wrapper that loops thru multiple sst invocations. Consequently, it can fail to terminate a wrapper loop.)

The failure is also observed with MULTI_RANK execution on the Serialization Branch

vjleung · 2016-02-08T22:46:34Z

John,

So is it just how the redhat-7 vm handles write permissions by default.

Vitus

Sent from my iPhone

On Feb 8, 2016, at 3:36 PM, John <[email protected]mailto:[email protected]> wrote:

The test was running on all other tested platforms. The failure appears to have to do with over writing the motif-xx.log files.
(These failures can trigger another problem, the SQE timelimit processor really attempts to deal with a time limit for an SST run. the Detailed network test has a python wrapper that loops thru multiple sst invocations. Consequently, it can fail to terminate a wrapper loop.)

Reply to this email directly or view it on GitHubhttps://github.com//issues/108.

nmhamster · 2016-02-08T22:54:34Z

I think this is really down to how SST Ember actually does file opening. There is a potential ordering effect here on who owns the final FILE*. Its more of a bug in Ember which is now getting exposed.

nmhamster · 2016-02-09T15:05:32Z

John, can you try running this again and see if it restores old behavior at least? There is still an issue where a "job" spans multiple SST ranks.

jpvandy · 2016-02-09T15:09:35Z

Nightly seems to have restored the old behavior.

I’m VPN from home at the moment. Be in after a bit.
Was just trying to run on my sandbox of yesterday, when your email came in

John

From: Si Hammond [mailto:[email protected]]
Sent: Tuesday, February 09, 2016 8:06 AM
To: sstsimulator/sst-elements
Cc: Vandyke, John P
Subject: [EXTERNAL] Re: [sst-elements] The scheduler Detailed Network test fails on RedHat-7 VM (#108)

John, can you try running this again and see if it restores old behavior at least? There is still an issue where a "job" spans multiple SST ranks.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/108#issuecomment-181906422.

nmhamster · 2016-02-09T21:00:50Z

John,

Can you confirm the test passes on RedHat 7.X?

Si Hammond

Scalable Computer Architectures
Sandia National Laboratories, NM, USA

On 2/9/16, 8:09 AM, "John" [email protected] wrote:

Nightly seems to have restored the old behavior.

I’m VPN from home at the moment. Be in after a bit.
Was just trying to run on my sandbox of yesterday, when your email came
in

John

From: Si Hammond [mailto:[email protected]]
Sent: Tuesday, February 09, 2016 8:06 AM
To: sstsimulator/sst-elements
Cc: Vandyke, John P
Subject: [EXTERNAL] Re: [sst-elements] The scheduler Detailed Network
test fails on RedHat-7 VM (#108)

John, can you try running this again and see if it restores old behavior
at least? There is still an issue where a "job" spans multiple SST ranks.

—
Reply to this email directly or view it on
GitHub<#108 (comment)
t-181906422>.

—
Reply to this email directly or
view it on GitHub
<#108 (comment)
08291>.

jpvandy · 2016-02-09T21:14:26Z

The nightly FAILED on RedHat-7 as it had previously. It is looping on an empty motif-3.log file.
After updating ember, I get the same result when I run it on the VM.

The other platforms that run it are happy.
John
From: Si Hammond [mailto:[email protected]]
Sent: Tuesday, February 09, 2016 2:01 PM
To: sstsimulator/sst-elements
Cc: Vandyke, John P
Subject: [EXTERNAL] Re: [sst-elements] The scheduler Detailed Network test fails on RedHat-7 VM (#108)

John,

Can you confirm the test passes on RedHat 7.X?

Si Hammond

Scalable Computer Architectures
Sandia National Laboratories, NM, USA

On 2/9/16, 8:09 AM, "John" [email protected] wrote:

Nightly seems to have restored the old behavior.

I’m VPN from home at the moment. Be in after a bit.
Was just trying to run on my sandbox of yesterday, when your email came
in

John

From: Si Hammond [mailto:[email protected]]
Sent: Tuesday, February 09, 2016 8:06 AM
To: sstsimulator/sst-elements
Cc: Vandyke, John P
Subject: [EXTERNAL] Re: [sst-elements] The scheduler Detailed Network
test fails on RedHat-7 VM (#108)

John, can you try running this again and see if it restores old behavior
at least? There is still an issue where a "job" spans multiple SST ranks.

—
Reply to this email directly or view it on
GitHub<#108 (comment)
t-181906422>.

—
Reply to this email directly or
view it on GitHub
<#108 (comment)
08291>.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/108#issuecomment-182066273.

jpvandy · 2016-02-19T17:53:58Z

Nothing appears to have changed with respect to the file over write problem.
(The SQE problem of this particular infinite loop has been corrected in the Test Suite.)
Is this a scheduler bug or an ember bug?

vjleung · 2016-02-19T18:05:57Z

John,

I have lost track of this a little bit. The last thing I remember is Si saying the problem is not in the scheduler.

Vitus

Sent from my iPhone

On Feb 19, 2016, at 10:54 AM, John <[email protected]mailto:[email protected]> wrote:

Nothing appears to have changed with respect to the file over write problem.
(The SQE problem of this particular infinite loop has been corrected in the Test Suite.)
Is this a scheduler bug or an ember bug?

Reply to this email directly or view it on GitHubhttps://github.com//issues/108#issuecomment-186331237.

nmhamster · 2016-02-19T19:19:04Z

This is an issue with Ember. :-(

S.

Si Hammond
Scalable Computer Architectures
Center for Computing Research
Sandia National Laboratories, NM, USA
[Sent from Remote Connection, Please excuse typos]

From: vjleung <[email protected]mailto:[email protected]>
Reply-To: sstsimulator/sst-elements <[email protected]mailto:[email protected]>
Date: Friday, February 19, 2016 at 11:05 AM
To: sstsimulator/sst-elements <[email protected]mailto:[email protected]>
Cc: Simon Hammond <[email protected]mailto:[email protected]>
Subject: [EXTERNAL] Re: [sst-elements] The scheduler Detailed Network test fails on RedHat-7 VM (#108)

John,

I have lost track of this a little bit. The last thing I remember is Si saying the problem is not in the scheduler.

Vitus

Sent from my iPhone

On Feb 19, 2016, at 10:54 AM, John <[email protected]mailto:[email protected]mailto:[email protected]> wrote:

Nothing appears to have changed with respect to the file over write problem.
(The SQE problem of this particular infinite loop has been corrected in the Test Suite.)
Is this a scheduler bug or an ember bug?

Reply to this email directly or view it on GitHubhttps://github.com//issues/108#issuecomment-186331237.

Reply to this email directly or view it on GitHubhttps://github.com//issues/108#issuecomment-186334833.

jpvandy · 2016-04-23T22:51:11Z

At this point this is a solid failure on the RedHat-7 VM, COERHEL-7, and and on all three Multi Rank n=2 nightly test, sst-test, El Capitan, Yosemite.

nmhamster · 2016-04-24T01:06:19Z

@jpvandy - can you post the error into the issue please? Thank you!

nmhamster · 2016-04-24T01:07:05Z

Ok, no problem, I already have it! Sorry for the spam!

jwilso · 2016-06-29T16:58:39Z

@afrodri : Si should evaluate whether this is still a 6.0.0 issue.

nmhamster · 2016-07-05T16:46:29Z

Fixed.

jpvandy · 2016-07-14T15:49:12Z

This is not fixed, it may be deferred, or it may be "not to be fixed", but it's not fixed.

jpvandy · 2016-11-09T22:40:53Z

When the openmpi version was updated to 1.8.8, this problem became ubiquitous, across all platforms. It was fixed, with PR 498 on Nov. 8th.

jpvandy assigned nmhamster Feb 8, 2016

nmhamster added SST-scheduler Bug labels Feb 9, 2016

nmhamster added this to the SST 6.0.0 milestone Feb 9, 2016

jpvandy added SST-ember and removed SST-scheduler labels Feb 19, 2016

jpvandy changed the title ~~The scheduler Detailed Network test fails on RedHat-7 VM~~ An Ember problem causes Detailed Network test to fail on RedHat-7 VM and on MultiRank Apr 6, 2016

jpvandy mentioned this issue Apr 29, 2016

9 tests fail when "mpirun -np 2" is applied to entire collection of 200+ tests #162

Closed

jpvandy mentioned this issue Jun 24, 2016

Ember Sweep has occasional Time Limit failures #274

Closed

nmhamster closed this as completed Jul 5, 2016

jpvandy reopened this Jul 14, 2016

jpvandy modified the milestones: Future, SST v6.0.0 Jul 14, 2016

jpvandy closed this as completed Nov 9, 2016

jpvandy modified the milestones: SST v6.1.0, Future Nov 9, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An Ember problem causes Detailed Network test to fail on RedHat-7 VM and on MultiRank #108

An Ember problem causes Detailed Network test to fail on RedHat-7 VM and on MultiRank #108

jpvandy commented Feb 8, 2016 •

edited

Loading

vjleung commented Feb 8, 2016

nmhamster commented Feb 8, 2016

nmhamster commented Feb 9, 2016

jpvandy commented Feb 9, 2016

nmhamster commented Feb 9, 2016

jpvandy commented Feb 9, 2016

jpvandy commented Feb 19, 2016

vjleung commented Feb 19, 2016

nmhamster commented Feb 19, 2016

jpvandy commented Apr 23, 2016

nmhamster commented Apr 24, 2016

nmhamster commented Apr 24, 2016

jwilso commented Jun 29, 2016

nmhamster commented Jul 5, 2016

jpvandy commented Jul 14, 2016

jpvandy commented Nov 9, 2016

An Ember problem causes Detailed Network test to fail on RedHat-7 VM and on MultiRank #108

An Ember problem causes Detailed Network test to fail on RedHat-7 VM and on MultiRank #108

Comments

jpvandy commented Feb 8, 2016 • edited Loading

vjleung commented Feb 8, 2016

nmhamster commented Feb 8, 2016

nmhamster commented Feb 9, 2016

jpvandy commented Feb 9, 2016

nmhamster commented Feb 9, 2016

jpvandy commented Feb 9, 2016

jpvandy commented Feb 19, 2016

vjleung commented Feb 19, 2016

nmhamster commented Feb 19, 2016

jpvandy commented Apr 23, 2016

nmhamster commented Apr 24, 2016

nmhamster commented Apr 24, 2016

jwilso commented Jun 29, 2016

nmhamster commented Jul 5, 2016

jpvandy commented Jul 14, 2016

jpvandy commented Nov 9, 2016

jpvandy commented Feb 8, 2016 •

edited

Loading