Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

netcdf4 build fails on garnet (CrayXE6) #444

Open
cekees opened this issue Sep 5, 2014 · 6 comments
Open

netcdf4 build fails on garnet (CrayXE6) #444

cekees opened this issue Sep 5, 2014 · 6 comments

Comments

@cekees
Copy link
Contributor

cekees commented Sep 5, 2014

Just now starting to look at this. I will post the whole build.log in a gist, but I'm guessing it's the error about not finding or linking to hdf5 (which is listed as a build dependency):

2014/09/05 10:54:23 - INFO: [package:run_job] configure: error: Can't find or link to the hdf5 library. Use --disable-netcdf-4, or see config.log for errors.
2014/09/05 10:54:23 - INFO: [package:run_job] patching file RELEASE_NOTES.md
2014/09/05 10:54:23 - INFO: [package:run_job] Hunk #1 succeeded at 5 with fuzz 2 (offset -4 lines).
2014/09/05 10:54:23 - INFO: [package:run_job] patching file libsrc4/nc4file.c
2014/09/05 10:54:23 - INFO: [package:run_job] patching file nc_test4/tst_nc4perf.c
2014/09/05 10:54:23 - INFO: [package:run_job] patching file nc_test4/tst_parallel3.c
2014/09/05 10:54:24 - INFO: [package:run_job] make: *** No targets specified and no makefile found. Stop.
2014/09/05 10:54:24 - ERROR: [package:run_job] Command '[u'/bin/bash', '_hashdist/build.sh']' returned non-zero exit status 2
2014/09/05 10:54:24 - ERROR: [package:run_job] command failed (code=2); raising

@cekees
Copy link
Contributor Author

cekees commented Sep 5, 2014

Here are the config.log and build.log files

https://gist.github.com/cekees/5c4d308ee29dd9d72d1f

@cekees
Copy link
Contributor Author

cekees commented Sep 5, 2014

I always struggle with config out put having so many errors that are just the result of checks that don't necessarily need to pass. If it's just the last error that triggered the config failure, then it may just be that we need to add something like -ldl or turn off dynamic library support in netcdf.

@ahmadia
Copy link
Contributor

ahmadia commented Sep 8, 2014

Sorry, I should have kicked a rebuild off on Friday. I put something in this morning and it's still going. Here's what I can tell from looking at this and my own tree:

The last time I moved the branch was in July, and the associated hdf5 build with that was kduxq2nyak56. The one your profile is trying to link, bvqkn2n5n3ny, isn't in my build cache, so I don't know how you're creating it. The reason I'm curious about this is because the error you're seeing:

configure:16585: cc -o conftest -g -O2 -I/lustre/home1/u/cekees/.hashdist/bld/curl/lnnmx25bocra/include -I/lustre/home1/u/cekees/.hashdist/bld/hdf5/bvqkn2n5n3ny/include -I/lustre/home1/u/cekees/.hashdist/bld/mpi/jrk6uiczudla/include -I/lustre/home1/u/cekees/.hashdist/bld/patchelf/m5njcb5667me/include -L/lustre/home1/u/cekees/.hashdist/bld/curl/lnnmx25bocra/lib -Wl,-rpath=/lustre/home1/u/cekees/.hashdist/bld/curl/lnnmx25bocra/lib -L/lustre/home1/u/cekees/.hashdist/bld/hdf5/bvqkn2n5n3ny/lib -Wl,-rpath=/lustre/home1/u/cekees/.hashdist/bld/hdf5/bvqkn2n5n3ny/lib -L/lustre/home1/u/cekees/.hashdist/bld/mpi/jrk6uiczudla/lib -Wl,-rpath=/lustre/home1/u/cekees/.hashdist/bld/mpi/jrk6uiczudla/lib -L/lustre/home1/u/cekees/.hashdist/bld/patchelf/m5njcb5667me/lib -Wl,-rpath=/lustre/home1/u/cekees/.hashdist/bld/patchelf/m5njcb5667me/lib conftest.c -lhdf5  -lm -lz  >&5
/lustre/home1/u/cekees/.hashdist/bld/hdf5/bvqkn2n5n3ny/lib/libhdf5.a(H5PL.o): In function `H5PL_term_interface':
H5PL.c:(.text+0xa7): undefined reference to `dlclose'
/lustre/home1/u/cekees/.hashdist/bld/hdf5/bvqkn2n5n3ny/lib/libhdf5.a(H5PL.o): In function `H5PL_load':
H5PL.c:(.text+0x309): undefined reference to `dlsym'
H5PL.c:(.text+0x43e): undefined reference to `dlopen'
H5PL.c:(.text+0x457): undefined reference to `dlsym'
H5PL.c:(.text+0x584): undefined reference to `dlclose'
H5PL.c:(.text+0x609): undefined reference to `dlerror'
H5PL.c:(.text+0x81c): undefined reference to `dlclose'
collect2: error: ld returned 1 exit status
configure:16585: $? = 1
configure: failed program was:

Has to do with trying to link a statically linked library as if it had been dynamically linked (libdl.so is frequently implicitly available in dynamically linked libraries, but not is available by default in a static library).

But our general assumption is that we always work with dynamic libraries, particularly I/O interfaces like hdf5 that might be used across multiple modules (and therefore be linked in various separate DSOs), so I think the problem here is with your hdf5 build. You should have dynamic libraries in your HDF5 artifact, and NetCDF and other modules that depend on it should be linking dynamically.

What branch of hashstack are you using, and how is it different than stable/garnet?

@cekees
Copy link
Contributor Author

cekees commented Sep 8, 2014

I just pushed it to cekees/stable_garnet_update.

@ahmadia
Copy link
Contributor

ahmadia commented Sep 8, 2014

Damn, sorry I missed this earlier:

From the configure log.

configure:16585: cc -o conftest -g -O2 -I/lustre/home1/u/cekees/.hashdist/bld/curl/lnnmx25bocra/include -I/lustre/home1/u/cekees/.hashdist/bld/hdf5/bvqkn2n5n3ny/include -I/lustre/home1/u/cekees/.hashdist/bld/mpi/jrk6uiczudla/include -I/lustre/home1/u/cekees/.hashdist/bld/patchelf/m5njcb5667me/include -L/lustre/home1/u/cekees/.hashdist/bld/curl/lnnmx25bocra/lib -Wl,-rpath=/lustre/home1/u/cekees/.hashdist/bld/curl/lnnmx25bocra/lib -L/lustre/home1/u/cekees/.hashdist/bld/hdf5/bvqkn2n5n3ny/lib -Wl,-rpath=/lustre/home1/u/cekees/.hashdist/bld/hdf5/bvqkn2n5n3ny/lib -L/lustre/home1/u/cekees/.hashdist/bld/mpi/jrk6uiczudla/lib -Wl,-rpath=/lustre/home1/u/cekees/.hashdist/bld/mpi/jrk6uiczudla/lib -L/lustre/home1/u/cekees/.hashdist/bld/patchelf/m5njcb5667me/lib -Wl,-rpath=/lustre/home1/u/cekees/.hashdist/bld/patchelf/m5njcb5667me/lib conftest.c -lhdf5  -lm -lz  >&5

The configure test is trying to build an executable. On OS X/Linux, gcc will build dynamically-linked executables by default (that is, preferring dynamic libraries over static libraries). But on a Cray XE6, the reverse is true. You need to pass a special flag, -dynamic, to inform the compiler that you want the "normal" behavior. This is only true when building executables, so it doesn't turn up when dealing with most libraries, but it does turn up with tools like netcdf, where the configure system will try to build executables as part of setup.

You need to set LDFLAGS={{DYNAMIC_EXE_LINKER_FLAGS}} somewhere in the netcdf configuration. I'm still watching Garnet build a stack from this morning, but this looks like the problem (and I think this is the best solution).

@ahmadia
Copy link
Contributor

ahmadia commented Sep 9, 2014

This should work:

diff --git a/pkgs/netcdf4/netcdf4.yaml b/pkgs/netcdf4/netcdf4.yaml
index 9e6b589..2cc908a 100644
--- a/pkgs/netcdf4/netcdf4.yaml
+++ b/pkgs/netcdf4/netcdf4.yaml
@@ -14,6 +14,13 @@ build_stages:
   bash: |
     export CC=$MPICC

+- name: configure
+  mode: override
+  append: {LDFLAGS: {{DYNAMIC_EXE_LINKER_FLAGS}} }
+  when machine == 'CrayXE6':
+    extra: ['--host=cray']
+
+
 # http://www.unidata.ucar.edu/software/netcdf/docs/known_problems.html#clang-
ncgen3
 - when: platform == 'Darwin'
   files: [fix_genlib.patch]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants