Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nothing happens after first round of calculations in all-to-one mode #367

Closed
oskeng opened this issue Nov 8, 2022 · 7 comments
Closed
Assignees

Comments

@oskeng
Copy link

oskeng commented Nov 8, 2022

When running in all-to-one mode, I get at "time taken to solve linear system" for all workers after the first round of calculations, but then nothing happens. Processes are, however, still going. See screenshot.

Screenshot_20221108_141944

Pairwise works fine.

.ini as follows. Any help is very much appreciated.

[Circuitscape Mode]
data_type = raster
scenario = all-to-one

[Version]
version = 5.11.2

[Habitat raster or graph]
habitat_file = /home/oskeng/Dropbox/Jobb/MIUN/Projekt/Konnektivitet_Norrbotten/Analys/Circuitscape/Input/test/calc_resistance_test_20m.asc
habitat_map_is_resistances = true

[Connection Scheme for raster habitat data]
connect_four_neighbors_only = false
connect_using_avg_resistances = false

[Short circuit regions (aka polygons)]
use_polygons = false
polygon_file = False

[Options for advanced mode]
ground_file_is_resistances = false
source_file = (Browse for a current source file)
remove_src_or_gnd = keepall
ground_file = (Browse for a ground point file)
use_unit_currents = false
use_direct_grounds = false

[Mask file]
use_mask = false
mask_file = None

[Options for one-to-all and all-to-one modes]
use_variable_source_strengths = false
variable_source_file = None

[Options for pairwise and one-to-all and all-to-one modes]
included_pairs_file = (Browse for a file with pairs to include or exclude)
use_included_pairs = false
point_file = /home/oskeng/Dropbox/Jobb/MIUN/Projekt/Konnektivitet_Norrbotten/Analys/Circuitscape/Input/test/calc_focal_points_ras_test_20m.asc

[Calculation options]
solver = cg+amg
print_timings = True
parallelize = True
max_parallel = 14

[Output options]
write_cum_cur_map_only = true
log_transform_maps = false
output_file = /home/oskeng/Dropbox/Jobb/MIUN/Projekt/Konnektivitet_Norrbotten/Analys/Circuitscape/Output/test/all-to-one/out_test
write_max_cur_maps = false
write_volt_maps = false
set_null_currents_to_nodata = false
set_null_voltages_to_nodata = false
compress_grids = false
write_cur_maps = false
@ranjanan
Copy link
Member

ranjanan commented Nov 8, 2022

Could you send me the data files?

@oskeng
Copy link
Author

oskeng commented Nov 8, 2022

Could you send me the data files?

https://www.dropbox.com/s/bm4ixw9izrrcb5f/circuitscape_input.zip?dl=0

Edit: I used the "_20m" input

Many thanks!

@ranjanan ranjanan self-assigned this Nov 8, 2022
@oskeng
Copy link
Author

oskeng commented Nov 10, 2022

Any feedback @ranjanan?

A bit of a hurry to decide on simulation method and prepare data and .ini for a full-scale run on the supercomputer. Any help is greatly appreciated.

@oskeng
Copy link
Author

oskeng commented Nov 25, 2022

Well, I tried also full scale simulations on the high-memory cluster but got the exact same behaviour:

  From worker 7:	[ Info: 2022-11-20 18:09:55 : Solving point 5 of 500
  From worker 7:	[ Info: 2022-11-20 18:10:17 : Solver used: AMG accelerated by CG
  From worker 5:	[ Info: 2022-11-20 18:10:37 : Solving point 3 of 500
  From worker 2:	[ Info: 2022-11-20 18:10:43 : Solving point 2 of 500
  From worker 4:	[ Info: 2022-11-20 18:10:44 : Solving point 1 of 500
  From worker 6:	[ Info: 2022-11-20 18:10:56 : Solving point 4 of 500
  From worker 5:	[ Info: 2022-11-20 18:10:59 : Solver used: AMG accelerated by CG
  From worker 8:	[ Info: 2022-11-20 18:11:02 : Solving point 6 of 500
  From worker 3:	[ Info: 2022-11-20 18:11:08 : Solving point 7 of 500
  From worker 4:	[ Info: 2022-11-20 18:11:08 : Solver used: AMG accelerated by CG
  From worker 2:	[ Info: 2022-11-20 18:11:10 : Solver used: AMG accelerated by CG
  From worker 6:	[ Info: 2022-11-20 18:11:18 : Solver used: AMG accelerated by CG
  From worker 8:	[ Info: 2022-11-20 18:11:25 : Solver used: AMG accelerated by CG
  From worker 3:	[ Info: 2022-11-20 18:11:29 : Solver used: AMG accelerated by CG
  From worker 7:	[ Info: 2022-11-20 18:21:12 : Time taken to construct preconditioner = 453.981257612 seconds
  From worker 5:	[ Info: 2022-11-20 18:21:36 : Time taken to construct preconditioner = 442.207463115 seconds
  From worker 4:	[ Info: 2022-11-20 18:21:47 : Time taken to construct preconditioner = 437.223975284 seconds
  From worker 3:	[ Info: 2022-11-20 18:21:49 : Time taken to construct preconditioner = 439.184317842 seconds
  From worker 6:	[ Info: 2022-11-20 18:21:51 : Time taken to construct preconditioner = 438.728753843 seconds
  From worker 2:	[ Info: 2022-11-20 18:22:47 : Time taken to construct preconditioner = 490.79701734 seconds
  From worker 8:	[ Info: 2022-11-20 18:22:49 : Time taken to construct preconditioner = 486.515464396 seconds
  From worker 4:	[ Info: 2022-11-20 19:33:37 : Time taken to solve linear system = 4295.120061158 seconds
  From worker 6:	[ Info: 2022-11-20 19:33:53 : Time taken to solve linear system = 4312.302845471 seconds
  From worker 5:	[ Info: 2022-11-20 19:35:10 : Time taken to solve linear system = 4395.431408273 seconds
  From worker 7:	[ Info: 2022-11-20 19:35:48 : Time taken to solve linear system = 4457.293068107 seconds
  From worker 2:	[ Info: 2022-11-20 19:35:53 : Time taken to solve linear system = 4377.015238594 seconds
  From worker 3:	[ Info: 2022-11-20 19:36:01 : Time taken to solve linear system = 4438.088842233 seconds
  From worker 8:	[ Info: 2022-11-20 19:37:00 : Time taken to solve linear system = 4441.687107102 seconds

Then nothing happened until I canceled the job the day after:

  [ Info: 2022-11-20 17:35:55 : Precision used: Double
  [ Info: 2022-11-20 17:35:55 : Starting up Circuitscape to use 7 processes in parallel
  [ Info: 2022-11-20 17:36:17 : Reading maps
  [ Info: 2022-11-20 17:43:24 : Resistance/Conductance map has 277679571 nodes
  [ Info: 2022-11-20 18:03:15 : There are 277679571 points and 1 connected components
  slurmstepd: error: *** JOB 21086745 ON b-cn0549 CANCELLED AT 2022-11-21T14:18:07 ***

But even though "nothing happened" it used a steady 1.33 TiB of memory for ~20 hrs:

https://usage.hpc2n.umu.se/d/job-on-kebnekaise/job-on-kebnekaise?var-jobid=21086745&from=1668962100000&to=1669036687000&orgId=1

Abolutely lost here. Any idea, @ranjanan ?

@ranjanan
Copy link
Member

@oskeng #373 could solve this too. In a few hours try updating Circuitscape to 5.12.1 and then trying again.

@ranjanan
Copy link
Member

#373 does indeed fix this!

@oskeng
Copy link
Author

oskeng commented Jan 17, 2023

#373 does indeed fix this!

I can verify that everything now works as expected. Many thanks @ranjanan!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants