Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to MOOSE checkpoints prevent MRAD model restarts #369

Open
harterj opened this issue Mar 15, 2024 · 5 comments
Open

Changes to MOOSE checkpoints prevent MRAD model restarts #369

harterj opened this issue Mar 15, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@harterj
Copy link

harterj commented Mar 15, 2024

Bug Description

The checkpoint = true in HPMR_thermal_ss.i used to write a directory called HPMR_dfem_griffin_ss_out_bison0_cp with checkpoint files. With the recent changes to MOOSE checkpoints, it appears only the master input app can write checkpoint files.

>  mpiexec -n 48 ~/sawtooth-projects/dire_wolf/dire_wolf-opt -i HPMR_dfem_griffin_tr.i
*** ERROR ***
No checkpoint file found!

Steps to Reproduce

Obtain the latest version of Dire Wolf and MOOSE. Run the steady state case, HPMR_dfem_griffin_ss.i as normal. When the simulation finishes, look in HPMR_dfem_griffin_ss_out_bison0_cp, notice there are 0 files. You will be unable to run the null transient, or any other simulation relying on restarts.

[hartjack][~/projects/virtual_test_bed/microreactors/mrad/steady/HPMR_dfem_griffin_ss_out_bison0_cp] (devel)> l
total 160K
drwx------ 2 hartjack hartjack   0 Mar 15 13:12 .
drwxrwxr-x 7 hartjack hartjack 14K Mar 15 13:51 ..
[hartjack][~/projects/virtual_test_bed/microreactors/mrad/steady/HPMR_dfem_griffin_ss_out_bison0_cp] (devel)> 

As a comparison to show new functionality, add checkpoint = true to [Outputs] in HPMR_dfem_griffin_ss.i. Run the steady state case. Look at the two output directories HPMR_dfem_griffin_ss_out_cp and HPMR_dfem_griffin_ss_out_bison0_cp. You will see the checkpoints in the former directory, and not the latter.

It is not as trivial as just using the master app checkpoint files. The neutronics and conduction meshs are different, with different BCs. I did not create the mesh, but I'm guessing some changes might need to be made, hence tagging the model creators.

Impact

Trying to use a copy of MRAD for another application. I can't progress much further right now.

Tagging: @miaoyinb @nstauff @GiudGiud @markdehart

@harterj harterj added the bug Something isn't working label Mar 15, 2024
@miaoyinb
Copy link
Collaborator

@GiudGiud Is this the expected behavior of the new app version? I recently used the INL HPC blue_crab-opt compiled on 3/15/2024 for a different problem and the checkpoint files of the child app are generated as usual.

@GiudGiud
Copy link
Collaborator

We should still be able to generate sub-app checkpoints with checkpoint = true in the subapp.
I need to look into this

@harterj
Copy link
Author

harterj commented Mar 21, 2024

Tag @YaqiWang

@lindsayad
Copy link
Member

I believe we've been moving towards having the main app handling all the restart, even if the main input file is "changing" from before-restart to after-restart.

@loganharbour

@harterj
Copy link
Author

harterj commented Mar 28, 2024

BTW, @GiudGiud helped me get this running, but obviously this is not the ideal fix. This is [Outputs] in HPMR_thermo_ss.i

[Outputs]
  perf_graph = true
  exodus = true
  color = true
  csv = true
  [check]
    type = Checkpoint
    execute_on = FINAL
    num_files = 1e5
  []
[]

This will generate checkpoint files for the SubApp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants