Improve performances (mainly parallelization)
This MR aims to improve the performance of the code, but it also adds some features:
- a timing feature, which gives a detailed list of in which function the time is spent (using
std::chrono
), and prints the results to 2 files (timing.txt
andtiming.json
). - improves the verbosity, adding the averaged speed and the estimated remaining time
- print the speed evolution into a file
speed.csv
, as well as several performance-related quantities totiming.json
. - better format for duration, they were previously given in milliseconds and now are formatted according to their value (from microseconds to days).
Also, this MR bumps the C++ version to 17 (it was previously 14).
Here is an example of the contents of the new my_sim/results/my_sim/timing.txt
file (for a bouncing cube simulation):
Ran on 2024-04-12 18:23:39.98, sorted by execution sequence:
-------------------------------------------------------------------------------------------------
Name | Count | Time | Rel. time
-------------------------------------------------------------------------------------------------
initialise_simulation |1 |0.02s |0.01%
verbosity |100 |0.00s |0.00%
inject_particles |17500 |0.00s |0.00%
initialise_mpm_scheme |17500 |48.68s |18.55%
reset_nodes | 17500 | 0.11s | 0.04%
activate_nodes | 17500 | 0.30s | 0.11%
compute_shapefn | 17500 | 48.25s | 18.39%
shapefn_themselves | 17500000 | 12.00s | 4.57%
gradient | 17500000 | 33.48s | 12.76%
initialise_contact |17500 |0.00s |0.00%
compute_nodal_kinematics |17500 |11.38s |4.34%
map_mass_momentum_to_nodes | 17500 | 11.11s | 4.24%
compute_nodal_velocity | 17500 | 0.11s | 0.04%
apply_nodal_velocity_constraints | 17500 | 0.14s | 0.05%
compute_contact_forces |17500 |0.00s |0.00%
precompute_stress_strain |17500 |9.85s |3.76%
compute_strain | 17500 | 8.54s | 3.26%
update_volume | 17500 | 0.45s | 0.17%
pressure_smoothing | 17500 | 0.00s | 0.00%
compute_stress | 17500 | 0.85s | 0.32%
compute_forces |17500 |182.29s |69.47%
map_body_force | 17500 | 6.03s | 2.30%
apply_traction_on_particles | 17500 | 0.00s | 0.00%
apply_concentrated_force | 17500 | 0.00s | 0.00%
map_internal_force | 17500 | 176.24s | 67.17%
compute_internal_force | 907200000| 23.79s | 9.07%
update_nodal_internal_force | 907200000| 24.84s | 9.47%
compute_particle_kinematics |17500 |6.59s |2.51%
compute_acceleration_velocity | 17500 | 0.22s | 0.08%
motion_integration_specific_op | 17500 | 0.00s | 0.00%
compute_particle_velocity | 17500 | 3.03s | 1.16%
apply_particle_velocity_constraints | 17500 | 0.00s | 0.00%
compute_updated_position | 17500 | 3.31s | 1.26%
postcompute_stress_strain |17500 |0.00s |0.00%
locate_particles |17500 |0.89s |0.34%
write_csv |202 |2.47s |0.94%
-------------------------------------------------------------------------------------------------
Total of above | |262.17s |99.92%
-------------------------------------------------------------------------------------------------
Total | |262.39s |100.00%
-------------------------------------------------------------------------------------------------
Ran on 2024-04-12 18:23:39.98, sorted by cost:
-------------------------------------------------------------------------------------------------
Name | Count | Time | Rel. time
-------------------------------------------------------------------------------------------------
compute_forces |17500 |182.29s |69.47%
map_internal_force | 17500 | 176.24s | 67.17%
update_nodal_internal_force | 907200000| 24.84s | 9.47%
compute_internal_force | 907200000| 23.79s | 9.07%
map_body_force | 17500 | 6.03s | 2.30%
apply_traction_on_particles | 17500 | 0.00s | 0.00%
apply_concentrated_force | 17500 | 0.00s | 0.00%
initialise_mpm_scheme |17500 |48.68s |18.55%
compute_shapefn | 17500 | 48.25s | 18.39%
gradient | 17500000 | 33.48s | 12.76%
shapefn_themselves | 17500000 | 12.00s | 4.57%
activate_nodes | 17500 | 0.30s | 0.11%
reset_nodes | 17500 | 0.11s | 0.04%
compute_nodal_kinematics |17500 |11.38s |4.34%
map_mass_momentum_to_nodes | 17500 | 11.11s | 4.24%
apply_nodal_velocity_constraints | 17500 | 0.14s | 0.05%
compute_nodal_velocity | 17500 | 0.11s | 0.04%
precompute_stress_strain |17500 |9.85s |3.76%
compute_strain | 17500 | 8.54s | 3.26%
compute_stress | 17500 | 0.85s | 0.32%
update_volume | 17500 | 0.45s | 0.17%
pressure_smoothing | 17500 | 0.00s | 0.00%
compute_particle_kinematics |17500 |6.59s |2.51%
compute_updated_position | 17500 | 3.31s | 1.26%
compute_particle_velocity | 17500 | 3.03s | 1.16%
compute_acceleration_velocity | 17500 | 0.22s | 0.08%
motion_integration_specific_op | 17500 | 0.00s | 0.00%
apply_particle_velocity_constraints | 17500 | 0.00s | 0.00%
write_csv |202 |2.47s |0.94%
locate_particles |17500 |0.89s |0.34%
initialise_simulation |1 |0.02s |0.01%
verbosity |100 |0.00s |0.00%
initialise_contact |17500 |0.00s |0.00%
compute_contact_forces |17500 |0.00s |0.00%
postcompute_stress_strain |17500 |0.00s |0.00%
inject_particles |17500 |0.00s |0.00%
-------------------------------------------------------------------------------------------------
Total of above | |262.17s |99.92%
-------------------------------------------------------------------------------------------------
Total | |262.39s |100.00%
-------------------------------------------------------------------------------------------------
And the end of the output of a simulation:
[2024-04-10 17:57:53.277] [MPMExplicit] [info]
Step: 16625 of 17500 (95%)
Averaged speed so far: 186.90792694447953 iter/s
Estimated remaining time: 4.681449 s
[2024-04-10 17:57:54.213] [MPMExplicit] [info]
Step: 16800 of 17500 (96%)
Averaged speed so far: 186.90895857306566 iter/s
Estimated remaining time: 3.745139 s
[2024-04-10 17:57:55.149] [MPMExplicit] [info]
Step: 16975 of 17500 (97%)
Averaged speed so far: 186.91085412014934 iter/s
Estimated remaining time: 2.808826 s
[2024-04-10 17:57:56.082] [MPMExplicit] [info]
Step: 17150 of 17500 (98%)
Averaged speed so far: 186.91579834926912 iter/s
Estimated remaining time: 1.872501 s
[2024-04-10 17:57:57.017] [MPMExplicit] [info]
Step: 17325 of 17500 (99%)
Averaged speed so far: 186.91981180982094 iter/s
Estimated remaining time: 936.230346 ms
[2024-04-10 17:57:57.952] [MPMExplicit] [info] Rank 0, Explicit USF solver raw duration: 93.622214645 s
Total duration: 1 min 33.622215 s
Averaged speed: 186.9214487860292 iter/s
Edited by Sacha Duverger