Skip to content

Improve performances (mainly parallelization)

Sacha Duverger requested to merge improve/parallelization into custom_cbgeo

This MR aims to improve the performance of the code, but it also adds some features:

  • a timing feature, which gives a detailed list of in which function the time is spent (using std::chrono), and prints the results to 2 files (timing.txt and timing.json).
  • improves the verbosity, adding the averaged speed and the estimated remaining time
  • print the speed evolution into a file speed.csv, as well as several performance-related quantities to timing.json.
  • better format for duration, they were previously given in milliseconds and now are formatted according to their value (from microseconds to days).

Also, this MR bumps the C++ version to 17 (it was previously 14).

Here is an example of the contents of the new my_sim/results/my_sim/timing.txt file (for a bouncing cube simulation):

Ran on 2024-04-12 18:23:39.98, sorted by execution sequence:
-------------------------------------------------------------------------------------------------
                   Name                    |      Count      |       Time        |    Rel. time     
-------------------------------------------------------------------------------------------------
initialise_simulation                      |1                |0.02s              |0.01%             
verbosity                                  |100              |0.00s              |0.00%             
inject_particles                           |17500            |0.00s              |0.00%             
initialise_mpm_scheme                      |17500            |48.68s             |18.55%            
    reset_nodes                            |    17500        |    0.11s          |    0.04%         
    activate_nodes                         |    17500        |    0.30s          |    0.11%         
    compute_shapefn                        |    17500        |    48.25s         |    18.39%        
        shapefn_themselves                 |        17500000 |        12.00s     |        4.57%     
        gradient                           |        17500000 |        33.48s     |        12.76%    
initialise_contact                         |17500            |0.00s              |0.00%             
compute_nodal_kinematics                   |17500            |11.38s             |4.34%             
    map_mass_momentum_to_nodes             |    17500        |    11.11s         |    4.24%         
    compute_nodal_velocity                 |    17500        |    0.11s          |    0.04%         
    apply_nodal_velocity_constraints       |    17500        |    0.14s          |    0.05%         
compute_contact_forces                     |17500            |0.00s              |0.00%             
precompute_stress_strain                   |17500            |9.85s              |3.76%             
    compute_strain                         |    17500        |    8.54s          |    3.26%         
    update_volume                          |    17500        |    0.45s          |    0.17%         
    pressure_smoothing                     |    17500        |    0.00s          |    0.00%         
    compute_stress                         |    17500        |    0.85s          |    0.32%         
compute_forces                             |17500            |182.29s            |69.47%            
    map_body_force                         |    17500        |    6.03s          |    2.30%         
    apply_traction_on_particles            |    17500        |    0.00s          |    0.00%         
    apply_concentrated_force               |    17500        |    0.00s          |    0.00%         
    map_internal_force                     |    17500        |    176.24s        |    67.17%        
        compute_internal_force             |        907200000|        23.79s     |        9.07%     
        update_nodal_internal_force        |        907200000|        24.84s     |        9.47%     
compute_particle_kinematics                |17500            |6.59s              |2.51%             
    compute_acceleration_velocity          |    17500        |    0.22s          |    0.08%         
    motion_integration_specific_op         |    17500        |    0.00s          |    0.00%         
    compute_particle_velocity              |    17500        |    3.03s          |    1.16%         
    apply_particle_velocity_constraints    |    17500        |    0.00s          |    0.00%         
    compute_updated_position               |    17500        |    3.31s          |    1.26%         
postcompute_stress_strain                  |17500            |0.00s              |0.00%             
locate_particles                           |17500            |0.89s              |0.34%             
write_csv                                  |202              |2.47s              |0.94%             
-------------------------------------------------------------------------------------------------
Total of above                             |                 |262.17s            |99.92%            
-------------------------------------------------------------------------------------------------
Total                                      |                 |262.39s            |100.00%           
-------------------------------------------------------------------------------------------------

Ran on 2024-04-12 18:23:39.98, sorted by cost:
-------------------------------------------------------------------------------------------------
                   Name                    |      Count      |       Time        |    Rel. time     
-------------------------------------------------------------------------------------------------
compute_forces                             |17500            |182.29s            |69.47%            
    map_internal_force                     |    17500        |    176.24s        |    67.17%        
        update_nodal_internal_force        |        907200000|        24.84s     |        9.47%     
        compute_internal_force             |        907200000|        23.79s     |        9.07%     
    map_body_force                         |    17500        |    6.03s          |    2.30%         
    apply_traction_on_particles            |    17500        |    0.00s          |    0.00%         
    apply_concentrated_force               |    17500        |    0.00s          |    0.00%         
initialise_mpm_scheme                      |17500            |48.68s             |18.55%            
    compute_shapefn                        |    17500        |    48.25s         |    18.39%        
        gradient                           |        17500000 |        33.48s     |        12.76%    
        shapefn_themselves                 |        17500000 |        12.00s     |        4.57%     
    activate_nodes                         |    17500        |    0.30s          |    0.11%         
    reset_nodes                            |    17500        |    0.11s          |    0.04%         
compute_nodal_kinematics                   |17500            |11.38s             |4.34%             
    map_mass_momentum_to_nodes             |    17500        |    11.11s         |    4.24%         
    apply_nodal_velocity_constraints       |    17500        |    0.14s          |    0.05%         
    compute_nodal_velocity                 |    17500        |    0.11s          |    0.04%         
precompute_stress_strain                   |17500            |9.85s              |3.76%             
    compute_strain                         |    17500        |    8.54s          |    3.26%         
    compute_stress                         |    17500        |    0.85s          |    0.32%         
    update_volume                          |    17500        |    0.45s          |    0.17%         
    pressure_smoothing                     |    17500        |    0.00s          |    0.00%         
compute_particle_kinematics                |17500            |6.59s              |2.51%             
    compute_updated_position               |    17500        |    3.31s          |    1.26%         
    compute_particle_velocity              |    17500        |    3.03s          |    1.16%         
    compute_acceleration_velocity          |    17500        |    0.22s          |    0.08%         
    motion_integration_specific_op         |    17500        |    0.00s          |    0.00%         
    apply_particle_velocity_constraints    |    17500        |    0.00s          |    0.00%         
write_csv                                  |202              |2.47s              |0.94%             
locate_particles                           |17500            |0.89s              |0.34%             
initialise_simulation                      |1                |0.02s              |0.01%             
verbosity                                  |100              |0.00s              |0.00%             
initialise_contact                         |17500            |0.00s              |0.00%             
compute_contact_forces                     |17500            |0.00s              |0.00%             
postcompute_stress_strain                  |17500            |0.00s              |0.00%             
inject_particles                           |17500            |0.00s              |0.00%             
-------------------------------------------------------------------------------------------------
Total of above                             |                 |262.17s            |99.92%            
-------------------------------------------------------------------------------------------------
Total                                      |                 |262.39s            |100.00%           
-------------------------------------------------------------------------------------------------

And the end of the output of a simulation:

[2024-04-10 17:57:53.277] [MPMExplicit] [info] 
	Step: 16625 of 17500 (95%)
	Averaged speed so far: 186.90792694447953 iter/s
	Estimated remaining time: 4.681449 s

[2024-04-10 17:57:54.213] [MPMExplicit] [info] 
	Step: 16800 of 17500 (96%)
	Averaged speed so far: 186.90895857306566 iter/s
	Estimated remaining time: 3.745139 s

[2024-04-10 17:57:55.149] [MPMExplicit] [info] 
	Step: 16975 of 17500 (97%)
	Averaged speed so far: 186.91085412014934 iter/s
	Estimated remaining time: 2.808826 s

[2024-04-10 17:57:56.082] [MPMExplicit] [info] 
	Step: 17150 of 17500 (98%)
	Averaged speed so far: 186.91579834926912 iter/s
	Estimated remaining time: 1.872501 s

[2024-04-10 17:57:57.017] [MPMExplicit] [info] 
	Step: 17325 of 17500 (99%)
	Averaged speed so far: 186.91981180982094 iter/s
	Estimated remaining time: 936.230346 ms

[2024-04-10 17:57:57.952] [MPMExplicit] [info] Rank 0, Explicit USF solver raw duration: 93.622214645 s
	Total duration: 1 min 33.622215 s
	Averaged speed: 186.9214487860292 iter/s
Edited by Sacha Duverger

Merge request reports