The Multi-thread

Have a great idea for the game? We'd love to hear it!

The Multi-thread

Postby Dinosawer » Mon Jan 23, 2017 2:52 pm

pun totally intended :ghost:

So, given the current performance problem Josh is having, I thought I'd share some info on how multithreading is generally done in large (astro)physical simulations.
After all, conceptually they're not super different from LT. And all of them are multithreaded, as running larger ones on a single core simply isn't viable.
Disclaimer: I'm not an expert on multiprocessing, and have little experience of the implementation, but a good grasp of the theory.
This may not be applicable to LT, but I'll let Josh be the judge of that.

Example case
To illustrate the methods, I'll use the following example case.
Say we want to simulate a solar system. We have planets, the sun, moons, comets etc. We dump these into objects (which we'll call particles) with the following properties:
-mass
-position
-velocity
Every tick, we need to calculate the acceleration on each particle. Now, the acceleration (due to gravity) depends on the ass and location of every other particle,
so this is a nice way to demonstrate how to deal with thread communication too.
(spoiler alert: you make it so you don't have too. iirc Josh spoke about each system being a thread and having to communicate data - that's not a good way of doing things.
We're on shared memory after all.)
In a classical singlethreaded approach, the pseudocode for this would be this:

Code: Select all
particles = list(all particles)
while(keepgoing):
    tick = something;
    for particle in particles:
        calculateAcceleration(particle, particles, tick)
    for particle in particles:
        particle.update()


The classic approach
This one is currently the most used one, not because it's better, but because it's easy to integrate into a program that was single threaded before.
Basically, first we make sure the state the sim was in at the start of each iteration is written to memory that is NOT changed during the calculations.
The calculating functions write their findings to other variables. The state then gets updated at the very end, in update().
e.g., the particle will have fields
x0, y0, z0, dx, dy, dz, etc. Calculations written to dx, state kept in x0.
This way you have no communication problems, as all the info each function needs is written in shared memory and nothing ever needs to be copied.
The actual multithreading happens as follows: loops where it is possible (in this case, both of the for loops inside the while) will get split between cores.
e.g. if you have 100 particles and 4 cores, each gets 25 of the loop iterations.
In the correct framework (my thesis project ShadowFax used openMPI) you don't even have to worry about implementation once the main framework is there -
I could just add new functions in loops and they would get multithreaded automatically! Very user friendly.
You don't quite get a 1:1 gain when adding cores, because of parts staying singlethreaded and waiting for synchronisation, but you should be able to get a nice performance boost.

Task-based parallellism
An elegant approach, for a more civilised age.
I first learned about it when receiving a talk about SWIFT. More efficient per core, but used less because you have to adapt your codebase to fit on it.
The basic idea is this: instead of dividing the work, we define tasks. Then we let each thread pick a task, execute it, and mark it as "done", pick a new task, etc.
The advantage of this is that each thread keeps going as long as there is work, thus avoiding threads idling because they're done while others threads are still doing stuff.
As for communication problems, same approach as above.
Now, of course, some tasks need to be done before others can be done.
In my example, calculateAcceleration() must happen before particle.update()
This is handled by giving each task dependencies on other tasks. Then we make the threads only pick up tasks for which all dependencies are met, and the work is
automatically done in the right order.
This avoids a large synchronisation losses and gives great efficiency on large numbers of threads. (SWIFT speak of 60% efficiency on 512 cores, which is really good)
Not sure if it's worth it for small-scale things like LT (at most 8 threads or so), but the approach is very elegant either way.
Paper: https://arxiv.org/pdf/1606.02738v1.pdf (open access, no paywall - interesting point for programmers in part three and further mostly)

So, hope it helps Josh, or is otherwise interesting.
Dino out o7
Last edited by Dinosawer on Tue Jan 24, 2017 1:57 am, edited 1 time in total.
Warning: do not ask about physics unless you really want to know about physics.
The LT IRC / Alternate link || The REKT Wiki || PUDDING
Image
User avatar
Dinosawer
Admiral
 
Posts: 5447
Joined: Fri May 09, 2014 1:08 pm
Location: Belgium

Re: The Multi-thread

Postby Silverware » Mon Jan 23, 2017 4:05 pm

No, bad Dino. Giving me ideas for the stuff I wanted to code a decade ago...
I have to work on Purgatory...

Still, that's a damn good way of looking at Threading, the task based approach in particular would work really well for the masses of AI that LT will no doubt have.
ᕕ(ಠ‸ಠ)⊃━☆゚.*・。゚
User avatar
Silverware
Vice Admiral
 
Posts: 2658
Joined: Sun Sep 07, 2014 3:23 pm
Location: Goattown-Three, Sigma Six, Goat Space

Re: The Multi-thread

Postby Aiha » Mon Jan 23, 2017 4:30 pm

Dinosawer wrote:Now, the acceleration depends on the ass and location [...]

Huh, I've been taking physics courses the last couple of semesters... guess I haven't learned this yet. :ghost:
User avatar
Aiha
Ensign
 
Posts: 31
Joined: Wed Jul 09, 2014 2:00 pm

Re: The Multi-thread

Postby Dinosawer » Mon Jan 23, 2017 4:40 pm

:lol: That's what I get for writing on the train.
Eh, leaving it, it's funny :ghost:
Warning: do not ask about physics unless you really want to know about physics.
The LT IRC / Alternate link || The REKT Wiki || PUDDING
Image
User avatar
Dinosawer
Admiral
 
Posts: 5447
Joined: Fri May 09, 2014 1:08 pm
Location: Belgium

Re: The Multi-thread

Postby Cornflakes_91 » Mon Jan 23, 2017 4:41 pm

Silverware wrote:the task based approach in particular would work really well for the masses of AI that LT will no doubt have.


systems would also lend themself well as task units, process_solar_system() as individual task chunks.
User avatar
Cornflakes_91
Admiral
 
Posts: 8896
Joined: Wed Mar 06, 2013 1:53 am
Location: Austria

Re: The Multi-thread

Postby Silverware » Mon Jan 23, 2017 4:43 pm

Cornflakes_91 wrote:
Silverware wrote:the task based approach in particular would work really well for the masses of AI that LT will no doubt have.


systems would also lend themself well as task units, process_solar_system() as individual task chunks.

Yeah, and you could schedule specific ones only once every 30 frames or something. (or whenever you switch into them) so long as you keep a record of the last time they updated.
ᕕ(ಠ‸ಠ)⊃━☆゚.*・。゚
User avatar
Silverware
Vice Admiral
 
Posts: 2658
Joined: Sun Sep 07, 2014 3:23 pm
Location: Goattown-Three, Sigma Six, Goat Space

Re: The Multi-thread

Postby Cornflakes_91 » Mon Jan 23, 2017 4:45 pm

Silverware wrote:
Cornflakes_91 wrote:
Silverware wrote:the task based approach in particular would work really well for the masses of AI that LT will no doubt have.


systems would also lend themself well as task units, process_solar_system() as individual task chunks.

Yeah, and you could schedule specific ones only once every 30 frames or something. (or whenever you switch into them) so long as you keep a record of the last time they updated.


heck that could be internal to the process_solar_system() with themself keeping track how often and when to run a full update.

with the task scheduler having no idea which systems are at what LOD level.
(beyond is_loaded = TRUE)
User avatar
Cornflakes_91
Admiral
 
Posts: 8896
Joined: Wed Mar 06, 2013 1:53 am
Location: Austria

Re: The Multi-thread

Postby Dinosawer » Mon Jan 23, 2017 4:51 pm

In task based it's actually better to keep the tasks as small as possible, for optimal time scheduling efficiency. process_system is too large.
Scheduling can be done much easier - in Shadowfax we assigned each object a timestep of 1/2^k, and if the current time was a multiple of that step it was simulated that step (the task was scheduled, if it were task-based). (tick per loop was the smallest timestep)
Allows you to run different things at different rates easily.
Warning: do not ask about physics unless you really want to know about physics.
The LT IRC / Alternate link || The REKT Wiki || PUDDING
Image
User avatar
Dinosawer
Admiral
 
Posts: 5447
Joined: Fri May 09, 2014 1:08 pm
Location: Belgium

Re: The Multi-thread

Postby Flatfingers » Mon Jan 23, 2017 5:13 pm

I don't know enough about this to agree or disagree that it could, on balance, help LT.

But I am curious: at what point do the costs of synchronization start to outweigh the benefits of task-ifying functionality?
User avatar
Flatfingers
Vice Admiral
 
Posts: 4398
Joined: Sat Nov 24, 2012 12:45 am

Re: The Multi-thread

Postby Dinosawer » Mon Jan 23, 2017 11:10 pm

Depends on the case, but in Swift, never. They had near perfect scaling on >32000 threads in a sim with 500 million particles. (the larger the amount of stuff the better it scales)

Mathematically speaking, in task-based approaches the synchronisation idle time (of cores waiting for one to finish) is at most as long as the largest of tasks between 2 syncs. So as long as you don't try ridiculously small sims there's a gain.
Warning: do not ask about physics unless you really want to know about physics.
The LT IRC / Alternate link || The REKT Wiki || PUDDING
Image
User avatar
Dinosawer
Admiral
 
Posts: 5447
Joined: Fri May 09, 2014 1:08 pm
Location: Belgium

Re: The Multi-thread

Postby FormalMoss » Mon Jan 23, 2017 11:51 pm

Dinosawer wrote::lol: That's what I get for writing on the train.
Eh, leaving it, it's funny :ghost:


Yup, don't forget about:
    multuthreading
    neing
    mempry
:clap: :lol:
YAY PYTHON \o/

In Josh We Trust
-=326.3827=-


User avatar
FormalMoss
Captain
 
Posts: 938
Joined: Fri Jun 06, 2014 8:06 am

Re: The Multi-thread

Postby FormalMoss » Tue Jan 24, 2017 12:45 am

Hi Josh, This might be for Graphics Josh, but if it helps to think in a "different light", then all the better, no?


I was drawn to Dino's link to ShadowFax and its gallery.

I shall draw your attention to the third movie (far right, a load of dots in a square).
When clicking on this, it says:

"Single generator moving through a 2D Voronoi mesh, illustrating the continuous way in which the Voronoi mesh adapts to a continuous movement of the generators. Faces shrink and grow, but do not suddenly (dis)appear."



When I watched this vid, it got me thinking about using voronoi meshes for the imposters issue Josh was having a while ago, except in 3d.

So a Google search for "3d Voronoi mesh imposters" brought me to a Powerpoint presentation on GPU Computational Geometry.

A synopsis is as follows:

Good fits for GPU:
- Voronoi Diagrams, Distance Fields

3 Research Papers:

Generic Mesh Refinement on GPU
- Use/ update coarse meshes on CPU, generate refined meshes on GPU to desired LOD
- Main Reason: Overcome Bandwidth Bottleneck
- Secondary Reason: Offload work load from CPU onto GPU
- Performance: Frame rates are equivalent, #Vertices on bus greatly reduced, CPU free up to work on other tasks than refinement
- Conclusions:
- Simple Vertex Shader Method for low cost tessellation of meshes on GPU
- Reduced bandwidth on graphics bus
- CPU freed up


- Dynamic LOD on GPU (Page 33)
- Goal - GPU LOD Geometry
- For max efficiency, data structures must support parallel algorithms
- Overview of approach
- Creation
- LOD Atlas Texture
- Rendering
- Select appropriate LOD level
- Render on GPU
- Performance
- GPU approach about 10x faster than CPU approach


- Isosurface Computation Made Simple: Hardware Acceleration, Adaptive Refinement and Tetrahedral Stripping"


Edit: Yikes! That took an hour to review and write up! :wtf: :ugeek: :problem:
YAY PYTHON \o/

In Josh We Trust
-=326.3827=-


User avatar
FormalMoss
Captain
 
Posts: 938
Joined: Fri Jun 06, 2014 8:06 am


Return to Suggestions



Who is online

Users browsing this forum: No registered users and 2 guests