Multicore ?

Re: Multicore ?

Mon Jan 21, 2013 6:47 am

#16 by Rabiator

DWMagus wrote:The biggest thing we need to consider is mostly Josh's time frame and what he believes he is capable of.

There is a lot of variance between different computers, cpus, cores, threads, etc. The idea behind multi-threading is to break up a large task into smaller tasks so that it is easier to process. This is true for large-scale operations where it is best to complete a task in the fastest way possible.

Where this gets into a problem for LT is that even if Josh is able to utilize multiple cores, and it is the most efficient ever, what it boils down to is that it means that on faster/multiple core machines the simulation will run... better? faster? It becomes a no-brainer.

We already know that a hexacore current gen AMD will run rings around a P4 machine. In the actual amount of 'work' to be done, it will be the same regardless of the machine you're on. i.e. The simulation will run at the same speed with the same LOD regardless of your machine. In that case, the amount of 'work' that needs to be done is static. You're only breaking up a finite amount of work to be processed faster.

First, I agree about the time frame - no sense in tackling the impossible.

I do, however, not necessarily agree about the amount of work being needed constant. This might well depend on the size of the players's empire. Assuming that

-the player can have multiple assets (ships, stations)
-and each of these assets observes its surroundings and reports in real-time to the player

you have the same needs for consistency around each asset as near the player himself. That means a bubble of high detail LOD around each asset, and correspondingly more simulation work to do.

In practice, this could mean that the old P4 breaks down when the player adds his third station, a modern AMD hexacore can handle a maximum of 20 stations and a high end Intel hexacore can handle 30 stations.

Quote

Re: Multicore ?

Mon Jan 21, 2013 7:11 am

#17 by Katorone

As the universe grows larger, more events will happen. But these events can be queued. Josh can let the game engine decide how much spare calculating time there is, and pick a few items from a queue to calculate each second. Some items can take precedence over others, eg a carrier arriving at a hostile star base is more important than a transporter moving slower because of a nebula.

The end result would be that the events in the universe slow down. This might be wanted behaviour as the player can't possibly keep track of everything happening at once. A lot of things are happening at the same time in the game, but not everything needs to be calculated in real time. As there's only a finite time to process events in an infinite universe, I'm guessing Josh will have to do something like this. Regardless of the amount of cores.

Also, the comment about "fuzzy" Josh made, makes me think he'll let the game guess sometimes. For instance, a factory will never always produce x amount in y time. There are costs for broken down machinery, unavailable resources, small theft, etc... The game doesn't need to keep track of factory stock levels at all times. That would be a waste of cycles. The game just needs to guess how many items were made between the times the factory was involved in an event (player looking at the stock levels, traders arriving/leaving). Traders won't find the factory without goods when they arrive, because smart traders phone ahead and make a down payment. (So X-series universe traders making useless trips shouldn't happen as much.)

Beware of he who would deny you access to information, for in his heart he dreams himself your master.

Quote

Re: Multicore ?

Wed Jan 23, 2013 10:40 am

#18 by kone

Working with compression and big corporate number-crunching real-time systems I tried to come up with a good idea to offer here.
The problem is that the hardware is not unlimited and even when the CPUs get multiplied every now and then, the inter-process communication will suck up the processor bandwith big time once there are too many threads, multiplying the number of connections between them.

Let me go through the things which would seem quite usable here and why they are possibly not usable at all
(also some of them are not a good idea for only some kinds of data):
1. (no)SQL Database in Memory
While this approach would be the simplest and maybe most straightforward it would be painfully slow.
Of course you can have a priorized lazy-caching list where all queries would be queued. This queue would be the bottleneck of the whole universe, first maybe just dropping lower prioritized requests and then the higher ones as load increases.
Of course they can scale and put up more processes, clustering the data and try to intelligently guess to which server process the next request should be routed to, but I think that could be too much overhead. Also this library would introduce more dependencies. Also the memory consumption can lead to very big amounts of unusable RAM.

2. Use frameworks like openMP http://en.wikipedia.org/wiki/Openmp and Boost http://en.wikipedia.org/wiki/Boost_(C%2B%2B_libraries)
While they all promise the golden calf I would be afraid of inter-framework problems which normally are not traceable in a good manner. Also if you use external libraries you are bound and have to trust them somehow. While I have a little experience in multi-coring I do not have enough trust in those frameworks but who knows... Maybe one should try it. Try it really good! I would definitely put up some load testings on those frameworks before I would use them. Which would put time- overhead in this solution aswell.

3. ultra-fast, different Data Structures
So lets take the example of the automated trader, wanting to know where she should go in the shiny spaceship.
What we need here is just: Give me the price of Product X on Location Y. (x 1000000)
Imagine a hash table (or in python: dict) where you could define the key as X.Y = 500 for example.
Yes, this would be a monolithic structure in memory, providing the bottleneck for all to see, BUT:
it's extremely fast. If you would just think of queuing or caching the requests or the results: think again.
Hash tables are usually nearly as fast as a direct memory lookup (or a direct call of any item in a array).
Using the Take-a-Dozen algorithm or the secretary problem or whatever you would not even need to ask all of the planets and providers in the whole universe.
In this case we can at least provide a fixed structure and keep it really simple - but it's not very procedural.
You could built up different lists / Hash tables for goods (each good), weapons, ships etc. but it would not solve the problem completely.
As noSQL Databases do have a similar structure, you could possibly use one - see 1. When it gets too big you need to put it (partially) on Disk too - see 4.

4. Use the OS
What we basically need is a non-blocking way of storing and retreiving data very fast.
The only thing where I did find anything like this is for webservers:
Solving the C10k problem http://en.wikipedia.org/wiki/C10k_problem
For example Tornado: http://www.tornadoweb.org/, also node.js.
I found that 'non-blocking' usually means to use events and let the OS handle all the trouble

.
I have to admit that I'm not completely sold on using this technology.
But lets just think for a second: we need a savegame-file, right?
Why not procedurally write the savegame in (compressed) blocks (maybe use some index to keep track) and use the file interface of the OS to handle all the troubles? Using the non-blocking I/O interfaces, this could work.
Some years ago I was strictly against using files anywhere - each interface (I believed) should have used encrypted, compressed real-time XML-streams. Yes. I was that naive.

You wouldn't believe it over how many interfaces i stumbled which are actually using files (or filestream objects and such). So the actual OSes are quite powerfull and provide good caching and non-blocking access to a nearly unlimited amount of data.
(Note: For a nice idea to make File Systems smarter see ReiserFS 4 for nice B*-trees in a File System: http://en.wikipedia.org/wiki/Reiser4 )

5. asynchronous, event-driven architecture
While this is used by the webservers mentioned before you can of course implement one yourself.
Katorone mentioned it earlier - this is the standard aproach in client-server architecture. While I was a fan of all my objects having to pipe down requests and answering them aswell in this manner I now think that this is too much of a hazzle. Also when the amount of objects grow exponentially, pipes can be a pain in the ass. But I guess if something grows exponentially everything is.

Logic
The thing we would need to know is: when should we put a cluster of stars into a seperate process, assuming that we actually can communicate between processes quite fast.
The answer I can think of would be: when there is too much going in any process/thread it should split (like bilogical cells).

Splitting processes
So lets assume you start from the beginning in LT. The universe would be simulated in one process right now.
As soon as this thread reaches a certain treshhold (CPU percentage would be nifty - 80% of one core), it would split itself in two, but lazy. So the first process would keep the universe alive and fill the other process slowly until both have the same data - until the first one would have experienced some change - you would need to introduce some version control and then you have - again - the problem of overhead.

Reducing Requests in general
Another idea would be to put a price tag on the lookup:
The farther the object is to the own location, the more it costs to ask for the price.
This could effectively limit the amounts of global requests albeit only to a certain extense, not eliminating the problem entierly.

So ... many words, I know. I'm sorry.
Maybe someone can learn from this in some way - and maybe somebody else gets the perfect idea for LT from it.
Have a nice one.

Quote

Re: Multicore ?

Wed Jan 23, 2013 11:56 am

#19 by ToreadorVampire

What are the rules regarding profanity on this forum? All of this talk of "Th****ing" and "L***s" - don't you know that those are bad words?

Multithreaded codebases are the road to hell. I say that even as someone who develops in a beautiful managed environment such as .NET/C#. I'm SO very glad that I don't generally have to deal with it.

There is another slant to this though, in that it is probably premature to be thinking about threading for performance purposes. After all:

Premature optimisation is the root of all evil.

Nobody suspects a Toreador …

Quote

Re: Multicore ?

Wed Jan 23, 2013 12:11 pm

#20 by DWMagus

ToreadorVampire wrote:Multithreaded codebases are the road to hell. I say that even as someone who develops in a beautiful managed environment such as .NET/C#. I'm SO very glad that I don't generally have to deal with it.

Oh god. I'm a systems admin in a large government entity (20k users, 15k machines, 1k servers, 8 windows domains) and I can tell you that our codebase clusters are ran into the f'ing ground by our development teams because of trying to do multithreading optimizations. As for databases? You would need to come up with your own, because no matter how how efficient an off-the-shelf solution is, they still hog memory like nothing else. Recently we decided to test Oracle's claims on MySQL so we decided to put an instance on a box. Apparently they weren't lying when they said that MySQL will use as much RAM as you give it up to 2TB. SQL is memory hungry. If you give it RAM, it will eat it.

Add in the overhead of multithreading? Man, we needed a full farm just for our main cluster. But the good news is, once we got off our high horse and stopped trying to utilize all these 'nifty' things, we were able to repurpose our single farm to handle all our databases without issue.

Early Spring - 1055: Well, I made it to Boatmurdered, and my initial impressions can be set forth in three words: What. The. F*ck.

Quote

Re: Multicore ?

Tue Jan 29, 2013 1:04 pm

#21 by DWMagus

Now that Josh has moved scheduling to the GPU, isn't this the same thing as making something multicore? The only difference is that instead of using just the CPU, Josh is using the GPU as well (effectively multicore between GPU and CPU).

With this in place, is it pertinent to actively use the other cores if they are available to help out?

Granted, I don't know the depth of what this GPU scheduling is capable of, nor how much is even offloaded, but it sounds to me like a pseudo multi-core process already.

Any light on this Josh? Sorry for dredging up topics you have already 'ruled' on.

Early Spring - 1055: Well, I made it to Boatmurdered, and my initial impressions can be set forth in three words: What. The. F*ck.

Quote

Re: Multicore ?

Tue Jan 29, 2013 2:43 pm

#22 by JoshParnell

DWMagus wrote:Now that Josh has moved scheduling to the GPU, isn't this the same thing as making something multicore? The only difference is that instead of using just the CPU, Josh is using the GPU as well (effectively multicore between GPU and CPU).

With this in place, is it pertinent to actively use the other cores if they are available to help out?

Granted, I don't know the depth of what this GPU scheduling is capable of, nor how much is even offloaded, but it sounds to me like a pseudo multi-core process already.

Any light on this Josh? Sorry for dredging up topics you have already 'ruled' on.

Pahaha please don't feel as if a reply from me constitutes "ruling" on a topic

Unless locked...they're all open for discussion

The GPU scheduling is a bit different from multiprocessing. The main idea with the GPU scheduling is to "smooth out" the workload over multiple frames. GPU scheduling is different than CPU core scheduling in this sense, because there is only one GPU, and you do not send tasks to individual GPU cores, you just send a task to the GPU. However, some tasks take too long to send all at once. Most rendering-related things are fast enough to happen once every frame, but generating the field data for these new models requires significantly more than a single frame of GPU time. Altogether, the processes take probably 1 to 2 seconds of dedicated GPU time.

If you were to launch a 2-second GPU job all at once, you would crash the driver (at least on Windows), because the OS does not expect a GPU job to take that long. Furthermore, even if the job completed without crashing, you would have a 2-second lag before you saw the next frame of the game. Of course, for a game rendering 60 frames per second, that's absolutely unacceptable.

The point of the GPU scheduler is to figure out how much of each job it can execute each frame while still maintaining a smooth framerate. So it is not multiprocessing per-se, because it still runs in serial with all of the rendering stuff. It just makes sure that GPU jobs don't clog up the rendering pipeline. In this sense, it sort of "feels" like multiprocessing, because you're submitting a large job but the rendering doesn't slow down. But it's not true multiprocessing, since, again, everything is running in serial (in general, one does not make "threaded" calls to GPUs, since, internally, the driver is not threaded (AFAIK))

Now, I did mention also implemented CPU multithreading, and that is multiprocessing, obviously. So the model generator is taking advantage of multiple CPU cores now.

So, finally, to get to the point/the real question "is it effectively multicore" - yes, in the sense that the GPU is a massively parallel thing and when it executes these jobs, it executes them on a bazillion different little execution units in parallel; no, in the sense that all of the GPU jobs are being processed in serial; no, in the sense that the GPU and CPU are in lock-step while the jobs are running (because the CPU needs to be able to time and control the GPU jobs), so the CPU is waiting on the GPU to complete them.

Phew. Sorry. Turns out it was a pretty tricky question

“Whether you think you can, or you think you can't--you're right.” ~ Henry Ford

Quote

Re: Multicore ?

Tue Jan 29, 2013 3:50 pm

#23 by jimhsu

How is frame rate stability currently? As in whatever metric -- 99th percentile frame time, minimum frame rates, standard deviation of frame rates, etc. As the recent TechReport articles article showed, high frame rates with instability can lead to a worse play experience than lower, constant frame rates. X3 has a lot of problems in this area particularly, as in pauses and skips everywhere.

I imagine with procedural LOD, you can mitigate this effect significantly on the GPU side. But there are other scenarios -- sudden jumps in disk latency, a path-finding routine taking longer than expected, etc. How well does scheduling handle those cases? A lot of that happens when the GPU is starved for data due to the CPU crunching away at something.

Quote

Re: Multicore ?

Tue Jan 29, 2013 4:35 pm

#24 by JoshParnell

jimhsu wrote:How is frame rate stability currently? As in whatever metric -- 99th percentile frame time, minimum frame rates, standard deviation of frame rates, etc. As the recent TechReport articles article showed, high frame rates with instability can lead to a worse play experience than lower, constant frame rates. X3 has a lot of problems in this area particularly, as in pauses and skips everywhere.

I imagine with procedural LOD, you can mitigate this effect significantly on the GPU side. But there are other scenarios -- sudden jumps in disk latency, a path-finding routine taking longer than expected, etc. How well does scheduling handle those cases? A lot of that happens when the GPU is starved for data due to the CPU crunching away at something.

Quite stable, as I am personally extremely sensitive to unstable framerates. However, since many of the CPU-intensive routines are still not implemented, it's hard to say yet.

In general, though, I am a fan of sampling-based or iterative randomized algorithms, and the good news about such algorithms is that they can run for as little or as much time as you like, and are easy to resume on the next frame. I plan to do most of the CPU-intensive stuff this way (like AI, for example) so that there won't be sudden stalls. Just like in the GPU scheduler, I will smooth CPU-intensive jobs across many frames.

Also, if it's something realllyyyyy intensive that can't be easily divided across multiple frames, I will put it in a different thread so that the GPU doesn't starve while the CPU crunches.

In summary, I'll do everything I can to mitigate jerkiness. I'm not a fan of it myself, and I imagine it would seriously hinder my enjoyment of the game

“Whether you think you can, or you think you can't--you're right.” ~ Henry Ford

Quote

Re: Multicore ?

Tue Jan 29, 2013 5:37 pm

#25 by DWMagus

Thanks for the explanation on the GPU scheduling.

I understand now. Basically what it sounds like is that even though a GPU has many "cores" inside you don't need to worry about it because the hardware is handling the scheduling instead of manually having to schedule it yourself. If a CPU could do this, we'd see leaps and bounds in terms of efficiency.

Early Spring - 1055: Well, I made it to Boatmurdered, and my initial impressions can be set forth in three words: What. The. F*ck.

Quote

Re: Multicore ?

Wed Feb 13, 2013 7:33 am

#26 by ThisIsJustMe

Make sure you find a way to fork out for the user interface / keyboard/mouse handling. I know some stuff that tends to "lose characters when typing" because the rendering in the background is pretty heavy.

Quote

Re: Multicore ?

Tue May 06, 2014 3:05 am

#27 by Katawa

JoshParnell wrote:PS ~ As an aside, it's not even clear to me that splitting the simulation would yield higher performance. As I said, the amount of object-object dependency would necessitate multiple lock acquires/releases for each entity, for each simulation tick. Which means multiple (evil) kernel calls for each entity, for each simulation tick. Off the top of my head, I'm going to say that the overhead of that will far outweigh the gain of threading..

immutable data don't need no locks, PARADIGM SHIIIIIFT
peace, I'm out

woops, my bad, everything & anything actually means specific and conformed

Quote

Re: Multicore ?

Thu May 08, 2014 3:24 am

#28 by Rabiator

Since this thread is already resurrected, what about OOS simulation in one or more separate threads?
If the workload is split along entire systems belonging to either thread A or thread B, there should be a fairly low amount of object-object dependency between threads. An object is either in system X or system Y, and has no interaction with objects in the other system

.

Quote

Re: Multicore ?

Thu May 08, 2014 10:41 pm

#29 by Katawa

Rabiator wrote:Since this thread is already resurrected, what about OOS simulation in one or more separate threads?
If the workload is split along entire systems belonging to either thread A or thread B, there should be a fairly low amount of object-object dependency between threads. An object is either in system X or system Y, and has no interaction with objects in the other system .

What if they're trading on the same market?

woops, my bad, everything & anything actually means specific and conformed