Welcome back, bienvenue, aloha, etc etc. You know how this works by now Onward!
Recap since Last Time:
- Moved all heavy physics code (kinematics, collision, attachments/parenting, sleeping) to C (HUGE gains)
- Added lots of new instrumentation & utilities for profiling; we have a much better handle on CPU time now
- Major improvements to HDR tonemapping to prevent color desaturation, especially in bright regions
- Added time acceleration (for testing purposes...)
- Ported dust-flecks from LTC++; dust clouds to follow shortly
- Had a lot of fun with Lindsey's new extrusion & stellation functions in our PCG library
- Added AI aiming / turret control to existing gameplay logic test
- Stress-tested gameplay script scheduler with great results
- Loads and loads and loads of fixes, optimizations, cleanups
- Scripting scripting scripting gameplay logic gameplay logic gameplay logic and such
(I feel compelled to emphasize that seeing these changes in bullet-point format doesn't even come close to conveying the amount of progress made since the last log )
Gameplay Logic with Coroutines and Automatic Scheduling
While much of my time over the past few weeks has been spent spelunking in the various caverns of our existing systems in search of milliseconds, it has all been in service to the bigger-picture goal of supporting 'a lot' of gameplay going on at a large scale. Ultimately, although the journey likely won't be complete for yet another week or two, I'm happy to report that things thus far are going very well. In this log, I'll give a more complete picture of what that actually means!
Now, I know already that I'm going to indulge myself here in my technical explanations, particularly since I'm excited about the system at hand. As such, I'll offer a TL;DR for those who don't find my forrays into techno-gibberish-land to be interesting.
TL;DR: It's now very easy to write gameplay logic that can run efficiently at huge scales in the midst of lots of other logic. Our systems take care of figuring out how, when, and which logic to run each frame to achieve maximal accuracy alongside maximal performance. While the system can support a tremendous amount of logic going on at once without hurting performance, it can also automatically and gracefully degrade the accuracy of simulation logic in order to maintain FPS when 'tremendous amount' turns into 'put out the fire on my CPU!!'
Those who enjoy gibberish, proceed, those who don't, skip to next horizontal separator
I've given a few code examples in the past of mod format, CType usage, Lua in LT, and so forth. What I haven't given is a look into how 'real' gameplay logic will look under our work-in-progress gameplay logic system (which is, for the moment, rather unimaginatively named the 'Script Scheduler'). Time to change that! How is it that we will develop that which will go on to bring life to LT? Let's have a look!
Meet my 'simple AI test' script:
Clearly I've left out a lot -- I've folded the initialization, which just sets up some state for the AI unit and is largely uninteresting. I've omitted definitions for the two functions that seem to be doing the work here:
aiTurretControl. All I've left is the outer body, but that's because the interesting thing about this code is not so much the content, but the structure. If you're code-savvy, you may have already picked up on the bizarre elegance of what's going on here: the script itself -- which is, at this point in testing, the only thing giving AI units any behavior at all -- is merely a function! Not only is it just a function, it's a rather bizarre one at that. It initializes some data and then goes into a seemingly-infinite loop (the 'while' block). There's precious little to indicate that it's part of a larger system. In fact, from the standpoint of the one writing this code, it almost looks like this is the only logic in existence! The code seems to completely deny being part of an ecosystem of gameplay logic. It is not worrying about keeping track of some 'state machine;' it's not defining a bunch of callback functions to interact with the outside world; aside from local variables there's virtually no persistent state of any kind! Overall, it looks nothing like what one would expect from a gameplay script.
Remember a while back when I said something about the needs of gameplay logic being very similar to the needs of a process in an operating system? That is, in fact, almost exactly what's going on here! Our current implementation of the script manager functions similarly to an operating system scheduler, and gameplay logic scripts function similarly to a process. Indulge me in my desire to explain how and why this is beautiful for both the one writing code and the system running it
I've expressed over and over since my performance failures in LTC++ that exploiting sparsity is key to being able to support a thousand AI ships running low-level manuevering logic, while simultaneously having research modules all over the universe working toward technology breakthroughs, while simultaneously letting AI players in high places run cost-benefit analyses of hundreds of potential projects, while simultaneously simulating the black-box economies of colonies everywhere, while...you get the idea! Recognizing that these calculations each have their own unique demand patterns for CPU cycles, then being able to predict and accordingly exploit those patterns...well, this is the difference between LT and solitaire But doing so while also making it tractable to write all that logic? I had a name for the problem: FPLT, the fundamental problem of LT. At the time, someone rightly pointed out that it was not so much the fundamental problem of Limit Theory so much as it was the fundamental problem of simulating any large, complex system in real-time -- a problem encountered by many others at many points in the past.
Operating systems get closer to solving that problem than any other piece of software known to me. Some solve it better than others... An operating system must cope with: an unknown amount of work needing to be performed by an unknown number of workers at unknown time intervals, interacting with one another in unknown ways, and having an unknown number of CPU cycles to spend each unit of time to accomplish all of this. By unknown, I mean unknown beforehand -- unknown at the time of building the operating system. Hence, a good OS must be prepared to efficiently support an arbitrarily-complex web of processes using a finite amount of computational power. It must also do so in such a way that the burden is not overwhelming on the programmer when he/she needs to write a program that performs such work. If you really think about it, this is quite an amazing feat! Say what you will, but modern OSs are indeed pretty good at this central job of theirs. How?? Chiefly, via two mechanisms: intelligent scheduling and virtualization of resources. The former attempts to maximize resource utilization such that the most work-per-unit time is achieved (over both the short- and long- term). The latter makes it appear to programs that they have exclusive control over and unlimited usage of each resource, in order to ease the burden of programming. The most important resource in question is, of course, CPU time.
Enough of that, let's get back to our AI script and understand what's really going on. Clearly, something's up with that 'sleep' function. Indeed, it is central to how our script mechanism works. Under the hood, we're using Lua coroutines, a somewhat advanced programming concept to fully understand, but one with simple enough implications: scripts can run as though they were their own 'processes,' living within a function (or a multitude thereof), while still being able to give control back to the rest of the game so as not to forever stall the universe while one silly pilot tries to calculate a successful womp-rat-bullseye. Now we see what's actually happening here: those calls to sleep are 'giving up' (in coroutine parlance, 'yielding') control of the CPU to whatever thing gave it in the first place (here, the script manager). In the world of coroutines, one thinks less of code like this:
Code: Select all
other code other code other code other other other ... <-- I am here :'[ --> ... ...
And more like this:
Code: Select all
my code my code my code ... <-- yield; everyone ELSE is here >:] --> ... mine mine mine mine mine
This so-called inversion does wonders for being able to reason about game logic! We no longer have to consider the context in which our code is running -- much like a process in an OS need not worry about what else is actually going on with memory or the CPU (multithreading aside...). We simply write the entirety of what we want our game logic to do, from start to finish, and, wherever it makes the most sense (usually inside a looping construct), we will insert these 'sleep' calls to allow the rest of the world to do what it wants.
But that's not quite the end of the story, because this code is not really leveraging our system to the fullest -- it's being intentionally greedy in order to stress-test the script manager. As you might have guessed, that number inside the sleep call is a number that indicates how long the script wants to sleep.
Sleep(0)essentially means "look, I'm going to let everybody else have some time to do their thing, but WAKE ME UP AS SOON AS YOU CAN." There is no 'sparsity' being exploited here. Let's consider a different script, one for a manufacturing module's logic, which is almost too good of an example for exploiting sparsity:
This one may have mistakes since I just cooked it up, but it's still very much representative of what writing LT logic will look like. Manufacturing here is dead-simple. Suppose we have a 'job' description (it will come from the blueprint of whatever's being manufactured) that includes some inputs (materials consumed), outputs (materials produced), and a time indicating how long a single run of the given job takes. All we do in our script is check that our inventory contains all of the required materials to run the job, then remove the prescribed amount of input material from inventory, call sleep with the given job time, and, upon waking, feed the outputs into inventory. We can easily put this inside a loop so as to allow instructing manufacturing units to continue manufacturing until some prerequisite number of runs have been completed (or materials run out).
Here we have a true example of something that will benefit immensely from sparsity. The vast majority of the time, manufacturing logic consists of precisely nothing. If you were to write this in
onUpdatestyle, you would have a state variable, a switch or if/elseif/else sequence checking the state, and, within one of those branches, you would simply be checking if
elapsedTime >= job.batchTime(which would be false for 99.9% of the time, until the final frame where the job completes). Now, run a million manufacturing jobs across the universe at the same time, and what will happen? The other style will bog down. Maybe it will take more than a million. But it will come to a grinding halt at some point. Our version with sleep? You will see no noticeable difference in performance until the number of jobs gets so large that it starts forcing other game memory out of RAM and into swap space. It will be long, long after the other system has come to a screeching halt. This is because, with a good scheduler, we need exactly no time at all to process sleeping jobs (this is a tiny lie, but only tiny -- asymptotically, I am not lying, because sleeping jobs require O(1) time for the whole lot of them).
I have shown two examples at the extremes -- low-level AI maneuvering isn't a great example of sparsity, while manufacturing is a perfect one. But consider that LT logic spans the whole spectrum in-between, and that, by my estimation, that spectrum is heavily-biased toward the sparse side. What we have here is indeed the game-changer that we were looking for with respect to managing logic of all granularities. Huzzah! \o/
- Logic is never scheduled before requested, but may be scheduled after. For this reason, sleep returns 'delta time' -- the actual amount of time elapsed since the script yielded. AI uses this, for example, in the PID algorithm, which requires knowing elapsed time. The manufacturing script has no need for the true dt, though it does operate under the assumption that at least job.batchTime time will have elapsed when it is awoken (our scheduler, like most, guarantees an 'at least' relationship with respect to sleeping)
- The entirety of script execution can be bounded such that total script time is never allowed to take, for example, more than 10ms per frame. Doing so allows maintaining performance -- but our scheduling algorithm also ensures that each script will still be called with an importance that is proportional to how long it's been since the script should have awoke
- Due to the above 'fairness' policy, dt becomes 'larger than expected' for each script at roughly the same rate,
meaning that each script's temporal accuracy degrades at roughly the same rate, so the simulation accuracy degrades smoothly and fairly in the face of too much work
- Since the scheduler is deeply concerned with time, it is very easy to profile the total CPU time of each script! This allows much more insightful information than just 'Ship.onUpdate' or 'AI.onUpdate,' as it was in LTC++ profilers. Now we will actually see accurate time consumption for each piece of logic, aggregated over all entities on which that logic is running!
- Speaking of which, even in its infancy, scheduler has been profiled with tens-of-thousands of scripts running at the same time with no degradation
- The future of gameplay logic is BRIGHT!
Whew. Took two-and-a-half turns of the hourglass that time...but I regret nothing! It has been such a long road to this system. As I mentioned above, it'll still be a week or two before the whole thing is finalized. But things are shaping up really nicely, both from the performance standpoint and from the ease-of-use standpoint.
It's a shame that I don't have more time to talk about the tremendously-exciting developments on Friday, what with Lindsey's implementation of extrusion and stellation for station-building, plus the addition of the icosahedron primitive. I did, at least, take an excessive number of screenshots while playing with the additions. So...gallery! Note that any lack of diversity here is not the fault of Lindsey, as our primitives library is now quite large, but is to blame on my obsession with icosahedra and their beautiful symmetry...which made it hard for me to focus on any other shape There are also a few AI unit shots and nebula shots in there, just because
I must now turn attention to the KS update! See you next time
Limit Theory Devlog Gallery ~ November 6, 2017
A few of my favorites from the gallery, showcasing some very basic results using almost nothing but icosahedra, stellation, extrusion, and warping (the occasional torus does show up). Thanks, Lindsey!