Whew, roller-coaster of a week. Ultimately, a pretty crushing ending, sad to say. As stated in the last devlog, my plan for the week was to allocate 100% of effort to exploring the performance characteristics, and, ultimately, feasibility of the current LuaJIT solution.
The tentative conclusion, after getting a finer handle on where I lose microseconds with LJ, was...LJ is really close to being feasible. With the current tests, it is performant enough. But don't get too excited yet. Despite injecting some more logic, my tests aren't yet representative of the full scale of LT simulation. As such, I would have considered passing tests with 'good headroom' as a green light for LJ. In the perfect escalation of tensions, LJ did pass, but not with much headroom. What I mean by 'headroom' is basically 'spare time.' Essentially, the current solution manages to 'just scrape by' the current tests, which puts me in a very nerve-wracking situation. LT will quite obviously be more intensive than my tests, which means, presumably, LuaJIT would not perform well (FYI, I was not running these tests on my older computer, I was running them on a rather powerful laptop, so that contributes to the nerve factor). On the other hand, I haven't explored all optimization routes yet -- I could push more logic into C, I could continue to learn more about how to squeeze the most from LJ, etc.
Well, some of you will be happy to know that my first thought (believe it or not) was threading. Despite how frequently I complain about the difficulty of threading game logic, I actually made a great design decision when building the PAX demo: unlike all of my previous work, I (somewhat impulsively) decided to implement a 'data-driven' design for the component-based entities. I think I did so initially to minimize function call overhead (with this design, I write update & draw functions that take lists of specific object types and perform the necessary logic on them; this is in contrast to simply having 'monolithic' loops that call update/draw functions on one entity at a time). I quickly realized this was a great decision for graphics optimization, as it allowed me to eliminate a lot of OpenGL state change calls. In fact, the simple PAX renderer (theoretically) runs substantially faster than the old C++ LT renderer. But, most importantly, this design ended up being critical to opening a path for a limited amount of game logic threading.
Now, in C(++), this would have been a fairly simple ordeal. But we're in LJ. Lua has no native support for threading, but it's designed such that there's nothing stopping you from running multiple interpreter instances on multiple threads. The real problem, however, is that each such instance has its own data -- there would be no direct 'sharing.' Shared memory is absolutely necessary to getting any performance gain out of threaded logic. The alternative (and the mechanism that existing Lua threading packages use) is to send data as necessary between the different interpreters via communication channels. Frankly this is a waste of time for most high-performance code. In my case, the time it would take to send even a basic snapshot of the state of some objects to another thread in order to have it process them would vastly outweigh the time it would have taken to simply perform the logic on one thread. I don't need to do a perf test to know that non-shared-memory multithreading is a waste of my time OTOH, Lua isn't built for shared-memory threading. Two different interpreter states
Anddd that's how my Friday was spent, striking out the 'cannot' and replacing it with 'should not try to' I devised a decent little hack to force the interpreters to share memory. I had no illusions about how dangerous this was. Nonetheless, I basically lept out of my chair with joy when a simple test actually showed my mechanism to be working, and showed threads smoothly sharing memory. It was fast. It was correct (no memory corruption). I built a threadpool utility and built some functions to help me control arbitrary amounts of worker threads splitting parallel code paths among themselves. Around midnight, I had finally put enough gears in place to try it out on the real thing: threading the logic for ships in the PAX demo.
And that's when everything caught on fire.
It took about five minutes to come to the definitive conclusion that shared-memory threading of any real complexity (e.g. accessing tables within tables) isn't possible in Lua. Best guess is that the garbage-collected architecture screws everything up. Instead of pointers, tables likely use state-relative indices, breaking any attempts to access memory from other interpreters. Whatever the case, I'm nearly certain that the GC is to blame. I can't possibly overstate how much I hate garbage collection.
All-in-all, a fairly heart-breaking week. Threading would have pushed me over the edge into the green -- I would have had a comfortable amount of headroom. Alas, 'twas not meant to be.
I'm not ready to give up on LJ yet. It has still come closer than any other solution. I will not discard it lightly. This week, I will delve even deeper into learning performance characteristics of LJ -- in particular, there is a built-in profiler that is (apparently) very good. I need to get that running so I can see what's going on and where my opportunities for buying time are. Catch is, to run the profiler I have to be running my program directly from LJ, not launching LJ from my program. A little finessing of the C engine will be necessary for me to convert it into a shared library, and then a little time will be required to make a Lua script that does the (basic) functionality that the C core did with respect to starting things up. Shouldn't be long before I can grab some nice profile results. After that? Who knows. I will try whatever I can. The goal is to buy headroom. The goal is to push LJ into the green.
I believe it can be done. Whether I can figure out how is another matter, and I suppose we'll all have to stay tuned to find out
---
PS ~ In case anyone was going to point it out, the 'thread' construct in Lua is deceptively-named, and has nothing to do with true hardware parallelism. It's not helpful in this quest, sadly. Also, LuaLanes, luasched, LuaTask, etc, etc etc etc...none of the existing packages for Lua threading implement the kind we need (shared-memory, preemptive). 'LuaThread' claims to be shared-memory and preemptive. Sadly it has mysteriously disappeared from the internet, and the only similarly-named impostors I can find do not implement true shared-memory/preemptive (likely because, as I found, it is not possible )