Limit Theory Forums

Posted: **Mon Oct 09, 2017 1:12 pm**

So I was watching this video on how some guys over at Iron Galaxy developed an AI development method called Shadow System. Long story short, during training sessions and regular play, everything about the world state and the players actions is recorded in discrete "Replay segments" the AI is trained on these replay segments of real players, learning how to be aggressive, how to defend, counter, dodge, combo, etc in that particular player's playstyle. When then implemented in real time, the AI compares the current world state vs similar world states that have been recorded, and then executes the replay segment with the closest starting world state.

I think this would be a fantastic way to train AI for LT, at least for some things like dog fighting and battle tactics... long term business strategies, exploration strategies, I have some ideas but they would need to be tested.

I recommend you watch the video above but I'll include my notes in the spoiler.

Spoiler: SHOW

Now the bulk of the video is talking about 1v1 combat, which will of course happen, but presumably there will usually be more than 2 entities in a fight, and often there will be a variety of enemies, friendlies, neutrals, stations, planets, wormholes, etc. so even with this sort of system, accounting for everything will be impossible. But we can try.

Because most combat will probably be fleet v fleet combat, some additional things will need to be added.
First, A training command interface.
The training command interface allows a player to

assign objectives - kill, destroy, protect, barricade, etc.
select single or multiple units and assign definable roles and hierarchies- artillery, flank, skirmish, escort, guard, etc.
give priorities to different assets/targets
select individual units/squadrons in a fleet to take control of in the middle of a battle to train that particular strategy/tactic
trigger events such as reinforcements, presence of neutrals, units switching sides in the middle of battle
Define active contracts and relations between NPCs such as assassination contracts held by a neutral, unsuspected party, or bribes which have been paid to make certain NPCs to doublecross their team.

My Dojo's idea comes to mind here, specifically as an isolated training ground where you can practice your own skills whether beginner or expert, fight yourself, train the AI, and set up any number of scenarios from an Ambush to a Siege, destroying a mining operation/research lab, raiding and robbing traders, and so on.

Now the biggest difference between this shadow system used for 1v1 vs 1000v1000 is the amount of information which is being recorded and accessed at any given time. Its unclear how much of this sort of information could be handled in a training session or in real time play, ideally each ship is aware of critical information and changes regarding their role, but that different NPCs are accessing all sorts of different information simultaneously. The fighter pilot is paying attention to their squadmates positions in relation to their own, but the general doesn't care about the distance between individual fighters, but does care about the position of different squadrons. So the general AI is trained differently than the fighter AI, while the game is aware of all details at all times.

Beyond combat, it might be possible to Shadow-train AI in competetive but not combative things like business.
but business decisions can be much more abstract than combat ones. An idea for that is to have an option to create "Business plans" where the player defines the 1:??? 2:??? 3:??? 4: Profit steps they intend to carry out. The plan itself monitors how close they are to achieving each step, and what actions they took during that time, whether said actions seemed immediately relevant or not, as perhaps they were for setting up groundwork/contingencies for future steps.
The plan can also monitor different aspects of the game such as the market, any attacks/raids that occur, or other player-defined metrics.
The business plan can also be changed, but such changes can logged as an adaptation to the new situation, such as reducing expected mining output, or "~~kill~~ buy off competitors", and so on.

In typing this out, it occurred to me that while such training has the potential to make impressively human-like AI, it might have too much information from too many people for too many possible scenarios to be practical for a would filled with infinite ProcGen AI. I'm not an expert so maybe this exists, but there would need to be a way to aggregate the data from lots of training scenarios into a handful of files, which adhere to the most commonly trained patterns, yet can implement variations in a similar fashion to the way humans do. But such a solution will take a more technically savvy mind than my own.

Anyways, thoughts?

Posted: **Mon Oct 09, 2017 4:33 pm**

Hyperion wrote: ↑
Mon Oct 09, 2017 1:12 pm
So I was watching this video on how some guys over at Iron Galaxy developed an AI development method called Shadow System. Long story short, during training sessions and regular play, everything about the world state and the players actions is recorded in discrete "Replay segments" the AI is trained on these replay segments of real players, learning how to be aggressive, how to defend, counter, dodge, combo, etc in that particular player's playstyle. When then implemented in real time, the AI compares the current world state vs similar world states that have been recorded, and then executes the replay segment with the closest starting world state.

I think this would be a fantastic way to train AI for LT, at least for some things like dog fighting and battle tactics... long term business strategies, exploration strategies, I have some ideas but they would need to be tested.

I recommend you watch the video above but I'll include my notes in the spoiler.
Spoiler: SHOW
AI for competitive Games

Shadow system - Copies player behavior

Training Sessions pitting character against copy of self, or against others
Step 1: Give the AI examples of how to be aggressive: Different kinds of attacks directed at a passive target, inclusing up close, distance, combos, attacks with a set-up (jump attacks)
Step 2: Give AI examples of how to be defensive: Defend against the attacks you just showed it (Blocks, dodges, distance & orientation from enemy)
Step 3: during defense training, add countermeasures and counterattacks
Step 4: Repeat above until a variety of attacks, defenses, counters have been learned.

THIS IS A TRAINED AI WITH AN INDIVIDUAL PLAYER'S UNIQUE STYLE
The more training different AI's receive from a single player, the more intelligently that AI can respond to a number of different situations.
The more AIs which are trained by different players, the more unique encounters a player can experience with an AI

Training Sessions
RECORD EVERYTHING
World-states (position, speed, orientation, inventory, health, stamina, etc)
Actions the player does at a given world-state
Replay segments
short (~2-10sec), optionally player defined
Note beginning worldstate, actions, and final worldstate
With a bank of replay segments, find the best matching replay segment for current worldstate to get to most desirable worldstate at end of segment
Mind-Games
Guessing what the player will do next based on what theyve done in the past
Include reactions and reaction time where the player is just standing
"It's possible to pre-aim and pre-fire, but it's not possible to have a 10ms reaction time."

The Shadow System - Architecture
To determine how similar 2 situations are, the SS uses Similarity Functions, Weights, Heuristics

Similarity (Health and Distance) ultimately defined by designer
Find closest replay example with that worldstate and execute that segment

Create a Score
Difference in health * weight = health score
Diff distance * weight = distance score
Diff Timer * w = timer score
Diff Meter * w = Meter Score
Diff Ammo * w = Ammo Score
Diff Stat * w = Stat Score
Add all scores, replay segment with lowest total is best match
Defining the importance of each is the main thing in balancing
Rather than a difficulty slider for accuracy and reaction time, measure things like distance, visibility, direction to target

Heuristics
Players keep track of info that the game doesn't to change their behavior (how often opponent attacks high vs low and change block tactics to match)
Deliberately add tracking for these trends let's shadows adjust their behavior like players do, or not if the player does not

Because its recording everything the shadow system will capture strategic and tactical behaviors without explicit knowledge of them. it will also capture social behaviors

Social coordination tracking
Position, route, actions, status of friendlies
position of Enemies
Status of Objectives

Will keep up with players as it's continuously copying behaviors as the metagame shifts

Pull in real data from other players
Self Reflection - Play against yourself to identify weaknesses
Filling Gaps - Use shadows to fill in vacancies for matchmaking
Dropin-Dropout - Take over a shadow at any point, or drop out and leave a shadow
Remix and breed shadows - Make new opponents that get better over time
AI tournaments - AI vs AI
Now the bulk of the video is talking about 1v1 combat, which will of course happen, but presumably there will usually be more than 2 entities in a fight, and often there will be a variety of enemies, friendlies, neutrals, stations, planets, wormholes, etc. so even with this sort of system, accounting for everything will be impossible. But we can try.

Because most combat will probably be fleet v fleet combat, some additional things will need to be added.
First, A training command interface.
The training command interface allows a player to
assign objectives - kill, destroy, protect, barricade, etc.

select single or multiple units and assign definable roles and hierarchies- artillery, flank, skirmish, escort, guard, etc.

give priorities to different assets/targets

select individual units/squadrons in a fleet to take control of in the middle of a battle to train that particular strategy/tactic

trigger events such as reinforcements, presence of neutrals, units switching sides in the middle of battle

Define active contracts and relations between NPCs such as assassination contracts held by a neutral, unsuspected party, or bribes which have been paid to make certain NPCs to doublecross their team.

My Dojo's idea comes to mind here, specifically as an isolated training ground where you can practice your own skills whether beginner or expert, fight yourself, train the AI, and set up any number of scenarios from an Ambush to a Siege, destroying a mining operation/research lab, raiding and robbing traders, and so on.

Now the biggest difference between this shadow system used for 1v1 vs 1000v1000 is the amount of information which is being recorded and accessed at any given time. Its unclear how much of this sort of information could be handled in a training session or in real time play, ideally each ship is aware of critical information and changes regarding their role, but that different NPCs are accessing all sorts of different information simultaneously. The fighter pilot is paying attention to their squadmates positions in relation to their own, but the general doesn't care about the distance between individual fighters, but does care about the position of different squadrons. So the general AI is trained differently than the fighter AI, while the game is aware of all details at all times.

Beyond combat, it might be possible to Shadow-train AI in competetive but not combative things like business.
but business decisions can be much more abstract than combat ones. An idea for that is to have an option to create "Business plans" where the player defines the 1:??? 2:??? 3:??? 4: Profit steps they intend to carry out. The plan itself monitors how close they are to achieving each step, and what actions they took during that time, whether said actions seemed immediately relevant or not, as perhaps they were for setting up groundwork/contingencies for future steps.
The plan can also monitor different aspects of the game such as the market, any attacks/raids that occur, or other player-defined metrics.
The business plan can also be changed, but such changes can logged as an adaptation to the new situation, such as reducing expected mining output, or "~~kill~~ buy off competitors", and so on.

In typing this out, it occurred to me that while such training has the potential to make impressively human-like AI, it might have too much information from too many people for too many possible scenarios to be practical for a would filled with infinite ProcGen AI. I'm not an expert so maybe this exists, but there would need to be a way to aggregate the data from lots of training scenarios into a handful of files, which adhere to the most commonly trained patterns, yet can implement variations in a similar fashion to the way humans do. But such a solution will take a more technically savvy mind than my own.

Anyways, thoughts?

I like these ideas and would certainly want my assets behaving in a manner and with similar skill to my own. Maybe this could be a simulator room where the player is free to perform various tasks and then have those actions copied onto something similar to a data chip which becomes an in-game item that can be used to train the player's assets in particular skills.

I see this being applied to a number of areas besides combat and trading. It could be useful for finding and mining asteroids efficiently, scanning for long range threats, among other things.

Posted: **Mon Oct 09, 2017 4:43 pm**

I'm not very optimistic about "copy the human player" as a learning model. What if the human actions are sub-optimal? What if there are multiple humans whose ideas of "best" actions are different -- couldn't that confuse the naïve NPC learner? What if some tactical problems have solutions that no human might imagine but an AI, unconstrained by human preconceptions, might produce?

I see that as a relatively brittle method.

AI-vs-AI learning, though... that, I like. I had some thoughts on how to "breed" better NPC ship behaviors on my own blog back in 2009, which I then tweaked for LT in the NPC Ship Combat Behavior Options thread. You might find some ideas there that could be applicable to this thread.

For discussion, though, three points:

1. NPC ships must have a wide range of competency levels. It cannot be enough that the bigger ship always wins -- boring!

In the context of this post, suppose we could find a way to train tactical brilliance into ship NPCs -- I contend we would not want every NPC pilot to operate at that level of skill.

It will IMO be vastly more fun to go with the model developed in 1990: from Wing Commander. In WC, a lot of ships were basically cannon fodder... but in certain missions, you could run into an enemy "ace" pilot. This simple two-state model was incredibly satisfying; after solving a "how do I beat these groups of unimaginative opponents" problem, you're suddenly faced with a single ship that flies faster, rolls faster, turns faster, and seems to anticipate your responses before you make them. Instead of nameless antagonists, now you're dealing with an enemy with a name who seems to have a personal vendetta against you. (Shades of the Nemesis System from Shadows of Mordor, which Cornflakes has been kind enough to explain to me.)

In the case of LT, I'd suggest that while it might be cool to train brilliant NPC pilots, that should not imply that every NPC pilot we encounter will express that level of competence. A visibly wide range of competencies will be more fun.

2. Environment must matter. I am tired beyond words of games whose designers think that "magic powers" specific to a character are what constitute tactical competence. Oh, hey, look, NPC Bob is spamming the magic power Fireball; "I can tactics!" No. No, you can't.

I've yammered about tactics elsewhere; in the context of this conversation, I'm suggesting that NPC pilot AI needs to understand not just "these are my weapons and defenses," but "I can recognize environmental phenomena in space, such as asteroids, radiation fields, and Space Dragons, and I know how to adapt my tactics to use those local phenomena to my advantage and/or my opponent's disadvantage."

3. Group intelligence > individual intelligence. In other words, "tactics" needs to be about more than just what an individual NPC pilot can do -- what about coordinated group actions?

NPC pilots, under certain circumstances, ought to be able to coordinate their actions so that the whole group becomes more effective than just the sum of its individual ships. So in addition to individual tactics, NPC ships ought to be able to learn squadron tactics; and beyond that there could even be fleet (mixed ship types) tactics.

As with individual NPC pilot competence, group competence should vary. A well-trained squadron might have a number of effective tricks for luring opponents into kill zones; a motley bunch of pirates might get in each other's way and refuse to defend each other.

Finally, if you really want an interesting idea for how NPCs might be persuaded to satisfy player interests, consider this JoshPost from August 21, 2013:

JoshParnell wrote: ↑
Wed Aug 21, 2013 3:51 pm
So before LT, I had an idea for a style of game creation that I wanted to try out someday, that relies exactly on evolution of procedural algorithms as you suggest. My idea was:

Build algorithm for a class of content, doesn't necessarily have to be consistently good - just capable of good output (much easier than building a consistently-good algorithm)

Expose the parameters of the algorithm to some sort of nonconvex optimization procedure (evolutionary computation comes to mind)

Build a simple website where two outputs are displayed side-by-side and the user is asked to choose the most appealing ("which weapon looks more aesthetically-pleasing")? This is essentially powering the optimizer's ability to make decisions about the fitness of an output.

Get a crowd of people who are interested in helping build the game (just by doing side-by-side content comparisons) and...evolve the algorithms to perfection!
Alternatively, you don't even have to evolve the algorithms, maybe just store the best-rated outputs if you want to go for a manual content game.

It's really "crowd-sourced" game development! I think it would be a really unique and interesting way of building content.

Posted: **Tue Oct 10, 2017 10:58 am**

Flatfingers wrote: ↑
Mon Oct 09, 2017 4:43 pm
Finally, if you really want an interesting idea for how NPCs might be persuaded to satisfy player interests, consider this JoshPost from August 21, 2013:

and using hyperion's suggesion is fundamentally the same, except it provides the optimiser more data to work with.
its not just a binary yes/no question but a "what would you do"

the same fundamental limitations still apply: people being people and doing things differently from each other.

also, the part where people do things differently could even be used as a feature
when you are already having a couple of people doing AI training keep their data separate and use that as different personalities
with people playing suboptimal (because they are people not robots..) being a feature, as the AI then makes errors itself and doesnt act optimally all the time.

what i see as a bigger problem is how to store and search the recorded data sets fast and efficiently.
theres a lot of possible scenarios and a lot of possible reactions to those, thats a lot of data that has to be distributed in a game (and not as an attachment to some research paper) and to be searched a hundred times per second in a battle

Posted: **Tue Oct 10, 2017 12:09 pm**

From what I understand, recent Forza racing games use a technique like this to model human players' actions and create AI drivers that behave and drive like individual players- they then use these to populate single player races. Playing against them, opponent drivers feel more aggressive and demonstrate human-like behavior, such as mistiming turns or ramming other opponents. I'm not sure how much of this is bad driving AI and how much of it is humans being bad at racing games, and it is difficult to tell when behavior is just artificial rubberbanding, but at the very least it is a cool gimmick.

Posted: **Tue Oct 10, 2017 9:49 pm**

Cornflakes_91 wrote: ↑
Tue Oct 10, 2017 10:58 am
what i see as a bigger problem is how to store and search the recorded data sets fast and efficiently.
theres a lot of possible scenarios and a lot of possible reactions to those, thats a lot of data that has to be distributed in a game (and not as an attachment to some research paper) and to be searched a hundred times per second in a battle

That's actually an argument in favor of defining a "perfect" NPC pilot (as I suggested) and then, for each instance of an individual pilot, degrading its competence according to that pilot's personality traits.

Posted: **Wed Oct 11, 2017 1:55 pm**

Flatfingers wrote: ↑
Tue Oct 10, 2017 9:49 pm
That's actually an argument in favor of defining a "perfect" NPC pilot (as I suggested) and then, for each instance of an individual pilot, degrading its competence according to that pilot's personality traits.

building a perfect pilot is a completely different thing than compressing data effectively, though.
The first thing is hard to test, the second thing not.

Building a perfect pilot needs a lot of testing anyway, so why not record and use the test sessions?

Posted: **Wed Oct 11, 2017 2:05 pm**

Someone say Shadows?

Spoiler: SHOW

--IronDuke

Posted: **Wed Oct 11, 2017 6:16 pm**

Cornflakes_91 wrote: ↑
Wed Oct 11, 2017 1:55 pm
Building a perfect pilot needs a lot of testing anyway, so why not record and use the test sessions?

I think you'll find that all solutions to codifying NPC pilot expertise will need to be tested pretty thoroughly.

So setting that aside, I wonder if what we're looking at here is the difference between action-based training (how closely did an NPC copy the trainer's actions?) and outcome-based training (who won across many simulated fights?).

What I'm thinking is that action-based training is more brittle than outcome-based training. That is, action-based training can be effective as long as the real scenario closely matches the training scenario; but outcome-based training becomes more effective as scenario components become more complex and variable.

The reason I think outcome-based training -- pitting pilots against each other in many, many simulated matchups -- is preferable for LT is because I'm hoping that, as I've suggested many times, many parts of normal space will be full of interesting "terrain." (Otherwise "tactics" is just who has more/bigger guns.) So, if space is relatively complex, and the terrain of local environments can vary a fair amount, then I believe it'll be more cost-effective to use a dynamic, simulated, outcome-based training model than to require a human to carefully define, and create, and correctly (as a pilot) take advantage of every possible combination of environmental features in an action-based training model.

For a real-world example of this, consider the times when you've had a computer problem and needed to talk to Tech Support. Who would you rather talk to: someone who's been trained to read from a pre-defined set of scripts? Or someone who's been trained to be good at general technical problem-solving?

IronDuke wrote: ↑
Wed Oct 11, 2017 2:05 pm
Someone say Shadows?
Spoiler: SHOW

"The Babylon Project was our last, best hope for peace.

...it failed."

Posted: **Thu Oct 12, 2017 5:16 am**

Flatfingers wrote:What if the human actions are sub-optimal? What if there are multiple humans whose ideas of "best" actions are different -- couldn't that confuse the naïve NPC learner?

And? LT is trying to make the NPCs feel like real people, sub-optimal and tactically misguided people included.

Cornflakes_91 wrote:what i see as a bigger problem is how to store and search the recorded data sets fast and efficiently.
there's a lot of possible scenarios and a lot of possible reactions to those, thats a lot of data that has to be distributed in a game (and not as an attachment to some research paper) and to be searched a hundred times per second in a battle

This is for me the bigger issue too. The ability to have real human input, but in a way that doesn't require 10,000+ action-plan files for different situations. That's an issue I haven't been able to tease apart.

I'm okay with building the perfect pilot and then making it suboptimal in a couple dozen different ways and different magnitudes of suckage in each to get an enormous variety of behavior, but I think getting real data of human suckage and skill is preferable.

1. NPC ships must have a wide range of competency levels. It cannot be enough that the bigger ship always wins -- boring!

Of course. But the training on human data isnt meant to train tactical brilliance, it's meant to train tactical realism, not everyone who can fly is an ace pilot, not every trainer knows what they're doing.

2. Environment must matter. I am tired beyond words of games whose designers think that "magic powers" specific to a character are what constitute tactical competence. Oh, hey, look, NPC Bob is spamming the magic power Fireball; "I can tactics!" No. No, you can't.

1,000,000% agree. AI will of course need to have information on environmental conditions, but going from "There is an asteroid field 10,000km to my starboard" to "I can land on an asteroid and go dark to hide myself" is a significant hurdle. An advanced neural network might be able to figure it out, but barring that, I would say only directly programming that option or training from human data, noting that a human did just that would get the AI to do it as well.

3. Group intelligence > individual intelligence. In other words, "tactics" needs to be about more than just what an individual NPC pilot can do -- what about coordinated group actions?

That is why I suggested a training command interface, where a human can take control of not just a single ship, but of a whole squadron or whole fleet, create their own defined groups and then train those ships as a single unit, the same way they would train a single ship to handle different situations.

Finally, if you really want an interesting idea for how NPCs might be persuaded to satisfy player interests, consider this JoshPost from August 21, 2013:

You might like this video. My own thought on that was to have an algorithm that creates not 2, but 16 children each with a different slight variation from their parent(s), and the human chooses from 0-16 of the children to survive to the next generation, breeding the survivors with a few mutations. And yes, this could be applied to all sorts of things from aesthetics to combat.

Limit Theory Forums

AI training with Shadows

AI training with Shadows

Re: AI training with Shadows

Re: AI training with Shadows

Re: AI training with Shadows

Re: AI training with Shadows

Re: AI training with Shadows

Re: AI training with Shadows

Re: AI training with Shadows

Re: AI training with Shadows

Re: AI training with Shadows