AI gradient descent algorithm

AI gradient descent algorithm

Tue Nov 12, 2013 10:50 pm

#1 by mmoose

I would like to unify the entire AI under a gradient descent algorithm. That's basically what the maneuvering piece of the AI uses already – it computes a potential field based on the sum of all the actions that it wants to perform, then it uses a gradient descent to follow the most favorable path through that field. I'd like to do the exact same thing with all actions, not just low-level maneuvering. To do so, however, I need to introduce low-level metrics to which the AI can dynamically attach scores.

This is pretty brilliant--and presumably one could adjust the AI difficulty by adusting the speed of descent (e.g., step length). The unification is elegant and simple.

However, If Josh is introducing low-level metrics to attach scores, what guarantees that the scores will be differentiable (assuming a Newton-type algorithm)? Or, is Josh using some amoeba-like method of descent?

Quote

Re: AI gradient descent algorithm

Wed Nov 13, 2013 5:53 am

#2 by Katawa

He's using the dark descent, I forget what that's about though.

woops, my bad, everything & anything actually means specific and conformed

Quote

Re: AI gradient descent algorithm

Wed Nov 13, 2013 6:01 am

#3 by Cornflakes_91

Katawa wrote:He's using the dark descent, I forget what that's about though.

Amnesia: The dark descent ? ^^

Limit Theory IRC Channel FAQ big QA Thread Retroshare

Quote

Re: AI gradient descent algorithm

Wed Nov 13, 2013 10:33 am

#4 by JoshParnell

Everything is treated discretely, so it doesn't matter whether the metrics are technically differentiable or not: you just measure the metric before and after performing an action, which tells you that action's impact on that metric. You then look at how you value the metric, and, by linearity, the value coefficient times the discrete change is what you might call the "value" of performing that action. E.g. my reputation with B increases by X when I perform A. My credits also decrease by Y. I value B's reputation at Z per unit and credits at W per unit. Hence, the action is worth Z*X - W*Y to me.

The equivalent of gradient descent with actions is just selecting the highest-value set of actions to perform, and the performing them.

Hope that answered the questions

“Whether you think you can, or you think you can't--you're right.” ~ Henry Ford

Quote

Re: AI gradient descent algorithm

Wed Nov 13, 2013 2:13 pm

#5 by CSE

The steepest descent is a greedy algorithm that is quite effective for "simple" problems but usually fails to find optimum solutions for problems with complex objective value topography (many local optima).
So you should add a "long term/long view" component to AI thinking in order to make sure that an AI does not remain blocked in a local optimum, i.e. a sub optimal behavior that require "tough" decision to get away from.

An example: you are in a system with asteroids, decide to mine and get equipped with mining equipment, make some runs. Discover that the mineral are scarce or for some reason in low demand.
Steepest descent would still keep mining as the alternative to a small gain is actually a short term loss of having to buy e.g. weapons to start a new career as bounty hunter.

There are many solutions to this problem, the more so when a good, as opposed to a proven optimal, solution must be found. An example is the tabu search, which, using memory to escape local minimal, is particularly fitting to simulate people learning from past experience. In its simple form, it consists of keeping track of past decisions and avoid recalling into the same pattern by making return to a past decision taboo. Thie behavior could e.g. be triggered when an AI suffers a steady diminution of revenue.

Quote

Re: AI gradient descent algorithm

Wed Nov 13, 2013 2:54 pm

#6 by JoshParnell

Yes, descent is as smart or stupid as the objective function allows it to be

But the topology of my objective function is automatically influenced by the perception of hypothetical long-term gains, which allows the AI to reason about everything from ultra-short-term, reflex-like actions all the way to long-term life goals under the exact same framework. That was kind of the point in moving to a purely gradient-based method

In your example, the recognition that a career change would be the most lucrative course of action would cause the AI to propagate value backwards through the objective function, until it landed on the "buy weapons" action, for example. The backwards-propagation of value would allow for overcoming what would otherwise be a locally-suboptimal decision.

I've seen this technique before in Q learning, but I'm not sure if this exact formulation has a name.

I'm very excited about this technique, I think it's going to perform really well

And I also think it has strong roots in human psychology

“Whether you think you can, or you think you can't--you're right.” ~ Henry Ford

Quote

Re: AI gradient descent algorithm

Wed Nov 13, 2013 2:57 pm

#7 by Grumblesaur

JoshParnell wrote: I'm very excited about this technique, I think it's going to perform really well And I also think it has strong roots in human psychology

Soon we'll see a field of Computational Psychology.

Shameless Self-Promotion 0/ magenta 0/ Forum Rules & Game FAQ

Quote

Re: AI gradient descent algorithm

Wed Nov 13, 2013 3:21 pm

#8 by CSE

JoshParnell wrote:But the topology of my objective function is automatically influenced by the perception of hypothetical long-term gains, which allows the AI to reason about everything from ultra-short-term, reflex-like actions all the way to long-term life goals under the exact same framework. That was kind of the point in moving to a purely gradient-based method

Does it look like being able to look a couple of step forward, like in chess calculating the effect of every move and counter move for a couple of steps?

If yes, that means that you can escape local optima if (and only if) a better solution than said local optima is within your horizon, whose scope is limited to the number of steps forward, which when increased exponentially increases the computational difficulty.

As opposed to that, meta-heuristics forces an AI to leave a local optima even when no better option is found! in the local region within the scope of the horizon. May sound crazy, but allows to force a change, i.e. taking a risk. Again this would be toggled by a hard situation, such as a trend of ever smaller revenues below a certain threshold.
Real life example is emigrating to another country when life becomes too hard in your own: there is no guarantee, no "better" situation in a short to mid term view, but the current situation is worse and targeting the off chance is the best - if not rational - way to go.

The advantage is that when "aging" your universe during initial generation, this will force many niches to be filled by AI. Most will take initially the same decision (low hanging fruits) and competition will make the less fit have low profit and such a system would then force them to find new activities.

If not, and you did not mean just an "extended horizon", than I misunderstood

Quote

Re: AI gradient descent algorithm

Wed Nov 13, 2013 3:50 pm

#9 by JoshParnell

No, it's not forward-looking: as I said, value propagates backwards from hypothetical situations. There is no limit to how far off the situation can be. The propagation is also proportional to the magnitude of the value involved, which means that large optima will quickly propagate to change the local landscape even if they are extremely far away.

I promise I've thought it through

“Whether you think you can, or you think you can't--you're right.” ~ Henry Ford

Quote

Re: AI gradient descent algorithm

Wed Nov 13, 2013 4:49 pm

#10 by mmoose

Grumblesaur wrote:
JoshParnell wrote: I'm very excited about this technique, I think it's going to perform really well And I also think it has strong roots in human psychology
Soon we'll see a field of Computational Psychology.

Actually, this sort of thing is already happening in economics. (Maybe in psychology or other fields too, but I wouldn't know.) And, yes, it is crazy.

Quote

Re: AI gradient descent algorithm

Wed Nov 13, 2013 4:51 pm

#11 by mmoose

JoshParnell wrote:Everything is treated discretely, so it doesn't matter whether the metrics are technically differentiable or not: you just measure the metric before and after performing an action, which tells you that action's impact on that metric. You then look at how you value the metric, and, by linearity, the value coefficient times the discrete change is what you might call the "value" of performing that action. E.g. my reputation with B increases by X when I perform A. My credits also decrease by Y. I value B's reputation at Z per unit and credits at W per unit. Hence, the action is worth Z*X - W*Y to me.

The equivalent of gradient descent with actions is just selecting the highest-value set of actions to perform, and the performing them.

Hope that answered the questions

By the way, Josh, thanks for answering my question!

Quote

Re: AI gradient descent algorithm

Wed Nov 13, 2013 5:01 pm

#12 by mmoose

CSE wrote:The steepest descent is a greedy algorithm that is quite effective for "simple" problems but usually fails to find optimum solutions for problems with complex objective value topography (many local optima).
So you should add a "long term/long view" component to AI thinking in order to make sure that an AI does not remain blocked in a local optimum, i.e. a sub optimal behavior that require "tough" decision to get away from.

An example: you are in a system with asteroids, decide to mine and get equipped with mining equipment, make some runs. Discover that the mineral are scarce or for some reason in low demand.
Steepest descent would still keep mining as the alternative to a small gain is actually a short term loss of having to buy e.g. weapons to start a new career as bounty hunter.

There are many solutions to this problem, the more so when a good, as opposed to a proven optimal, solution must be found. An example is the tabu search, which, using memory to escape local minimal, is particularly fitting to simulate people learning from past experience. In its simple form, it consists of keeping track of past decisions and avoid recalling into the same pattern by making return to a past decision taboo. Thie behavior could e.g. be triggered when an AI suffers a steady diminution of revenue.

Another possible algorithm is called simulated annealing (http://en.wikipedia.org/wiki/Simulated_annealing), which is particularly effective in discrete state spaces.

Quote

Re: AI gradient descent algorithm

Thu Nov 14, 2013 12:39 am

#13 by CSE

JoshParnell wrote:No, it's not forward-looking: as I said, value propagates backwards from hypothetical situations. There is no limit to how far off the situation can be. The propagation is also proportional to the magnitude of the value involved, which means that large optima will quickly propagate to change the local landscape even if they are extremely far away.

I promise I've thought it through

Well, I guess I am not suppose to understand it, only to enjoy it when playing LT. I trust optimization-Josh to deliver a good AI algorithm, of course.

But if you find the time (e.g. while your left brain do some graphical stuff and your right brain is bored) to (re-)explain how a future situation can be evaluated (in order to propagate backwards) without being "looked forward", so generated by a sort of stepwise future construction which would be exponential in nature with longer depth, that would be interesting....

mmoose wrote:Another possible algorithm is called simulated annealing (http://en.wikipedia.org/wiki/Simulated_annealing), which is particularly effective in discrete state spaces.

Indeed. Tabu search seemed however more appropriate to me because AI will need a sort of memory system to remember their past (at least the interactions with the player) so it could just fit in this system. Simulated annealing seems to me less appropriate for an agent, because it actually jumps larger distances, so represent a non-continuous behavior, which seems less natural (but could be adapted to do the job). Other systems exists for meta-heuristics global otimizations, such as my personal favorite the ant algorithm or also the genetic algorithm, but they don't apply to individuals, only to populations, so would be less intuitive to implement (e.g. Would need split personalities. Perhaps not so foreign however. Let's ask the opinions of all the Joshes involved in the project

)

Quote

Re: AI gradient descent algorithm

Thu Nov 14, 2013 1:36 am

#14 by Flatfingers

I was going to mention the topographic metaphor and simulated annealing, but y'all have clearly taken care of that.

On the "topographic" notion, I'm wondering if some of the folks reading this thread might like to understand it better. Here's a simplified but, I hope, not wildly inaccurate explanation.

Think about all the things you could have chosen to do this morning, and the cost (however you want to define that) of doing each one of those things. Imagine drawing a 3D map of a landscape where each cost becomes a peak on the map. Some will be low (things you could easily do), and some will be very high (things you almost certainly would never choose to do).

Your goal is to get as much done as you can, for the lowest cost. On a map, that would be like finding the shortest path from your starting point in the middle of the map to the lowest point on the map.

Now imagine pouring a bucket of water onto the map. As water seeks the lowest level, the path that the water takes until it stops running will be the optimal path, which is equivalent to deciding "what things to do"...

...or at least, AN optimal path, because the water might get stuck in a large but high hole. There might be lower points on the map, but the water can't get there. This is what they're calling a "local minimum." In the way that you might shake the map to let the water try a different path, a good goal optimization algorithm has some way to bounce a tentative solution for "what to do" out of a local minimum in order to make sure there aren't some better solutions nearby.

What Josh is doing sounds more sophisticated than this. But I believe the basic process is pretty close.

The questions I have about it are:

1. Similar to CSE's question, how does the gradient algorithm make comparisons between hypothetical consequences and "real" consequences, and how does it apply those deltas to the objective function?

2. What are the frequencies at which an NPC makes its high-level (strategic), medium-level (operational), and low-level (tactical) hypotheticals? (This devolves into the "how many ships can be fighting each other around the player at one time" question, but it's also about how often an NPC can have a mid-life crisis.)

3. To what extent will an NPC's personality enable it to make sub-optimal but psychologically reasonable decisions?

Also, mmoose, after reading your Introduction, I have been hoping to see you join in on this subject and, I hope, the subject of the in-game economy for Limit Theory.

Quote

Re: AI gradient descent algorithm

Thu Nov 14, 2013 2:53 am

#15 by Katorone

But I think I can do it. It would be the last real step towards an AI that literally does everything on it's own. I would be little more than the creator of the game logic – simply giving the AI the rules of the gameplay – then stepping back to watch as it fits together all the pieces into a coherent fabric of intelligent behavior. No heuristic functions, no hints, no help whatsoever. Just the rules of the game and some CPU time. How great would that be!!!

Sounds awesome, doesn't it?

As humans we don't really know all actions we can take under all circumstances either. We can "guess" how something will work out, but we have no real way of knowing. But there's also learning from experience, and literally sleeping on it. While we sleep and dream, the problems of the previous day get processed into our memory and entire being.

To me, as a layman, it almost seems like Josh is building this way of interacting with reality in LT. The AI guesses how a choice works out by simulating it 'in its head'. Much like humans do when they're troubled. They go over likely scenario's, which choices are based on their personality. A violent man might seek a violent solution.
How do we know which choices we have? By seeing others making these choices, by reading about them or just creativity. To me, this descent algorithm sounds a lot like "sleeping on it", but in the forward sense. What if the AI would also dream about ways to make its choices better in the future? It wanted to expand to quickly and made an enemy... Shouldn't it expand more cautiously next time?
Perhaps the AI can't really know that it made an enemy when it expanded too quickly, but a possible conclusion of its own inspection could be that it should send out scouts first next time.

While the universe gets simulated, before the player enters it, the AI of each faction could become unique. Interactions could hone the AI's skill a bit relative to each faction's preferences/personality.
Wouldn't it be magnificent to discover the enemy has a weakness against lasers, and abuse that fact in a fight? Only to discover that the AI pursuits research into better plating against lasers, and weapons that are more damaging to your preferred hull type? Of course, it only had this weakness against lasers because it regularly has conflicts with another faction that primarily uses missiles. Another faction that has learned to rely less on research and more on teamwork might even want to call its bigger friends to help out.

How would the AI start making choices? I propose to have each faction have a score in their 'fight, flight or adapt' aspects, and applies these to problems it encounters. Anything that causes it to influence the territory of another faction, in a negative way for this faction, could be considered fighting. Anything that's reactive, in a peaceful non-encroaching way, to the problem itself could be called adoption. Everything else is flight.

As in reality, once you're established it's much harder to flee. You're pot committed and more likely to protect your assets, even if it doesn't make sense logically. How smaller the organisation (also in regards to the opposition), the more likely it is to encounter fleeing.

By letting the simulation run for a while, I'm sure AI can figure out smart things to do by trial and error. Didn't you learn both that: a stove that's turned on is hot, and your mother doesn't lie about hot stoves? Personally, I learned that by touching a hot stove.

Beware of he who would deny you access to information, for in his heart he dreams himself your master.

Quote

Display posts from previous

Sort by

Online Now