Return to “Technical”

Post

Natural Language Generator

#1
Ooooooh, here's something: it looks like some folks are working on code that does natural language generation from semantic data.

In other words, it takes information about a subject -- for example, what one NPC feels about another, or what her plans or goals are -- and expresses that information in natural language sentences.

I'm not sure what state of usability this software is in, or licensing/usage status, etc. But even if the software itself is not directly usable, the conceptual theory behind it might have some utility.
Post

Re: Natural Language Generator

#3
Bad news is that the page says the program is written in Java, which is similar to C++, but not similar enough for Josh to just clip the whole thing out and paste it in.

While the conceptual theory behind "teaching" a computer how to use language, it is exceptionally complicated because human language (specifically the grammar) doesn't translate well at all to a computer's binary functionality. It's easy to encode characters or symbols, but harder to encode an idea.

You'd basically have to give the computer a bunch of words, a list of ways to change the words based on their function, and rules regarding where to put the words and when to change them. Don't forget about words that change their spelling when you've pluralized them or changed their tense (e.g.: man vs. men, fly vs. flew, etc.). It becomes more and more complicated.

So the information that these NPCs put out would need to be strictly limited, unless we want to get into the realm of computational linguistics :monkey: .

If you want to know how faction X feels toward Y (or indeed just person X to person Y), you'd probably want to just outline a sentence like "X is ________ toward Y", maybe with a few variations, to convey the idea.
Then you set some predetermined values, like 0 = Antagonistic, 25 = Aggressive, 50 = Ambivalent, 75 = Amicable, 100 = Aligned. There can be more in between, but since you probably don't want to define 101 different levels of friendliness, we could drop in some modifiers.

If you're lower than 25, but still closer to Aggressive than Antagonistic, you could call this "Very Aggressive". If you're between 75 and 100, and closer to 100 than 75, we could call that "Rather Aligned".

So instead of looking in the code and seeing that czerCorp.friendliness(joshEmp) = 67 (or something like that), the game will output "Czerka Corporation is rather amicable toward Joshonian Empire", where the underlined bit is the only variable.

This is sort of like the idea you have, but it's just dumbed down and dealt with on a case-by-case basis.

There are other applications of this idea, but writing a full code-to-human-readable-text parser (especially one with a large vocabulary or one capable of compound or complex sentences) is an undertaking the size of Limit Theory itself, if not more so. Don't get me wrong, I think it's an especially cool idea, particularly regarding immersion and replayability, but I think it's better to do it the easier way for now, given that LT's already been pushed back a few months.
Shameless Self-Promotion 0/ magenta 0/ Forum Rules & Game FAQ
Post

Re: Natural Language Generator

#4
Grumblesaur wrote:Bad news is that the page says the program is written in Java, which is similar to C++, but not similar enough for Josh to just clip the whole thing out and paste it in.

While the conceptual theory behind "teaching" a computer how to use language, it is exceptionally complicated because human language (specifically the grammar) doesn't translate well at all to a computer's binary functionality. It's easy to encode characters or symbols, but harder to encode an idea.
Depends which languages you're talking about. Sanskrit is a language that sticks to a strict set of grammatical rules exceedingly well, so well that there's actually quite a bit of interest in the AI community in it (link). In general, however, you're right. I don't want to have to learn foreign languages to converse with NPCs.
Grumblesaur wrote:You'd basically have to give the computer a bunch of words, a list of ways to change the words based on their function, and rules regarding where to put the words and when to change them. Don't forget about words that change their spelling when you've pluralized them or changed their tense (e.g.: man vs. men, fly vs. flew, etc.). It becomes more and more complicated.

So the information that these NPCs put out would need to be strictly limited, unless we want to get into the realm of computational linguistics :monkey: .

If you want to know how faction X feels toward Y (or indeed just person X to person Y), you'd probably want to just outline a sentence like "X is ________ toward Y", maybe with a few variations, to convey the idea.
Then you set some predetermined values, like 0 = Antagonistic, 25 = Aggressive, 50 = Ambivalent, 75 = Amicable, 100 = Aligned. There can be more in between, but since you probably don't want to define 101 different levels of friendliness, we could drop in some modifiers.

If you're lower than 25, but still closer to Aggressive than Antagonistic, you could call this "Very Aggressive". If you're between 75 and 100, and closer to 100 than 75, we could call that "Rather Aligned".

So instead of looking in the code and seeing that czerCorp.friendliness(joshEmp) = 67 (or something like that), the game will output "Czerka Corporation is rather amicable toward Joshonian Empire", where the underlined bit is the only variable.

This is sort of like the idea you have, but it's just dumbed down and dealt with on a case-by-case basis.

There are other applications of this idea, but writing a full code-to-human-readable-text parser (especially one with a large vocabulary or one capable of compound or complex sentences) is an undertaking the size of Limit Theory itself, if not more so. Don't get me wrong, I think it's an especially cool idea, particularly regarding immersion and replayability, but I think it's better to do it the easier way for now, given that LT's already been pushed back a few months.
For the idea I have planned, without revealing too much I see NPCs only conversing on a small range of subjects. In turn, they will only respond in any meaningful way to anything said to them if it relates to a small range of subjects.
Post

Re: Natural Language Generator

#5
ThymineC wrote: Depends which languages you're talking about. Sanskrit is a language that sticks to a strict set of grammatical rules exceedingly well, so well that there's actually quite a bit of interest in the AI community in it (link). In general, however, you're right. I don't want to have to learn foreign languages to converse with NPCs.
Many older languages have very few rule exceptions, or at least, fewer than modern languages (Esperanto doesn't count). Latin only has eleven irregular verbs (though due to derivational morphology, the actual number of irregular verbs is much higher), though it's got a fair share of goofy nouns. Sanskrit exists mainly as a written language, as any speakers of it are doing so in an effort to revive it. Regular inflectional morphology and strict sentence structure is excellent for computer interpretation.

Unfortunately, English (and for that matter, other major game languages, like German, French, Spanish, Russian, Italian, Mandarin, Japanese, and Korean) are not so computer friendly.
ThymineC wrote: For the idea I have planned, without revealing too much I see NPCs only conversing on a small range of subjects. In turn, they will only respond in any meaningful way to anything said to them if it relates to a small range of subjects.
Good for programming. Not great for immersion, but I don't imagine that being too much of a problem, since the amount of time you spend chatting with NPCs won't be quite as much as the amount you spend flying, shooting, or equipment fiddling.

Maybe I should engineer my current conlang project to be a computer-friendly language, though I'm not sure if tense/case particles would be better or worse than bound morphemes (like Latin's verb endings and noun declensions). Hmm.
Shameless Self-Promotion 0/ magenta 0/ Forum Rules & Game FAQ
Post

Re: Natural Language Generator

#6
Grumblesaur wrote:
ThymineC wrote: For the idea I have planned, without revealing too much I see NPCs only conversing on a small range of subjects. In turn, they will only respond in any meaningful way to anything said to them if it relates to a small range of subjects.
Good for programming. Not great for immersion, but I don't imagine that being too much of a problem, since the amount of time you spend chatting with NPCs won't be quite as much as the amount you spend flying, shooting, or equipment fiddling.

Maybe I should engineer my current conlang project to be a computer-friendly language, though I'm not sure if tense/case particles would be better or worse than bound morphemes (like Latin's verb endings and noun declensions). Hmm.
Good for immersion too, the way I'm thinking of it.
Post

Re: Natural Language Generator

#9
I used something very similar to this (https://code.google.com/p/simplenlg/) to generate readable storybeat. As my engine directly generates storybeat as a set of triple it was pretty forward to use.
The results are ok : the generated text is readable... but not enjoyable.
I just tried it out for a few hours because I was tired of reading triples to understand the story Diegetisor (the engine I'm working on) was creating... (By the way, I finally use a graphviz... much more readable). Anyway, I think you can reach satisfying results with little tweaking. A colleague of mine used it to generate multiple choice questions and nobody ever noticed there was Natural Language Generation behind it !
Are you trying to scan my signature ?
Post

Re: Natural Language Generator

#11
I would actually think the grammar would be one of the easiest parts as for the most part it is simply a set of logical rules with some exceptions. Even a lot of those exceptions can come with their own rules. In the end it just comes down to lots of coding time. :D The problems in grammar come from those pesky words that share the same spelling, but have totally different grammatical functions, like desert (to leave) and desert (arid region). :x Those situations are the only cases in which I can see problems arising as they require you to be able to figure out its meaning (and this its function) from the context it is put in. Something computers aren't very good at doing. :( Just look to Microsoft Word for example. The only times you'll get in trouble for a grammar mistake is when it can't determine what grammatical function a word has as two words share the same spelling. The hard part is trying to make the computer able to understand the meaning of words, though with the development already pushed back and Josh saying LT is already near feature saturation this probably is not something that should be added, maybe LT2 :D.
True understanding comes when you can explain to someone why something works the way it works, not just that something works. I'm talking to you Quantum Mechanics :).

Online Now

Users browsing this forum: No registered users and 3 guests

cron