3 November 2018

LoSt this month: Counting words

Bison skulls at a refinery (1892)
Michigan Carbon Works, Rogueville, Detroit

I've been stretching these semi-monthly updates to the point that there will probably only be 11 entries by the end of 2018, but hope to pull a Phileas Fogg-like and deliver December's post in such a fashion that year's updates will feel well-rounded overall ;)

October was a busy month, albeit not on the gamedev front.

In between life, I've been making some notes and plans related to Lost. More locations and encounters remain the over-arching todo task. In addition to "random roving reavers" (that I mentioned before), I've been elaborating some sketches for something like a hunting lodge. The current template is a house with stuffed heads on the wall, where you can gain favor and experience by handing in rare specimens. As the game world grows, hunting will likely become a topic that interests and connects various factions, from lone trappers to colonial trading houses. Further on lies the theme of strategically destroying nature to gain geopolitical dominance, which would sadly be one of the few situations directly inspired by history, ie. the mass slaughter of American bison. Of course I'd have to connect it to other themes in the game, and turn the dials up to 11, to get workable content. Unless the player outright seeks employment with the exploiters, running errands clearing out dangerous biomes and the like, they might become adversaries in ongoing plots and shoot'em-up sequences. I already have sketches for an encounter akin to a death squad, currently as a template for my "random reavers", that could surely be used and fleshed out along with story lines on speciecide and terror.

Markov the mole 

One "little" thing I did start to kick around in October, was a prototype for a loosely Markov-based dialogue generator :P Just dabbling, toying with ideas for later implementations.

Text goes here
I went and got some source text (screenplays mostly, lifted in all Freundschaftlichkeit), and made a very basic Markov generator. It doesn't give anything I can use straight off, but gives a feeling for possible ways of randomizing text. There would have to be an organizing system, to make the text (seem to) recognize certain topics and follow some syntactic principles. Such an engine can be hand-tuned over time, with the "puzzle pieces" themselves, the snippets of words to string together, stripped from the text source and stored in some kind of database or (more likely, a convoluted) python dictionary. I can't say how fine grained it's plausible to make something like that. Or if the ideas I'm currently experimenting with will turn out useful at all. Surely, it won't be easy to make the output actually reflect the state of the game world. Ideally, NPCs should talk about other people, places and phenomena, so such a system would have to be purdy clever. I'll report back in a few years, hopefully, or who knows, maybe even before that?

Anyway, still with the Markov chains, you typically want as large a text source as possible, to offset the effect that words which rarely occur, also link to fewer other words. They become less dynamic, tending to crop up in the same context over and over.

Comparing word counts with/without thesaurus
While repetition has its uses, a well written text can often benefit from a larger vocabulary. But in Markov chain source texts, you want the ratio of word count to vocabulary to be as high as possible; because that gives each word/sequence a greater number of possible follow-ups. Hapaxes are the worst in this sense.

I considered a thesaurus to bypass the problem somewhat. I tested by running the input/output of my basic generator through a custom dictionary, that was premade with an actual thesaurus (moby dict). The basic idea was to find clusters of words with similar meanings, like "robbers", "rogues" etc., and substituting all of them for one vanilla synonym like "villains", but with the rule that when the generator comes across "villains", another randomizer substitutes that word for one of the synonyms. 

moby-thesaurus, while a great asset, didn't yield fantastic results with my primitive algorithms. Some coupled phrases are quite wonky, like "good piece of desert" equaling "honorable master of mare". Stuff like that can possibly be pruned away, and there's the occasional stroke of random coherence, like this generated insult: "You wretched, double-crossing natural! Nincompoop, futile bluff." So there may be an idea hidden in here somewhere. As of yet, it mostly sounds like someone is definitely speaking about something, but it's not very clear what
Cut me free! Now why are you got opposite it. As a united in the British I destroy it! 'Cause I've got to break the activities destroy. And my performer normal our guns are what are you how. We hit the province. That don't you? You had a bleeding … High, huh? – No … tight.
As always,

No comments:

Post a Comment