More random passwords, entropy and words

Following on from my previous blog post I have written a new perl program, this time to create kind of sentence like structures.  It is very simple at the moment but I will continue working on it on and off.  As with the previous programs it is written in perl and like randword it uses Adam Kilgarriff's BNC list from here http://www.kilgarriff.co.uk/BNClists/all.al.gz albeit in a slightly different way - it uses the parts of speech encoded therein.

In looking for the structure of English I found among other things some very interesting pages from this site: http://papyr.com/hypertextbooks/grammar/, specifically the pages on the phrases: the noun phrase, the verb phrase, etc.  What I haven't found yet is anyone who has enumerated the probabilities of the structure: what is the probability, for instance, that the subject phrase of a sentence is a simple pronoun? What is the probability that the main adjective in an adjectival phrase is a verb participle? etc.  I'll probably be working on the way the structure is generated but at the moment the recursivity in the structure seems to have gotten away from me slightly.

This new program called "randsent" creates English-like sentences with a kind of rough grammar so the sentences often seem like English although they may not make a lot of sense.  Of course they're not meant to make sense but to be memorable.  I find it's rather fun to run it and enjoy some of the sentences.  Some of the words from Kilgarriff's list are a bit strange at times.

I've read that this might be done with Markov chains but I'm not familiar with how to do that.  Any interesting ideas, suggestions or rewrites accepted, or encouraged.

As before, it can be downloaded as a simple perl text file from here or in the zip file with the other programs and the source frequency list from here.  

Edit: If you have iframes here is an updating example:
$randsent -t 10 -p 1000

No comments: