More random passwords, entropy and words

Following on from my previous blog post I have written a new perl program, this time to create kind of sentence like structures.  It is very simple at the moment but I will continue working on it on and off.  As with the previous programs it is written in perl and like randword it uses Adam Kilgarriff's BNC list from here http://www.kilgarriff.co.uk/BNClists/all.al.gz albeit in a slightly different way - it uses the parts of speech encoded therein.

In looking for the structure of English I found among other things some very interesting pages from this site: http://papyr.com/hypertextbooks/grammar/, specifically the pages on the phrases: the noun phrase, the verb phrase, etc.  What I haven't found yet is anyone who has enumerated the probabilities of the structure: what is the probability, for instance, that the subject phrase of a sentence is a simple pronoun? What is the probability that the main adjective in an adjectival phrase is a verb participle? etc.  I'll probably be working on the way the structure is generated but at the moment the recursivity in the structure seems to have gotten away from me slightly.

This new program called "randsent" creates English-like sentences with a kind of rough grammar so the sentences often seem like English although they may not make a lot of sense.  Of course they're not meant to make sense but to be memorable.  I find it's rather fun to run it and enjoy some of the sentences.  Some of the words from Kilgarriff's list are a bit strange at times.

I've read that this might be done with Markov chains but I'm not familiar with how to do that.  Any interesting ideas, suggestions or rewrites accepted, or encouraged.

As before, it can be downloaded as a simple perl text file from here or in the zip file with the other programs and the source frequency list from here.  

Edit: If you have iframes here is an updating example:
$randsent -t 10 -p 1000


Random passwords, entropy and words

The concepts of passwords and entropy have generated a lot of discussion especially since the xkcd comic.  One of the basic results is that a small additional password length can add as much entropy to a password as adding a larger range of characters and could be much easier to remember.  Think of it like this if you like: each character in a password might be one of say 26 characters or if you add numbers and uppercase and punctuation maybe around 100 possibilities for each letter.  But the number of words in just  the English language gives you a much larger number of possibilities.

A long while ago I wrote a perl program for generating random passwords.  It and the new ones I talk about here are available under the GPL for you to download and use at the end of this article if you're interested.  To use them you will need a computer that runs perl.

My perl random password generator generates a password of random letters.  The command has lots of options to change the make-up of characters and the way it chooses them
$ randstring
$ randstring -t 5
For me, what started the idea for extending my random password generator to add random words was when someone on the link list sent me these first two links below.

My first try was a variation on the original which generates a series of wordlike things.
$ randstring -w -t 5
 rie hiuv disu coayu
 esxyi sc aim kyw
 kionuj lujyc oni aoahy
 fii ausnzg gad puku
 zna vymiq as mam
Then I decided to write a program that chooses random words from a dictionary.
$ randword -t 5
 yeta fit fot brach
 casave tid oleous pram
 lawyer Ro coto testa
 drawly ras Trapa Ao
 cosec crappo ay hi
Lastly I added a word generator that uses a weighted random choice based on a word frequency list so it has a tendency to choose more often used words.
$ randword -w -t 5
 look and april mr
 home not grass larger
 child that charts lucky
 the the very these
 andrew very she the
This command has many options including an interesting option to limit the word selection to more or less popular words.  In this next example all the words have a frequency of more than 100 in the list.  This cuts the word list in this case from 236660 entries to 26310.
$ randword -w -t 5 -p 100
 upon quest jokes to
 loved scale 's awards
 do in soft pounds
 for and it the
 us most your of
I'm not going to make a web page to do this at the moment, partly because the word-based password generators use quite a lot of memory and CPU resources and partly because someone has already done it (see below) but mainly because it's a silly idea to use a password from a website on the internet.  So for your benefit I am offering these scripts as a GPL program for those interested in using them.

I should note that the word frequency list is based on a 1989 one from Adam Kilgarriff's BNC lists from and thanks to him for permission to use it.


I've just discovered that someone wrote a webpage which generated "xkcd" style passwords:

Here are links to download my programs:
tgz archive (includes randstring, randword and the word frequency list).

They are, of course, works in progress and any suggestions, bug reports, etc will be gratefully accepted. 

The dictionary version looks for a standard linux, BSD or MacOS dictionary at "/usr/shar/dict/words" with one entry (word) per line of probably ASCII text.  If you don't have that, you may have to get a copy of it.  I have not internationalized the programs.