Generating improbability
Now there's a few ways of going about this, take the sentence "mathematics is the language of science", taking the lower-case alphabet and the space character, there are 38 spaces and 27 different characters to go in each. Note that Dawkins did a similar computational experiment in The Blind Watchmaker.
- random chance - suppose you tried to generate every position and every character at once. To get the first character correctly would be a 1 in 27 chance. As would generating the second character. To generate both characters correctly would be (1/27)2 or 1 in 729. To generate three characters correctly would be (1/27)3 or 1 in 19,683. To generate all 38 characters correctly would be 1 in 2738 or 1 in 2.5*1054. So for this one off event to happen, it would take an extraordinary amount of time to generate it by chance with purely random input.
- cumulative chance - instead of trying to generate it all in one go, generating each character individually could work better. Start with the first character, and there's a 1 in 27 chance of generating 'm'. So when 'm' is finally generated, move onto trying to generate the 'a'. Now each subsequent character generation is dependant on earlier chance encounters. To generate 'math' now becomes (1/27) + (1/27) + (1/27) + (1/27), or 1 in 108. For all 38 characters, it's (1/27) * 38 or 1 in 1026. So by progressively doing each step along the way, it takes away an extreme amount of improbability.
- evolving chance - back to generating in one go. By starting at an arbitrary point and tweaking the string, eventually it will come up with the right answer. The mathematics for this is not easily expressed, though it can be expressed in code.
The results
I wrote a java program to simulate these different processes. The source code is available here for anyone wishing to run it themselves. Feel free to modify it, and push it to it's limits. It's very much a cobbled together hack-job, there wasn't much focus on having a clean interface. It's there to show I didn't make the results up, and while the randomness of the PC will mean that the numbers will not turn out the exactly same as mine, they are a good approximation of the procedure.
- random chance will not generate a computational answer, computer randomness is based on seeds so it will run infinitely without ever finding the answer. If it were truly random, then it would still take an almost infinitely long time. I ran the program quite a few times and there wasn't even a fragment of any word that could be considered English.
- cumulative chance has yielded similar results to the mathematical prediction of an average of 1026 iterations. Running it 10 times, I had the following results:
1058, 867, 1077, 1403, 776, 943, 1081, 893, 945, 880.
This is an average of 992 iterations for the result. Doing it again for 10 iterations yielded the result 1028. A third time with 100 iterations yielded an average of 1008. A few more times and I did get 1026 as the average over 100 iterations. The practical application of statistics correlated with the theoretical application. - evolutionary chance brought an even quicker result. By adjusting the amount of mutation (a mutation rate of 1 would mean 'd' could only change to either 'e' or 'c', 2 would cover the spread from 'b' to 'f'), the length of time could decrease. The results for running it 100 times on different mutation rates were as follows:
A run through with a mutation variance of 5
Iteration: 1 tptvelwhiqt irsirfpokxpub skrjcgrhcslt
Iteration: 2 - qrtyekzgimy iunhtirkibrubetjuhelueerot
Iteration: 3 - nqt efxfiru irmkojmmkdquddqesjeoxieoov
Iteration: 4 - rmtzedthiov ivmoohqrkdsufencsfhquiesly
Iteration: 5 - whteedtiity iyktnkpulfnudjmcrfgnuiewpz
Iteration: 10 - tftjeclaiww ivdtocswahguajjo fodwieurb
Iteration: 20 - hvtheixaifs in tfwlladguarexffqjpievcr
Iteration: 30 - natheoetias is thyhlamguadeohfinsieucx
Iteration: 40 - mathematigs is th tlacguaoepcflnniescg
Iteration: 50 - mathematics is th uladguauewdformiemce
Iteration: 60 - mathematics is thdilajguabe lfdskiebce
Iteration: 70 - mathematics is thwtlacguaje ofksniefce
Iteration: 80 - mathematics is thkjlauguape of syievce
Iteration: 90 - mathematics is the language of snience
Found iteration 97 - mathematics is the language of science
And what of those monkeys?
The first two methods would be simulations of how a monkey would type: the first method would be the equivalent of letting a monkey type the entire post and have it start from scratch over and over if there would be any errors. The 2nd method is like the monkey using the backspace key each time it got something wrong. The evolving algorithm is unlike random chance, it's to illustrate that information of almost infinite improbability can emerge over a far shorter space of time.
Including white space, my posts average around 7,000 characters. By the time a post is finished, the amount of characters I press is probably a lot higher when taking into account typos, spelling and grammatical errors, deletion of poor sentences, and proof reading. All up, it wouldn't be entirely unfair to say I do about 10,000 key strokes per post. Now if I do 200 keystrokes a minute, then it would still take me 50 minutes or so to get the post to where it is now. To evolve or generate something like this by chance is theoretically possible, but practically impossible.
This is why when we see a code or information we know there's intelligence behind it. To look on the great pyramids, the symbols contained therein are the product of intelligence and intent. Anyone who has had previous dealings with the written word would be able to understand the symbols are the same construct. To go back further into human history, the same could be said of cave paintings. It is not the refined symbolism of the written word, but it's evidently communication. Other forms of communication do exist that are not so obvious, smoke signals for example. There are times too when we can mistake natural order and randomness as communication, the stars say astrology is a joke and our futures are not etched into the palms of our hands.
We can infer meaning from non-meaningful processes, this is the problem with the statement "all codes are a product of intelligence". We see patterns all through nature, improbable shapes, repetition, improbable assortment, all of which come from natural processes. One of these patterns is the pattern of DNA, the double-helix structure that has the instructions on the building blocks of life. How could this occur naturally? Well, that's for another entry. The importance of the tests above what to highlight the difference between random chance and a cumulative or an evolutionary process. It was to show that improbable events can be created quite quickly using select processes.