Rethinking the Alphabet Order

Posted at — Jun 14, 2019

The first three letters of the alphabet are ABC. You can make the word CAB from it but it’s not a commonly used word. You may intuit that SAT might be better than ABC because you can make the words SAT and AT which are words that a two year old may know. Therein in my opinion is the inefficiency of the English alphabet order.

Being able to combine letters into words as early as possible when learning the alphabet allows those letters to be put in context and reinforce the learning. Making it highly contextual to an individual by using only words that they know would further enhance learning progress.

The consequences of this research could be an increase in learning speed as applied to the alphabet. This could also be applied to other alphabets

The Goal

The goal is to find the most optimal sequence of letters to learn the alphabet in. The context of this will be for my two year old daughter so I’ve collected a list of 100 words that she knows and will evaluate sequences against this vocabulary file. It will ensure that any sequence found will be highly relevant to her learning.

How the Score is Calculated

Evaluating a score is important as we need a way to compare the quality of sequences against each other.

For each amount of letters up to 26, we calculate how many words can be made against the reference vocabulary. We then sum up all these values to arrive at a score.

For example we take the letters TASG.

Using the first letter we can’t make any words.
Using the first two letters we can make ‘at’. Score 1.
Using the first three letters we can make ‘at’, ‘sat’. Score 2.
Using the first four letters we can make ‘at’, ‘sat’, ‘gas’, ‘sag’. Score 4.
Sum all to arrive at a final score of 7.

The score favours finding a sequence of letters than can make as many words as possible early as this has a compounding effect.

Method

In order to find the optimal sequence, we will write software to implement the following methods to see which yields the best result.

Basic Frequency - We will count the occurrences of each letter in the vocabulary and rank them in order to obtain a sequence.
Paired Frequency - Group all letters into two pairs as they occur in the vocabulary. Rank the pairs on frequency to obtain a sequence.
Genetic Algorithm

Results

The following graph shows the results. Alphabet Order Results

The letter sequences used:

Classic Alphabet: abcdefghijklmnopqrstuvwxyz
Basic Frequency: eraiontcpslghubdymwkzfjv
Paired Frequency: einrgaltpsckohmbduwfyjvz
Genetic Algorithm 1: ehmsaprtcnowigdylubkzqxjvf
Genetic Algorithm 2: hpatrewniosgcmlbydukfzxjqv
Genetic Algorithm 3: eamhrtocinwgpsldbyukjzvxqf

The three genetic algorithms are the highest scores I’ve been able to obtain using genetic algorithms. If anyone runs the program and obtains a higher result, contact me with your sequence.

Conclusion

I thought the basic frequency did a good job of almost doubling the score from the classic alphabet while being easy to implement. I honestly thought the genetic algorithm would better the basic frequency score by a few hundred. It still may do so with some more optimizations but the diminishing returns on effort means that I’ll just be satisfied with the result so far. I’ve thought of using a dictionary as a vocabulary but this would require refactoring the code to make it more efficient and again I would face diminishing returns.

I’ve published the source code on github so you too may run it against your own vocabulary set if you wish to customize the alphabet order against your own childs vocabulary.

Now it’s time to start teaching my daughter the alphabet.

Ralph Romero

Blog