You can try other distances like Cosine Distance, Edit Distance etc. We just represent one case where our vocabulary was taken from Moby Dick which for sure does not represent the actual relative frequencies of the English words, however, we did a relatively good job. Let’s see some examples: Autocorrect neverteless: my_autocorrect('neverteless')Īutocorrect nesseccary: my_autocorrect('nesseccary')Īutocorrect occurence: my_autocorrect('occurence')
Output = df.sort_values(, ascending=False).head() Print(f'The first ten words in the text are: \n) What you will need is a corpus to build your vocabulary and the word frequencies. We will show how you can easily build a simple Autocorrect tool in Python with a few lines of code. You can also have a look at the Spelling Recommender.
We assume that you are familiar with the concepts of String Distance and String Similarities.