An algorithm that can solve the word puzzle Wordle with an optimal number of guesses on HARD mode.

Overview

WordleSolver

An algorithm that can solve the word puzzle Wordle with an optimal number of guesses on HARD mode.

How to use the program


Copy this project with git clone and run python3 solver.py in the terminal.

When you run the program, the algorithm will provide you with an educated guess. Then, you type the guess into Wordle. Once you get the result of how many letters were right, you input it back into the program and will get another guess back. This process will continue until you have solved the puzzle!

Inputting the result of your guesses is easy. If a character is gray, enter '_', if a character is yellow, enter the lowercase letter, and if a character is green, enter the uppercase letter. For example, if the program told you to guess "aeros" and the result of the guess was:

image

You would enter the result as: __r__

Here is another example:

image

You would enter the result as: DR_k_

How the algorithm works

Here's a quick run-down of how the algorithm works. We keep a list of words that the answer can be and keep removing from the list until only one word remains or we guess the right answer. Each word has a unique number associated with it. We can use this number to quickly determine if a word can be an answer based on the results of other guesses. If a word cannot be the answer, it will be removed from our list. The key to the accuracy and efficiency of this algorithm is how this unique number is generated.

The number is the product of a few prime numbers which lets us use modular arithmetic in a clever way! Each letter will have 6 prime numbers associated with it. One "yellow" number and five "green" numbers. We use the one yellow number when we know a letter is in the word but we don't know where. We use one of the green letters when we know that a letter is in a specific spot. You can see these prime numbers in charDict.json. To actually calculate the number of a word, we multiply all the yellow numbers of the characters that make up the word together as well as certain green numbers. The green number we multiply depends on the position the letter appears. If the letter D appears in the first spot, we multiply by its 1st green number. If it was instead in the last spot of the word, we multiply by its 5th green number. The reason we do this is we can utilize modulo to check if a certain word can be an answer based on the result of another guess. For example, if we guessed "aeros" and the word we were trying to find was "drink", we will find that r is somewhere in the word but not in the third spot. Let us say a word has number n. If n%r's yellow number does not equal 0, then we know that word cannot be zero and we can remove it from the list. Also, if n%r's third green number equals 0, we know that it cannot be the answer because r cannot be in the third spot. Similar logic is applied when multiple letters are yellow or some letters come up green. The value of each word does not change, so we can process this information once and store it in a txt file to be used later which is what I did in wordList.txt! If you would like to use a different set of words than what I used, feel free to change the words.txt file and run process.py to generate a new wordList file.

Optimizations

One way to make the algorithm take fewer guesses is to make smarter guesses. As such, an optimization I decided to make is to take into account letter frequency. Letters that appear more often have lower prime numbers associated with them and also that the word that is guessed always has the smallest number associated with it. Now, the primes associated with each letter aren't just chosen arbitrarily and actually tell us some information. "e" is the most common letter and as such has the six smallest prime numbers. I can sort the wordlist and make the algorithm guess the word with the smallest number. So, our algorithm is more likely to guess a word with "e" in it than "q" since words with "e" will probably be smaller. This is good because "e" is much more likely to be in the word than "q". Also, I only need to sort the list once in process.py so there is no significant performance hit!

A drawback of this approach is that words that are made up of repetitive common letters have very low values and are guessed much more. This is not good because words with repeating letters make it harder to narrow down our potential guesses! For example, consider the word "esses" which is made up of only of the two most common letters. It's good that our guesses consist of letters that are common but it is bad that we only get information about two different letters. The way I fixed this is by multiplying words that have characters repeated two or three times by a much bigger prime number so they are weighed down and guesses less often.

Another optimization I made is taking into account how common a word is. There are a lot of niche words in the list that are very rarely used which are likely not the answer to the puzzle. So, once I've narrowed down the possible words to less than a hundred, it makes sense to guess the more common words first. This is why I introduced a second number that is associated which each word. The second number is the frequency of a word in Wikipedia articles. Once there are less than 100 words in the list, the list is resorted by this second number rather than the first and so each guess will be the most common word remaining!

Further Optimizations

As I mentioned before, one of the optimizations I made was having more common letters correspond with smaller prime numbers and sorting the list of words based on the number associated with each word. This is all done just once for each set of words in process.py and is very computationally efficient. However, if more accuracy is desired, the prime number associated with each letter can be re-generated after each guess because the frequency of each letter is likely to change. This may increase accuracy slightly but will take much longer to process which is why I opted against it. After each guess, I would have to re-check the frequency of each letter, calculate the value of each word, and then resort to the entire list based on this new value.

Sources

  • Wordle is by PowerLanguage
  • List of 5 letter words is based on SOWPODS and was taken from Word Game Dictionary. I suspect that PowerLanguage used the same source for wordle as he used a similar source for another project.
  • The frequency of words was taken from lexepedia with a minimum frequency of 1, length of 5, and only includes Wiktionary Words.
Owner
Akil Selvan Rajendra Janarthanan
yo!
Akil Selvan Rajendra Janarthanan
HF's ML for Audio study group

Hugging Face Machine Learning for Audio Study Group Welcome to the ML for Audio Study Group. Through a series of presentations, paper reading and disc

Vaibhav Srivastav 110 Jan 01, 2023
Khandakar Muhtasim Ferdous Ruhan 1 Dec 30, 2021
Awesome-NLP-Research (ANLP)

Awesome-NLP-Research (ANLP)

Language, Information, and Learning at Yale 72 Dec 19, 2022
Winner system (DAMO-NLP) of SemEval 2022 MultiCoNER shared task over 10 out of 13 tracks.

KB-NER: a Knowledge-based System for Multilingual Complex Named Entity Recognition The code is for the winner system (DAMO-NLP) of SemEval 2022 MultiC

116 Dec 27, 2022
WikiPron - a command-line tool and Python API for mining multilingual pronunciation data from Wiktionary

WikiPron WikiPron is a command-line tool and Python API for mining multilingual pronunciation data from Wiktionary, as well as a database of pronuncia

213 Jan 01, 2023
🦅 Pretrained BigBird Model for Korean (up to 4096 tokens)

Pretrained BigBird Model for Korean What is BigBird • How to Use • Pretraining • Evaluation Result • Docs • Citation 한국어 | English What is BigBird? Bi

Jangwon Park 183 Dec 14, 2022
Let Xiao Ai speakers control third-party devices

A stupid way to extend miot/xiaoai. Demo for Panasonic Bath Bully FV-RB20VL1 逆向 Panasonic Smart China,获得控制浴霸的请求信息(HTTP 请求),详见 apps/panasonic.py; 2. 通过

bin 14 Jul 07, 2022
EasyTransfer is designed to make the development of transfer learning in NLP applications easier.

EasyTransfer is designed to make the development of transfer learning in NLP applications easier. The literature has witnessed the success of applying

Alibaba 819 Jan 03, 2023
LUKE -- Language Understanding with Knowledge-based Embeddings

LUKE (Language Understanding with Knowledge-based Embeddings) is a new pre-trained contextualized representation of words and entities based on transf

Studio Ousia 587 Dec 30, 2022
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

ALBERT ***************New March 28, 2020 *************** Add a colab tutorial to run fine-tuning for GLUE datasets. ***************New January 7, 2020

Google Research 3k Dec 26, 2022
Tools and data for measuring the popularity & growth of various programming languages.

growth-data Tools and data for measuring the popularity & growth of various programming languages. Install the dependencies $ pip install -r requireme

3 Jan 06, 2022
Finetune gpt-2 in google colab

gpt-2-colab finetune gpt-2 in google colab sample result (117M) from retraining on A Tale of Two Cities by Charles Di

212 Jan 02, 2023
【原神】自动演奏风物之诗琴的程序

疯物之诗琴 读取midi并自动演奏原神风物之诗琴。 可以自定义配置文件自动调整音符来适配风物之诗琴。 (原神1.4直播那天就开始做了!到现在才能放出来。。) 如何使用 在Release页面中下载打包好的程序和midi压缩包并解压。 双击运行“疯物之诗琴.exe”。 在原神中打开风物之诗琴,软件内输入

435 Jan 04, 2023
Sentiment-Analysis and EDA on the IMDB Movie Review Dataset

Sentiment-Analysis and EDA on the IMDB Movie Review Dataset The main part of the work focuses on the exploration and study of different approaches whi

Nikolas Petrou 1 Jan 12, 2022
wxPython app for converting encodings, modifying and fixing SRT files

Subtitle Converter Program za obradu srt i txt fajlova. Requirements: Python version 3.8 wxPython version 4.1.0 or newer Libraries: srt, PyDispatcher

4 Nov 25, 2022
Product-Review-Summarizer - Created a product review summarizer which clustered thousands of product reviews and summarized them into a maximum of 500 characters, saving precious time of customers and helping them make a wise buying decision.

Product-Review-Summarizer - Created a product review summarizer which clustered thousands of product reviews and summarized them into a maximum of 500 characters, saving precious time of customers an

Parv Bhatt 1 Jan 01, 2022
An official repository for tutorials of Probabilistic Modelling and Reasoning (2021/2022) - a University of Edinburgh master's course.

PMR computer tutorials on HMMs (2021-2022) This is a repository for computer tutorials of Probabilistic Modelling and Reasoning (2021/2022) - a Univer

Vaidotas Šimkus 10 Dec 06, 2022
Finally decent dictionaries based on Wiktionary for your beloved eBook reader.

eBook Reader Dictionaries Finally, decent dictionaries based on Wiktionary for your beloved eBook reader. Dictionaries Catalan 🚧 Ελληνικά (help welco

Mickaël Schoentgen 163 Dec 31, 2022
Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition

Wav2Vec2 STT Python Beta Software Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 mode

David Zurow 22 Dec 29, 2022
String Gen + Word Checker

Creates random strings and checks if any of them are a real words. Mostly a waste of time ngl but it is cool to see it work and the fact that it can generate a real random word within10sec

1 Jan 06, 2022