Scott Pakin's handy-dandy
Word Filtering Program

Scott Pakin's handy-dandy word-filtering program is a tool that helps with word games such as Jotto, hangman, and Jumble, in which one needs to identify a mystery word. Simply follow the instructions below, and we'll identify the mystery word in no time.

Initializing…
0%

Help

To use the program, simply fill in any line of the command form and press the associated Process button to process that command. Operations are cumulative. Hence, if you first specify that the mystery word starts with A and later specify that it starts with B, then all candidate words will be eliminated from the list as no word can start with both A and B.

The following commands are supported:

The word must contain {exactly, at least, at most} ⟨number⟩ letters, {not necessarily unique, all unique}.
Words containing a number of letters that does not match the given tally are filtered out of the list. If all unique is selected, a further restriction is that no letter can appear more than once anywhere in the word.
The word {must, must not} contain the letter(s) ⟨letters⟩ {anywhere in the word, at positionnumber⟩, as the last letter of the word}.
With must, a word must contain all of the given letters to remain in the list. With must not, a word must contain none of the given letters to remain in the list. If multiple letters are specified, they will be searched for anywhere in the word.
The word must contain {exactly, at least, at most} ⟨number⟩ instance(s) of the letter ⟨letter⟩.
Words containing the wrong tally of the given letter are filtered out of the list.
The word must contain {exactly, at least, at most} ⟨number⟩ letter(s) from the list ⟨letters⟩ at {any, the same, a different} position.
The letters of each word in the word list are matched against ⟨letters⟩. Words containing the wrong tally of matched letters at the specified position type (any, same, or different) are filtered out of the list. Note, though, that same-position or different-position matches do not preclude the existence of any additional matches at the other position type. For example, if exactly 3 letters of SWORD must appear at a different position, the word birds is retained because the r, the d, and the s appear in different positions than in SWORD. However, brows is also retained because the r, the w, and the s appear in different positions than in SWORD—even though the o appears in the same position as in SWORD. If desired, further specifying that exactly 3 letters of SWORD must appear at any position will eliminate brows while retaining birds. (This sort of back-to-back invocation is likely to be popular when playing Word Mastermind-style games.)
The word must contain only the letters ⟨letters⟩, each {zero or more times, at most once per occurrence in the list, exactly once (i.e., forming an anagram)}.
With zero or more times, a word is retained only if each of its letters is drawn from the set ⟨letters⟩. With at most once per occurrence in the list, a word is kept only if each of its letters is drawn from the set ⟨lettersand it repeats each letter no more times than it is repeated in ⟨letters⟩. With exactly once (i.e., forming an anagram), only words that are anagrams of ⟨letters⟩ are retained. For example, if ⟨letters⟩ is OSPST, exactly once matches only posts, spots, and stops; at most once per occurrence additionally matches opt, post, pot, top, and toss, among others; and zero or more times matches all of those words plus words such as soot, stoops, and tots.
The word must match the pattern ⟨pattern⟩ with dashes matching {any letter, any other letter}.
The ⟨pattern⟩ field accepts upper- and lowercase letters and dashes. It filters the word list as follows:
  • The pattern must contain as many characters as each word that should be retained.
  • An uppercase letter in the pattern specifies that that letter must appear in the corresponding position in each word that should be retained.
  • A lowercase letter specifies that that letter must appear in each word but not at the position in which it appears in the pattern.
  • A dash means either that any letter can appear at the corresponding position or that only any other letter (i.e., any letter that is not present in ⟨pattern⟩) can appear at the corresponding position.
The word {must, must not} match the regular expression ⟨regexp⟩.
Regular expressions are a powerful mechanism for describing letter patterns. You can search the Web for regular expression to find more thorough explanations and tutorials, but the basic syntax is as follows:
  • A letter in the regular expression matches the same letter in a word.
  • ^ matches the beginning of a word.
  • $ matches the ending of a word.
  • . matches any letter.
  • [letters] matches any letter in ⟨letters⟩. Ranges such as k-q are allowed.
  • [^letters] matches any letter not in ⟨letters⟩. Ranges such as k-q are allowed.
  • X? matches either zero or one occurrence of ⟨X⟩.
  • X* matches zero or more occurrences of ⟨X⟩.
  • X+ matches one or more occurrences of ⟨X⟩.
  • X⟩⟨Y⟩ matches an ⟨X⟩ that is followed immediately by a ⟨Y⟩.
  • X|Y⟩ matches either an ⟨X⟩ or a ⟨Y⟩.
  • Parentheses can be used to group multiple subexpressions into a single, new subexpression.
{Intelligently, Randomly} guess a {word, letter} from the {original, current} list (honoring current word lengths). The best candidate found after ⟨number⟩ seconds or less is displayed below.
With Randomly, a word or letter is selected at random from either the original or current} list of words. Only words that are no shorter than the shortest word in the list and no longer than the longest word in the list are eligible for selection. With Intelligently, the program selects the word or letter that is most likely to narrow down the list of remaining words. A word selected Intelligently from the original list is not necessarily a candidate for the mystery word. For instance, if the word is known to end with an E, there is little knowledge to be gained by guessing a word that also ends with an E; the program may therefore use that last position to test a letter that it knows nothing about. Intelligently guessing a word is a time-consuming process. Specifying a time limit tells the program to return the best word found at the point at which time ran out.

Filtering commands are cumulative. For example, if you indicate that The word must contain the letter(s) A at position 1 then indicate that The word must contain exactly 2 instance(s) of the letter P, the resulting word list will contain only words that meet both criteria (e.g., apple and apropos but not aspirin or zipper). Similarly, if you indicate that The word must contain exactly 4 letters then indicate that The word must contain exactly 10 letters, the resulting word list will be empty as no word simultaneously contains exactly 4 letters and exactly 10 letters.

The Show words button reveals the list of remaining words. (The list is hidden by default because some Web browsers are extremely slow at displaying and updating text boxes containing large amounts of text.) The list can be edited manually if desired.

Example

Consider a game like Lingo in which players have to identify a five-letter word of which only the first letter is known at the outset. Each turn, a player guesses a word and is told which of the letters in that word are in the correct location in the mystery word, which letters are in an incorrect location, and which letters do not appear at all in the mystery word. Suppose that the given initial letter is P. One approach is as follows:

  1. The program begins with a list of 63823 letters.
  2. Process The word must contain exactly 5 letters. 4662 words (only those that contain the correct number of letters) remain in the list.
  3. Process The word must contain the letter(s) P at position 1. This reduces the number of remaining words to 310.
  4. Process Intelligently guess a word from the original list. The program guesses arose.
  5. Suppose we're told that the A and E exist in the mystery word but not at their locations in arose and that none of the other letters in arose appear in the mystery word. Process The word must match the pattern a---e with dashes matching any letter and The word must not contain the letter(s) ROS anywhere in the word. Now only 15 words remain in the list. (Click on Show words to see them.)
  6. Process Intelligently guess a word from the original list. The program guesses gland.
  7. Suppose we're told that the mystery word contains D in the final position, the A at a position other than where it occurs in gland, and none of the other letters in gland. Process The word must match the pattern --a-D with dashes matching any letter and The word must not contain the letter(s) GLN anywhere in the word. Now only 4 words remain in the list (paced, paved, pawed, and payed).
  8. Process Intelligently guess a word from the original list. The program guesses chewy. Note how chewy contains the C from paced, the W from pawed, and the Y from payed.
  9. Suppose we're told that the mystery word contains an E but not at its position in chewy and that none of the other letters in chewy appear in the mystery word. Because all of the remaining words contain an E at the same position we don't actually have to process The word must match the pattern --e-- with dashes matching any letter (although it won't hurt). However, processing The word must not contain the letter(s) CHWY anywhere in the word leaves only one word remaining: paved. This must be the mystery word!

Each type of game may favor a different set of commands for filtering the word list. For example, for games like hangman, the most useful commands are likely to be The word must match the pattern ⟨some uppercase-letter pattern⟩ with dashes matching any other letter and Intelligently guess a letter from the current list. For games like Jumble, the most useful command is likely to be The word must contain exactly the lettersjumbled letters. For games like Jotto, The word must contain exactlynumber⟩ letter(s) from the set ⟨previous guess and Intelligently guess a word from the current list will be the most heavily utilized commands.

Terms of use

Scott Pakin's handy-dandy word-filtering program is

Copyright © 2017 Scott Pakin

You are allowed to run this program from http://www.pakin.org/wordfilter as much as you want, free of charge. You are allowed to provide links to http://www.pakin.org/wordfilter. You are not allowed to redistribute Scott Pakin's handy-dandy word-filtering program—that includes the HTML, JavaScript, CSS, and any other program components—either modified or unmodified. You are not allowed to embed Scott Pakin's handy-dandy word-filtering program within another Web page (e.g., using HTML frames).

Valid XHTML 1.1 Valid CSS!