Spell Checking in PHP

Introduction

Spell checking is a tool that many people cannot do without. However, it is rare to find such a useful extra when filling out a form on a Web page. Today you’ll learn how to use the PHP spell checking functions to create a robust spell checker that can be used just about anywhere.

The pspell/aspell PHP functions

PHP handles all of its spell-checking functionality through the use of the pspell/aspell libraries. These are available from Sourceforge at http://aspell.sourceforge.net/ andhttp://pspell.sourceforge.net/. With these libraries, PHP can:

  • Spell check multiple languages
  • Determine if a word is valid for the given language
  • Suggest possible spellings for invalid words
  • Maintain personalized settings that allow a given user to add words to their own personalized dictionary

Surprisingly, all of this functionality can be accomplished through the introduction and use of a few simple functions. Today, you’ll take those functions and create a single class that will allow you to manage multiple dictionaries and spell-check them quickly and effectively on any string.

Here are the steps that need to be taken to work with the pspell library in PHP.

Step 1 – Getting a handle to the dictionary configuration

The first step to using the pspell functions in PHP is to initialize the dictionary. This can be done through the use of the pspell_config_create() function…

int pspell_config_create (string language [, string spelling [, string jargon [, string encoding]]])

…where language represents the ISO-639 language code for the dictionary you would like to load (for example, “en” for English).

Next, the spelling the parameter is used for languages that have different dialects (such as American and British English). Currently, the only accepted values for spelling are “American”, “British”, and “Canadian” which represent different dialects of the English language.

The third parameter, jargon, is used in instances where two dictionaries using the same dialect are being loaded and contain extra information used to distinguish the two.

The final parameter, encoding, is the encoding used in the dictionary file and can be safely ignored.

In general, when working with pspell_config_create(), only the first two, (perhaps three) parameters are used, but only the first parameter, language, is actually required.

When pspell_config_create() is called, it returns a handle to the configuration parameters, which then can be passed to the pspell_config_*() family of functions to modify specific dictionary configuration settings. Today you’ll only be using the pspell_config_ignore(), pspell_config_mode() and pspell_config_personal() functions. For a complete list of functions, consult the PHP manual.

Step 2 – Modifying the configuration

Once you have the handle returned from the pspell_config_create() function, the next step is to use the pspell_config_*() family of functions to modify the configuration to your tastes. In this article, I’ll only cover those functions which you will be using, starting with pspell_config_ignore() whose syntax is…

int pspell_config_ignore (int dictionary_link, int n)

…where dictionary_link represents the handle you received, and n represents the size limit of words the spell checker will ignore.

For example, a value of 5 for n means that the spell checker will ignore all words that have five or less total characters and not report any spelling errors on them. In general, a safe value for this would be two or three, depending on how sensitive you would like the spell checker to be.

The second configuration function I’ll use is the pspell_config_personal() function. This function is used to define a “personal” dictionary to use in conjunction with the dictionary already determined. Personal dictionaries are useful because they allow specific “users” of the dictionary to load their own set of personalized words that either the spell checker wrongly labeled as misspelled or (for whatever reason) should be ignored.

The syntax is…

int pspell_config_personal (int dictionary_link, string file)

…where dictionary_link again represents the handle to the configuration and file represents the filename of the personal dictionary to load.

The final configuration function I’ll introduce is the pspell_config_mode() function. This function is used to determine the amount of time PHP will spend looking for words to suggest.

It can be set to one of the following values:

  • PSPELL_FAST (least number of suggestions)
  • PSPELL_NORMAL (more suggestions)
  • PSPELL_BAD_SPELLERS (a lot of suggestions)

The syntax for pspell_config_mode() is…
pspell_config_mode (int dictionary_link, int mode)

…where mode represents one of the flag constants listed above.

Step 3 – Creating an active instance of the dictionary

Once you have created a new dictionary configuration and modified the settings to your tastes, the last step is to call the pspell_new_config() function and pass it the handle you received from pspell_config_create(). At this point, pspell_new_config() returns another handle to an active dictionary, which you can then use to actually perform spellchecking operations.

Using the dictionary

Once you have created an active instance of the dictionary, you are ready to use it to check the spelling of words (or suggest the spelling of misspelled words) and work with the personal word list (if used).

These tasks are accomplished through the use of four functions:

  • pspell_check()
  • pspell_suggest()
  • pspell_add_to_personal()
  • pspell_save_wordlist()

Checking the spelling of a word

The simplest usage of your spell checker is to simply determine if a word is a valid word in the dictionary. This is accomplished through the use of the pspell_check() the function which is defined as…

boolean pspell_check (int dictionary_link, string word)

…where dictionary_link now represents the value returned from pspell_new_config() and word represents the word to check.

If the word exists either in the original dictionary or the personal wordlist, pspell_check() returns true and the word is valid (returning false if the word was not found).

Suggesting a correct spelling of a word

Assume a word is invalid; PHP can suggest a group of words that PHP believes might be what the given word was supposed to be. This can be done through the use of the pspell_suggest()function.

Calling this function is identical to pspell_check(), except pspell_suggest() will return an array of suggestions for the given word rather than a simple false if the word was not found.

Adding words to the wordlist

Beyond the normal spell-checking dictionaries, PHP also supports the ability to create custom personalized dictionaries on a case-by-case (or user-by-user) basis. This is accomplished through the use of two functions: pspell_add_to_personal() and pspell_save_wordlist(). Their syntax is as follows, and should be self-explanatory:

int pspell_add_to_personal (int dictionary_link, string word)

int pspell_save_wordlist (int dictionary_link)

Note that these functions should only be used in conjunction with pspell_config_personal() (or similar). In order for a specific wordlist to be saved, the pspell_save_wordlist() function must be called.

The Script

Now that you have been introduced to the functions in today’s script, let’s see them in action. Today’s column is a wrapper class for the aforementioned functions that handles everything in a much cleaner fashion than if you were to call each function from within your script as necessary.

Start by creating the class and its member variables, followed next by the constructor(s), and then finishing up with the wrappers for the four functions outlined in the last section.

Step 1 – Initialize the Class

Code Flow

  • Create the class
  • Initialize the member variables

<?php
class spell_checker {
var 
$personal_path "/path/to/personal_dict/";
var 
$skip_len      3;
var 
$mode PSPELL_NORMAL;
var 
$pspell_handle;
var 
$pspell_cfg_handle;
var 
$personal false;

In our spell checking class, we initialize six variables. The first three, $personal_path, $skip_len, and $mode, are to be configured to taste and are used in the configuration functions. The last three, $pspell_handle, $pspell_cfg_handle, and $personal, are internal-use only variables and in general should not be touched.

Step 2 – The class constructor

The constructor for our spell checker class could be viewed as the “heart” of the script. It is here that we determine how the spell checker will function and initialize the spell checker as necessary. The constructor takes two parameters and can be used to load a different dictionary other than the default (English) by specifying a different ISO language identifier other than “en” for the first parameter.

Code Flow

  • Create a configuration handle using the specified language
  • Set configuration options
  • Load the personal word list (if used)
  • Create a handle to an active spell-checking session

Step 3 – Spell Check function wrappers

The final section of our class contains the wrappers that allow us to check, suggest, add, and save wordlists quickly and easily.

Code Flow

  • Create wrappers for spell checking functions
  • Finish the spell checker class

Notice that for those functions that work with the personal wordlist, a check exists to ensure that the object was initialized with personal wordlist support (determined by the $personalvariable).

How to use the spell checker class

The spell checking class can be used in the following fashion:

The resulting output should be:

Notice that because ‘ttest’ was added to the personal wordlist, it was not shown as an incorrectly spelled word, while ‘mispellled’ was.

Final Notes

You have now learned how to construct a very useful spell-checking object that will allow you to check the spelling of any string accessible through PHP, thanks to the pspell library. Note that, in order for today’s Code Gallery Spotlight column to function, the pspell library must be installed and configured properly through PHP.

Furthermore, in order for the personal wordlists to function properly, the directory is given by the $personal_path the variable must exist, and the user that PHP runs under must be granted