Systems and methods for improved spell checking

a spell checker and system technology, applied in the field of spell checkers, can solve the problems of standard dictionary, inconvenient spell checking in context, and incorrect entry of words in queries, and achieve the effects of improving spell check, improving quality, and improving spelling alternatives

Inactive Publication Date: 2007-05-10
MICROSOFT TECH LICENSING LLC
View PDF24 Cites 114 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0009] The present invention relates generally to spell checkers, and more particularly to systems and methods for improving spell checking via utilization of query logs. Iterative transformations of search query strings along with statistics extracted from search query logs and / or web data are leveraged to provide possible alternative spellings for the search query strings. This provides a superior spell checking means that can be influenced to provide individualized suggestions for each user. By utilizing search query logs, the present invention can account for substrings not found in a lexicon but still acceptable as a search query of interest. This allows a means to provide a higher quality proposal for alternative spellings, beyond the content of the lexicon. One instance of the present invention operates at a substring level by utilizing word unigram and bigram statistics extracted from query logs along with an iterative search. This provides substantially better spelling alternatives for a given query than employing only exact string matching. Thus, the present invention, for example, can tailor its suggested alternatives based on the recent history of popular concepts / queries. It can also tailor its corrections for a given user based on the corresponding prior query logs, enabling a much more relevant spelling alternative to be provided. Other instances of the present invention can receive input data from sources other than a search query input. This provides a method of utilizing the query log facilitated spell checking in the context of ordinary word processors and the like.

Problems solved by technology

As many have come to find out, if the site information or the search query is entered incorrectly, the cost in time to re-navigate can become quite high.
Browser or other search queries for information present a unique problem for spell checking applications, since the queries often consist of words that may not be found in a standard spell-checking dictionary, such as artist, product, or company names.
Another problem is that a word in a query may have been entered incorrectly, but not be spelled incorrectly (for example, “and processors” instead of “amd processors”).
As such, a standard dictionary, while suitable for spell checking in the context of word processing, may not be appropriate for type-in-line and search-query spell checking.
However, for many applications where spell checking is desired (e.g., text input provided to input boxes), a standard dictionary is not optimal for the problem.
Unfortunately, a problem with this approach is that the query logs will generally also contain a large number of input errors and return substring matches that are not relevant to a user's desired search.
These dynamic behaviors cannot be accounted for utilizing traditional dictionary and search query processing.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Systems and methods for improved spell checking
  • Systems and methods for improved spell checking
  • Systems and methods for improved spell checking

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0071] platnuin rings→platinum rings

example 2

[0072] ditroitigers→detroit tigers

[0073] In the first example, a typical word processor spell checker might suggest only plantain and plantains as corrections for the misspelled word platnuin. In the second example, the typical word processor spell checker highlights the word ditroitigers as a misspelling but provides no correction suggestion. While a traditional trusted lexicon and corpus approach may not be able to solve this type of problem, it can be addressed with the present invention by utilizing large query logs.

[0074] If a misspelling such as ditroitigers is too far from the correct alternative according to a distance and threshold of choice, the correct alternative might not be found in one step. Nevertheless, employing an instance of the present invention, the correct alternative can be reached by allowing intermediate valid corrections steps, such as ditroitigers→detroittigers→detroit tigers. The last formulation of the problem did not explicitly utilize a lexicon of t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention leverages iterative transformations of search query strings along with statistics extracted from search query logs and / or web data to provide possible alternative spellings for the search query strings. This provides a spell checking means that can be influenced to provide individualized suggestions for each user. By utilizing search query logs, the present invention can account for substrings not found in a lexicon but still acceptable as a search query of interest. This allows a means to provide a higher quality proposal for alternative spellings, beyond the content of the lexicon. One instance of the present invention operates at a substring level by utilizing word unigram and / or bigram statistics extracted from query logs combined with an iterative search. This provides substantially better spelling alternatives for a given query than employing only substring matching. Other instances can receive input data from sources other than a search query input.

Description

CROSS REFERENCE TO RELATED APPLICATIONS [0001] This application is a continuation of U.S. patent application Ser. No. 10 / 801,968, filed on Mar. 16, 2004, entitled “SYSTEMS AND METHODS FOR IMPROVED SPELL CHECKING”, the entirety of which is incorporated herein by reference.TECHNICAL FIELD [0002] The present invention relates generally to spell checkers, and more particularly to systems and methods for improving spell checking via utilization of query logs. BACKGROUND OF THE INVENTION [0003] Interaction with automated programs, systems, and services, has become a routine part of most people's lives—especially with the advent of the Internet. Web surfing or browsing for instance may even be the “new” national pastime for a certain segment of the population. In accordance with such systems, applications such as word processing have helped many become more efficient in their respective jobs or with their personal lives such as typing a letter or e-mail to a friend. Many automated features...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(United States)
IPC IPC(8): G06F17/00G06F17/21G06F17/27G06F17/30G06F40/189G06F40/191
CPCG06F17/273G06F16/3322G06F16/9532G06F16/9535G06F40/232B63B2221/08B63C2005/022F16B5/02
Inventor CUCERZAN, SILVIU-PETRUBRILL, ERIC D.
Owner MICROSOFT TECH LICENSING LLC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products