Method for creating error-correcting database, automatic error correcting method and system

A database and error correction technology, which is applied in the field of generating error correction databases for character data, can solve problems such as poor applicability and unguaranteed accuracy, and achieve the effects of wide application, correction of input errors, and wide coverage

Active Publication Date: 2008-08-13
BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD
View PDF4 Cites 71 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] However, the various existing automatic error correction schemes are mainly based on preset models, simple grammatical analysis or simple word comparison, etc., which have certain limitations, and the accuracy cannot be guaranteed; and English (Chinese) correction Error solutions generally cannot be directly applied to Chinese (English) error correction, and the applicability of the solution is poor

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for creating error-correcting database, automatic error correcting method and system
  • Method for creating error-correcting database, automatic error correcting method and system
  • Method for creating error-correcting database, automatic error correcting method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0069] This embodiment is explained by taking the query log as a data source as an example. Generally, the query log can be recorded by a search engine, and the query records of each user can be separated by IP address or user login name; certainly, the query record can also be Recorded by local clients and then aggregated.

[0070] The query log may generally include input historical records of user query keywords, for example,

[0071] 10.10.1.1 Shanghai 2008-02-25.09:00:00

[0072] 10.10.1.1 Wrestle 2008-02-25.11:00:00

[0073] 10.10.1.1 Bodou 2008-02-25.12:00:09

[0074] 192.10.1.1 Wrestle 2008-02-23.13:00:00

[0075] 192.10.1.1 bodou 2008-02-23.13:00:05

[0076] 192.10.1.1 Nanjing 2008-02-23.15:00:05

[0077] Each line in the above log information represents a user query string, and a line of records includes the following information: user identification (for example, account number, nickname, IP, etc., which can generally be used to uniquely represent a user), query ...

Embodiment 2

[0087] In this embodiment, the user's input method log information is taken as an example for illustration. The input method log information may include the coded character string input by the user and the corresponding input candidate items. In this embodiment, the user input sequence information may be used to mine and obtain the required character error correction relationship, as follows:

[0088] Find whether there is a situation that the coded strings are directly adjacent, and if so, determine that the adjacent coded strings belong to a character error correction relationship, and determine that the last coded string used to input the candidate is correct.

[0089] For the user's input history, the input method log can record the information "user ID-coded string-input candidate", of course, the "user ID" is an optional record field. In the case of manual error correction by the user, the input method log may record information "user ID-encoded string-encoded string-inp...

Embodiment 3

[0093] This embodiment uses the input method log as an example for illustration. The difference from Embodiment 2 is that the input method log of this embodiment also records the relevant deletion operations of the user, such as backspace key, delete key, Esc key, replacement operation, etc. Wait. Among them, the replacement operation can be seen as a combination of a deletion operation and a re-input operation.

[0094] Under normal circumstances, the user will not use the delete operation during normal input. A typical situation is due to the user's manual error correction. Therefore, when the delete operation appears in the user's input record, it can be determined that there is a user manual error. error correction information. In this embodiment, the following analysis and mining steps can be used to obtain the character error correction relationship:

[0095] Find whether the user has applied a delete operation during the input process, and if so, determine that the en...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention provides a method and device of error correcting database and a method and system of automatic error correction. The method of error correcting database includes following steps: collecting journal information, including user input history recorder; sequence information utilizing the input history recorder, acquiring character error correcting relation from the journal information; storing the character error correcting relation to obtain an error correcting database. The invention excavates the manual error correcting information by recording and collecting journal containing user input course information to generate an error correcting database to achieve more correctly automatic error correction for more users, the invention is also used for individuation automatic error correction of user. Since the error correcting information of the invention is obtained from the journal containing user input course information, the invention is more complied with user needs, and more exact relative to analyzing and debug of computer.

Description

technical field [0001] The invention relates to the technical field of computer character processing, in particular to a method and device for generating an error correction database for character data, and an automatic error correction method and system. Background technique [0002] At present, with the application of Internet technology more and more widely, many of people's daily work and entertainment are carried out on the Internet, and users more and more frequently need to input information through computers to complete human-computer interaction. However, the user may input wrong information in many cases and needs to be corrected. For example, input errors are caused by touching other keyboard keys; input errors (including Chinese character input and English character input) are caused by inaccurate memory, and the like. [0003] Traditional research on spelling correction began as early as the middle of the last century, but it was mainly aimed at text processing...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 苏雪峰
Owner BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products