Method for catching limit word information, optimizing output and input method system

A technology that restricts information and input methods, applied in the input/output process of data processing, special data processing applications, instruments, etc., can solve problems such as reducing input efficiency, user input trouble, increasing the number of user candidates, etc., to optimize characters Output process, the effect of improving intelligence

Active Publication Date: 2009-04-29
BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD
View PDF4 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Including the word "quantity general" in the input method lexicon can certainly increase the intelligence of the input method (to achieve a higher intelligent word formation effect), but because the word "quantity general" is very difficult to use when it becomes a word alone appear less, which may cause trouble for user input, increase the number of candidates that users need to choose, and reduce input efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for catching limit word information, optimizing output and input method system
  • Method for catching limit word information, optimizing output and input method system
  • Method for catching limit word information, optimizing output and input method system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0052] refer to figure 1 , shows a method embodiment 1 of obtaining restricted word information, which may specifically include:

[0053] Step 101, obtain a target word;

[0054] The process of obtaining the target word can be obtained from the Internet, that is, directly obtained from an Internet corpus (for example, a collection of Internet web pages or a collection of search keywords, etc.) through statistics and screening, or obtained from an existing lexicon. It does not need to be limited, as long as one target word set can be obtained; as for the size of the range of the set, those skilled in the art can set it according to actual needs.

[0055] Preferably, for the obtained set of target words, an optimization step may also be included to remove some words by using some attributes of the target words to further narrow the scope. For example, words whose Internet word frequency or thesaurus word frequency is less than or equal to a preset threshold are removed from th...

example 1

[0062] Described feature information is: in this target word, the single word located at the beginning of the word is used as the feature value of the prefix in the preset corpus, and the single word located at the end of the target word in the preset corpus is used as the feature value of the end of the word;

[0063] The preset condition for judging is: whether there is at least one eigenvalue in the above eigenvalues ​​and whether it belongs to the preset range.

[0064] For example, for the word "quantity" in "quantity will" rarely appear at the beginning of a word, if the frequency of its prefix is ​​less than or equal to a preset threshold, it can be determined that "quantity will" is a restricted word.

[0065] Of course, for the target word consisting of three or more words, it is also possible to determine the feature value of a word located at a certain position in the word in the preset corpus at the same position in the word.

example 2

[0067] The feature information is: the feature value of the linguistic collocation relationship of each single-character word and / or multi-character word contained in the target word in the preset corpus;

[0068] The preset condition for judging is: whether there is at least one eigenvalue in the above eigenvalues ​​belonging to the preset range.

[0069] The linguistic collocation relationship may include collocation parameters between words and words, collocation parameters between words and parts of speech, and collocation parameters between parts of speech and parts of speech. Those skilled in the art can select or combine the above-mentioned various matching relationships according to actual needs.

[0070] For example, for the word "is to play", "is" followed by a verb, such a collocation relationship is rare in linguistics, so it can be obtained that the collocation feature value is less than or equal to the preset threshold, then it can be determined that "yes" play"...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for acquiring the limiting word information, comprising the steps of: acquiring a target word; acquiring the corresponding characteristic information of target word; jugding whether the characteristic information or the corresponding numerical result accords with the preset condition, if being suitable, determining the target word as a limit word and recording the related limiting information which is used for limiting the arrangement when the word is outputed alone. The inventive embodiment, by preseting the word stock including the inout method of limiting word information, judges whether the output candidative item accords with the preset condition of application limiting information when user inputs the word, and further, based on the result, judges whether the candidative item with limiting word information is displayed and outputed, accordingly user can obtain more effective output without increasing the operation, the character output process of input method system is optimized greatly, and the intelligentance of input method system is also improved.

Description

technical field [0001] The invention relates to the field of computer character input data processing, in particular to a method and device for acquiring restricted word information, a method for updating an input method thesaurus, a method for optimizing output and an input method system. Background technique [0002] With the popularization and development of computer technology and Internet technology, users with different professional fields, different interests and usage habits have higher and higher requirements for the intelligence of the input method system. [0003] In the prior art, there has been a technique of obtaining an input method thesaurus by using a complex Internet corpus to count and screen. The obtained Internet thesaurus can contain many new words that cannot be obtained through previous closed corpus information (such as modern Chinese dictionaries, news, newspapers, etc.), thereby greatly improving people's input efficiency. However, due to the comp...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30G06F3/023
CPCG06F17/276G06F3/018G06F40/274
Inventor 吕杰勇
Owner BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products