Method, apparatus for generating code frequency and inputting character code, and words inputting apparatus
A character encoding and character input technology, which is applied in the field of input methods, can solve the problems of frequency adjustment and sorting work, and achieve the effect of improving input efficiency and reducing the number of code selections
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0046] refer to figure 1 As shown, it is Embodiment 1 of the method for generating code frequency of the present invention, comprising:
[0047] 101. Obtain the usage frequency of character codes; the method of the present invention can be applied to both single characters and phrases.
[0048] Among them, before obtaining the usage frequency of the character encoding, it may also include:
[0049] The usage frequency of the word is obtained, and the usage frequency of the word is used as the usage frequency of the character code corresponding to the word.
[0050] The acquisition of the frequency of use of words specifically includes: collecting statistics on a directional corpus, and acquiring the frequency of use of words.
[0051]Wherein, the frequency of use of the acquired word can be specifically:
[0052] Statistically target corpus to obtain the usage frequency of words. Targeted corpus can include: forum corpus, user chat corpus or web page corpus.
[0053] 102....
Embodiment 2
[0095] refer to image 3 Shown, is embodiment two of the method for inputting text of the present invention, comprises:
[0096] 301. Chinese corpus collection: use search engine technology to generate an input method Internet thesaurus based on the content of Internet web pages, which can cover all types of popular words and form a Chinese corpus;
[0097] The collection of Chinese corpus can usually include the following processes:
[0098] First, grab Chinese webpages (for example, 4 billion) including network news, forums, blogs, chat rooms and other network contents from the Internet;
[0099] Second, set corresponding weight values for the captured web pages. For example, a lower weight value is assigned to repeated web pages, spam web pages, pornographic web pages, etc., and web pages with lower weight values are removed, so as to obtain a high-quality analyzed web page set (eg, 1 billion). Or reduce the influence of some webpages on word frequency statistics thr...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 