Method and apparatus for learning Chinese new words

A new word and Chinese technology, applied in the field of learning Chinese new words, can solve the problems of many subjective factors, large manpower and material resources consumption, limited application area, etc., and achieve the goal of reducing the difficulty of learning, high accuracy, and reducing the difficulty of calculation Effect

Inactive Publication Date: 2005-06-22
PEKING UNIV
View PDF0 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] 1. The efficiency and accuracy of learning new words is low
Learning new words directly from the entire article requires complex lexical and syntactic analysis, which is relatively complicated to deal with. The complexity of new word learning algorithms is high. When faced with massive web page information, these methods are sometimes powerless
Moreover, due to the need for complex lexical and syntactic analysis, and there are a large number of ambiguity problems in the Chinese language, the accuracy of new word learning is also relatively low;
[0005] 2. The learning of new words is limited by the field
And when the application environment of new words has no clear domain characteristics, it is more difficult to use professional corpus to learn new words, so the application of this domain-oriented new word learning method is very limited;
[0006] 3. Too many subjective factors
The collection and arrangement of professional corpora often requires a lot of manpower and material resources, and is usually a time-consuming task
The sorted corpus samples are inevitably affected by human subjective factors and have deviations, which will eventually affect the accuracy of new word learning

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and apparatus for learning Chinese new words
  • Method and apparatus for learning Chinese new words
  • Method and apparatus for learning Chinese new words

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] Such as figure 1 As shown, the present invention includes:

[0048] The input module is used for inputting the search engine log;

[0049] The word segmentation processing module deletes single Chinese characters and non-Chinese component query words; counts the frequency of query words in the search engine log, and sets a threshold value, directly deletes query words lower than the threshold value: and uses the remaining query words as The words in the thesaurus are split based on the vocabulary, and the query frequency of each part is retained;

[0050] The combined extraction module processes the split query words, and the processing process is as follows:

[0051] 2-character or 3-character query word, if the word segmentation result is an existing vocabulary, then delete it directly, if it is other, then input the query word as a new word into the filter module;

[0052] For a 4-character query word, if the word segmentation result is an existing vocabulary, it ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

This invention discloses a method for studying Chinese new words and a device, which uses a process module to process the searched engine daily record input via an input module, deletes single Chinese word and inquiry words containing non-Chinese composition and puts the left in order according to the inquiry sequence to set a threshold value and delete the inquiry words with sequence lower than the value. The word-dividing module divides the rest inquiry words taking the present words with the number smaller or equal to 4, if the number is greater than 4, then each time takes 4 words and one from the beginning till the last word, then divides the words based on the four-words method. The filter module arrays the new words according to the appeared frequency, sets new threshold value and deletes the new words lower than the threshold value and outputs the left by an output module.

Description

Technical field: [0001] The invention relates to a method and a device for learning new Chinese words. Background technique: [0002] Word segmentation is the premise and basis for effective Chinese information processing. Word segmentation technology has been widely used in search engines, information retrieval and other fields. The size of the dictionary is one of the most important factors affecting word segmentation accuracy. The dictionaries used for word segmentation are usually incomplete, and it is impossible to include all the entries whether it is a common general dictionary or a domain-oriented professional dictionary. Moreover, as a dynamic and open collection of Chinese vocabulary, a large number of new words will continue to emerge. Especially in the Web environment, various fashionable new vocabulary reflecting the characteristics of the times will appear from time to time. The updating speed of the manually maintained dictionary often lags behind the gene...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
Inventor 龚笔宏冯是聪
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products