Chinese new word and expression detecting method and its detecting system

A detection method and detection system technology, applied in special data processing applications, instruments, electrical digital data processing and other directions, can solve the problems of low timeliness, incomplete new word search range, inefficiency, etc., and achieve the effect of high timeliness

Active Publication Date: 2005-07-20
HUAWEI TECH CO LTD
View PDF0 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0016] The object of the present invention is to: overcome the low efficiency problem that the method for existing new word detection adopts manual retrieval, and the technology based on the new word automatic extraction of co

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese new word and expression detecting method and its detecting system
  • Chinese new word and expression detecting method and its detecting system
  • Chinese new word and expression detecting method and its detecting system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0061] Describe the detection method of Chinese new word of the present invention below by embodiment and in conjunction with accompanying drawing, as figure 1 As shown, the steps are:

[0062] 1. Webpage collection, using a shared webpage collection software Offline Explorer to collect the webpages of designated news websites, and store them in the hard disk according to the website structure. The collection of web pages can also use other collection software, as long as the software can complete the task of collecting web pages.

[0063] 2. Web page processing, such as figure 2 As shown, there are four steps:

[0064] 1) Extract the content and time information of the webpage text; the main function is to extract the content and time of the text from the webpage. When processing each webpage, first extract the webpage content and time information based on the template. When there is a template for the web page or the existing template does not match and the template-base...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a Chinese character new phrase checking method and the checking system. It is based on Internet Chinese character new phrase auto-checking method. It includes page collection, page information process and new phrase searching, and making the most of the time information of page collected from Internet. Separating the time information and content drawn from page and searching the repeating string on the base of the separation, saves into original database. The original database is separated into before setting time database and after setting time database. Comparing the two databases, the new phrase selected list would be gained. Then, the final result should be affirmed. The invention would search new plural characters phrase and new phrase combined by plural characters without length limited and structure. And the rubbish strings would be filtered by the phrase construction rule. It has the feature of high time effect.

Description

technical field [0001] The invention relates to a method for detecting new words, in particular to a method for detecting Chinese new words and a detection system thereof. technical background [0002] The continuous emergence of new words in natural language is an objective law. With the rapid development of economy and society and the increasing frequency of foreign exchanges, especially the widespread use of the Internet, this phenomenon has become more obvious. According to research statistics, in the past 20 years in China On average, more than 800 words are produced each year. [0003] However, for a language such as Chinese that does not have a clear boundary between words, it is more difficult to identify new words. In general, new words in Chinese can be divided into the following categories according to their sources: [0004] 1. Named entity: including personal name, place name, transliterated name, product name, company name, institution name, etc.; [0005] 2...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06F17/30
Inventor 邹纲刘群
Owner HUAWEI TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products