Keyword extension method and system and classification corpus labeling method and system

A keyword expansion and keyword technology, which is applied in the fields of instruments, calculations, and electronic digital data processing, etc., can solve the problems of heavy workload for thesaurus establishment, high subjectivity of keyword expansion methods, and low accuracy of keyword expansion. Accelerate the processing speed, achieve convenience and high accuracy

Inactive Publication Date: 2015-04-15
PEKING UNIV FOUNDER GRP CO LTD +2
View PDF5 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The technical problem to be solved by the present invention is the subjectivity of the keyword expansion method in the prior art, the heavy workload of thesaurus establishment, and the low accuracy of keyword expansion. An objective, simple and accurate keyword expansion method is proposed method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Keyword extension method and system and classification corpus labeling method and system
  • Keyword extension method and system and classification corpus labeling method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0054] This embodiment provides a keyword expansion method, the flow chart is as follows figure 1 shown, including the following steps:

[0055] (1) Retrieve according to the pre-given initial keywords, and retrieve keywords. In this example, the initial keyword is used to search the article database to obtain highly relevant articles, and then perform word segmentation on these articles, and use the result after word segmentation as the words obtained by the search. The number of occurrences of the retrieved words is counted, and the words whose occurrences are greater than the preset threshold of 50 times (the number of times here is set according to the size of the article database and the degree of common use of the retrieved keywords) are used as the retrieved keywords. In this way, keywords will be obtained, which have a certain statistical significance, and it is convenient to find words related to each meaning of the keyword.

[0056] (2) Use the keywords obtained by...

Embodiment 2

[0060] (1) Retrieve according to the pre-given initial keywords, and retrieve keywords.

[0061] (2) Use the keywords obtained by retrieval as the basis for the next retrieval, and perform cyclic retrieval through keyword iteration.

[0062] In the retrieval process of (1) and (2) above, the retrieval method is as follows:

[0063] Use the preset keywords to search in the article database to obtain highly relevant articles, and then perform word segmentation on these articles, and perform the operation of removing stop words after word segmentation, and then obtain the same keywords that appear at the same time as the preset keywords The co-occurrence words can be obtained through the sliding window method, and the co-occurrence words are used as the words obtained by retrieval. The retrieved words are obtained through word segmentation, stop words removal, and co-occurrence words. After the above-mentioned step-by-step filtering, unnecessary redundant words are removed to ob...

Embodiment 3

[0070] A keyword expansion system comprising:

[0071] (1) Acquisition unit: perform retrieval according to the predetermined initial keywords, and retrieve keywords. In the keyword expansion system, the acquisition unit further includes a retrieval keyword module: counting the occurrence times of the words obtained through retrieval, and using words whose frequency is greater than a preset threshold as keywords obtained through retrieval.

[0072] As other alternative implementations, the acquisition unit also includes a retrieval and comparison module for obtaining keywords: counting the number of words obtained by retrieval and the number of occurrences of each word, sorting them in descending order according to the number of times, and ranking the first A certain proportion of words are used as keywords obtained by retrieval.

[0073] (2) Circular retrieval unit: use the retrieved keywords as the basis for the next retrieval, and perform cyclic retrieval through keyword i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Provided are a method and system for keyword expansion; the method performs a search by means of an initial keyword, the searched and obtained keyword serving as the basis for the next search, and the searches being performed by means of keyword iteration; when the error of two consecutively searched keywords is within a certain range, the searched keyword serves as an expansion keyword of the initial keyword; in this way, various expressions of the initial keyword and multi-faceted implied meanings of the word are obtained, and the initial keyword is effectively and reasonably expanded, solving the problem in the prior art requiring that a corpus be manually established; the method is a convenient and highly accurate method for keyword expansion. Also provided is a method and system for classifying corpora and automatic annotation; the method determines for each class one or more initial core keywords; expanded keywords of each class are obtained by means of initial core keyword expansion; a search is performed using the expanded keywords corresponding to the classes, and a class corpus is selected therefrom and annotated.

Description

technical field [0001] The invention relates to a keyword expansion method and a classification corpus automatic labeling method, belonging to the technical field of electrical digital data processing. Background technique [0002] A keyword is generally a concentrated expression of a class of related terms. In order to improve the comprehensiveness of its expression content, a general keyword will have a variety of related meanings. In order to improve the hit rate of keyword retrieval, generally a set Expand the predetermined initial keyword to obtain a variety of related words corresponding to the keyword, and search at the same time. A keyword expansion method is provided in the prior art. First, a database is established: the database contains keywords, vocabulary and identification codes; then the keywords are corresponding to at least one vocabulary; and then the relevant keywords are corresponding to an identification code ; Determine the identification code corresp...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/3322G06F16/3338G06F16/24573G06F16/2455
Inventor 叶茂汤帜徐剑波雷超金立峰
Owner PEKING UNIV FOUNDER GRP CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products