Mining method and system for correlation of vocabulary entities based on template

A technology of relevance and vocabulary, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve problems that are not detailed in sentences or phrases

Inactive Publication Date: 2010-09-15
吴毓杰 +1
View PDF2 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although these two methods are currently the mainstream issues of document exploration, they are still in the article stage, and have not gone down to the level of sentences or phrases.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mining method and system for correlation of vocabulary entities based on template
  • Mining method and system for correlation of vocabulary entities based on template
  • Mining method and system for correlation of vocabulary entities based on template

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] Specific embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0024] According to the pre-defined part-of-speech patterns, stop words and named entities defined by the user, and according to the relevance of each pattern, the present invention mines and presents the association patterns conforming to the statistical independence or correlation with the method of sequential pattern mining.

[0025] Under the framework of the present invention, multilingual applicability is emphasized. Therefore, it is necessary to use a specific language vocabulary extraction and part-of-speech tagging method to complete the vocabulary and part-of-speech tagging, and then dig out styles that meet the named entity rules from a large number of files. Therefore, in the embodiment, this part of the components can use different language families or different methods of annotating vocabulary to find the vocabulary and part of sp...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a mining method and a system for correlation of vocabulary entities based on a template. The invention is characterized by defining according to part-of-speech styles predefined by a user, disused words and named entities, and mining and presenting the correlation styles meeting statistical independence or correlation by a sequential pattern mining method according to the correlation of various styles. Process detail regulation of the invention is parameterized, and can be defined and added to other file information such as time, date, source and the like according to the favor of a user; and the user can obtain highly relevant named entities or vocabulary relation styles in a designated fileset within a limit time.

Description

technical field [0001] The invention relates to a text mining method and system in the field of information processing and information retrieval, in particular to a template-based method and system for mining lexical entity relevance. Background technique [0002] The present invention locates in a large amount of documents, which can use natural language part-of-speech to extract and mark results, according to the pre-defined named entity rules, and then carry out correlation mining with a large amount of data mining rules. This invention involves several fields of knowledge such as: (1) Natural language processing: natural language processing, part-of-speech tagging, post term processing, named entity rules Research (Named entity recognition); (2) Data mining: Sequential pattern mining, Association mining; (3) Field knowledge research such as correlation coefficient verification. [0003] In terms of the spirit of overall architecture design, the present invention is an i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 吴毓杰卢阳正
Owner 吴毓杰
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products