Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for extracting text-oriented field term and term relationship

A technology of relation extraction and terminology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as low efficiency, inability to effectively remove candidate words, noise words, singularity, etc., and achieve the effect of improving the term recognition rate.

Inactive Publication Date: 2012-02-22
XI AN JIAOTONG UNIV
View PDF9 Cites 83 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0010] Among them, patents ①-⑤ are mainly based on a single term recognition model, which cannot effectively remove noise words in candidate words, and the recognition effect on derived terms is not very good
[0011] Patent ⑥ only builds a prefix table to save the number of string matches, and adopts an exhaustive method, which is inefficient

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for extracting text-oriented field term and term relationship
  • Method for extracting text-oriented field term and term relationship
  • Method for extracting text-oriented field term and term relationship

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0063] 1. Offline construction of domain terms: mainly includes two processes, namely the preprocessing of the original corpus, which includes word segmentation, part-of-speech tagging, and noise word filtering, corresponding to Step 1~Step 3; Internet word frequency filtering technology corresponds to Step 4; mixed word frequency Filtering technology corresponds to Step 5; traditional feature extraction of domain terms corresponds to Step 6; Internet feature extraction of domain terms corresponds to Step 7-Step 9; establishment of a dual-model structure corresponds to Step 10-Step 11. The whole process is as figure 1 Shown:

[0064] Step 1: Perform Chinese word segmentation and part-of-speech tagging on the original corpus

[0065] Step 2: For the word string obtained after Chinese word segmentation, retain the "noun", "verb", "adverb", "adjective", and "quantifier" in it, and remove the stop words in it. After the above processing, the obtained n consecutive words (words t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for extracting a text-oriented field term and term relationship. The method is characterized by comprising the following steps of: firstly, preprocessing original linguistic data to obtain a candidate word set including clauses, participles and part of speech tagging, and filtering noise words; secondly, extracting term characteristics from the original linguisticdata and the Internet, and separating terms from candidate words by combining with a dual-model structure algorithm; thirdly, constructing a term dictionary by adopting an inverted index method, and tagging the terms in a text to be identified by using a longest match algorithm; and finally, carrying out multilevel sign sequence tagging through a conditional random field model according to a multi-dimensional node signing rule to obtain a relationship among the terms in the text to be identified.

Description

technical field [0001] The invention relates to text mining and knowledge acquisition methods, in particular to a method for extracting text-oriented domain terms and term relationships. Background technique [0002] With the increasingly widespread application of Internet technology, online learning has become one of the main means for people to acquire and learn knowledge, and terminology, as the basic unit of knowledge, is the cornerstone of building knowledge maps and knowledge navigation. How to classify texts in a specific field, or provide experienced people with knowledge structure and evolution rules in a specific field, or provide learners with the correct learning path in a certain field, so how to efficiently and accurately obtain terminology in different fields Sets and relationships between terms are very important. [0003] The applicant retrieved the following patent documents related to the present invention through a novelty search: [0004] ①A method for...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 郑庆华刘均罗俊英程晓程
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products