Unlock instant, AI-driven research and patent intelligence for your innovation.

A Term Extraction Method Based on Definition and Relationship

A term and relationship technology, applied in the field of text mining, can solve the problems of low recognition ability of low-frequency terms, poor ability to extract long-word terms, omission, etc., to achieve the effect of improving recognition ability

Active Publication Date: 2020-09-22
TSINGHUA UNIV
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Therefore, the term extraction method in the prior art has low recognition ability for low-frequency terms, which is easy to cause omissions, and has poor extraction ability for terms with high generality and long word count terms.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Term Extraction Method Based on Definition and Relationship
  • A Term Extraction Method Based on Definition and Relationship
  • A Term Extraction Method Based on Definition and Relationship

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0062] The embodiments will be described in detail below in conjunction with the accompanying drawings.

[0063] figure 1 The overall flowchart of the term extraction method proposed by the present invention specifically includes the following steps:

[0064] Step (1), text preprocessing, word segmentation and word frequency statistics

[0065] The resource presented in web page html format is the resource with the widest source and the easiest way to obtain, and the present invention selects html text as the data input of the method. Resources in html format are not in plain text form, so data cleaning for text preprocessing is required.

[0066] The pictures in the web page are only links without semantic information in the text, and the tables are difficult to process due to the changeable format, so the present invention uses regular expressions to label with

[0067]

[0068]

[0069]

[0070]

[0071]

[0072]

[0073]

[0074] The content in the tag is a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the field of text mining, and particularly relates to a term extraction method on the basis of definitions and relations. According to the method, which regards term definition and relation mining as a primary and combines word-formation rules and boundary detection, definition extraction is conducted on text at first, initial high-quality term alternates are generated from the definitions, and then the term alternates are continuously extended according to the term relations. The term extraction method is beneficial to improvement of capabilities of identifying low-frequency terms and extracting terms with high universality or long words.

Description

technical field [0001] The invention belongs to the field of text mining, in particular to a term extraction method based on definitions and relationships. Background technique [0002] Terminology, as a conventional symbol for expressing professional concepts in a specific field, plays an important role in natural language fields such as Chinese word segmentation and syntactic analysis. In the process of building domain knowledge base, terminology, as the main embodiment of domain knowledge, plays an important role in the expansion of knowledge examples. Manually annotating terms from unstructured texts consumes a lot of manpower and time, and there will be cases where the recall rate is reduced due to missing annotations. Therefore, automatic term extraction has been paid more and more attention by researchers. [0003] The term extraction method in the prior art mainly includes two steps. The first step is to obtain the candidate terms through the unit calculation of t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/284G06F40/289
Inventor 许斌李思良杨玉基
Owner TSINGHUA UNIV