Unlock instant, AI-driven research and patent intelligence for your innovation.

Domain term automatic extraction method based on abnormal sub-graph detection

A technology for automatic extraction and terminology, applied in natural language data processing, instrumentation, electrical digital data processing, etc., can solve problems such as unstable terminology extraction

Inactive Publication Date: 2021-03-19
TIANJIN UNIV
View PDF5 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is to overcome the deficiencies of the existing methods, and propose an automatic domain term extraction method based on abnormal subgraph detection to solve the problem of unstable term extraction effect in the existing methods

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Domain term automatic extraction method based on abnormal sub-graph detection
  • Domain term automatic extraction method based on abnormal sub-graph detection
  • Domain term automatic extraction method based on abnormal sub-graph detection

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] The principles, advantages and implementation steps of the present invention will be easier to understand in conjunction with the above algorithm description and the following examples.

[0040] The present invention solves existing problems and is realized through the following technical solutions:

[0041] Step 1. Perform preprocessing operations such as sentence segmentation and word segmentation on the text data and perform part-of-speech tagging. Here, the THULAC word segmentation tool is used to implement.

[0042] Step 2. Select all possible words by n-gram method and grammatical rules, and use stop words and word frequency (experience threshold is 3) to filter. Here, some linguistic rules can be added to filter according to different fields. For example, in "tool realization", the tool is a noun, and the realization is a verb, which generally cannot form an effective phrase.

[0043] Step 3. Build a network. Use the set of candidate terms selected in step 2 as ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a domain term automatic extraction method based on abnormal subgraph detection. The method comprises the following steps: firstly, preprocessing text data and performing part-of-speech tagging; selecting all possible words through an n-gram method and / or a grammatical rule, and filtering by using stop words and word frequency; constructing a network, and constructing the network by taking the selected candidate term set as a node; calculating attribute values serving as term features in various term automatic extraction methods, and taking the attribute values as feature values of sub-graph detection; calculating p values of nodes in the graph, wherein the p values measure the possibility that the nodes serve as terms; through an abnormal sub-graph detection algorithm, extracting a sub-graph containing abnormal nodes, wherein the sub-graph is required to contain abnormal nodes as much as possible and contain normal nodes as few as possible.

Description

technical field [0001] The invention proposes an algorithm for automatically extracting domain terms, specifically relates to an automatic term recognition method based on abnormal subgraph detection, and belongs to the technical field of computer software. Background technique [0002] The rapid development of technologies such as mobile Internet, social media, and big data has led to an exponential increase in the amount of text data in cyberspace. How to use text mining technology to extract valuable information has become a concern in the computer field. Many models and technologies that have been developed are based on massive text resources. However, unstructured text data can express the same meaning in different forms and vocabulary due to its flexible expression. It is very difficult to use it. of. Extracting relevant domain terms from a large amount of text is the primary issue of text mining and information extraction, and it is also a basic issue in the fields ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/279G06F40/205G06F40/216
CPCG06F40/279G06F40/205G06F40/216
Inventor 李存壮武南南王文俊
Owner TIANJIN UNIV