Unlock instant, AI-driven research and patent intelligence for your innovation.

A fine-grained domain terminology self-learning method based on context semantics

A self-learning method and contextual technology, applied in the field of self-expansion, can solve problems such as difficult to meet the needs of terminology learning in fine-grained fields

Active Publication Date: 2021-07-02
BEIJING UNIV OF TECH
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In order to solve the problem that the existing text terminology learning technology based on large training samples is difficult to meet the fine-grained domain terminology learning requirements of smaller instance samples, the present invention proposes a fine-grained domain terminology self-learning method based on context semantics. Semantic information, comprehensively expresses the statistical and linguistic features of candidate terms in the corpus from the perspective of the recurrence times of context information of candidate terms, and uses the logarithmic likelihood ratio to calculate the domain of candidate terms by referring to the ideas of domain correlation and domain consistency Dependency bias value, and finally integrate the membership activation value of each candidate term to independently discover new terms in the field

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A fine-grained domain terminology self-learning method based on context semantics
  • A fine-grained domain terminology self-learning method based on context semantics
  • A fine-grained domain terminology self-learning method based on context semantics

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] The present invention will be further described below in conjunction with accompanying drawing and embodiment example:

[0033] The source data used in the field term discovery method of the present invention comes from the PLOS ONE website, and 5000 articles are randomly crawled by searching the keywords of "fMRI" and "Cognitive Function";

[0034] Cognitive function term concept set is composed of 803 cognitive function terms in Cognitive atlas website;

[0035] The method flow chart of this embodiment is as follows figure 1 As shown, it specifically includes the following steps:

[0036]Step 1: Build the initial term set and the original target corpus

[0037] The initial term set is formed by screening the top 10 cognitive function terms with the highest frequency in the source data;

[0038] The original target corpus consists of 932 segments from the source data, where each abstract contains terms from the initial term set;

[0039] Step 2: Construct the origi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

In order to solve the problem that existing text terminology learning methods based on large training samples are difficult to meet the needs of fine-grained domain terminology learning for smaller instance samples, the present invention proposes a fine-grained domain terminology self-learning method based on contextual semantics. Semantic information, comprehensively expresses the statistical and linguistic features of candidate terms in the corpus from the perspective of the recurrence times of candidate term context information, uses the logarithmic likelihood ratio for reference, and calculates the domain of candidate terms Dependent on the bias value, and finally integrates the membership activation value of each candidate term to independently discover new terms in the field. The fine-grained domain term self-learning technology based on context semantics described in the present invention can realize the self-learning of term sets and promote the construction of specific domain ontology, which can not only be applied to term discovery and extraction in fields such as cognitive functions, but also It is used as a candidate concept generation tool in the concept extraction method.

Description

technical field [0001] The invention relates to a big data-driven self-learning method for domain terminology, in particular to self-learning of domain terminology sets based on text data resources such as blogs, documents, and webpages, so as to realize self-expansion of domain terminology databases. Background technique [0002] Big data knowledge engineering is an important part of artificial intelligence research, and text data such as blogs, documents, and web pages are the most important knowledge sources. Traditional text-based terminology learning techniques mainly use conditional random fields and other machine learning methods based on large training samples, aiming at core terms in various fields with large instance sizes, such as gene names and protein names in the field of bioinformatics, social media fields, etc. Identify and extract terms such as addresses and occupations. However, with the continuous deepening of knowledge-driven artificial intelligence appl...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/211G06F16/33G06F16/36
CPCG06F16/3335G06F16/36G06F40/211
Inventor 张顺林绍福陈建辉高江帆何小波
Owner BEIJING UNIV OF TECH