Unlock instant, AI-driven research and patent intelligence for your innovation.

Subject term mining method and device, electronic equipment and storage medium

A technology of subject words and algorithms, applied in the field of data mining, can solve problems such as low accuracy, complex organizational structure, inability to accurately mine subject words, etc., and achieve the effect of narrowing the scope of mining

Active Publication Date: 2021-05-11
BEIJING UNIV OF POSTS & TELECOMM +1
View PDF5 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] For Chinese text, it has a more complex organizational structure than English text, and in the prior art, the method of mining English text's subject words is usually used to mine Chinese text's subject terms or to extract subject terms based on manual work. There is a problem of low accuracy and the inability to accurately mine potential subject words composed of emerging professional vocabulary in Chinese texts

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Subject term mining method and device, electronic equipment and storage medium
  • Subject term mining method and device, electronic equipment and storage medium
  • Subject term mining method and device, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0048] In order to make the purpose, technical solutions and advantages of the present disclosure clearer, the present disclosure will be further described in detail below in conjunction with specific embodiments and with reference to the accompanying drawings.

[0049]It should be noted that, unless otherwise defined, the technical terms or scientific terms used in one or more embodiments of the present application shall have common meanings understood by those skilled in the art to which the present disclosure belongs. "First", "second" and similar words used in one or more embodiments of the present application do not indicate any order, quantity or importance, but are only used to distinguish different components. "Comprising" or "comprising" and similar words mean that the elements or items appearing before the word include the elements or items listed after the word and their equivalents, without excluding other elements or items. Words such as "connected" or "connected"...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

One or more embodiments of the invention provide a subject term mining method and device, electronic equipment and a storage medium. The method comprises the steps: acquiring text data; filtering the text data based on a language model to determine a set of candidate terms; screening the candidate term set based on an unsupervised algorithm and a prediction model to determine an importance degree result of the candidate term set; and according to an importance degree result of the candidate term set, determining a subject term. According to the method, characters with low coagulation degree in the text data are filtered out through the language model, the influence of the characters which are not tightly spliced on subject term mining is reduced, the uncertainty of the left and right adjacent characters of the vocabulary is reflected through the degree of freedom of the vocabulary in the text data, and the vocabulary which can be freely and independently used is found, so that the subject term mining range is narrowed. According to the method, the complex structure of the Chinese text corpus is fully considered, the subject term of the text data is recognized through layer-by-layer screening, and meanwhile a potential subject term composed of emerging specialized vocabularies can be mined according to the importance degree sequence.

Description

technical field [0001] One or more embodiments in this application relate to the technical field of data mining, and in particular, relate to a method, device, electronic device and storage medium for mining keywords. Background technique [0002] In the existing technology, the project text plagiarism check work is faced with a large volume of text and high granularity of the project text. Quickly retrieving similar documents in the document database has become the primary problem in improving the accuracy and efficiency of the plagiarism check work. Since scientific and technological projects or scientific research documents usually revolve around several keywords, and the keywords reflect the gist of the text description to a certain extent, it is only necessary to find and compare the industry keywords found in each text to test the consistency of the text. similarities between. [0003] For Chinese text, it has a more complex organizational structure than English text,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F40/289G06N3/04
CPCG06F16/3344G06F40/289G06N3/044G06N3/045Y02D10/00
Inventor 熊永平曹滔宇朱承治谷纪亭徐翀
Owner BEIJING UNIV OF POSTS & TELECOMM