Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Text subject indexing method and device, electronic device and computer storage medium

A topic indexing and text technology, applied in the field of text processing, can solve the problems of long training time, poor versatility, and difficult application, and achieve the effect of reducing the amount of calculation, reducing the number of comparisons, and improving efficiency

Active Publication Date: 2020-01-24
INST OF SCI & TECHN INFORMATION OF CHINA
View PDF4 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, during the implementation process, the inventors of the present application found that: the accuracy rate of the statistical indexing method is low, and the selected indexing words cannot represent the content of the article well; The results are directly affected by the performance of the "rule base". Due to the complexity and flexibility of Chinese, the pre-defined rules often do not have a high degree of coverage and require a lot of manual intervention. Synonym recognition, word meaning disambiguation, etc., lead to poor versatility and greater difficulty in application; automatic indexing methods based on machine learning need to train multiple classifiers for different types of data, the training time is long, and there are problems of data sparseness and overtime. Fitting learning problems, unable to adapt to the labeling of large-scale controlled vocabularies

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text subject indexing method and device, electronic device and computer storage medium
  • Text subject indexing method and device, electronic device and computer storage medium
  • Text subject indexing method and device, electronic device and computer storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0067] Embodiments of the present application are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary only for explaining the present application, and are not construed as limiting the present application.

[0068] Those skilled in the art will understand that unless otherwise stated, the singular forms "a", "an", "said" and "the" used herein may also include plural forms. It should be further understood that the word "comprising" used in the specification of the present application refers to the presence of the features, integers, steps, operations, elements and / or components, but does not exclude the presence or addition of one or more other features, Integers, steps, operations, elements, components, and / or groups thereof. It will be under...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention relates to the technical field of text processing, and discloses a text subject indexing method and device, an electronic device and a computer storage medium, and thetext subject indexing method comprises the steps: determining a text word list of a to-be-indexed text; determining a text representation vector of the to-be-indexed text based on a predetermined wordvector library according to the text word list; then, based on a mapping table between subject words and common words, which is pre-established according to the controlled word table, determining thesubject words of which the association strength with any text word is greater than a first preset threshold value as the subject words of any text word to obtain the subject words corresponding to the text words respectively; determining a target subject word of the to-be-indexed text according to the text representation vector and the subject word corresponding to each text word, and performingsubject indexing on the to-be-indexed text through the target subject word. Therefore, the operand is greatly reduced, the comparison frequency is effectively reduced, and the text topic indexing efficiency is greatly improved.

Description

technical field [0001] The embodiments of the present application relate to the technical field of text processing, and specifically, the present application relates to a text subject indexing method, device, electronic equipment, and computer storage medium. Background technique [0002] Automatic subject indexing generally refers to the process of using a computer system to analyze, discover and extract key words used to reveal the content of documents from various elements of document composition, such as titles, keywords, abstracts and texts. Indexed documents can be expanded from papers to other forms of electronic documents such as patents, books, and web page texts. Subject indexing can be done manually or by machine. [0003] At present, automatic subject indexing methods can be divided into three categories according to technology: statistical indexing, linguistic analysis indexing and machine learning indexing. The main idea of ​​statistical indexing is that the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/216G06F40/30G06F40/169G06K9/62
CPCG06F18/22Y02D10/00
Inventor 韩红旗薛陕刘志辉张运良悦林东高雄
Owner INST OF SCI & TECHN INFORMATION OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products