Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and device for obtaining lexical item paragraph association weights

A technique for any paragraph or paragraph, applied in the field of obtaining the associated weights of terms and paragraphs, which can solve the problem of not considering the difference of document representation.

Active Publication Date: 2020-09-01
CENT SOUTH UNIV
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In order to solve the above-mentioned problems in the prior art that do not consider the differences in the representation of documents by the adjacency relationship between the paragraph difference and adjacency relationship of the term in the same structural position of the document, the present invention provides a method and device for obtaining the association weight of the term and paragraph

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for obtaining lexical item paragraph association weights
  • Method and device for obtaining lexical item paragraph association weights

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment 1

[0060] See attached figure 1 , the method for obtaining term-paragraph association weights in the first embodiment, comprising the steps of:

[0061] A1. Based on a plurality of pre-set terms, the number of the document structure position where the term is located, the number of the paragraph in the document structure position where the term is located, and the weight of the term, obtain and The number of terms in any paragraph in the document structure position corresponding to the number of the document structure position and the total weight of all terms in the paragraph.

[0062] Wherein, the numbers of the paragraphs correspond to the order of the paragraphs in the document structure position where the paragraphs are located.

[0063] A2. Based on the number of terms in any paragraph in the document structure position corresponding to the number of the document structure position and the total weight of all terms in the paragraph, obtain the preset multiple terms Paragr...

specific Embodiment 2

[0121] In order to better explain the present invention, refer to the appended figure 2 , in this embodiment, the term document paragraph position table is input into the computer in advance, and the table will be described first.

[0122] In this embodiment, the input is the term document paragraph position table words_list of a specific document, which is a database table containing all terms extracted from a specific document and its document paragraph position information, and the term of each specific number in the table is in the document There may be multiple records in different paragraphs of the same structure, or different sentences in the same paragraph. See Table 1 for specific field definitions.

[0123] Table 1 Definition of term document paragraph position table

[0124] Field Name field meaning Field Type field description word_id term number INTEGER A unique number for a particular term word_weight term basic weight DECIMA...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a method and device for obtaining lexical item paragraph association weights. The method comprises the following steps: A1, based on a plurality of preset lexical items, the serial numbers of the document structure positions where the lexical items are located, the serial numbers of paragraphs in the document structure positions where the lexical items are located and theweights of the lexical items, obtaining the number of lexical items in any paragraph in the document structure position corresponding to the serial number of the document structure position and the total weight of all lexical items in the paragraph, wherein the numbers of the paragraphs correspond to the sequence of the paragraphs in the document structure positions where the paragraphs are located; a2, based on the number of the lexical items in any paragraph in the document structure position corresponding to the serial number of the document structure position and the total number of the weights of all the lexical items in the paragraph, obtaining the paragraph association weight of any lexical item in the plurality of preset lexical items.

Description

technical field [0001] The present invention relates to the technical field of document extraction, and in particular to a method and device for acquiring term-paragraph association weights. Background technique [0002] At present, most Chinese text classification systems use words as feature items, called feature words. These feature words are used as the intermediate representation of the document, and are used to realize the similarity calculation between documents and documents, documents and user targets. Usually, the score value of each feature is calculated according to a feature evaluation function, and then these features are sorted according to the score value, and several highest score values ​​are selected as feature words. [0003] The most commonly used and effective text representation method is to establish a term-document matrix. Each element value in the term-document matrix represents the weight of the term on the corresponding row to the document on th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/33G06F40/216G06F40/284
CPCG06F16/3344G06F40/216G06F40/284Y02P90/30
Inventor 邓吉秋路馥毓李晨菡
Owner CENT SOUTH UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products