Paragraph vectorization method and device

A paragraph and vector technology, applied in the field of paragraph vectorization, can solve the problem that sentences cannot reflect the structural characteristics of normative text content

Active Publication Date: 2019-12-03
BEIJING GRIDSUM TECH CO LTD
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The embodiment of the present invention provides a method and device for paragraph vectorization to at least solve the problem in the prior art that when vectorizing paragraphs, the distance is calculated based on the context of words and sentences, and then calculated by methods such as clustering. The technical problem that the vector of the sentence cannot reflect the content and structure characteristics of the normative text

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Paragraph vectorization method and device
  • Paragraph vectorization method and device
  • Paragraph vectorization method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0018] According to an embodiment of the present invention, a method embodiment of a paragraph vectorization method is provided. It should be noted that the steps shown in the flow chart of the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions, Also, although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in an order different from that shown or described herein.

[0019] figure 1 is a paragraph vectorization method according to an embodiment of the present invention, such as figure 1 As shown, the method includes the following steps:

[0020] Step S102, constructing a feature set including a plurality of feature words.

[0021] Specifically, feature words are words that can characterize text to a certain extent. The present invention is mainly aimed at the vectorization of paragraphs. Therefore, the feature words in the feature set in the present invention are...

Embodiment 2

[0056] According to an embodiment of the present invention, a product embodiment of a paragraph vectorization device is provided, image 3 is a device for paragraph vectorization according to an embodiment of the present invention, such as image 3 As shown, the device includes a construction module 101 , a transformation module 103 and a vectorization module 105 .

[0057] Wherein, the construction module 101 is used to construct a feature set comprising a plurality of feature words; the conversion module 103 is used to replace the words in the paragraph to be processed based on the preset knowledge base to obtain a converted paragraph; the vectorization module 105 is used to The words belonging to the feature set in the converted paragraph are used as the features of the converted paragraph, and the converted paragraph is vectorized.

[0058] In the embodiment of the present invention, the construction module 101 pre-constructs a feature set including a plurality of feature...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a paragraph vectorization method and device. Wherein, the method includes: constructing a feature set including a plurality of feature words; replacing the words in the paragraph to be processed based on the preset knowledge base to obtain the converted paragraph; using the words belonging to the feature set in the converted paragraph as the words of the converted paragraph feature to vectorize the transformed paragraph. The present invention solves the technical problem in the prior art that when vectorizing paragraphs, the distance is calculated based on the context of words and sentences, and then the vectors of the sentences calculated by clustering and other methods cannot reflect the content structure characteristics of normative texts .

Description

technical field [0001] The present invention relates to the field of natural language processing, in particular, to a method and device for paragraph vectorization. Background technique [0002] The vectorization of natural language is an arduous task of NLP (Natural Language Processing, Natural Language Processing) technology, and it is the basis for using various natural language models. The quality of vectorization directly affects the final accuracy rate. Although many companies are using various vectorization technologies, and there are certain vectorization tools in the open source platform, including word2vector, sentence2vector, etc., it is difficult to use a unified abstraction method for different document characteristics and different needs. Find out the feature points that are really needed in the demand. For example, when parsing normative texts such as legal documents, it is necessary to divide the small paragraphs in the text into large paragraphs according t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/22G06F17/27G06F16/35G06F40/237
CPCG06F16/35G06F40/151G06F40/205G06F40/237G06F16/00G06F40/12
Inventor 石鹏姜珂
Owner BEIJING GRIDSUM TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products