Supercharge Your Innovation With Domain-Expert AI Agents!

Corpus vectorization processing method and device, computer equipment and storage medium

A processing method and technology of a processing device, applied in the field of data processing, can solve problems such as lack of structured word processing, and achieve the effect of improving practical effects

Pending Publication Date: 2021-03-12
ZTE CORP
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The main purpose of the present invention is to provide a corpus vectorization processing method, device, computer equipment and storage medium for processing structured text, at least to solve the technical problem of lack of structured text processing in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Corpus vectorization processing method and device, computer equipment and storage medium
  • Corpus vectorization processing method and device, computer equipment and storage medium
  • Corpus vectorization processing method and device, computer equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] In order to more clearly illustrate the technical solutions of the embodiments of the present application, the accompanying drawings used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of the present invention, and therefore are not It should be regarded as a limitation on the scope, and those skilled in the art can also obtain other related drawings based on these drawings without creative work.

[0022] Such as figure 1 As shown, a corpus vectorization processing method provided by an embodiment of the present invention may include the following steps.

[0023] Step S11 , extracting text data of a set type from the corpus, and performing word segmentation processing on the text data to obtain word segmentation data.

[0024] The corpus vectorization processing method provided in this embodiment is applied to the processing of corpus. The corpus can be, but not limited to: single sent...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a corpus vectorization processing method and device, computer equipment and a storage medium, and relates to the technical field of data processing, and the method comprises thefollowing steps: extracting text data of a set type from a corpus, and carrying out word segmentation processing on the text data to obtain word segmentation data; constructing a structural representation vector corresponding to the word segmentation data; and determining an implicit strata word meaning representation vector corresponding to the word segmentation data based on the structural representation vector. Corpus vectorization representation is carried out by utilizing the structured information, so semantic features of structured characters can be represented, the method can be applied to related algorithms of natural language processing in a more fit manner, and the practical effects of related models and algorithms of the method are improved.

Description

technical field [0001] The present invention relates to the technical field of data processing, in particular to a corpus vectorization processing method, device, computer equipment and storage medium. Background technique [0002] Today's image processing and natural language processing solutions all use architectures based on deep learning or neural networks. The granularity of image processing is based on pixels, and each pixel has specific image numerical information (RGB value, gray value, transparency, etc.) representation, so images can be easily applied to deep learning and neural networks. Different from the field of image processing, the granularity of natural language processing is based on text. Although each text has a corresponding code, it cannot show the characteristics of the text. It is meaningless to directly use the text code. Therefore, in the field of natural language processing, the representation of text, that is, text vectorization technology, has b...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/30G06F40/289G06N3/04G06N3/08
CPCG06N3/084G06N3/045
Inventor 胡恒
Owner ZTE CORP
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More