Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Text representation method and device based on string vector, electronic equipment and storage medium

A text representation and vector technology, which is applied in text database query, unstructured text data retrieval, electronic digital data processing, etc., can solve the problem that words cannot be semantically represented, word vectors are not well learned, and Chinese texts cannot be obtained. Less than semantic representation and other issues, to achieve the effect of avoiding semantic loss

Pending Publication Date: 2021-09-03
BEIJING MININGLAMP SOFTWARE SYST CO LTD
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Incorrect word segmentation will cascade the error to the word vector link, resulting in poor learning of the word vector, that is, the words cannot be effectively represented semantically, and eventually the Chinese text will not be effectively represented semantically
[0003] For the above problems, no effective solution has been proposed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text representation method and device based on string vector, electronic equipment and storage medium
  • Text representation method and device based on string vector, electronic equipment and storage medium
  • Text representation method and device based on string vector, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] In order to enable those skilled in the art to better understand the solution of the present application, the technical solution in the embodiment of the application will be clearly and completely described below in conjunction with the accompanying drawings in the embodiment of the application. Obviously, the described embodiment is only It is an embodiment of a part of the application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the scope of protection of this application.

[0033] It should be noted that the terms "first" and "second" in the specification and claims of the present application and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific order or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a text representation method and device based on a string vector, electronic equipment and a storage medium, and the method comprises the steps of obtaining a target text which is a text to be subjected to text vectorization representation; performing character string matching on the target text through the multi-mode string matching model, obtaining a first character string set and a second character string set, wherein the first character string set comprises character strings matched by the multi-mode string matching model in the target text, and the second character string set comprises character strings not matched by the multi-mode string matching model in the target text; splicing the character strings in the first character string set and the character strings in the second character string set into a plurality of fragment sentences according to the positions in the target text; and determining a text vector of the target text by using the sentence vectors of the plurality of fragmented sentences. According to the invention, the technical problem of semantic loss caused by the fact that word segmentation needs to be performed on a Chinese text in a text representation method in related technologies is solved.

Description

technical field [0001] The present application relates to the field of natural language processing, and in particular to a text representation method and device based on string vectors, electronic equipment, and storage media. Background technique [0002] Natural Language Processing (NLP, Natural Language Processing) is an important direction in the field of computer science and artificial intelligence. Related technologies mainly conduct vectorized research on words and characters, especially word vectors are favored. Word vector research takes single character as the research granularity, only considers the co-occurrence relationship between characters, and does not make use of the semantic information carried by words, so it has not been paid attention to in the research of text vector representation. Word vector is a crucial technical means of semantic representation in natural language processing in the past ten years. Its semantic representation effect is far superio...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F16/903G06F40/242G06F40/289G06F40/30G06K9/62
CPCG06F16/3344G06F16/90344G06F40/242G06F40/289G06F40/30G06F18/214
Inventor 梁吉光徐凯波
Owner BEIJING MININGLAMP SOFTWARE SYST CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products