Check patentability & draft patents in minutes with Patsnap Eureka AI!

Word vector acquisition method and system, electronic equipment and storage medium

An acquisition method and acquisition system technology, which are applied in the fields of systems, word vector acquisition methods, electronic equipment and storage media, can solve the problems of vectorization, acquisition of word vectors, and large resource consumption, so as to avoid cascading errors, Purpose and advantages Simple and easy to understand, the effect of improving processing power

Inactive Publication Date: 2021-08-13
BEIJING XUEZHITU NETWORK TECH
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The embodiment of the present application provides a word vector acquisition method, system, electronic equipment and storage medium, so as to at least solve the problem that the acquisition of word vector needs to consume a lot of resources to train and cannot directly acquire word vector based on existing resources through the present invention. And it is impossible to directly vectorize text based on word vectors, word segmentation is required, which leads to problems such as word segmentation errors and cascading errors

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word vector acquisition method and system, electronic equipment and storage medium
  • Word vector acquisition method and system, electronic equipment and storage medium
  • Word vector acquisition method and system, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0058] This embodiment provides a method for acquiring word vectors. Please refer to Figure 1 to Figure 2 , figure 1 It is a flowchart of a method for obtaining word vectors according to an embodiment of the application; figure 2 It is a frame diagram of a method for obtaining a word vector according to an embodiment of the present application. As shown in the figure, the method for obtaining a word vector includes the following steps:

[0059] Dictionary building step S1: extract words from the pre-trained word vector model, segment the words into words, and process the words, and use the processed words to form a dictionary;

[0060] Chi-square acquisition step S2: counting the co-occurrence frequency of the word, and calculating the chi-square of the word according to the co-occurrence frequency;

[0061] Word vector calculation step S3: perform weighted calculation on the word chi-square to obtain a word vector.

[0062] In an embodiment, the dictionary building step...

Embodiment 2

[0102] Please refer to Figure 3 to Figure 4 , image 3 is a device frame diagram for obtaining word vectors according to an embodiment of the present application; Figure 4 It is a schematic structural diagram of the word vector acquisition system of the present invention. Such as Figure 3 to Figure 4 As shown, the inventive word vector acquisition system is applicable to the above-mentioned word vector acquisition method, and the word vector acquisition system includes:

[0103] Dictionary building unit 51: extract words from the pre-trained word vector model, segment the words into characters, and after processing the words, use the processed words to form a dictionary;

[0104] Chi-square acquisition unit 52: count the co-occurrence frequency of the word, and calculate the chi-square of the word according to the co-occurrence frequency;

[0105] Word vector calculation unit 53: perform weighted calculation on the word card to obtain a word vector.

[0106] In an embo...

Embodiment 3

[0115] combine Figure 5 As shown, this embodiment discloses a specific implementation manner of an electronic device. The electronic device may include a processor 81 and a memory 82 storing computer program instructions.

[0116] Specifically, the processor 81 may include a central processing unit (CPU), or an Application Specific Integrated Circuit (ASIC for short), or may be configured to implement one or more integrated circuits in the embodiments of the present application.

[0117] Among them, the memory 82 may include mass storage for data or instructions. For example without limitation, the memory 82 may include a hard disk drive (Hard Disk Drive, referred to as HDD), a floppy disk drive, a solid state drive (SolidState Drive, referred to as SSD), flash memory, optical disk, magneto-optical disk, magnetic tape or universal serial bus (Universal Serial Bus, referred to as USB) drive or a combination of two or more of the above. Storage 82 may comprise removable or n...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a word vector acquisition method and system, electronic equipment and a storage medium, and the word vector acquisition method comprises the following steps: a dictionary building step: extracting words from a pre-training word vector model, segmenting the words into words, processing the words, and forming a dictionary by using the processed words; a chi-square obtaining step: counting the co-occurrence frequency of the character, and calculating the chi-square of the character according to the co-occurrence frequency; and a word vector calculation step: carrying out weighted calculation on the word chi-square to obtain a word vector. When the method is applied to text analysis, the text can be directly vectorized based on the word vector, word segmentation is not needed, and therefore cascade errors caused by word segmentation errors are avoided.

Description

technical field [0001] The present application relates to the technical field of deep learning, and in particular to a method, system, electronic device and storage medium for acquiring word vectors. Background technique [0002] In recent years, with the development of deep learning technology, deep learning has become a necessary research method for most tasks in natural language processing, and plays a vital role in text representation, text classification, sentiment classification, automatic summarization, etc. effect. Especially in terms of text representation, almost all natural language processing application tasks require text representation, that is, text vectorization. However, there is a big difference between Chinese and English in writing. In the English text, spaces are used as intervals between words, which naturally divide words. However, there is no clear distinction between words and words in Chinese texts, and all words are continuous, that is, Chinese ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/242G06F40/216G06F40/284G06F40/289
CPCG06F40/216G06F40/242G06F40/284G06F40/289
Inventor 梁吉光徐凯波
Owner BEIJING XUEZHITU NETWORK TECH
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More