Unlock instant, AI-driven research and patent intelligence for your innovation.

Word vector generation method and device supporting polarity differentiation and polysense

A word vector and polarity technology, applied in the field of word vector generation method and device that supports polarity distinction and polysemy, can solve the problem of easy matching errors of word vectors, achieve the effect of improving the impact of matching results and solving matching errors

Active Publication Date: 2019-04-12
安徽省泰岳祥升软件有限公司
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] This application provides a method and device for generating word vectors that support polarity distinction and polysemy, to solve the problem of word vectors constructed by traditional methods having polysemy and the problem of easy matching errors in the case of antonyms

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word vector generation method and device supporting polarity differentiation and polysense
  • Word vector generation method and device supporting polarity differentiation and polysense
  • Word vector generation method and device supporting polarity differentiation and polysense

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0076] see figure 1 , is a flow diagram of a word vector generation method that supports polarity discrimination and polysemy. In this embodiment, the method for generating a word vector according to a resource file includes the following steps:

[0077] S101: Obtain a word vector model and a resource file in the current business scenario, where the resource file includes multiple sememes corresponding to the semantics in the current business scenario.

[0078] In this embodiment, after determining which business field the text information to be processed belongs to, it is first necessary to obtain the word vector model in the current business scenario, generally by calling the established word vector model in the server or database. The word vector model here refers to a set composed of a large number of word vectors, that is, through the training corpus and the association relationship between the words appearing in the business documents in the current business scenario, t...

Embodiment 2

[0117] The difference between this embodiment and Embodiment 1 is that, if image 3 As shown, in the step of determining the operation weight according to the semantic information and the set target word calculation value, including:

[0118] S201: Count the semantic information, including all sememes corresponding to the semantics with the largest number of sememes and the number of occurrences of each sememe;

[0119] S202: Determine the total value of weight calculation according to the total number of occurrences of all the sememes in the semantics containing the largest number of sememes and the sum of the calculated value of the target word;

[0120] S203: Calculate the ratio of the number of occurrences of each sememe to the total value, and determine the operation weight of each sememe and the operation weight of the target word.

[0121] In this embodiment, determining the calculation weight needs to select the semantic information that contains the largest number of...

Embodiment 3

[0138] see Figure 5 , in this embodiment, the method for generating word vectors includes the following steps:

[0139] S301: Obtain a word vector model and a resource file in the current business scenario, and acquire a sentence text containing the target word, and the resource file includes multiple sememes corresponding to the semantics in the current business scenario;

[0140] S302: Determine the original word vector corresponding to the target word according to the word vector model; extract the semantic information corresponding to the target word in the resource file, the semantic information includes a plurality of sememes under semantics and each the number of occurrences of the sememe;

[0141] S303: Determine a set of adjacent words of the target word in the sentence text, where the set of adjacent words is a set of multiple words adjacent to the target word in the sentence text;

[0142] S304: According to the adjacent word set and the semantic information, det...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a word vector generation method and device supporting polarity differentiation and polysense, and the method comprises the steps: carrying out the weighted operation of the values of all dimensions in a target word vector according to an established word vector and a resource file in a current business scene, and generating a new word vector; the method comprises the following steps: selecting all meaning element numbers in a resource file; wherein the semantic meaning comprises the semantic meaning with the maximum number of the semantic meaning and the semantic meaningunder the most relevant semantics, the operation weight of each dimension value in the new word vector is determined, and therefore weighted summation is conducted on the word vector of the target word and the word vector of the semantic meaning according to the operation weight, a new word vector is obtained, and the real semantics are determined. According to the method, the new word vector canbe dynamically generated; the new word vector reflects the actual semantic characteristics more accurately, and the operation weight is determined on the basis of the semantic information, so that the influence of the ambiguity and one word polysemy on the matching result can be remarkably improved, and the problem that the word vector constructed by the traditional method is easy to match mistakenly under the conditions of one word polysemy and one word polysemy is solved.

Description

[0001] This application claims the priority of the Chinese patent application submitted to the China Patent Office on June 1, 2018, with the application number 201810557309.9, and the title of the invention is "Method and device for generating word vectors supporting polarity distinction and polysemy", the entire content of which Incorporated in this application by reference. technical field [0002] The present application relates to the technical field of machine learning, in particular to a method and device for generating word vectors that support polarity distinction and polysemy. Background technique [0003] Word embedding is a way of expressing words that allows computers to understand human language through language digitization. A word vector can represent a word through a vector of a certain dimension, and reveal the relationship between the word and other words, such as [0.792,-0.177,-0.107,0.109,-0.542,…], the word vector is generally composed of words In the v...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/279G06F40/30
Inventor 杨凯程李健铨蒋宏飞
Owner 安徽省泰岳祥升软件有限公司