Unlock instant, AI-driven research and patent intelligence for your innovation.

Similarity comparison method and device based on word2vec technology

A technology of similarity and technology, applied in the field of computer software, can solve the problem of the same device with different names, and achieve the effect of saving the amount of data calculation, speeding up the work efficiency, and improving the calculation efficiency.

Active Publication Date: 2020-05-19
WUHAN OPTICS VALLEY INFORMATION TECH
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In view of this, the present invention proposes a similarity comparison method and device based on word2vec technology, aiming to solve the technical problem that the prior art cannot determine that devices with different names in the power grid industry are the same device through word2vec technology

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Similarity comparison method and device based on word2vec technology
  • Similarity comparison method and device based on word2vec technology
  • Similarity comparison method and device based on word2vec technology

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0053]

[0054]

[0055] Table 1 Encoding of words in the custom lexicon

[0056] Chinese short sentences to be compared: "10kV Huake University 1# Transformer Line" and "Hua Ke No. 1 10kV Transformer".

[0057] Through transcoding, "10kV Huake University 1# Transformer Line" eventually becomes "_1001_Hua Ke Da 1#_1002_1003_", "Hua Ke No. 1 10kV Transformer" finally becomes "_Hua Ke No. 1_1001_1002_", and the codes are sorted:

[0058] "10kV Huake University 1# Transformer Line" becomes "_1001_1002_1003_Hua Ke Da 1#_".

[0059] "Hua Ke No. 1 10kV Transformer" becomes "_1001_1002_Hua Ke No. 1_".

[0060] Because phrase 1 contains the sentence structure of phrase 2, the comparison can be continued. Since the matching has been transcoded, a similarity score is given for each matching paragraph, and only the strings that do not match the string The original Chinese part performs Chinese similarity comparison, that is, compares "_Hua Ke Da 1#_" and "_ Hua Ke 1 No. _", and r...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a similarity comparison method and device based on a word2vec technology. The method comprises the steps of obtaining a network article and a local word bank, establishing a word2vec algorithm, and training the network article through the word2vec algorithm to generate a word vector model file; obtaining a plurality of to-be-compared statements, searching the to-be-comparedstatements according to the lexicon, converting words which can be searched in the lexicon into digital codes as to-be-compared digital codes, and recording words which cannot be searched in the lexicon as to-be-compared words; and putting the words to be compared into the word vector model file, and obtaining the similarity of the encoded statements to be compared. Through the word2vec algorithmand digital coding, the same equipment with different equipment names can be effectively compared, and the working efficiency is improved.

Description

technical field [0001] The invention relates to the technical field of computer software, in particular to a similarity comparison method and device based on word2vec technology. Background technique [0002] In the initial stage of the information system construction, the functional departments of the power grid mainly work to solve the production needs of each department, which leads to the lack of data exchange between the systems. However, since 2017, the power grid has started to sort out the inventory of existing equipment. Since the names of equipment in each system are manually entered, they are called differently. There are full names, abbreviations, place names, and equipment uses, etc. The work has caused great difficulties. The initial method was mainly to export the data of each system, or use EXCEL or purely manual methods for comparison. The workload is huge, the comparison period is long, and the efficiency is low, which is unsustainable. And currently there...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/284G06F40/211G06K9/62
CPCG06F18/22Y02D10/00
Inventor 陈钢高波
Owner WUHAN OPTICS VALLEY INFORMATION TECH