Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and system for determining word similarity

A similarity and similarity calculation technology, which is applied in natural language data processing, instruments, electrical digital data processing, etc., can solve the problem of the decrease in the accuracy of similarity calculation

Pending Publication Date: 2022-03-15
上海影谱科技有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] For this reason, the embodiment of the present invention provides a method and system for determining word similarity, so as to solve the existing calculation method of word similarity with low computing resources, which only pays attention to the arrangement order of each word. When the phrases are in different order, the accuracy of the similarity calculation will decrease

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for determining word similarity
  • Method and system for determining word similarity
  • Method and system for determining word similarity

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0053] Such as figure 1 As shown, this embodiment proposes a method for determining word similarity.

[0054]In this embodiment, the method is specifically described by taking two short sentences "separate wall-mounted air conditioner" and "household air conditioner" that need to calculate word similarity as examples.

[0055] The method includes:

[0056] S100. Preprocessing the short sentence to be processed.

[0057] Specifically include: removing English and numbers in short sentences, and removing Chinese stop words in short sentences.

[0058] Remove English and numbers in short sentences, and remove Chinese stop words, such as: "Yes", "I", etc., to reduce the amount of calculation in subsequent operations. There are no parts to be cleared in the two example short sentences of this embodiment.

[0059] S200. Calculate the first similarity of the word granularity of the two short sentences.

[0060] Specifically include:

[0061] Obtain the number of the same Chines...

Embodiment 2

[0095] Corresponding to Embodiment 1 above, this embodiment proposes a system for determining word similarity, which includes:

[0096] The preprocessing module is used to preprocess short sentences to be processed;

[0097] The first similarity calculation module is used to calculate the first similarity of the word granularity of two short sentences;

[0098] The second similarity calculation module is used to calculate the second similarity of the phrase granularity of two short sentences;

[0099] The length comparison value calculation module is used to compare the length of two short sentences, and calculates the length comparison value of the two short sentences;

[0100] The final similarity acquisition module is used to perform weighted calculation of the first similarity, second similarity and length comparison value of the two short sentences, and obtain the final similarity of the two short sentences after performing data normalization.

[0101] The functions per...

Embodiment 3

[0103] Corresponding to the above embodiments, this embodiment proposes a computer storage medium, which contains one or more program instructions, and the one or more program instructions are used to be executed by a system for determining word similarity such as The method of Example 1.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses a method and system for determining word similarity, which considers the condition that two short sentences contain similar phrases but are different in sequence, and is a word similarity calculation method using relatively low calculation resources and relatively high precision for a specific business scene. According to the method, the characteristic that in the short sentences for explaining characters, the influence of position change of phrases on information conveyed by the whole short sentences is far smaller than the influence of position change between words is utilized, the similarity of word granularity between two short sentences is compared, and the similarity and length difference of phrase granularity between two short sentences are also compared.

Description

technical field [0001] Embodiments of the present invention relate to the technical fields of natural language processing and word similarity calculation, and in particular to a method and system for determining word similarity. Background technique [0002] In the field of artificial intelligence, natural language processing is a very basic research. Research on natural language processing includes sentiment analysis, named entity recognition, part-of-speech tagging, text clustering, Chinese word segmentation and many other popular issues. Among them, Chinese word segmentation can divide sentences into phrases for computer recognition. [0003] With the rapid development of computer technology and the Internet, the data that computers need to process has increased dramatically. Traditional word similarity calculations use dictionaries, word vectors, and other means that consume a lot of computing resources to obtain high-precision similarity results, or use the intersecti...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/194G06F40/289G06K9/62
CPCG06F40/194G06F40/289G06F18/22
Inventor 孙其凡
Owner 上海影谱科技有限公司