Unlock instant, AI-driven research and patent intelligence for your innovation.

Entity similarity matching method and system

A matching method and entity technology, which can be applied in the fields of instruments, electrical digital data processing, computer components, etc., can solve problems such as low efficiency, and achieve the effect of improving efficiency

Inactive Publication Date: 2021-01-29
SICHUAN CHANGHONG ELECTRIC CO LTD
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The present invention aims to solve the problem of low efficiency existing in existing short text similarity matching methods, and proposes an entity similarity matching method and system

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Entity similarity matching method and system
  • Entity similarity matching method and system
  • Entity similarity matching method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0034] The entity similarity matching method described in the embodiment of the present invention, such as figure 1 shown, including the following steps:

[0035] Step S1, initialize entity index table HASH_ENT, word index table HASH_CHAR and stop word and high-frequency word table STTF_WORD, described entity index table HASH_ENT is used for storing all entities, and described word index table HASH_CHAR is used for storing except stop word and high-frequency word table STTF_WORD The mapping relationship between all words of high-frequency words and entities, the stop word and high-frequency word table STTF_WORD is used to store stop words and high-frequency words in entities;

[0036] In this embodiment, the entity index table HASH_ENT and the word index table HASH_CHAR use a hash index, the hashkey of the entity index table is an auto-increment number sequence, the hashvalue of the entity index table HASH_ENT is an entity; the hashkey of the word index table HASH_CHAR is a wo...

Embodiment 2

[0061] Assuming that the input of the module is: "Infernal Affairs", the expected result is to return the most similar correct result "Infernal Affairs" within 100ms. The specific implementation process of the embodiment of the present invention is as follows.

[0062] Step A, initialize the entity index table HASH_ENT, the word index table HASH_CHAR and the stop word and high-frequency word table STTF_WORD, assuming that there are three title entities in the entity index table HASH_ENT: Infernal Affairs, Infernal Affairs, and Tomb Notes. Entity index table HASH_ENT such as image 3 As shown, the word index table HASH_CHAR such as Figure 4 As shown, the stop word and high frequency word table STTF_WORD such as Figure 5 shown;

[0063] Step B, receive the input "no gap to", and disperse the string sequence as ["none", "between", "to"], and filter stop words and high-frequency words;

[0064] Step C, search the word index table HASH_CHAR, and get the result: ["none":"0_3,1_...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of natural language processing, aims to solve the problem of low efficiency of an existing short text similarity matching method, and provides an entity similarity matching method and system, and the scheme is summarized as follows: initializing an entity index table, a word index table and a stop word and high-frequency word table; receiving an input character string, filtering a stop word and a high-frequency word in the character string, and then segmenting the character string into a character string sequence; retrieving in a word index table according to the character string sequence to obtain retrieval results, summarizing and sequencing the retrieval results, and selecting first N retrieval results from the retrieval results; respectivelycalculating retrieval similarities of the N retrieval results, and selecting the retrieval results of which the retrieval similarities are greater than a preset value from the N retrieval results; and retrieving in the entity index table according to a retrieval result that the retrieval similarity is greater than a preset value to obtain a corresponding entity character string, and determining atarget entity with the highest similarity from the entity character string. According to the invention, the entity similarity matching efficiency is improved.

Description

technical field [0001] The present invention relates to the technical field of natural language processing, in particular to an entity similarity matching method and system. Background technique [0002] Long text similarity algorithms and retrieval algorithms currently have some relatively mature algorithms, such as Google's simhash algorithm, which can efficiently find text similar to the target text from massive long texts, but this algorithm is not suitable for short texts. The traditional edit distance algorithm, cosine similarity algorithm, etc. can be applied to short text similarity matching, but as the amount of data increases, the matching efficiency gradually decreases. After testing, it is found that the edit distance algorithm is more efficient in finding similar text in 1000 short texts , more than 1000, the efficiency is not high, because the edit distance algorithm needs to compare the input short text with all short texts in the database, and the calculation...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62G06F16/335G06F16/31G06F16/338
CPCG06F16/325G06F16/335G06F16/338G06F18/22
Inventor 周杰
Owner SICHUAN CHANGHONG ELECTRIC CO LTD