Lightweight text fuzzy search method supporting semantic association

A technology of fuzzy search and search method, applied in the direction of unstructured text data retrieval, semantic analysis, semantic tool creation, etc., can solve the problems of insufficient lightweight, false filtering, fuzzy search of sentence-level features, etc., to reduce memory burden, Calculation process optimization, the effect of optimizing the search process

Active Publication Date: 2020-05-08
深圳前海黑顿科技有限公司
View PDF4 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004]1. Most of the current fuzzy text searches do not reflect the real fuzzy search well. Simply speaking, the degree of fuzziness is relatively low and cannot be well supported Semantic association, such as searching for synonyms and associated words of keywords, so the synonyms of keywords will be filtered out, but in actual applications, it may need to be retained, which causes false filtering and lowers the recall rate
Moreover, when searching for keywords or key sentences in relatively long texts, due to the use of more violent methods to process the texts, the efficiency is relatively low, that is to say, it is not lightweight enough;
[0005]2. Most of the current text fuzzy search has not solved the two main problems of character string fuzzy matching: space problem and time problem. With a large amount of calculation and storage, the existing fuzzy matching algorithms often cannot meet the actual online needs in terms of time complexity and space complexity;
[0006]3. Most of the current text fuzzy search cannot capture the sentence-level features. If there is no text to be searched in the searched text, the search result is empty
However, there may be texts with similar meanings to the texts that need to be searched. In practical applications, if this happens, the search results are often not expected to be empty, but the texts with similar meanings are returned as results.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Lightweight text fuzzy search method supporting semantic association
  • Lightweight text fuzzy search method supporting semantic association
  • Lightweight text fuzzy search method supporting semantic association

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0062] Long text S1 = 'Hi everyone! Today I want to introduce my father to you. My father is a teacher and he is of medium height. A pair of piercing big eyes, and a few silver threads in the black and beautiful hair, it looks very old, but it looks full of vitality! '.

[0063] For short texts Q1 = 'big eyes' and Q2 = 'big eyes'

[0064] Call the encapsulated interface:

[0065] bluE(S, Q, autoSplit, isImagine, stop_words)

[0066] Search short text Q in long text S: result1=bluE(S=S1, Q=Q1), result2=bluE(S=S1, Q=Q2)

[0067] The returned results are:

[0068] {'match_str': 'a pair of piercing eyes', 'position': [33, 43], 'similarity': 0.4238}

[0069] {'match_str': 'big eyes', 'position': [39, 44], 'similarity': 0.3797}

[0070] The distributions of convolution scores corresponding to S1 in Q1 and Q2 are respectively (Note: At this time, the convolution operation has been screened in advance, and S_conv with fewer non-zero value units is not operated, resulting in on...

Embodiment 2

[0074] An example of using autoSplit:

[0075] Long text S2='Johann Carl Friedrich Gauss (Johann Carl FriedrichGauss) was a German mathematician, he had great research on number theory, algebra, statistics, analysis, differential geometry, geodesy, geophysics, mechanics, electrostatics Significant advances have been made in many fields, including physics, astronomy, matrix theory, and optics. Gauss has pointed out that the geometric construction of regular triangles, regular quadrilaterals, regular pentagons, regular pentagons, and regular polygons whose number of sides is twice the number of sides mentioned above can be realized with compass and straightedge, but from that Since then, research on this issue has not made much progress. On the basis of number theory, Gauss proposed a criterion for judging whether a regular polygon with a given number of sides can be geometrically constructed. For example, a regular heptagon can be inscribed in a circle with compass and ruler....

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a lightweight text fuzzy search method supporting semantic association. According to the method, a traditional statement retrieval algorithm is improved, statements which are completely consistent with target statements and are high in similarity can be retrieved, and approximate values of the statements and the target statements can be flexibly adjusted; the operation speed is high: a traditional violent enumeration algorithm is abandoned, and methods such as semantic atlas, convolution and dynamic programming are used, so that the search process is optimized, and thesearch speed is greatly increased and the size of the system is reduced, internal and external optimization is performed for lightweight users and use scenes, the whole calculation process is optimized, and the memory burden is reduced. The invention further provides a set of association mode without field operation, a user can call the association module in fuzzy search, but does not need to occupy the local computing power; the system is flexible, a user can easily and flexibly call different applications, and the whole algorithm module is subjected to interface packaging.

Description

technical field [0001] The invention relates to the related field of text fuzzy search, in particular to a light-weight text fuzzy search method supporting semantic association. Background technique [0002] Text fuzzy search is applied in many places, especially nowadays the network is increasingly developed, and the amount of text information generated on the network is also growing explosively. As a result, harmful information and information causing instability are increasingly flooded. Therefore, on public network platforms, many contents need to be censored before they can be displayed. In the early days of Internet censorship, most of them were reviewed manually, which was very inefficient, and compared with the speed of Internet text generation, this efficiency was even more insignificant. Therefore, many scholars and companies pay more attention to the fuzzy search of text, that is, to fuzzily find a given keyword or key sentence in a large amount of text informati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F16/36G06F40/247G06F40/289G06F40/30
CPCG06F16/334G06F16/367
Inventor 裴正奇黄梓忱段必超段朦丽朱斌斌
Owner 深圳前海黑顿科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products