Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Intelligent full-text retrieval method and system based on semantic understanding

A semantic understanding and intelligent technology, applied in unstructured text data retrieval, semantic analysis, natural language data processing, etc., can solve problems such as missing text expressions, data sparsity, lack of latitude semantic information, and difficulty in meeting user needs. To achieve the effect of improving accuracy

Active Publication Date: 2021-06-01
SHANDONG EVAYINFO TECH CO LTD
View PDF7 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The main reason is that the traditional full-text search divides the original data into words through word segmentation, and links the keywords with all the documents containing these keywords through the inverted index method. When users search, they often just quickly find The documents containing the search keywords are returned, which is only a mechanical match from the font, so a lot of information that represents the same concept but different text expressions will be missed, that is, the keywords cannot be understood semantically
For example, "Cities where four seasons are like spring", users want to get Kunming, Xiamen, Dali and other cities, but the traditional full-text search will only match articles with keywords such as "four seasons" and "city" based on keywords, which is difficult to satisfy users. real needs
[0006] In addition, most of the search fields for full-text retrieval are short texts. The uniqueness of short text information makes its classification method different from the traditional long text classification process. Scholars face the data sparsity, high latitude, semantic A series of studies have been carried out on problems such as lack of information; the existing technology applies the Deep Neural Network (DNN) method to the classification of short texts, and has achieved certain results, but it still faces some challenges. For example, most short text classification models only consider the literal meaning, which is not effective for the recognition of common polysemous words, and cannot solve the defect of short text sparsity

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Intelligent full-text retrieval method and system based on semantic understanding
  • Intelligent full-text retrieval method and system based on semantic understanding
  • Intelligent full-text retrieval method and system based on semantic understanding

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0044] In one or more implementations, an intelligent full-text retrieval method based on semantic understanding is disclosed, referring to figure 1 , including the following procedures:

[0045] (1) Cut the received search sentence into short texts, perform word segmentation operations on the short texts, and obtain the word segmentation library corresponding to the short texts;

[0046] (2) construct the semantic information vector and the dependency relationship vector of short text; Described semantic information vector comprises the central word and word sense co-occurrence words of short text;

[0047] (3) Based on the semantic information vector and dependency vector of the short text, the similarity between the short text information and the relevant information in the intelligent index database is calculated, and then the search result set is obtained.

[0048] Specifically, the process of building an intelligent index library in this embodiment specifically includes...

Embodiment 2

[0077] In one or more implementations, an intelligent full-text retrieval system based on semantic understanding is disclosed, including:

[0078] The data preprocessing module is used to cut the received search sentence into short texts, perform word segmentation operations on the short texts, and obtain the word segmentation library corresponding to the short texts;

[0079] The short text vector building block is used to construct the semantic information vector and the dependency relationship vector of the short text; the semantic information vector includes the central word and the word sense co-occurrence words of the short text;

[0080] The data index module is used to calculate the similarity between the short text information and the relevant information in the intelligent index library based on the semantic information vector and the dependency relationship vector of the short text, and then obtain the search result set.

[0081] The specific implementation manners ...

Embodiment 3

[0083] In one or more embodiments, a terminal device is disclosed, including a server, the server includes a memory, a processor, and a computer program stored on the memory and operable on the processor, and the processor executes the The program realizes the intelligent full-text retrieval method based on semantic understanding in the first embodiment. For the sake of brevity, details are not repeated here.

[0084] It should be understood that in this embodiment, the processor can be a central processing unit CPU, and the processor can also be other general-purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate array FPGA or other programmable logic devices , discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.

[0085] The memory may include read-only memor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an intelligent full-text retrieval method and system based on semantic comprehension, and the method comprises the steps: segmenting a received search statement into short texts, and carrying out the word segmentation operation of the short texts, so as to obtain a word segmentation library corresponding to the short texts; constructing a semantic information vector and a dependency relationship vector of the short text, wherein the semantic information vector comprises a head word and a word-meaning co-occurrence word of the short text; and based on the semantic information vector and the dependency relationship vector of the short text, performing similarity calculation on the short text information and related information in the intelligent index database to obtain a search result set. According to the method and system, the original data is divided into the multiple short texts to form the search text vector, and the similarity between the search text and the index database text is calculated by calling the semantic understanding interface of the artificial intelligence platform, so that the accuracy of full-text retrieval can be improved.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to an intelligent full-text retrieval method and system based on semantic understanding. Background technique [0002] The statements in this section merely provide background information related to the present invention and do not necessarily constitute prior art. [0003] Full-text retrieval takes all kinds of data, such as text, sound, image, etc. as the processing object, and provides a means to realize information retrieval according to the content of the data instead of the external characteristics. It includes two functions: data management and data query, Help users quickly manage and retrieve a large number of documents. [0004] Lucene is currently an open source project of Apache Corporation, and it is also the most popular Java-based open source search toolkit for the entire network. Lucene implements some common word segmentation algorithms and res...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F40/211G06F40/289G06F40/30
CPCG06F16/3334G06F16/3344G06F16/3347G06F40/211G06F40/289G06F40/30
Inventor 吴士伟杨春李慧娟孙露孙浩辛国茂胡传会
Owner SHANDONG EVAYINFO TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products