Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Word vector-based high-efficiency semantic expansion retrieval method and device and storage medium

A word vector, high-efficiency technology, applied in the field of high-efficiency semantic expansion retrieval method, device and storage medium based on word vector, can solve the problem that the meanings of synonyms and word segmentation are very different, the retrieval results of invalid words are interfered, and it is difficult to cover new words in time. Hot words and other issues, to improve the effect of word segmentation, expand the scope of semantics, and achieve the effect of information supplementation

Pending Publication Date: 2022-03-01
BANK OF COMMUNICATIONS
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to maintenance costs and other reasons, first of all, the number of words included in this type of dictionary is limited, and generally only the query of high-frequency words can be guaranteed, and proper nouns in specific fields may not be supplemented. Secondly, its update frequency will be very high. Low, difficult to cover new words and hot words in time
Second, its response speed will be very slow
[0005] In this regard, some existing technologies have proposed a method of pre-establishing indexes, which can improve the response speed, but at present, very few of them are used in professional fields. The problem is that many of the vocabulary in the professional field are niche words. Many synonyms and participle have very different meanings. In addition, during the retrieval process, if the input retrieval formula is a sentence, too many invalid words will interfere with the retrieval results.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word vector-based high-efficiency semantic expansion retrieval method and device and storage medium
  • Word vector-based high-efficiency semantic expansion retrieval method and device and storage medium
  • Word vector-based high-efficiency semantic expansion retrieval method and device and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments. This embodiment is carried out on the premise of the technical solution of the present invention, and detailed implementation and specific operation process are given, but the protection scope of the present invention is not limited to the following embodiments.

[0040] A high-efficiency semantic extended retrieval method based on word vectors. The method can be attached to a storage medium in the form of a computer program, and is implemented by a high-efficiency semantic extended retrieval device (computer system) based on word vectors.

[0041] Such as figure 1 As shown, the methods include:

[0042] Step S1: Input the collected original text data, perform corpus cleaning, and obtain corpus in a unified format;

[0043] Specifically, this part is implemented based on the data module, which includes requirements analysis, text preparation,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a word vector-based high-efficiency semantic expansion retrieval method and device and a storage medium. The method comprises the following steps: S1, carrying out corpus cleaning to obtain corpus in a uniform format; s2, performing word segmentation on the corpus to obtain vocabularies; s3, converting all vocabularies into word vectors by utilizing the trained word vector model; s4, establishing an index for the word vector by adopting a binary tree method based on the cosine distance; s5, receiving a retrieval keyword, converting the retrieval keyword into a word vector, and obtaining a synonym by using the established index; s6, carrying out word segmentation on the keywords; step S7, performing research and report retrieval on the original word, the synonym and the segmented word to obtain an original word retrieval result, a synonym retrieval result and a segmented word retrieval result; and S8, taking the research report which appears in the original word retrieval result, the synonym retrieval result and the word segmentation retrieval result at the same time as a highest priority result to be presented. Compared with the prior art, the method has the advantages of improving response speed and accuracy and the like.

Description

technical field [0001] The invention relates to the field of text retrieval, in particular to a high-efficiency semantic extended retrieval method, device and storage medium based on word vectors. Background technique [0002] Financial companies such as banks and securities firms will conduct research on specific industries, markets and policies and form research reports. Massive reports will consume a lot of time for practitioners to select suitable research reports, and strict searches for specific vocabulary may filter out reports with similar information. [0003] In this regard, some existing technologies provide fuzzy query or retrieval methods to supplement the retrieval results. Usually, in the existing technologies, the search vocabulary is supplemented, and the synonym derivation algorithm is applied to the search scene of the research report, thereby Expanding proprietary words not only ensures that the relevant content of the original search terms can be displa...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/31G06F16/33G06F40/247G06F40/289G06F40/30
CPCG06F16/316G06F16/3344G06F40/247G06F40/289G06F40/30
Inventor 何夏辉俞书浩王凯丽曹心怡陈昱莹
Owner BANK OF COMMUNICATIONS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products