Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

A Context-Based Abstract Sample Information Retrieval System

An information retrieval and context technology, which is applied in the field of information retrieval, can solve the problems of the characteristic representation of word vector formation samples and the extraction of word meaning features, so as to improve the accuracy, improve the accuracy, and expand the construction method.

Active Publication Date: 2019-08-09
长源动力(北京)科技有限公司
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The purpose of the present invention is to overcome the situation in the prior art that it is difficult to form a characteristic representation of a sample according to the word vector of Word2vector, and solve the problem of word meaning feature extraction in the characteristic representation of an abstract sample

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Context-Based Abstract Sample Information Retrieval System
  • A Context-Based Abstract Sample Information Retrieval System
  • A Context-Based Abstract Sample Information Retrieval System

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0027] Such as figure 1 As shown, the content is a context-based abstract sample information retrieval system of the present invention, including a word segmentation function module, a word meaning feature extraction module, an abstract word feature substitution representation module, an ST-IDF module and a classification module.

[0028] The abstract sample characterization method of the abstract sample information retrieval system includes the following steps:

[0029] Step 1: Use the word segmentation function module to segment the abstract words of the sample. When the sample completely uses abstract words to record information, it is impossible to segment the abstract words in the sample according to the dictionary or thesaurus. Therefore, this step only treats the abstract word as a string of ASCII characters. When the sample is a data link mes...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention proposes a context-based abstract sample information retrieval system. In this system, the feature representation method of abstract samples uses Word2vector to extract word meaning features to obtain the word vectors of abstract words; then, the word vectors of abstract words are clustered by "optimal fitness division", and the abstract words are divided according to the clustering results. The alternative representation is the cluster centroid; finally, according to the centroid and the word frequency of the abstract word it represents, a word vector cluster centroid frequency model (ST-IDF) is formed, which is used to characterize and represent abstract samples. The invention reduces the execution times of clustering and fitness degree calculation, improves the performance of abstract sample similarity analysis, and improves the accuracy rate of sample classification.

Description

technical field [0001] The invention relates to the field of information retrieval of data link messages, semi-structured texts or ordinary texts, in particular to sample similarity analysis and classification based on word vector (Word2vector). Background technique [0002] Abstract words refer to special words in information retrieval samples that cannot be directly interpreted by language, that is, no known language rules (word meaning, grammar, word order) can directly identify their actual semantics. A large number of abstract words exist in information retrieval samples to varying degrees, such as military data link messages (Link-16, Link-22), semi-structured text (XML) or ordinary text for data exchange. At the same time, there are a large number of data link messages, semi-structured texts or ordinary texts that completely use abstract words to record information. For this situation, we call such messages or texts in information retrieval tasks abstract samples. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/332G06F16/35G06K9/62
CPCG06F16/3329G06F16/355G06F18/23213
Inventor 吴琳韩广袁鑫攀李亚楠
Owner 长源动力(北京)科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products