Text similarity ordering method based on ES search

A text similarity and sorting method technology, which is applied in the field of text similarity sorting based on ES search, can solve the problem of being unable to sort texts with the same words but different orders, and achieve accurate results

Inactive Publication Date: 2018-04-20
SICHUAN CHANGHONG ELECTRIC CO LTD
View PDF4 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The technical problem to be solved by the present invention is to propose a text similarity sorting method based on ES search, which calculates the degree of similarity between texts by taking the position order between text words as an algorithm consideration factor, so as to solve the problem that ES search cannot The problem of sorting texts with the same word but different order, improve the accuracy of ES text similarity sorting

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text similarity ordering method based on ES search
  • Text similarity ordering method based on ES search
  • Text similarity ordering method based on ES search

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] The present invention aims to propose a text similarity sorting method based on ES search, which calculates the degree of similarity between texts by taking the position order between text words as an algorithm consideration factor, so as to solve the problem that ES search cannot sort the same words with different orders. For text problems, improve the accuracy of ES text similarity sorting.

[0020] The present invention uses the default TF-IDF model of ES to calculate the similarity value between the query text and the database text, and searches and matches a certain number of text sets with high scores, text word segmentation, text vectorization representation, and calculation by cosine similarity The degree of similarity between texts and reordering. Such as figure 1 As shown, it specifically includes the following steps:

[0021] a) Use the ES default TF-IDF model to search the database to obtain a certain number of text data sets. as follows:

[0022] Suppos...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of big data and discloses a text similarity ordering method based on ES search. According to the method, the position order of text words is considered inan algorithm to calculate similarity degrees among texts, therefore, the problem that texts with identical words and non-identical serial numbers cannot be ordered through ES search is solved, and theaccuracy of ES text similarity ordering is improved. The method comprises the steps that a, ES preliminary search is performed to obtain a similar text set; b, text word segmentation is performed toobtain a segmented word set; c, on the basis of the segmented word set, vectorized expression is performed on the texts after word segmentation; d, the similarity degrees among text vectors are measured through cosine similarities; and e, similarity reordering is performed on the texts according to cosine similarity values.

Description

technical field [0001] The invention belongs to the technical field of big data, and in particular relates to a text similarity sorting method based on ES search. Background technique [0002] As a real-time distributed search and analysis engine, ES can be used for full-text search, structured search and analysis. Its working principle is based on indexing the text, by segmenting the query text, calculating the frequency of words appearing in the database text, and the number of texts containing the word in the database text, and then through the calculation formula of the TF-IDF model Calculate the similarity value between the text in the database and the query text, and quickly return the search results in descending order according to the similarity value. [0003] Due to the fast search speed of ES, its application fields are also increasing. However, the ES default scoring rule adopts the TF-IDF algorithm model, which uses word frequency as the basic unit of text sim...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/22G06F17/27G06F17/30
CPCG06F16/30G06F40/194G06F40/289
Inventor 文杰锋刘楚雄
Owner SICHUAN CHANGHONG ELECTRIC CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products