News sentence clustering method, device and storage medium based on semantic similarity

A technology of semantic similarity and clustering method, applied in the field of news sentence clustering method, device and storage medium based on semantic similarity, can solve the problems of ignoring the importance of word elements and difficult to overcome the quality and efficiency of clustering, etc. To achieve the effect of efficient clustering

Active Publication Date: 2021-07-16
PING AN TECH (SHENZHEN) CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The current commonly used news clustering methods often ignore the importance of word elements when calculating and comparing sentence similarity. The existence of a large number of polysemous words makes it easy for these clustering methods to gather different types of information together.
There are also some news clustering methods that take into account the role of semantic information behind the surface text information of keywords, and use the knowledge platform with rich semantic concepts as an intermediate reference space to calculate the similarity of news sentences, but it is difficult to overcome the gap between clustering quality and efficiency. contradiction

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • News sentence clustering method, device and storage medium based on semantic similarity
  • News sentence clustering method, device and storage medium based on semantic similarity
  • News sentence clustering method, device and storage medium based on semantic similarity

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] The principle and spirit of the present invention will be described below with reference to several specific embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0037] Those skilled in the art know that the embodiments of the present invention can be implemented as a method, device, device, system or computer program product. Therefore, the present invention can be embodied as complete hardware, complete software (including firmware, resident software, microcode, etc.), or a combination of hardware and software.

[0038] According to an embodiment of the present invention, a method, device and storage medium for classifying news sentences based on semantic similarity are proposed.

[0039] refer to figure 1 As shown, it is a schematic diagram of the operating environment of the preferred embodiment of the electronic device of the present invention.

[0...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for clustering news sentences based on semantic similarity, the method comprising the following steps: preprocessing the news sentences of the corpus to extract available words; using the available words to train the continuous bag-of-words model, Obtain the initial word vector of each available word; Utilize the initial sentence vector of each news sentence and the initial word vector of the left and right adjacent available words of a certain available word in this news sentence to iteratively train the described continuous word bag model, obtain the The current word vector of each available word in the news sentence and the final sentence vector of the news sentence; the average value of the word vectors of all available words of each news sentence, the one-hot vector of high-frequency words and the final sentence vector are merged, The semantic vector of the news sentence is obtained; the distance between the semantic vectors is calculated to obtain the semantic similarity between different news sentences, and the news sentences of the corpus are clustered accordingly. The invention also provides an electronic device and a computer-readable storage medium.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a news sentence clustering method, device and storage medium based on semantic similarity. Background technique [0002] With the development of Internet technology, the amount of information owned by humans has shown explosive growth, and the amount of news text data has also increased rapidly. In the face of a huge amount of news corpus, clustering sentences can facilitate the inductive analysis of similar news and realize the comprehensive utilization of news data. [0003] The current commonly used news clustering methods often ignore the importance of word elements when calculating and comparing sentence similarities. The existence of a large number of polysemous words makes it easy for these clustering methods to gather different types of information together. There are also some news clustering methods that take into account the role of semantic information behind the s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F40/30G06F40/284G06F40/289
CPCG06F16/358G06F40/284G06F40/289G06F40/30
Inventor 徐冰汪伟肖京
Owner PING AN TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products