Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

News sentence clustering method based on semantic similarity, device and storage medium

A technology of semantic similarity and clustering method, applied in devices and storage media, in the field of clustering method of news sentences based on semantic similarity, can solve problems such as ignoring the importance of word elements, difficulty in overcoming clustering quality and efficiency, etc. To achieve the effect of efficient clustering

Active Publication Date: 2018-02-09
PING AN TECH (SHENZHEN) CO LTD
View PDF4 Cites 28 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The current commonly used news clustering methods often ignore the importance of word elements when calculating and comparing sentence similarity. The existence of a large number of polysemous words makes it easy for these clustering methods to gather different types of information together.
There are also some news clustering methods that take into account the role of semantic information behind the surface text information of keywords, and use the knowledge platform with rich semantic concepts as an intermediate reference space to calculate the similarity of news sentences, but it is difficult to overcome the gap between clustering quality and efficiency. contradiction

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • News sentence clustering method based on semantic similarity, device and storage medium
  • News sentence clustering method based on semantic similarity, device and storage medium
  • News sentence clustering method based on semantic similarity, device and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] The principle and spirit of the present invention will be described below with reference to several specific embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0037] Those skilled in the art know that the embodiments of the present invention can be implemented as a method, device, device, system or computer program product. Therefore, the present invention can be embodied as complete hardware, complete software (including firmware, resident software, microcode, etc.), or a combination of hardware and software.

[0038] According to an embodiment of the present invention, a method, device and storage medium for classifying news sentences based on semantic similarity are proposed.

[0039] refer to figure 1 As shown, it is a schematic diagram of the operating environment of the preferred embodiment of the electronic device of the present invention.

[0...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a news sentence clustering method based on semantic similarity. The method includes the following steps: preprocessing news sentences of a corpus, and extracting available words; utilizing the available words to train a continuous bag-of-words model to obtain an initial word vector of each available word; utilizing an initial sentence vector of each news sentence and the initial word vectors of the left and right adjoining available words of a certain available word in the news sentence to train the continuous bag-of-words model in an iterative manner to obtain a currentword vector of each available word in the news sentence and a final sentence vector of the news sentence; merging an average value of the word vectors of all the available words, one-hot vectors of high-frequency words and the final sentence vector of each news sentence to obtain a semantic vector of the news sentence; and calculating distances between the semantic vectors to obtain the semanticsimilarity between the different news sentences, and clustering the news sentences of the corpus in accordance therewith. The invention also provides an electronic device and a computer-readable storage medium.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a news sentence clustering method, device and storage medium based on semantic similarity. Background technique [0002] With the development of Internet technology, the amount of information owned by humans has shown explosive growth, and the amount of news text data has also increased rapidly. In the face of a huge amount of news corpus, clustering sentences can facilitate the inductive analysis of similar news and realize the comprehensive utilization of news data. [0003] The current commonly used news clustering methods often ignore the importance of word elements when calculating and comparing sentence similarities. The existence of a large number of polysemous words makes it easy for these clustering methods to gather different types of information together. There are also some news clustering methods that take into account the role of semantic information behind the s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/358G06F40/284G06F40/289G06F40/30
Inventor 徐冰汪伟肖京
Owner PING AN TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products