Document clustering method and platform, server and computer readable medium

A document clustering and document technology, applied in the computer field, can solve the problems of low accuracy and recall rate of fine-grained news event detection

Active Publication Date: 2021-02-09
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In the existing technologies, the LDA+ (Latent Dirichlet Allocation, document topic generation) model or the KeyGraph algorithm is used to realize the document clustering of news documents, but the above-mentioned methods are easy to cluster large event news document clusters, which leads to the accuracy of fine-grained news event detection. and lower recall

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Document clustering method and platform, server and computer readable medium
  • Document clustering method and platform, server and computer readable medium
  • Document clustering method and platform, server and computer readable medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0073] In order for those skilled in the art to better understand the technical solution of the present disclosure, the document clustering method, platform, server and computer-readable medium provided by the present disclosure will be described in detail below with reference to the accompanying drawings.

[0074] Example embodiments will be described more fully hereinafter with reference to the accompanying drawings, but may be embodied in different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

[0075] The terminology used herein is for describing particular embodiments only and is not intended to limit the present disclosure. As used herein, the singular forms "a" and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The disclosure provides a document clustering method, including: constructing a word co-occurrence network according to multiple documents to be clustered; calculating the link similarity between any two links connecting the same node in the word co-occurrence network; according to The link similarity extracts a plurality of keyword communities from the word co-occurrence network; according to the document representation vector of each document to be clustered and the community representation vector of each keyword community, each document to be clustered Assign to corresponding keyword communities, and generate initial document clusters corresponding to each keyword community according to the assignment results, wherein all documents to be clustered in the same keyword community constitute an initial document cluster. The present disclosure also provides a document clustering platform, a server and a computer-readable medium.

Description

technical field [0001] The present disclosure relates to the field of computer technology, in particular, to a document clustering method and platform, a server, and a computer-readable medium. Background technique [0002] An event refers to something that happened in a certain place on a certain day. Many events occur and are reported every day in the world, and a large amount of Internet information news is generated from this. Clustering the news documents of massive internet news every day to automatically detect fine-grained news events in real time (such as daily-level news events), which can help and support public opinion analysis, realize news recommendation, or be used in articles The role of automatic writing. [0003] In the existing technologies, the LDA+ (Latent Dirichlet Allocation, document topic generation) model or the KeyGraph algorithm is used to realize the document clustering of news documents, but the above-mentioned methods are easy to cluster large...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06K9/62
CPCG06F18/23G06F18/22
Inventor 陈亮宇郭林森肖欣延吕雅娟佘俏俏
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products