Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Semantic-based document clustering method and system and computer equipment

A document clustering and document technology, applied in the field of artificial intelligence, can solve the problem of low clustering accuracy

Active Publication Date: 2020-09-18
PING AN BANK CO LTD
View PDF9 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In view of this, the purpose of the embodiments of the present invention is to provide a semantic-based document clustering method, system, computer equipment, and computer-readable storage medium to solve the problem of low clustering accuracy existing in the prior art. question

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Semantic-based document clustering method and system and computer equipment
  • Semantic-based document clustering method and system and computer equipment
  • Semantic-based document clustering method and system and computer equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0058] see figure 1 , shows a flow chart of steps of a semantic-based document clustering method according to an embodiment of the present invention. It can be understood that the flowchart in this method embodiment is not used to limit the sequence of execution steps.

[0059] details as follows:

[0060] Step 100 obtains an input document and preprocesses the input document to obtain a processed input document;

[0061] Specifically, a document represents an article or a sentence or paragraph around a central meaning, and the input document can be one document or multiple documents.

[0062] The processor pulls the input document along the preset path information, or it can set a cache area, the server puts the newly uploaded document into the cache area, and the processor in the server periodically extracts the cache area according to the set pull frequency All the documents in the document are used as input documents to perform a series of preprocessing to complete the ...

Embodiment 2

[0165] see Figure 6 , shows a schematic diagram of program modules of the semantic-based document clustering system of the present invention.

[0166] In this embodiment, the semantic-based document clustering system 20 may include or be divided into one or more program modules, and one or more program modules are stored in a storage medium and executed by one or more processors , to complete the present invention, and realize the above-mentioned semantic-based document clustering method. The program module referred to in the embodiment of the present invention refers to a series of computer program instruction segments capable of completing specific functions, which is more suitable for describing the execution process of the semantic-based document clustering system 20 in the storage medium than the program itself. The following description will specifically introduce the functions of each program module of the present embodiment:

[0167] The preprocessing module 200 is ...

Embodiment 3

[0177] refer to Figure 7 , is a schematic diagram of the hardware architecture of the computer device according to Embodiment 3 of the present invention. In this embodiment, the computer device 2 is a device that can automatically perform numerical calculation and / or information processing according to preset or stored instructions. The computer device 2 may be a personal digital assistant (Personal Digital Assistant, PDA), a smart phone, a notebook computer, a netbook, a personal computer and other similar devices. As shown in the figure, the computer device 2 at least includes, but is not limited to, a memory 21, a processor 22, a network interface 23, and a semantic-based document clustering system 20 that can communicate with each other through a system bus. in:

[0178] In this embodiment, the memory 21 includes at least one type of computer-readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of artificial intelligence, and provides a semantic-based document clustering method, which comprises the steps of obtaining an input document and preprocessing the input document; performing word frequency statistics and inverse document frequency calculation on each word contained in the processed input document to construct a word frequency-inverse document matrix; inputting words adopted in word frequency statistics into a pre-stored natural language processing model as objects to obtain a similarity matrix matched with the word frequency-inverse document matrix; performing semantic propagation on the word frequency-inverse document matrix according to the similarity matrix to obtain a second word frequency-inverse document matrix; and performing bidirectional clustering on the second word frequency-inverse document matrix to obtain at least one biclustering cluster, the biclustering cluster comprising a document cluster and a word cluster, and performing label endowing on each document in the document cluster according to feature words contained in the word cluster. According to the invention, the problem of low accuracy of a document clustering result in the prior art is solved. The invention also relates to the field of blockchains, wherein the natural language processing model can be stored on the blockchain.

Description

technical field [0001] Embodiments of the present invention relate to the technical field of artificial intelligence, and in particular to a semantic-based document clustering method, system, computer equipment, and computer-readable storage medium. Background technique [0002] With the popularization of the Internet and the rapid growth of information, we are faced with more and more text information processing every day, so it is easy to be submerged in the ocean of information. With the development of artificial intelligence, people are increasingly demanding personalized services. How to provide people with personalized services from massive text information, so as to provide people with better services and experiences is a major challenge. To achieve this The basis of one challenge is to be able to automatically classify text. Text clustering technology has been widely used in news personalized recommendation, text sentiment analysis, text information filtering and so...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F16/35G06F40/284G06F40/289G06F40/30
CPCG06F16/3344G06F16/35G06F16/355G06F40/284G06F40/289G06F40/30
Inventor 余显学
Owner PING AN BANK CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products