Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Text information clustering method and text information clustering system

A text information and clustering method technology, applied in the field of text information clustering method and text information clustering system, can solve the problems of occupying a lot of computing resources, affecting results, slow computing, etc., to speed up computing speed and reduce system resources. Consumption, the effect of speeding up computing efficiency

Inactive Publication Date: 2017-09-05
ALIBABA GRP HLDG LTD
View PDF5 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The existing text information clustering analysis will be slow to calculate and occupy too many computing resources when the number of topics increases. However, if the number of topics is limited, articles under different topics will be mixed together. impact on the end result
[0004] Therefore, it is necessary to propose a new text information clustering technology to solve the problems of slow calculation and excessive computing resources in the existing technology.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text information clustering method and text information clustering system
  • Text information clustering method and text information clustering system
  • Text information clustering method and text information clustering system

Examples

Experimental program
Comparison scheme
Effect test

no. 1 example

[0034] The first embodiment of the present application proposes a text information clustering method, such as figure 2 Shown is a flowchart of a text information clustering method according to an embodiment of the present application. The text information clustering method of the first embodiment of the present application includes the following steps:

[0035] Step S101, performing word segmentation processing on each piece of text information in multiple pieces of text information to form multiple words;

[0036] In this step, word segmentation processing may be performed on each piece of text information first. For example, "Python is an object-oriented, interpreted computer programming language" can be divided into "Python / is / a / oriented / object / interpretation / type / computer / program / design / language".

[0037] Through this step of processing, a sentence is divided into several words, which is convenient for subsequent processing operations.

[0038] In this step, the words...

no. 2 example

[0051] The second embodiment of the present application proposes a text information clustering method, such as image 3 Shown is a flow chart of the text information clustering method according to the second embodiment of the present application. The text information clustering method of the second embodiment of the present application includes the following steps:

[0052] Step S201, performing word segmentation processing on each piece of text information in multiple pieces of text information to form multiple words;

[0053] Step S202, using the LDA algorithm to initially cluster the multiple pieces of text information after word segmentation according to the multiple words to form multiple first-level topics, and each of the first-level topics includes at least two pieces of text information;

[0054] Step S203, according to preset rules, based on the number of text information under each of the first-level topics, determine the number of second-level topics under each of...

no. 3 example

[0077] The third embodiment of the present application proposes a text information clustering method, such as Figure 4 Shown is a flow chart of the text information clustering method according to the third embodiment of the present application. The text information clustering method of the third embodiment of the present application includes the following steps:

[0078] Step S301, performing word segmentation processing on each piece of text information in multiple pieces of text information to form multiple words;

[0079] Step S302, using the LDA algorithm to perform initial clustering on the plurality of pieces of text information after word segmentation according to the plurality of words to form a plurality of first-level topics, each of which includes at least two pieces of text information;

[0080] Step S303, according to preset rules, based on the number of text information under each of the first-level topics, determine the number of second-level topics under each...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses a text information clustering method and system. The clustering method comprises the following steps that each piece of text information in multiple pieces of text information undergoes word segmentation processing; primary clustering is conducted on the multiple pieces of text information undergoing segmentation processing to form multiple primary schemes, wherein each primary scheme includes at least two pieces of text information; according to preset rules, the number of secondary schemes under the primary schemes is determined based on the number of text information under the primary schemes, secondary clustering is conducted on the at least two pieces of text information included in the primary schemes according to the number of the secondary schemes under the primary schemes to form multiple secondary schemes. By adopting the text information clustering method, the total number of the primary schemes is decreased during primary clustering, the calculating efficiency is improved, the number of the secondary scheme is dynamically determined according to the text formation during secondary clustering, and the secondary scheme calculating speed is improved.

Description

technical field [0001] The present application relates to the field of text processing, in particular to a text information clustering method and a text information clustering system. Background technique [0002] Text clustering of text information according to corresponding topics has a very important application in the field of text processing. However, due to the wide coverage of text information and the huge amount of text information generated every day, it is very important to carry out large-scale text clustering analysis. Significance. [0003] The existing text information clustering analysis will be slow to calculate and occupy too many computing resources when the number of topics increases. However, if the number of topics is limited, articles under different topics will be mixed together. impact on the end result. [0004] Therefore, it is necessary to propose a new text information clustering technology to solve the problems of slow calculation and excessive...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/35G06F40/289G06F40/53G06V30/414G06F18/23
Inventor 付子豪张凯蔡宁杨旭褚崴
Owner ALIBABA GRP HLDG LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products