Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Suffix tree clustering method on basis of word segmentation and part-of-speech analysis

A technology of clustering method and word segmentation method, which is applied in the field of computer science, can solve the problems of long original documents and consuming large computing resources, and achieve the effect of reducing the processing dimension

Inactive Publication Date: 2013-07-31
BEIJING UNIV OF POSTS & TELECOMM
View PDF2 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The suffix tree clustering method still has some areas to be improved in the process of processing. For example, if the original document is too long, the computer will require more time for processing; there is a lot of redundant information in the text, and the computer needs to process these texts. Consumes a lot of computing resources
Different types of text have obvious differences in the structure of words, and it is obviously not the best choice to treat them in the same way

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Suffix tree clustering method on basis of word segmentation and part-of-speech analysis
  • Suffix tree clustering method on basis of word segmentation and part-of-speech analysis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0036] In order to illustrate "a suffix tree clustering method based on word segmentation and part-of-speech analysis", here is an example of processing and clustering the document "A.txt". Each of A.txt contains an introductory sentence about "basketball", the specific content is: "Basketball is a ball game played by two teams, and each team has 5 players."

[0037] The process of a suffix tree clustering method based on word segmentation and pa...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a suffix tree clustering method on basis of word segmentation and part-of-speech analysis. The method involves three parts, namely a document word segmenting module, a part-of-speech analyzing module and a suffix tree clustering module, accomplishes word segmentation, part-of-speech tagging, word weight calculation and essence extraction to a document, can realize dimension reduction process to an original document, reduces the complexity of the processing procedure of the suffix tree clustering method, and meanwhile, ensures the clustering accuracy.

Description

technical field [0001] The invention relates to a suffix tree clustering method based on word segmentation and part-of-speech analysis applied to search engines, and belongs to the technical field of computer science. Background technique [0002] With the continuous development of information technology, the data on the network is increasing at an alarming rate, and people's demand for network content is also increasing. Network content search has become the most widely used Internet service at present. The search engine is the main channel for network content search, and all countries are developing search engines with independent intellectual property rights, and are constantly conducting research on key technologies of search engines. [0003] The content on the Internet involves all aspects, and there is a large amount of unorganized and uncategorized information, which has caused certain difficulties for people who want to quickly obtain information on specific aspects...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06F17/30
Inventor 陆月明张吉伟党秋月
Owner BEIJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products