Check patentability & draft patents in minutes with Patsnap Eureka AI!

Semantic analysis based text clustering system and method

A technology of semantic analysis and text clustering, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve the problems of weak means, low information utilization, difficult information search, etc., to simplify the workload and achieve ideal results , to avoid repeated effects

Inactive Publication Date: 2014-12-03
ANHUI HUAZHEN INFORMATION SCI & TECH
View PDF4 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] In recent years, with the large-scale popularization of the network and the improvement of enterprise informatization, various resources have grown explosively. However, most of the information is stored in text databases. For such semi-structured or unstructured data, The means to obtain specific content information are weak, resulting in difficulty in information search and low utilization rate of information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Semantic analysis based text clustering system and method
  • Semantic analysis based text clustering system and method
  • Semantic analysis based text clustering system and method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] refer to figure 1 , a semantic analysis-based text clustering system proposed by the present invention includes a preprocessing module, a semantic analysis module, a vector generation module and a clustering module connected in sequence.

[0035] The preprocessing module is used for Chinese word segmentation and stop word filtering on the text.

[0036] The semantic analysis module is used for semantic similarity analysis and feature item weight calculation, extracting keyword feature items, normalizing text, and laying the foundation for text vectorization. The semantic analysis module has built-in ontology and entity dictionary. Ontology is used for semantic analysis of text, the basic unit of ontology is concept, concept constitutes concept tree, and concept tree constitutes ontology. Textual conceptualization solves the problem of polysemy or polysemy. The entity dictionary is used to extract entities from the text, so as to discard the content that has no practi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a semantic analysis based text clustering system and method. Texts are clustered according to semantic analysis, the workload of a clustering algorithm is reduced, the working efficiency is improved, and the clustering result is more ideal. The semantic analysis based text clustering system provided by the invention comprises a preprocessing module, a semantic analysis module, a vector generation module and a clustering module which are connected sequentially, wherein the preprocessing module is used for performing Chinese word segmentation and stop word filtration on the texts; the semantic analysis module is used for performing semantic similarity analysis and characteristic item weight calculation, extracting characteristic items of keywords and normalizing the texts, a body and an entity dictionary are arranged in the semantic analysis module, the body is used for performing semantic analysis on the texts, the entity dictionary is used for performing entity extraction on the texts, the basic composition unit of the body is concept, and concepts form a concept tree which forms the body; and a vector space model is arranged in the vector generation module, and the vector generation module is used for vectoring the texts.

Description

technical field [0001] The invention relates to the technical field of text information processing, in particular to a text clustering system and method based on semantic analysis. Background technique [0002] In recent years, with the large-scale popularization of the network and the improvement of enterprise informatization, various resources have grown explosively. However, most of the information is stored in text databases. For such semi-structured or unstructured data, The means to obtain specific content information are weak, resulting in difficulty in information search and low utilization rate of information. As a result, research on text mining, information filtering and information retrieval has reached an unprecedented climax. Fast and high-quality text clustering technology can form a large amount of text information into a small number of meaningful clusters, and make the text information in the same cluster have a high degree of similarity, while the texts i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06F17/30
Inventor 贾岩
Owner ANHUI HUAZHEN INFORMATION SCI & TECH
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More