POI (point of interest) Chinese text categorizing method based on local random word density model

A technology of vocabulary density and text classification, which is applied in text database clustering/classification, unstructured text data retrieval, special data processing applications, etc., and can solve problems such as inability to obtain better classification performance

Inactive Publication Date: 2014-02-26
段炼
View PDF3 Cites 26 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The purpose of the embodiments of the present invention is to provide a method for classifying Chinese text based on the local stochastic vocabulary density model POI, wh

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • POI (point of interest) Chinese text categorizing method based on local random word density model
  • POI (point of interest) Chinese text categorizing method based on local random word density model
  • POI (point of interest) Chinese text categorizing method based on local random word density model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0088] In order to make the object, technical solution and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with the examples. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0089] The application principle of the present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0090] Such as figure 1 As shown, the method for Chinese text classification based on the local random vocabulary density model POI of the embodiment of the present invention comprises the following steps:

[0091]S101: Use a Bayesian classifier to determine whether the text topic is related to POI, and use the improved vocabulary concentration, dispersion and frequency methods to filter out feature words to construct a feature space;

[0092] S102: divide the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a POI (point of interest) Chinese text categorizing method based on a local random word density model. The method includes: judging whether the subject of a text is related to POI or not, and using a modified word concentration, dispersion and frequency method to screen out feature words to build a feature space; performing local area dividing according to the similarity degree of the text and POI category, converting the text into feature vectors in each local area through feature mapping matrixes, and using SVM to perform POI text categorization. The method is good in execution efficiency and categorizing coverage and accuracy. The method is planned to be combined with large knowledge libraries of Hownet so as to capture semantic concepts of low-frequency words and unseen words and further increase the recognition capability of POI text differences, and the problem that good categorization performance cannot be obtained by existing conventional feature evaluation functions and text dimensionality reduction methods is solved.

Description

technical field [0001] The invention belongs to the technical field of mass interest point text classification, and in particular relates to a method for classifying Chinese text based on a local random vocabulary density model POI. Background technique [0002] The traditional method of collecting massive Point of Interest (POI) is the field survey of surveying and mapping departments at all levels. The data collected by this method has high precision, but the collection efficiency is low, the information update speed is slow, and the coverage is often insufficient. In addition, there are two methods of POI collection: professional company production and VGI public collection (such as Go2Map and Locationary). The former still has not solved the problem of requiring a large number of manual POI labels, resulting in insufficient data depth and difficulty in updating; the latter has a large number of multiple sources. POI data management, platform heterogeneity, service popul...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/35
Inventor 段炼胡宝清覃开贤
Owner 段炼
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products