Unnormalized language processing method base on web mining

A technology of standard language and processing method, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve problems such as non-standard language, achieve the effect of easy operation and solve the problem of non-standard language

Inactive Publication Date: 2010-06-30
张霄凯 +2
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Effectively solves the problem o...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Unnormalized language processing method base on web mining

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] The main purpose is to invent a method for dealing with non-standard languages ​​with minimal effort. Below is a further description of the present invention:

[0026] For the processing of typical non-standard language, the present invention adopts the pattern matching method based on the sequence covering algorithm. The specific implementation method is as follows: First, we need to deal with typical non-standard words. So in order to avoid being limited to a certain field, the data we collect cannot be concentrated in a certain field. For example, collect non-standard words related to this field in certain car forums or mobile phone forums. For the sake of fairness, the extracted data are all domain-independent. The following algorithm is employed to extract the rules identifying this non-canonical NIL.

[0027] 1) Training data set S, sen is an instance in S. The rule set R is initially empty. If the keyword contained in s is a non-standard word, it is marked ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an unnormalized language processing method base on web mining, which relates to the filed of computer data mining, in particular to the technology of the network emotion mining scheme. The invention discloses a method for processing the network unnormalized language, which belongs to the field of computer data mining. The method provides a method for processing the unnormalized language by using the minimized monitoring study. The types of the normal unnormalized language are simplified from six kinds into two disjoint kinds: the typical unnormalized language and the ambiguous unnormalized language. The invention provides a model matching algorithm based on the sequence coverage for the typical unnormalized language, and provides a classification algorithm based on the feather extraction for the ambiguous unnormalized language to process the ambiguous unnormalized language. Finally, the completely normalized written words can be obtained, so the subjective opinion type mining operation is convenient, and information such as motion, opinions, advices and the like can be perfectly extracted.

Description

technical field [0001] The invention relates to the field of computer data mining, in particular to the technology of network emotion mining schemes. Background technique [0002] In recent years, the Internet has had a very large number of users. With the help of the Internet platform, users often post some personal opinions and comments, that is, subjective texts that describe not complete facts, and their main content includes opinions, emotions and attitudes of individuals, groups, organizations, etc. Obviously, whether from other users or the company that produces a certain product, the user's point of view is very practical. It has good reference value and orientation. Therefore, it is very meaningful to deal with the text of the assertion or comment. Most of these comments come from the user's supervisory expression, within a certain range. There may be a new type of language: non-standard language. Non-normative language and noise are important features of subje...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27
Inventor 张霄凯杨帆史天艺尹航
Owner 张霄凯
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products