Chinese interword weighing positive and negative mode excavation method and system based on relevant coefficients

A technology of correlation coefficient and pattern mining, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the situation of not considering the different importance of items with different weights, redundancy, and the inability to mine matrix weighted negative correlations model etc.

Inactive Publication Date: 2014-12-17
GUANGXI UNIVERSITY OF FINANCE AND ECONOMICS
View PDF4 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The disadvantage of unweighted positive and negative association pattern mining is that it does not consider the different importance of items and the different weights of items in the transaction database, resulting in a large number of invalid, redundant and uninteresting association patterns.
The defect of item weighted positive and negative association mode mining is that it ignores the case that item weights have different weights in the transaction database
These methods effectively mine matrix-weighted association rules, but cannot mine matrix-weighted negative association patterns

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese interword weighing positive and negative mode excavation method and system based on relevant coefficients
  • Chinese interword weighing positive and negative mode excavation method and system based on relevant coefficients
  • Chinese interword weighing positive and negative mode excavation method and system based on relevant coefficients

Examples

Experimental program
Comparison scheme
Effect test

example

[0116] Example: The following formula is an example of a Chinese text database, there are 5 Chinese document records and 5 feature word items and their weights, that is, the document set is {d 1 , d 2 , d 3 , d 4 , d 5}, the set of feature words is {i 1 , i 2 , i 3 , i 4 , i 5} = {program, queue, function, environment, member}.

[0117]

[0118] Adopt mining method of the present invention to this Chinese document data example excavation Chinese feature word matrix weighted positive and negative correlation pattern, its excavation process is as follows (ms=0.15, mc=0.3, mFr=0.3, mNr=0.12, mi=0.26, =0.1):

[0119] 1. Mining matrix weighted feature word frequent 1_itemset L 1 , as shown in Table 1, where n=5.

[0120] C 1 w(C 1 ) mwS(C 1 ) (i 1 ) 2.8 0.56 (i 2 ) 0.55 0.11 (i 3 ) 2.6 0.52 (i 4 ) 0.92 0.184 (i 5 ) 0.84 0.168

[0121] It can be seen from Table 1 that L 1 ={(i 1 ), (i 3 ), (i 4 ), (i 5 )},

[0...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Chinese interword weighing positive and negative mode excavation method and a Chinese interword weighing positive and negative mode excavatiom system based on relevant coefficients. The method comprises the steps of preprocessing a Chinese text through a Chinese text information preprocessing module; generating a feature word candidate 1-item set through a Chinese feature word candidate item set generation module, generating a candidate i-item set according to a candidate (i-1)-item set from an i item set (i is greater than or equal to 2), calculating the supporting degree of the candidate i-item set, obtaining a frequency item set and a negative item set, and performing item set cutting according to the association of the item sets to obtain an interesting feature word frequent item set and a negative item set; calculating an association rule interesting degree and a confidence coefficient through a Chinese feature word positive and negative association rule generation and result display module, and displaying an interesting feature word positive and negative association rule mode excavated from the frequent item set and the negative item set to a user. According to the method and the system, appearance of ineffective and uninteresting Chinese feature word association modes can be avoided, and the excavation efficiency is greatly improved; the association rule mode is applied to the field of Chinese text information retrieval, so that inquiry expansion is realized; the information retrieval and inquiry performance is improved.

Description

technical field [0001] The invention belongs to the field of text mining, and specifically relates to a correlation coefficient-based Chinese inter-word weighted positive and negative pattern mining method and its mining system, which is applicable to the discovery of feature word association patterns in Chinese text mining and Chinese text information retrieval query expansion, cross-language information retrieval etc. The positive and negative association mode of the feature words of the present invention is applied to web search engines such as Baidu and Google to realize query expansion, which helps to improve their query performance and meet the needs of users for querying information. Background technique [0002] In the past 20 years, remarkable achievements have been made in association pattern mining research, which can be summarized as unweighted positive and negative association pattern mining technology, weighted positive and negative association pattern mining t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/30
Inventor 黄名选兰慧红
Owner GUANGXI UNIVERSITY OF FINANCE AND ECONOMICS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products