High average utility sequence pattern mining method under non-overlapping condition

A non-overlapping, pattern-free technology, applied in data mining, special data processing applications, instruments, etc., can solve problems such as large length, no decision-making significance for stores, and high cost of time and space, and achieve the effect of satisfying practical problems

Pending Publication Date: 2020-07-31
HEBEI UNIV OF TECH
View PDF19 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Yao et al published the document "A foundational approach to mining itemset utilities from databases." For the first time, they proposed the definition and mathematical model of high-utility pattern mining, and at the same time proposed to judge whether the pattern may be a high-utility pattern based on the estimated value of the pattern. In some cases, this method will generate a large number of candidate sets, and the time and space cost of mining is relatively high; Erwin et al. proposed a tree-based The structure mining method has better performance in the case of dense data sets; the literature "Aefficient algorithm for high utility itemset mining" and "Efficient algorithms for mining high utility itemsets from transactional databases" published by Tseng et al. designed a UP-tree-based structure The model growth method, compresses and stores data, and designs pruning strategies to reduce time and space overhead
In the above method, the utility of the pattern will increase with the increase of its length, so only considering the overall utility of the pattern still has many shortcomings. Higher profits will be obtained, but such a product combination has no decision-making significance for the store
[0016] CN110399406A discloses a method, device and computer-readable storage medium for mining global high-utility sequential patterns. The method adopts a linked list data structure to mine global high-utility sequential patterns. Its existence only considers the global utility of the pattern, and does not compare the length of the pattern with its Combined, this will lead to the defects of patterns that are too large in length and contain useless items in the mining results; CN109101530A discloses a high-utility event sequence pattern mining algorithm, and reports the high-utility sequence pattern mining of security events. The cumulative sum of item transaction attributes is greater than or equal to a given threshold, which belongs to the high-utility mode. It does not consider that multiple accumulations of multiple transactions with less impact may also lead to high utility. For example, the high-utility mode contains Defects in affairs; CN108733705A discloses a high-utility sequence pattern mining method and device, researching high-utility sequence pattern mining in commodity sales, which does not consider the number of occurrences of patterns in the sequence and the total profit of commodity combinations. High but very little Deficiency of user needs
[0017] In a word, the prior art research on high-utility pattern mining has the defect that it is difficult to reduce the number of candidate patterns when the pattern utility value does not conform to the downward closure characteristic.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • High average utility sequence pattern mining method under non-overlapping condition
  • High average utility sequence pattern mining method under non-overlapping condition
  • High average utility sequence pattern mining method under non-overlapping condition

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0079] Given a piece of DNA sequence S=s 1 the s 2 the s 3 the s 4 the s 5 the s 6 the s 7 the s 8 the s 9 the s 10 the s 11 the s 12 the s 13 = ATTCATCACATCA, the cycle gap is [0, 3], given the minimum average utility threshold minun = 25, the utility value of each item in the character set is as follows image 3 shown.

[0080] The first step is to read the sequence database SDB, the minimum gap min, the maximum gap max and the minimum average utility threshold minun:

[0081] Read into the given sequence database SDB, which contains 1 sequence S=s 1 the s 2 the s 3 the s 4 the s 5 the s 6 the s 7 the s 8 the s 9 the s 10 the s 11 the s 12 the s13 = ATTCATCACATCA, character set is {A, T, C}, minimum gap min=0, maximum gap max=3 and minimum average utility threshold minun=25.

[0082] The second step is to generate a high average utility pattern set and a high upper bound pattern set with a length of 1:

[0083] Calculate the average utility value an...

Embodiment 2

[0138] Given a piece of DNA sequence S=s 1 the s 2 the s 3 the s 4 the s 5 the s 6 the s 7 the s 8 the s 9 the s 10 the s 11 the s 12 =ATTCATCACATC, the cycle gap is [0, 3], given the minimum average utility threshold minun=25, the utility value of each item in the character set is as follows image 3 shown.

[0139] "The sixth step, when the high upper bound pattern set of length i+1 is empty, the high average utility sequential pattern mining ends, and the seventh step is executed.

[0140] Because in the fifth step, the pattern set with a high upper bound of length 5 is empty, so the high average utility sequential pattern mining ends. "

[0141] Except above-mentioned difference, other is with embodiment 1.

[0142] In the foregoing embodiment, the programming software used is VC++6.0, and the drawing tool is Visio2015, and the processor used is Pentium (R) Dual-Core 32Processor+, and the operating system is Windows7 and above versions, and the above software...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a high average utility sequence pattern mining method under a non-overlapping condition. The invention relates to the technical field of electric digital data processing. According to the method, a candidate set is generated by using a pattern growth method, the average utility value of the candidate patterns is quickly calculated on line by using a queue data structure, and a high utility sequence pattern can be mined under a non-overlapping condition. The defect that the number of candidate patterns is difficult to reduce under the condition that a pattern utility value does not accord with a downward closure characteristic in the research of high-utility pattern mining in the prior art is overcome, the completeness of calculation is ensured, the number of the candidate patterns is greatly reduced through a pruning strategy, and the time-space efficiency of calculation is improved.

Description

technical field [0001] The technical solution of the invention relates to the technical field of electrical digital data processing, in particular to a method for mining high average utility sequence patterns under the condition of no overlap. Background technique [0002] With the rapid development of information technology, people have stepped into the Internet age, and a large amount of data has also been generated. Under the background of data explosion, how to transform these data into useful information for people has become an urgent problem to be solved. Data The topic of excavation also emerged as the times require. Data mining is the analysis and automatic processing of massive data, so as to obtain the relationships, trends and laws contained in the data. Currently, research on data mining is divided into two parts: association rule mining and sequential pattern mining. Association rule mining is to mine the connection between different items in the same transac...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/2458G06F16/903G06N5/02
CPCG06F16/2465G06F16/90344G06F2216/03G06N5/025
Inventor 武优西耿萌户倩雷荣刘锦陈明婕翟景琦
Owner HEBEI UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products