Unlock instant, AI-driven research and patent intelligence for your innovation.

Travel network cell division method based on Simhash algorithm

A network community and algorithm technology, which is applied in the field of tourism complex network community division, can solve problems such as being easily affected by isolated points, affecting clustering results, and large amount of calculation, so as to improve division efficiency, simple and convenient algorithm, and reduce storage space Effect

Inactive Publication Date: 2015-12-09
SHAANXI NORMAL UNIV
View PDF1 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The idea of ​​G-N algorithm is simple, but this method has a large amount of calculation, and the calculation time complexity is relatively high
The typical clustering methods are K-Means and K-Medoids. Although these two algorithms run fast, their disadvantages are that the selection of the initial center cluster will affect the clustering results, and they are easily affected by outliers, etc.
Representative algorithms for hierarchical clustering include BIRCH, CURE, and Chameleon. Its disadvantage is that it cannot be corrected once the process ends.
The disadvantage of grid clustering is that it is not very handy for processing large-scale data
Therefore, the current network community division methods all have different defects, so that their application is limited

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Travel network cell division method based on Simhash algorithm
  • Travel network cell division method based on Simhash algorithm
  • Travel network cell division method based on Simhash algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0024] Now take Sina Weibo as an example, the travel network community division method based on Simhash algorithm of the present invention can be found in figure 1 , implemented by the following steps:

[0025] (1) Crawl the user ID and text data on the travel network, and store them in the database, specifically including the following steps:

[0026] (1.1) Apply for Sina APPkey;

[0027] (1.2) According to the API interface provided by Sina, check the URL of the required interface, HTTP request method, parameter request crawling user ID, user registered address address1, user microblog information content text, user published microblog address address2, and the interface returns in json format The data;

[0028] (1.3) Use the java program to process the json data returned by Weibo, and judge whether the registered address address1 of the first user is the same as the address address2 where the user publishes the text information content, and if not, determine that the text...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a travel network cell division method based on a Simhash algorithm. According to the method, the Simhash algorithm is utilized for processing texts and calculating the semantic fingerprints of the texts, the Hamming distance is used for comparing the distance between the semantic fingerprints of the texts, the text similarity is calculated, and therefore the purpose of clustering similar users is achieved. Dimensionality reduction is carried out on short-text high-dimensionality feature vectors, and the storage space occupied by the feature vectors is greatly reduced; meanwhile, the algorithm is easy and convenient to implement, short in calculation time and high in text processing speed, so that the division efficiency of complex network cells is improved, and the method has the great significance in predicting travel activity trend and development, providing travel service information, recommending travel routes and even predicting the travel peak.

Description

technical field [0001] The invention belongs to the field of data mining, in particular to a Simhash deduplication algorithm used as a clustering algorithm for community division in complex tourism networks. Background technique [0002] In recent years, complex networks have become a research hotspot in information science, sociology, physics, and even life science. Many systems in nature can be expressed in the form of complex networks, such as social networks, communication networks, and the Internet. Forums, BBS, Weibo, travel websites and other social platforms are widely used by travel enthusiasts because of their fast speed, low cost, and convenient use. Therefore, using these social platforms to communicate has gradually formed a complex tourism network. [0003] At present, there are different kinds of algorithms for complex network community division, one is the strategy of subgraph, such as spectral dichotomy, K-L algorithm; their disadvantage is that the size of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06Q50/00
CPCG06F16/3344G06F16/35G06F16/951G06Q50/01
Inventor 曹菡冯倩李程
Owner SHAANXI NORMAL UNIV