Travel network cell division method based on Simhash algorithm

A network community and algorithm technology, which is applied in the field of tourism complex network community division, can solve problems such as being easily affected by isolated points, affecting clustering results, and large amount of calculation, so as to improve division efficiency, simple and convenient algorithm, and reduce storage space Effect

Inactive Publication Date: 2015-12-09
SHAANXI NORMAL UNIV
View PDF1 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The idea of ​​G-N algorithm is simple, but this method has a large amount of calculation, and the calculation time complexity is relatively high
The typical clustering methods are K-Means and K-Medoids. Although these two algorithms run fast, their disadvantages are that the selection of the initial center cluster will affect the clustering results, and they are easily affected by outl

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Travel network cell division method based on Simhash algorithm
  • Travel network cell division method based on Simhash algorithm
  • Travel network cell division method based on Simhash algorithm

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0023] Example 1

[0024] Taking Sina Weibo as an example, the method for dividing travel network communities based on the Simhash algorithm of the present invention is shown in figure 1 , Achieved by the following steps:

[0025] (1) Crawling the user ID and text data on the travel network and storing it in the database, including the following steps:

[0026] (1.1) Apply for Sina APPkey;

[0027] (1.2) According to the API interface provided by Sina, check the URL of the required interface, HTTP request method, parameter request to crawl user ID, user registered address address1, user microblog information content text, user post microblog address address2, the interface returns json format The data;

[0028] (1.3) Use a java program to process the json data returned from Weibo, and determine whether the first user's registered address address1 is the same as the user's address address2 where the user publishes the text message content. If they are not the same, determine whether the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a travel network cell division method based on a Simhash algorithm. According to the method, the Simhash algorithm is utilized for processing texts and calculating the semantic fingerprints of the texts, the Hamming distance is used for comparing the distance between the semantic fingerprints of the texts, the text similarity is calculated, and therefore the purpose of clustering similar users is achieved. Dimensionality reduction is carried out on short-text high-dimensionality feature vectors, and the storage space occupied by the feature vectors is greatly reduced; meanwhile, the algorithm is easy and convenient to implement, short in calculation time and high in text processing speed, so that the division efficiency of complex network cells is improved, and the method has the great significance in predicting travel activity trend and development, providing travel service information, recommending travel routes and even predicting the travel peak.

Description

technical field [0001] The invention belongs to the field of data mining, in particular to a Simhash deduplication algorithm used as a clustering algorithm for community division in complex tourism networks. Background technique [0002] In recent years, complex networks have become a research hotspot in information science, sociology, physics, and even life science. Many systems in nature can be expressed in the form of complex networks, such as social networks, communication networks, and the Internet. Forums, BBS, Weibo, travel websites and other social platforms are widely used by travel enthusiasts because of their fast speed, low cost, and convenient use. Therefore, using these social platforms to communicate has gradually formed a complex tourism network. [0003] At present, there are different kinds of algorithms for complex network community division, one is the strategy of subgraph, such as spectral dichotomy, K-L algorithm; their disadvantage is that the size of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06Q50/00
CPCG06F16/3344G06F16/35G06F16/951G06Q50/01
Inventor 曹菡冯倩李程
Owner SHAANXI NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products