Clustering method, device and system

A clustering and cohesion technology, applied in the network field, can solve the problems that users cannot obtain resources, cannot be searched, and wrong search results, etc., and achieve the effect of objective and accurate processing methods, improved user experience, and accurate search results.

Active Publication Date: 2008-08-20
BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] However, the defect of this method is that if there is no description information of the corresponding query word in the link of the resource that the user cares about, the user cannot obtain the corresponding resource
In addition, for resources that are the same as the links represented by the query words, they may not be searched out as resources related to the current query words due to differences in their description information
Even, if the description information of some audio files has changed, but the text content of the audio file has not changed, wrong search results will be obtained only based on the search method of the description information
[0007] That is to say, there are at least the following problems in the prior art: the prior art cannot search for resources with the same text content length according to the text content length of the audio file

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Clustering method, device and system
  • Clustering method, device and system
  • Clustering method, device and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0029] refer to figure 1 Shown, be embodiment one of the method of the present invention, comprise steps:

[0030] Step 101, obtaining part of the text content of the media file;

[0031] Step 102. Calculate clustering information of the media file according to part of the text content of the media file.

[0032] Embodiments of the present invention have the following advantages:

[0033] First, the embodiment of the present invention can calculate the clustering information of the media file according to the partial text content of the media file by acquiring part of the text content;

[0034] Secondly, since the embodiment of the present invention obtains the clustering information of the media file by calculating part of the text content of the media file, it does not depend on the description information of the media file, and avoids the wrong clustering caused by artificially modifying the description information. Class, the processing method is objective and accurate....

Embodiment 2

[0073] refer to figure 2 As shown, it is the second embodiment of the method of the present invention, and the present embodiment takes an audio file as an example to illustrate, including steps:

[0074] Step 201, obtaining the contents of the header and tail of the audio file;

[0075] For MP3 and WMA files, a large amount of meta (metadata, source data) information will be stored in the header of the file to identify various attributes of the file itself, ID3V1 (the first generation of tags, for details, see http: / / www.id3.org / ID3v1 The MP3 (Moving PictureExperts Group Audio Layer III, Audio Compression Technology and Audio Coding Technology) file in ) format has 128 bytes of meta information at the end. Usually the header gets no more than 50k bytes of content; the tail gets 5k of content.

[0076] Step 202, analyzing the contents of the head and tail of the audio file;

[0077] Regarding MP3 and WMA head and tail files, referring to the MP3 file specification, there ...

example 1

[0100] Example 1: Suppose a link of an audio file in WMA format is:

[0101] http: / / oursim.whu.edu.cn / houtai / edit / UploadFile / 2006112073350103.wma For the audio file, the process of calculating its MD5 signature includes:

[0102] 1. Acquiring the header and tail contents of the WMA file in the link, the header and tail contents of the file are usually expressed in the form of a music URL link list;

[0103] Head: 2006112073350103_head 50k

[0104] Tail: 2006112073350103_tail 5k

[0105] 2. Analyze the header file and tail file of the WMA file obtained in the link:

[0106] a) First analyze the content of the header file.

[0107] The first 16 bytes of the header are 0x30 0x26 0xB2 0x75 0x8E 0x66 0xCF0x11 0xA6 0xD9 0x00 0xAA 0x00 0x62 0xCE 0x6C, so it can be judged that the file where the header file is located is a WMA format file.

[0108] The analyzer looks for audio content start identifiers 0x36 0x26 0xB2 0x75 0x8E 0x660xCF 0x11 0xA6 0xD9 0x00 0xAA 0x00 0x62 0xCE 0x6C,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Disclosed is a method for obtaining cluster information, comprising obtaining a part of media file text content according to which the media file cluster information is to be calculated. A device and system for obtaining cluster information is also disclosed. The use of the invention can search resource without description information in accordance with keyword.

Description

technical field [0001] The present invention relates to the field of network technology, in particular to a clustering method, device and system. Background technique [0002] The amount of resources stored in the Internet is huge, and it is constantly being updated and expanded. Especially with the expansion of network bandwidth, media files including audio and video files have developed rapidly because they can bring great enjoyment to people's physical and mental pleasure. However, how to adapt to the needs of users and provide users with accurate similar media file information has become more and more necessary with the expansion of media files. [0003] To search for resources that these users care about, it is necessary to find links to related resources. A kind of solution of music search engine is provided in the prior art, and main process is: [0004] The user enters a query word; [0005] After receiving the query word, the search engine performs a correspondi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 王志刚贾玉龙
Owner BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products