Unlock instant, AI-driven research and patent intelligence for your innovation.

Document compression and decompression method and device

A compression method and document technology, which is applied in the fields of instruments, computing, electrical digital data processing, etc., can solve the problems of low compression rate, large storage space, poor pertinence, etc., and achieve the effect of improving compression rate and reducing occupancy

Active Publication Date: 2016-02-17
BEIJING QIHOO TECH CO LTD
View PDF5 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] However, these compression algorithms are universal and poorly targeted. When compressing web pages, the compression rate is low and takes up more storage space.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Document compression and decompression method and device
  • Document compression and decompression method and device
  • Document compression and decompression method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0062] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

[0063] refer to figure 1 , which shows a flow chart of the steps of an embodiment of a method for compressing documents according to an embodiment of the present invention, which may specifically include the following steps:

[0064] Step 101, extracting multiple documents stored in advance;

[0065] In the embodiment of the present invention, the crawler can crawl the web pages of the Internet through the link relationship betwee...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention provides a document compression method and device. The method includes the steps that a plurality of documents which are prestored are extracted; multiple target documents with similar contents as the documents are researched from the multiple documents, and each target document is provided with a line number; serialization is conducted on the multiple target documents according to the line numbers, and one or more data blocks are obtained; compression processing is conducted on the one or more data blocks, and compression objects are obtained. According to the document compression method and device, the service characteristics of a webpage are utilized for compression, the compression rate is greatly raised, and occupation of the storage space is reduced.

Description

technical field [0001] The present invention relates to the technical field of computer processing, in particular to a document compression method, a document decompression method, a document compression device and a document decompression device. Background technique [0002] In order to build and update the index, the web crawler (also known as spider, Spider) of the search engine will grab a large number of web pages from the Internet every day, the number of which is as high as several billion. [0003] Except for web pages that can be completely identified as spam, most of the web pages crawled by spiders every day will be stored in a database in a certain format. This database is generally called a web page library. [0004] After a long period of accumulation, hundreds of billions of webpages are stored in the webpage library, and the average size of each original webpage is 30-50KB. The total storage capacity of hundreds of billions of webpages is very high. [0005...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/1744
Inventor 武志刚魏少俊
Owner BEIJING QIHOO TECH CO LTD