Unlock instant, AI-driven research and patent intelligence for your innovation.

Data processing method, search method and device

A data processing and raw data technology, applied in the field of data processing, can solve the problems of huge file data volume, system performance improvement, and too much data volume, and achieve the effect of improving network experience, increasing transmission speed, and small storage space

Active Publication Date: 2018-08-24
NEW SINGULARITY INT TECHN DEV
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] At present, in network services such as online shopping, information retrieval and information websites, a very large amount of data (such as text) needs to be processed. The traditional processing method is to directly encode these data according to a predetermined format, but this The problem with the method is that the data volume of the encoded file is still very large, and it does not use the later application (such as: storage, transmission, etc.)
[0004] This text has a total of 170 characters (including punctuation marks). Assuming it is stored in UTF-8 (8-bit Unicode Transformation Format, Universal Code) format (each word occupies 3 bytes), it will generate 510 bytes The compressed file of space takes up a lot of storage space during storage, and the transmission takes a long time due to the large amount of data during transmission
[0005] In addition, if traditional data processing methods are used in network services, it will lead to a decrease in user experience
Take search as an example: in the traditional search method, the original data is stored in the local file system unchanged, which will consume a large amount of storage space, especially in distributed search, where the amount of data in the search results is very large. Instead, it takes a long network transmission time, resulting in slower search speed
At the same time, the process of creating a traditional index is as follows: after the index server receives the original data, it first creates an index and then stores the original data; when the user searches for this record, it extracts the original data from the disk and returns it to the user. It is easy to become the bottleneck of system performance improvement in disk input / output and network transmission, affecting user experience

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data processing method, search method and device
  • Data processing method, search method and device
  • Data processing method, search method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] Please refer to figure 1 , is a schematic flowchart of an embodiment of the data processing method provided by the present invention. It includes the following steps:

[0038] Step S11, calculating the compression rate of each vocabulary in the original data.

[0039] In this embodiment, the compression rate is the ratio of the number of bytes reduced after a certain vocabulary is compressed to the number of bytes occupied when the vocabulary is not compressed, wherein, step S11 can use the formula: Calculate the compression rate of the vocabulary, where Co represents the compression rate, W_F represents the number of times the vocabulary appears in the original data, W_L represents the number of words contained in the vocabulary, n is the number of bytes required to encode a word in the original data, and f is the compression factor , which is used to indicate the number of bytes occupied by the position information, which is used to indicate the position of the voc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention provides a data processing method, a data searching method and a data processing apparatus. The data processing method comprises: calculating a compression ratio of each word in original data; carrying out compression on the words in the original data, of which the compression ratios are greater than a preset threshold value, and generating a high word frequency file, wherein the high word frequency file comprises the words and position information of the words in the original data; and after deleting the words, of which the compression ratios are greater than the preset threshold value, from the original data, compressing the original data to generate a non-high-word-frequency file. According to the embodiments of the present invention, adoption of the data processing method can enable the data to occupy a small storage space in the storing process and is beneficial for improving a transmission speed in the network transmitting process.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a data processing method, a search method and a device. Background technique [0002] At present, in network services such as online shopping, information retrieval and information websites, a very large amount of data (such as text) needs to be processed. The traditional processing method is to directly encode these data according to a predetermined format, but this The problem with this method is that the data volume of the encoded file is still very large, and the post-application (for example: storage, transmission, etc.) is not used. For example: [0003] Astronautics: Also known as space flight, space flight, cosmonautics, or space shuttle. Refers to the navigation activities of spacecraft in space. Some scientists once called the navigation activities of spacecraft in the solar system as spaceflight, and the navigation activities of spacecraft outside the solar ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 王忻
Owner NEW SINGULARITY INT TECHN DEV