The invention discloses a repeated data deletion framework-based
reverse index representation method and
system, and the method and
system are suitable for search engines and
social network data processing. The method comprises the following steps of: 1, traversing reverse lists in reverse indexes, and recognizing and recording sequential patterns which repeatedly emerge between different reverse lists; 2, calculating the length of each sequential pattern, carrying out corresponding operations according to the lengths, and distributing a mode serial number for each sequential pattern according to a lexicographical order of the sequential patterns; 3, reducing the reverse indexes according to the sequential patterns, and respectively storing the sequential patterns and the reduced reverse lists; and 4, difference value
processing: carrying out difference value calculation on adjacent document serial numbers in the sequential patterns, and recording pattern serial numbers and position offsets of adjacent pattern serial numbers, wherein the pattern serial numbers are expressed as two-tuples. The method and
system disclosed by the invention can effectively delete the repeated data in the reverse indexes, thereby decreasing the amount of the document serial numbers, improving the compression rate of the reverse indexes, shortening the query
response time of the search engines and improving the user experience.