Unlock instant, AI-driven research and patent intelligence for your innovation.

Broken document splicing method based on DFS and improved center clustering method

A technology of broken documents and clustering method, which is applied in the direction of instruments, character and pattern recognition, computer components, etc., can solve the problems of easy failure, lack of robustness, and failure of clustering algorithms, so as to reduce the number of manual interventions and reduce Human factors, the effect of increasing the accuracy rate

Active Publication Date: 2017-10-03
SOUTHWEST PETROLEUM UNIV
View PDF6 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to the complex cutting of shredded paper, clustering methods and stitching algorithms are often unable to cope with special situations
The splicing algorithm at this stage mainly contains the following disadvantages: the clustering algorithm is prone to failure
For special situations such as blanks before paragraphs caused by paragraphs in documents, clustering algorithms at this stage often fail to cope with them
The principle of splicing algorithm based on similarity is too single, lacks robustness, and cannot cover all possibilities in the splicing process
For NP-hard problems such as fragment splicing, as the scale of fragment splicing increases, this method does not achieve very good results
On the other hand, the genetic algorithm, a commonly used optimization algorithm for solving NP-hard problems, does not perform well in practice, mainly because parameters such as population size, population mutation rate, and crossover rate in the genetic algorithm need to be set artificially. If it is unreasonable, the calculation result will converge to the local optimum and the desired effect will not be achieved.
[0003] To sum up, the problem existing in the prior art is that the splicing algorithms that exist at this stage either use clustering methods or genetic algorithms
However, the clustering features selected by the clustering method are too single, and the genetic algorithm needs to manually set parameters. The above reasons lead to the poor effect of the algorithm on the actual fragment splicing

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Broken document splicing method based on DFS and improved center clustering method
  • Broken document splicing method based on DFS and improved center clustering method
  • Broken document splicing method based on DFS and improved center clustering method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] In order to make the object, technical solution and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with the examples. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0039] The application principle of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0040] Such as figure 1 As shown, the splicing method of broken documents based on DFS and improved central clustering method provided by the embodiment of the present invention includes the following steps:

[0041] S101: First use the similarity degree for splicing, and then perform error correction based on the difference degree, leaving aside the genetic algorithm to increase the correctness of the algorithm;

[0042] S102: Based on the improved central clustering method and the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of broken document splicing and discloses a broken document splicing method based on DFS and an improved center clustering method. The method comprises the following steps: carrying out splicing based on similarity, and carrying out error correction based on difference degree; and based on the improved center clustering method and a DFS splicing restoration algorithm, fully excavating matching degree of two pieces of fragments based on parameters of similarity degree, difference degree and black threshold and the like in the DFS splicing restoration algorithm. The method further reduces human factors and obtains a good clustering effect; splicing is carried out based on similarity, and error correction is carried out based on difference degree, thereby avoiding a genetic algorithm and improving accuracy of the algorithm; and by deducing a threshold range formula, the effect of the center clustering method is allowed to be better, and matching degree of the two pieces of fragments is fully excavated based on parameters of similarity degree, difference degree and black threshold and the like in the DFS splicing restoration algorithm, thereby further reducing human intervention.

Description

technical field [0001] The invention belongs to the technical field of broken document splicing, and in particular relates to a broken document splicing method based on DFS and an improved central clustering method. Background technique [0002] At present, the rule-cut document fragment splicing problem is mostly divided into two steps. The first step is to use the clustering method to cluster all the fragments in order to find out the fragments that originally belonged to a row, which is convenient for subsequent processing. In the second step, the shreds belonging to a row are spliced ​​separately based on the similarity algorithm or genetic algorithm, and manual intervention is performed in some special splicing situations. However, due to the complexity of cutting paper fragments, clustering methods and splicing algorithms are often unable to cope with special situations. The splicing algorithm at this stage mainly has the following disadvantages: the clustering algori...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62G06K9/34
CPCG06V10/267G06V30/153G06F18/23
Inventor 李玲娜杨丰祥彭凯巍唐瑞
Owner SOUTHWEST PETROLEUM UNIV