Unlock instant, AI-driven research and patent intelligence for your innovation.

Similarity determination method for long text

A judgment method and similarity technology, applied in the information field

Inactive Publication Date: 2016-12-07
HUNAN ANTVISION SOFTWARE
View PDF5 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This invention is not for judging the similarity of long texts

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Similarity determination method for long text

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] Such as figure 1 As shown, a similarity determination method for long texts of the present invention includes the following steps,

[0019] Step S101: Segmentation of text sentence segments, segmenting sentence segments according to the collected text content to be analyzed. In this embodiment, take the texts A and B to be analyzed as an example. Since the texts have different lengths and there are many sentences and paragraphs, it is very important to extract sentence fragments. The sentence fragments divided into different rules are also different, so the specified rules are uniformly adopted. Perform content segmentation into sentence fragments. The sentence fragment sets after the texts A and B to be analyzed are divided into C and D respectively.

[0020] Step S102: Combining long sentences, randomly combining long sentence segments after text segmentation; specifically includes the following steps,

[0021] Step S1021: Sorting and screening, performing long sen...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of information, in particular to a similarity determination method for long text. The method comprises the following steps of 101 text statement fragment segmenting, wherein statement fragment segmenting is conducted according to the contents of the collected text to be analyzed; 102 long statement combining, wherein long statement fragments obtained after text segmenting are randomly combined; 103 text similarity judging, wherein whether same sets exist in long statement combinations of the text to be analyzed or not is judged, if yes, the contents of the text to be analyzed are similar, and if not, the contents of the text to be analyzed are not similar. By means of the method, calculation is simple and easy to achieve, time is greatly shortened, and the spatial calculation complexity is greatly reduced; compared with other similarity determination methods, the text similarity determination method is simple and very effective for an acquisition system with the large data volume.

Description

technical field [0001] The invention relates to the field of information technology, in particular to a similarity judgment method for long texts. Background technique [0002] With the development of the Internet and the advent of the information age, it has become a very important way for people to obtain information resources from the Internet. The Internet has become an important platform for people to communicate and communicate. Every day, the Internet generates a large number of information resources, and the contents of these resources are largely repetitive and similar. According to relevant statistical data, the number of nearly repeated webpages on the Internet accounts for as high as 29% of the total number of webpages. In a large-scale information collection system, most of the collected webpage information content is completely repeated or nearly repeated. Therefore, in an information collection system, judging the similarity of text content becomes a very imp...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/211G06F40/253
Inventor 唐义晴黄三伟
Owner HUNAN ANTVISION SOFTWARE
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More