Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and system for segmenting text paragraphs

A technology for paragraphs and texts, applied in the field of text paragraph slicing methods and systems, can solve problems such as spending a lot of time and difficult to merge small paragraphs into semantic paragraphs, and achieve the effect of accurate semantic information

Inactive Publication Date: 2015-01-28
ANHUI HUAZHEN INFORMATION SCI & TECH
View PDF3 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Usually a text file may contain tens of thousands or even hundreds of thousands of words, and it takes a lot of time to search for certain semantics in the text file
Although the search efficiency can be improved by segmenting the entire text file into paragraphs, the paragraph segmentation in the prior art not only needs to consider text and paragraph length constraints, but it is also difficult to guarantee the small paragraphs with close semantic associations to the maximum extent. Combined into semantic paragraphs of moderate length

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for segmenting text paragraphs
  • Method and system for segmenting text paragraphs

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] Such as figure 1 As shown, the embodiment of the present invention proposes a text paragraph slicing method, including the following steps:

[0029] Step 101, acquire Internet data from a storage system, wherein the Internet data includes HTML (HyperText Mark-up Language, hypertext markup language) text, title, meta and anchor text. HTML is currently the most widely used language on the Internet, and it is also the main language that constitutes web documents. It is a descriptive text composed of HTML commands. HTML commands can explain text, graphics, animations, sounds, tables, links, etc.; the structure of HTML files includes The header (title) and the body (meta) are two parts. The header describes the information required by the browser, while the body contains the specific content to be explained. Anchor text is the text part of a hyperlink on a web page, which is an important factor affecting the ranking of a web page search engine. Anchor text refers to a web p...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and a system for segmenting text paragraphs. The method comprises the following steps of obtaining Internet data; roughly segmenting the text paragraphs of the Internet data; performing the paragraph correlation analysis on the roughly segmented text paragraphs, and regrouping the paragraphs; combining the regrouped paragraphs into a semantic paragraph; permanently sequencing the same semantic paragraphs into a storage system. The method has the advantages that the uniform interface and design of the system are favorably realized, the advantages of the analysis of paragraph texts are sufficiently utilized, more detailed and accurate semantic information is refined on texts with smaller granularity, and the collection, identification and analysis of information are supported.

Description

technical field [0001] The present invention relates to the technical field of data network, in particular to a text paragraph slicing method and system. Background technique [0002] Text refers to the manifestation of written language, and from a literary point of view, it is usually a sentence or a combination of sentences with complete and systematic meaning. A text can be a sentence, a paragraph or a chapter. Usually a text file may include tens of thousands or even hundreds of thousands of words, and it takes a lot of time to search for some semantics in the text file. Although the search efficiency can be improved by segmenting the entire text file into paragraphs, the paragraph segmentation in the prior art not only needs to consider text and paragraph length constraints, but it is also difficult to guarantee the small paragraphs with close semantic associations to the maximum extent. Merged into semantic paragraphs of moderate length. Contents of the invention ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06F17/30
Inventor 贾岩
Owner ANHUI HUAZHEN INFORMATION SCI & TECH