Vision-based document segmentation

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A document-based technology, applied in permanent visual display devices, unstructured text data retrieval, text database browsing/visualization, etc., can solve problems such as reducing the accuracy of the search process

Inactive Publication Date: 2005-02-09

MICROSOFT CORP

View PDF0 Cites 18 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] These characteristics of web pages can reduce the accuracy of the search process

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0020] This invention describes vision-based document segmentation. Vision-based document segmentation identifies portions of a document that include the semantic content of the document, based on the document's visual appearance. Vision-based document segmentation can be used in a number of different ways. For example, segmentation can be used when searching for documents to base search results on the semantic content parts of the documents.

[0021] The discussion that follows is in terms of documents and the models used to describe the structure of documents. Documents may be in any of a variety of formats, such as in accordance with Standard Generalized Markup Language (SGML) such as Extensible Markup Language (XML) format or Hypertext Markup Language (HTML) format. In several embodiments, these documents are web pages in HTML format. The model discussed here may be any of a variety of models that describe the structure of a document. In several embodiments, the model ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

Vision-based document segmentation identifies one or more portions of semantic content of a document. The one or more portions are identified by identifying a plurality of visual blocks in the document, and detecting one or more separators between the visual blocks of the plurality of visual blocks. A content structure for the document is constructed based at least in part on the plurality of visual blocks and the one or more separators, and the content structure identifies the one or more portions of semantic content of the document. The content structure obtained using the vision-based document segmentation can optionally be used during document retrieval.

Description

technical field [0001] The present invention relates to segmenting documents, and more particularly to vision-based document segmentation. Background technique [0002] People have access to vast amounts of information. However, finding the specific information they need in any given situation can be quite difficult. For example, through the Internet, a vast amount of information is accessible to people in the form of web pages. The number of such web pages may be on the order of 1 million or more. In addition, the available web pages are constantly changing, with some pages being added, others being deleted, and others being modified. [0003] Thus, when one desires to find out certain information, such as an answer to a question, the ability to extract specific information from this large source of information becomes very important. Processes and technologies were developed to allow users to search for information over the Internet, and are generally made available to...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06F15/00G06FG06F17/00G06F17/30G06F40/143G06K9/72G06K15/00

CPCG06F17/30716G06F17/218G06F17/2247G06F16/34G06F40/117G06F40/143G06F15/00G06F17/00

Inventor文继荣俞诗鹏蔡登马维英

OwnerMICROSOFT CORP

Vision-based document segmentation

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology