Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and device for generating structured information

A structured information and page technology, applied in the computer field, can solve problems such as poor applicability, thin content features, and many redundant information, and achieve the effect of ensuring diversity

Active Publication Date: 2017-11-24
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
View PDF11 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, this method of generating industry structured information requires the design of a large number of extraction rules, and due to the poor applicability of the extraction rules, the content characteristics of the generated structured information are relatively thin and there are many redundant information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for generating structured information
  • Method and device for generating structured information
  • Method and device for generating structured information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] The application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain related inventions, rather than to limit the invention. It should also be noted that, for the convenience of description, only the parts related to the related invention are shown in the drawings.

[0036] It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other. The present application will be described in detail below with reference to the accompanying drawings and embodiments.

[0037] figure 1 A flow 100 of one embodiment of a method for generating structured information according to the present application is shown. The method for generating structured information includes:

[0038] In step 110, all page contents of enterprise websites in a pred...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses a method and device for generating structured information. The method comprises the specific steps of capturing all the page content of an enterprise web site in a predetermined field; categorizing the page content to the category of the pre-structured enterprise information and other categories according to the feature of the page content, and obtaining the categorized page; dividing the categorized page into a content page and a form page, using the content page and the form page as tags, and labelling the categorized page; extracting at least one of the categorized pages, and obtaining the extracted information: body block extraction, body content structured extraction, image text block extraction, list block extraction and structuring of predetermined location content; constructing the structured information based on the extracted information. The implementation method ensures the diversity of information sources, so that the content feature presented by the generated structured information is rich and redundant information is reduced.

Description

technical field [0001] The present application relates to the field of computer technology, specifically to the field of computer network technology, and in particular to a method and device for generating structured information. Background technique [0002] With the continuous development of Internet technology, the amount of information carried by Internet web pages is increasing. How to extract the structured information required by a certain industry from the massive Internet webpage information is a problem that needs to be solved urgently. [0003] The current industry structured information is usually extracted from the webpage information of the industry website according to the extraction rules. However, due to the complexity and irregularity of the webpage structure, it is often necessary to write specific extraction rules for specific data sources. [0004] However, this method of generating industry structured information requires the design of a large number o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/951G06F16/9566G06F40/289
Inventor 钟辉强尹存祥沈剑平徐国强
Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products