Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and device for processing text-related structural data

A technology of structured data and processing methods, applied in the field of the Internet, can solve the problems of difficulty in displaying webpages on mobile devices, inability to see webpages, and support.

Inactive Publication Date: 2014-01-01
SHENZHEN SHI JI GUANG SU INFORMATION TECH
View PDF0 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In the process of realizing the present invention, the inventor has found that the prior art has at least the following problems: because the pages of complex webpages usually cannot be directly supported by the browser of the mobile device, and the mobile network and the screen of the mobile device are limited, etc. Conditions, which bring certain difficulties to displaying web pages on mobile devices, so that users usually cannot see information related to the body of the web page on the browser of the mobile device

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for processing text-related structural data
  • Method and device for processing text-related structural data
  • Method and device for processing text-related structural data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0078] figure 1 It is a flowchart of a method for processing text-related structured data according to Embodiment 1 of the present invention. Such as figure 1 As shown, the execution subject of the method for processing text-related structured data in this embodiment may specifically be a text-related structured data processing device. The method for processing text-related structured data in this embodiment may specifically include the following steps:

[0079] 100. Perform block processing on the nodes in the Document Object Model (DOM) tree of the web page according to the preset candidate block node types to obtain several candidate block nodes;

[0080] The type of the candidate block node in this embodiment is the node type corresponding to the label used to store the body of the webpage; for example, the label for storing the body of the webpage in the prior art may be a DIV label or a TABLE label, and the corresponding one is used for storage. The node type corresponding t...

Embodiment 2

[0126] Figure 4 It is a schematic structural diagram of an apparatus for processing structured data related to text provided in the second embodiment of the present invention. Such as Figure 4 As shown, the apparatus for processing structured data related to text in this embodiment may specifically include: a block processing module 10, a filtering module 11, a data extraction module 12, and a display module 13.

[0127] The block processing module 10 is used to block the nodes in the DOM tree of the web page according to the preset candidate block node types to obtain several candidate block nodes; the type of the candidate block nodes is used for storage The node type corresponding to the label of the main text of the webpage; the filtering module 11 is connected to the block processing module 10, and the filtering module 11 is used to filter out several candidate block nodes processed by the block processing module 10 that store the main text of the webpage A candidate block...

Embodiment 3

[0131] Figure 5 It is a schematic structural diagram of processing text-related structured data provided in the third embodiment of the present invention. Figure 5 The apparatus for processing structured data related to the text of the illustrated embodiment is described above Figure 4 On the basis of the illustrated embodiment, the following technical solutions may also be included.

[0132] Such as Figure 5 As shown, the apparatus for processing structured data related to the text of this embodiment further includes an integration module 14 and / or a packaging module 15. Figure 5 The illustrated embodiment takes the integration module 14 and the packaging module 15 as an example.

[0133] The integration module 14 can be connected to the segmentation processing module 10 and the filtering module 11; the integration module 14 is used for the segmentation processing module 10 to perform processing on the nodes in the document object model tree of the web page according to the pre...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and device for processing text-related structural data, and belongs to the field of internet technologies. The method includes the step 1, carrying out block processing on nodes in a document object model tree of a webpage according to types of preset candidate block nodes so as to obtain several candidate block nodes, wherein the types of the candidate block nodes are types of nodes corresponding to labels of a text used for storing the webpage, step 2, filtering out the candidate block nodes whose probabilities of the text used for storing the webpage are smaller than preset probability threshold values from the several candidate block nodes so as to obtain several block nodes, step 3, extracting the text-related structural data of the webpage from the several block nodes, and step 4, displaying the text-related structural data of the webpage. The text-related structural data of the webpage at least comprises titles, text information and text bodies. According to the technical scheme, the method can be used for efficiently extracting and displaying the text-related structural data.

Description

Technical field [0001] The embodiment of the present invention relates to the field of Internet technology, and in particular to a method and device for processing structured data related to text. Background technique [0002] In the prior art, webpages such as WWW webpages are mainly designed for personal computer (PC) browsers. With the development of technology and the drive of commerce, web pages have become more and more complex in recent years and contain more and more content. For example, web pages can contain various complex content such as navigation, text, links, advertisements, and JS. [0003] With the rapid development of the mobile Internet and the popular use of mobile devices such as mobile phones, users can use mobile devices to surf the Internet anytime and anywhere. Therefore, users have an increasing demand for direct browsing of web pages on mobile devices such as mobile phones. . [0004] In the process of implementing the present invention, the inventor foun...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): H04L29/08G06F17/30
Inventor 蔡兵徐羽彭默
Owner SHENZHEN SHI JI GUANG SU INFORMATION TECH
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More