Unlock instant, AI-driven research and patent intelligence for your innovation.

Information graph extraction and retrieval method and device for information graph webpage

A technology of information graphs and webpages, which is applied in the direction of network data retrieval, other database retrieval, network data query, etc., can solve problems affecting user experience and achieve the effect of convenient retrieval

Active Publication Date: 2019-03-29
ALIBABA (CHINA) CO LTD
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Even if a small number of infographic webpages are recalled, displaying infographic webpages in the traditional way of text summarization will inevitably affect user experience

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Information graph extraction and retrieval method and device for information graph webpage
  • Information graph extraction and retrieval method and device for information graph webpage
  • Information graph extraction and retrieval method and device for information graph webpage

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] Firstly, the prior art terms appearing in the present invention are explained.

[0030] Inverted index: Inverted index, also known as inverted index, embedded file or reverse file, is an index method used to store a word in a document or a group of documents under full-text search A map of storage locations. It is the most commonly used data structure in document retrieval systems. Through the inverted index, you can quickly obtain a list of documents containing this word according to the word. The inverted index mainly consists of two parts: "word dictionary" and "inverted file". Each item in this index table includes an attribute value and the address of each record with the attribute value. The attribute value is not determined by the record, but the position of the record is determined by the attribute value.

[0031] Machine Learning Algorithms: Machine Learning is a multi-field interdisciplinary subject, involving probability theory, statistics, approximation the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an information graph extracting and retrieving method and device for information graph webpages. The information graph extracting and retrieving method includes steps that a server identifies webpages comprising information graphs according to the fact that the webpages conform to preset features or not, wherein the preset features are classified into at least one of four types including a first feature type indicating information graph keywords, a second feature type indicating webpage content image number by one, a third feature type indicating that webpage content text length is smaller than threshold value, and a fourth feature type indicating webpage URL (uniform resource locator) modes as information graph webpages; extracting the identified structured information of the webpages comprising the information graphs and feature information of the information graphs to obtain the information graph features contained in the webpages. By means of the information graph extracting and retrieving method, the information graph webpages can be extracted and retrieved accurately according to retrieval needs of users.

Description

technical field [0001] The invention relates to the technical field of webpage information identification, in particular to a method and device for extracting information graph features of a webpage, and a retrieval method and device for an information graph webpage. Background technique [0002] With the rapid development of Internet technology, all kinds of information have shown explosive growth, and a large amount of information is mixed together, so that users have to spend a lot of time to screen out valuable information from a large amount of network information. Since the way of presenting information through text is not direct enough, a way of presenting information through information graphs has emerged, which presents data, information, knowledge, and the relationship between entities to users intuitively in the form of visual graphics, and can present complex information contexts in the form of information. The form of the graph is simple and clear to the user. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/953
Inventor 万明成王刚
Owner ALIBABA (CHINA) CO LTD