Information mining method and apparatus, and apparatus used for information mining

An information mining and preset technology, which is applied in the field of Internet information, can solve problems such as the inability to timely mine new entities or the latest attributes of entities, and achieve the effects of improving timeliness and reducing computational complexity

Active Publication Date: 2018-07-31
BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD
View PDF4 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the update speed of the structured data of the website for the entity or the corresponding attribute of the ent

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Information mining method and apparatus, and apparatus used for information mining
  • Information mining method and apparatus, and apparatus used for information mining
  • Information mining method and apparatus, and apparatus used for information mining

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0058] refer to figure 1 , which shows a flow chart of the steps of Embodiment 1 of an information mining method of the present invention, which may specifically include the following steps:

[0059] Step 101, obtaining the target sentence containing the preset predicate from the webpage text corpus;

[0060] Step 102, extracting a subject and an object from the syntactic analysis result corresponding to the target sentence;

[0061] Step 103: Create an entity-attribute pair based on the extracted subject and object, and save the entity-attribute pair.

[0062] In the embodiment of the present invention, the webpage text corpus may be composed of webpage text, which may be used to represent the natural language text included in the webpage, and the webpage text may be derived from unstructured data or semi-structured data included in the webpage. Optionally, the webpage text may include: body text of the webpage. In addition, the preset webpage categories to which the webpa...

Embodiment 2

[0078] refer to figure 2 , which shows a flow chart of the steps of Embodiment 2 of an information mining method of the present invention, which may specifically include the following steps:

[0079] Step 201, obtaining the target sentence containing the preset predicate from the webpage text corpus;

[0080] Step 202, extracting a subject and an object from the syntactic analysis result corresponding to the target sentence;

[0081] Step 203: Establish entity-attribute pairs according to the extracted subject and object;

[0082] Step 204, determine the first degree of confidence corresponding to the entity-attribute pair;

[0083] Step 205, if the first confidence exceeds the first confidence threshold, save the entity-attribute pair.

[0084] compared to figure 1 In the first embodiment of the method shown, in this embodiment of the present invention, before saving the entity-attribute pair, the first confidence level corresponding to the entity-attribute pair can be d...

Embodiment 3

[0108] refer to image 3 , which shows a flow chart of the steps of Embodiment 3 of an information mining method of the present invention, which may specifically include the following steps:

[0109] Step 301, acquiring multiple attributes corresponding to the entity;

[0110] Step 302. Obtain key attributes that are directional to the entity from multiple attributes corresponding to the entity;

[0111] Step 303: Establish entity-key attribute pairs according to the entity and the key attributes, and save the entity-key attribute pairs.

[0112] In practical applications, there are various attributes corresponding to an entity. For an entity, the attribute obtained from the website is only its auxiliary information, so this attribute may not be able to meet the needs of users well.

[0113] In this embodiment of the present invention, key attributes that are directional to the entity can be obtained from multiple attributes corresponding to the entity, and an entity-key att...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Embodiments of the invention provide an information mining method and apparatus, and an apparatus used for information mining. The method specifically comprises the steps of obtaining a target statement comprising a preset predicate from a webpage text corpus; extracting a subject and an object from a syntactic analysis result corresponding to the target statement; and according to the extracted subject and object, establishing an entity-attribute pair, and storing the entity-attribute pair. New entities or newest attributes of entities can be timely mined from a webpage text with higher timeliness, so that the timeliness of entity information can be improved.

Description

technical field [0001] The invention relates to the field of Internet information technology, in particular to an information mining method and device, and a device for information mining. Background technique [0002] With the rapid development of Internet information technology, especially wireless Internet information technology, information services have become more and more common. When an information service provider provides information services, for example, a search engine provides search services, it usually uses entities to provide information services. Specifically, objective things in the real world may be called entities, such as concepts, things, or events. For example, the film and television drama "I am a special soldier", the star "Andy Lau", and the writer "Huo Da" are all examples of entities. At the same time, each entity has attributes, which reflect the relevant information of the entity. For example, "military theme", "174cm", and "Hui nationality" ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/3344G06F16/9535
Inventor 邸楠尹顺顺邓超
Owner BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products