Data label mining method and data label mining system

A technology of data labeling and labeling, applied in the field of Internet applications, can solve problems such as inability to cover data labels, high labor costs, and heavy workload

Inactive Publication Date: 2014-10-01
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
View PDF4 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0014] Among them, the first method of manually constructing the data labeling system requires manual construction and the accuracy of the upper and lower relations, which brings a large workload and high labor costs. Therefore, the data in the manually constructed data labeling system The number of tags is small, and the upper and lower relationships between data tags are relatively simple, resulting in art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data label mining method and data label mining system
  • Data label mining method and data label mining system
  • Data label mining method and data label mining system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0059] The basic idea of ​​the present invention is: to count the number of occurrences of the basic data tags in the open data tags obtained from the Internet in advance and the pre-built basic data tag system, and to count the co-occurrence times of the open data tags and the basic data tags; The statistical number of occurrences and the number of co-occurrences obtain a confidence level, when the number of co-occurrences is greater than the preset first threshold or the confidence is greater than the preset second threshold or the confidence is greater than the average confidence and the standard deviation of the confidence and value, the open data label is added to the basic data label system as a feature label of the basic data label.

[0060] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0061...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a data label mining method, which comprises the following steps that: the number of occurrence times of open data labels obtained from Internet in advance and basic data labels in a pre-built basic data label system are counted; in addition, the number of co-occurrence times of the open data labels and the basic data labels are counted; the confidence degree is obtained by utilizing the counted number of occurrence times and the counted number of co-occurrence times; and when the number of co-occurrence times is greater than a preset first threshold or the confidence degree is greater than a preset second threshold, or the confidence degree is greater than a sum value of the average confidence degree and the confidence degree standard deviation, the open data labels are used as characteristic labels of the basic data labels to be added to a basic data label system. The invention also provides a data label mining system. According to the technical scheme provided by the invention, the hierarchical mining of the data labels can be realized, and the hyponymy accuracy of the mined data labels can be improved.

Description

【Technical field】 [0001] The invention relates to the field of Internet applications, in particular to a data tag mining method and system. 【Background technique】 [0002] At present, there are mainly the following two data label mining methods: [0003] The first labeling method is to manually construct a data label system, and achieve the effect of describing the data label system by formulating the upper and lower relationship of two data labels. For example, manually label the following data labels with the upper and lower relationship: [0004] Food [0005] food, restaurant [0006] food, restaurant, Chinese restaurant [0007] food, restaurant, western restaurant [0008] food, snacks [0009] food, bar [0010] Among them, each row of data labels describes the upper and lower relationships among a group of data labels. [0011] The second labeling method is to build a data labeling system on the Internet. Internet websites classify and index Internet pages. Fo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/9535
Inventor 林锡通
Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products