Domain word acquisition method and system

An acquisition method and acquisition system technology, applied in the field of Internet information processing, can solve the problems of no model and limited number of domain words, etc.

Active Publication Date: 2016-02-24
TENCENT TECH (SHENZHEN) CO LTD
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this acquisition method is manually sorted out, and the number of domain words acquired is limited. For example, car brands often only have the main brand, such as "Audi", without more detailed models, such as "Audi A6", and are greatly affected by the HTML format of the website. , if the site is revised and the HTML format changes, you need to re-modify the acquisition method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Domain word acquisition method and system
  • Domain word acquisition method and system
  • Domain word acquisition method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0062] The technical solution will be described in detail below in conjunction with specific embodiments and accompanying drawings.

[0063] Such as figure 1 Shown, in one embodiment, a kind of field word acquisition method comprises the following steps:

[0064] Step S10, acquiring network data, and classifying the network data by field.

[0065] If the network data is obtained, if the network data is not classified by field, it needs to be automatically classified, such as sports, technology, home appliances, entertainment and so on. In this embodiment, the system adopts the Naive Bayesian ( Bayes) algorithm for automatic classification. The Naive Bayesian algorithm is an algorithm that uses the knowledge of probability and statistics for classification. Such as figure 2 As shown, the specific steps of using this algorithm for domain classification include:

[0066] In step S100, a domain word probability table of each domain is established, and a priori probability ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a domain term obtaining method and system. The domain term obtaining method comprises the following steps of: obtaining network data, and carrying out domain classification on the network data; extracting a first candidate domain term from the network data subjected to domain classification; processing semantic integrity of the first candidate domain term to obtain a second candidate domain term; and calculating a domain correlation of the second candidate domain term, and comparing the domain correlation of the second candidate domain term with a correlation threshold to obtain a domain term. According to the domain term obtaining method and system, the first candidate domain term is extracted from the network data, and is subjected to the semantic integrity to obtain the second candidate domain term, the correlation of the second candidate domain term is calculated and is compared with the correlation threshold to obtain the domain term, thus a large quantity of domain terms can be more accurately obtained.

Description

【Technical field】 [0001] The invention relates to the field of Internet information processing, in particular to a method and system for acquiring field words. 【Background technique】 [0002] Domain words refer to terms that often appear in some specific fields but rarely appear in other irrelevant fields. For example, "FAW-Volkswagen" is a word in the field of automobiles, "low carbon" is a word in the field of environmental protection, and "Haier Refrigerator" is a word in the field of technology and home appliances. [0003] With the rapid development of the Internet, the problem of information overload has become increasingly prominent. The quality of Internet information is uneven, and a lot of high-quality information that is beneficial to users is mixed in various spam information. How to more effectively and accurately extract effective information from massive Internet data has become a top priority for network information query. Various traditional vertical sear...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 刘怀军赵琳
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products