Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Domain term obtaining method and system

An acquisition method and acquisition system technology, which are applied in the field of Internet information processing and can solve the problems of limited number of domain words and no models.

Active Publication Date: 2012-08-22
TENCENT TECH (SHENZHEN) CO LTD
View PDF3 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this acquisition method is manually sorted out, and the number of domain words acquired is limited. For example, car brands often only have the main brand, such as "Audi", without more detailed models, such as "Audi A6", and are greatly affected by the HTML format of the website. , if the site is revised and the HTML format changes, you need to re-modify the acquisition method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Domain term obtaining method and system
  • Domain term obtaining method and system
  • Domain term obtaining method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0062] The technical solution will be described in detail below in conjunction with specific embodiments and accompanying drawings.

[0063] like figure 1 Shown, in one embodiment, a kind of field word acquisition method comprises the following steps:

[0064] Step S10, acquiring network data, and classifying the network data by field.

[0065] If the network data is obtained, if the network data is not classified by field, it needs to be automatically classified, such as sports, technology, home appliances, entertainment and so on. In this embodiment, the system adopts the Naive Bayesian ( Bayes) algorithm for automatic classification. The Naive Bayesian algorithm is an algorithm that uses the knowledge of probability and statistics for classification. like figure 2 As shown, the specific steps of using this algorithm for domain classification include:

[0066] In step S100, a domain word probability table of each domain is established, and a priori probability of eac...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a domain term obtaining method and system. The domain term obtaining method comprises the following steps of: obtaining network data, and carrying out domain classification on the network data; extracting a first candidate domain term from the network data subjected to domain classification; processing semantic integrity of the first candidate domain term to obtain a second candidate domain term; and calculating a domain correlation of the second candidate domain term, and comparing the domain correlation of the second candidate domain term with a correlation threshold to obtain a domain term. According to the domain term obtaining method and system, the first candidate domain term is extracted from the network data, and is subjected to the semantic integrity to obtain the second candidate domain term, the correlation of the second candidate domain term is calculated and is compared with the correlation threshold to obtain the domain term, thus a large quantity of domain terms can be more accurately obtained.

Description

【Technical field】 [0001] The invention relates to the field of Internet information processing, in particular to a method and system for acquiring field words. 【Background technique】 [0002] Domain words refer to terms that often appear in some specific fields but rarely appear in other irrelevant fields. For example, "FAW-Volkswagen" is a word in the field of automobiles, "low carbon" is a word in the field of environmental protection, and "Haier Refrigerator" is a word in the field of technology and home appliances. [0003] With the rapid development of the Internet, the problem of information overload has become increasingly prominent. The quality of Internet information is uneven, and a lot of high-quality information that is beneficial to users is mixed in various spam information. How to more effectively and accurately extract effective information from massive Internet data has become a top priority for network information query. Various traditional vertical sear...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 刘怀军赵琳
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products