Identification method for webpage information related region

A web page information and identification method technology, which is applied in the fields of information, Internet information monitoring, mobile search, and information early warning, can solve the problems of low accuracy and low accuracy of related geographical identification of web page information, and achieves improved accuracy and granularity. stretch effect

Active Publication Date: 2014-06-11
COMP NETWORK INFORMATION CENT CHINESE ACADEMY OF SCI
View PDF3 Cites 28 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although these methods can be used to identify geographical areas related to web page information to a certain extent, due to the following situations: the same geographical names in different levels, the same noun may mean different meanings (such as geographical names or personal names), information descriptions often exist Relative location narrative methods (such as the south of Beijing), there are many references in the information, multiple different place names may be involved in the same information (especially place names of different categories), and abbreviations and non-standard languages ​​in the information, etc. At the same time, due to the relatively low accuracy of current natural language processing work, the accuracy of relevant region identification of web page information is often relatively low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Identification method for webpage information related region
  • Identification method for webpage information related region
  • Identification method for webpage information related region

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] The specific embodiment of the present invention is as figure 1 shown. Each step is described in detail below.

[0036] 1. Establish regional information ontology

[0037]Considering the characteristics of food safety incidents and the needs of subsequent event information extraction and tracking analysis, in the process of constructing food safety incident regional information ontology, it is mainly carried out in accordance with the standardized administrative divisions. For example, the region can be divided into five categories in general, namely Asia, Europe, Africa, America, and Oceania; each category can be subdivided again, for example, Asia can be divided into East Asia, West Asia, and South Asia , North Asia, Central Asia, and Southeast Asia; and so on, until the classification can no longer be divided, it is a bottom-level element (that is, an instance). In addition, for each instance in the ontology, telephone area codes, zip codes, abbreviations, places ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an identification method for a webpage information related region. The identification method for the webpage information related region includes: 1) establishing a regional information body; 2) extracting metadata information and text of crawled webpage information and dividing the extracted information title and text into words; 3) analyzing geographical name pronouns which express places in the words, judging whether there are reference relationships between the geographical name pronouns and the previous geographical terms, if so, replacing the geographical name pronouns with the corresponding geographical terms; 4) analyzing non-standard geographical name words in the words, and replacing the non-standard words with standard words; 5) analyzing the regional information of the relative position based on the regional information body to acquire an accurate geographical name word; 6) judging the analyzed webpage information based on the regional information body, and classifying the webpage information into a successfully matched region. The identification method for the webpage information related region greatly improves the identification accuracy for the webpage information related region.

Description

technical field [0001] The invention belongs to the field of information technology, and in particular relates to a method for judging and determining the region associated with information in a web page, which is mainly applied in the fields of Internet information monitoring, information early warning, mobile search and the like. Background technique [0002] In recent years, food safety incidents such as clenbuterol, dyed steamed buns, plasticizers, and poisonous cucumbers have occurred frequently, which have not only caused extremely bad social impacts, but also brought a lot of economic losses. In order to avoid or minimize the harm caused by these food safety incidents, event-based risk early warning technology has begun to receive great attention. For event-based risk early warning, it is necessary to discover information about these events in advance. [0003] With the rapid development of the Internet, the number of Internet users is increasing, and the Internet ha...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/951G06F40/30
Inventor 杨风雷黎建辉崔建业李晓东周园春归文胜汪海燕杨俊峰
Owner COMP NETWORK INFORMATION CENT CHINESE ACADEMY OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products