Method for extracting relationship among geographic entities contained in internet text

A geographic entity and Internet technology, which is applied in the field of extraction of geographic entity relations contained in Internet texts, can solve problems such as consuming a lot of manpower and material resources, complicated text, and inapplicability, and achieve the effects of reducing labor costs, enriching retrieval methods, and improving operating efficiency

Active Publication Date: 2017-09-19
INST OF GEOGRAPHICAL SCI & NATURAL RESOURCE RES CAS
View PDF3 Cites 23 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The pattern matching method requires in-depth analysis of the relational corpus and manual extraction of organizational relational patterns. Although this method has high accuracy, it requires a lot of manpower and material resources, and is not suitable for the extraction of large-scale relations in texts; The speed and accuracy have been improved, but this method requires a large-scale manual annotation corpus, the text involved in the open text is complicated, long text, short text, Internet terms, etc. increase the difficulty of corpus construction, and the limited relationship types defined manually It is difficult to adapt to the rapid growth and changes of text; the frequency statistics method requires frequent occurrence of words representing relationships, and it is difficult to apply to sparsely distributed geographical entity relationship instances

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for extracting relationship among geographic entities contained in internet text
  • Method for extracting relationship among geographic entities contained in internet text
  • Method for extracting relationship among geographic entities contained in internet text

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0037] Such as figure 1 , figure 2 , image 3 Shown, the present invention comprises the following steps:

[0038] 1. Data preprocessing: The steps of data preprocessing include: web page crawling, text extraction, sentence segmentation, Chinese word segmentation and part-of-speech tagging, geographic entity recognition, and context construction;

[0039] Web crawling: Obtain web texts containing geographic entities, extract the spatial or semantic relationship between two geographic entities from the web text; use the elements in the existing database of place names as geographic entities, and use geographic entities as keywords to search one by one Query relevant HTML webpages in search engines, and crawl HTML webpage content as the original corpus for geographic entity relationship extraction;

[0040] Text extraction: Find th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for extracting a relationship among geographic entities contained in an internet text. The method comprises the following steps of data preprocessing, document vectorization, weight calculation, keyword extraction and relational tuple construction. The method specifically comprises the steps of inputting the network text containing the geometric entities, extracting the spatial relationship or the semantic relationship among the geometric entities through data preprocessing, and obtaining a webpage pure text and candidate keywords; performing vectorization on the text by adopting a word-level vector space model, and establishing a word-context matrix; designing a novel weight calculation method for performing weight calculation on the geometric entities; and selecting the word with the maximum weight as a keyword from a context vector, constructing a relational tuple, and finally finishing geometric entity extraction. According to the method, a semantic-based retrieval mode is provided, so that a conventional search technology depending on the keyword is changed; and on the premise of lack of large-scale tagged corpora and a geometric knowledge library, geometric relationship description words can be quickly extracted, so that the operation efficiency is improved and the labor cost is greatly reduced.

Description

technical field [0001] The invention relates to a method for extracting Internet texts, in particular to an extraction method for Internet texts containing geographic entity relations. Background technique [0002] The core of entity relationship extraction research is to automatically extract the connection between named entities from Internet text data to form a mesh relationship network, which is convenient for users to query all aspects of entity information. For example, "Ma Yun, chairman of the board of directors of Alibaba Group in mainland China", can extract the entity "Alibaba" and "Ma Yun" as an employment relationship. Geographic entity relationship extraction research is a subset of entity relationship research, the purpose of which is to extract the relationship between geographic entities from Internet texts. For example: from "the remote sensing institute is located in the north of the geographic institute", the entities "geographical institute" and "remote ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/288G06F16/9537
Inventor 陆锋余丽张恒才彭澎仇培元牟乃夏
Owner INST OF GEOGRAPHICAL SCI & NATURAL RESOURCE RES CAS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products