Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Web text-based acquiring and screening method of seismic macroscopic anomaly information

A technology of abnormal information and screening method, applied in the field of text data mining, which can solve problems such as no general method proposed by no one, implementation difficulties, etc.

Active Publication Date: 2015-06-03
CHINA AGRI UNIV
View PDF3 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, no one has proposed a general method for the application of topic-based classifiers, and it is difficult to implement

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Web text-based acquiring and screening method of seismic macroscopic anomaly information
  • Web text-based acquiring and screening method of seismic macroscopic anomaly information
  • Web text-based acquiring and screening method of seismic macroscopic anomaly information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0118]下面结合附图和实施例,对本发明的具体实施方式作进一步详细描述。以下实施例用于说明本发明,但不用来限制本发明的范围。

[0119]本发明提供一种基于网络文本的地震宏观异常信息获取与筛选方法,用于抓取地震主题网络文本并筛选出其中的地震宏观异常相关信息。

[0120]如图1为基于网络文本的地震宏观异常信息获取与筛选方法的流程图。具体实现步骤如下:

[0121]步骤1,信息获取。

[0122](1)相关性判别

[0123]相关性判别是主题信息获取的第一个阶段,是主题信息获取的第一个阶段,主要工作是判断当前网络文本的主体相关性。页面内容主题相关性计算方法流程图如图2所示。对于贴吧的帖子列表页面和微博的关键词搜索页面,不需计算该页面的主题相关性。余弦值的阈值设定为一般网页0.1,贴吧0.3,微博0.1。

[0124](2)链接排序

[0125]链接排序是主题信息获取的第二步,主要工作是确定主题爬虫的优先性爬取策略。图3是页面内URL链接排序的实现流程,这是体现主题爬虫主题优先性抓取策略的地方。对于一般网页,计算余弦值时需要加入页面的余弦值作为上下文相关性,贴吧和微博页面不需要考虑这点。

[0126](3)信息抽取

[0127]信息抽取是主体信息获取的第三步,主要工作是从主题相关的网络文本页面中定位并抽取出具体的地震宏观异常信息。图4是信息抽取算法流程,其中贴吧和微博结构固定,利用正则表达式可以很方便的提取。

[0128]经过上述步骤,本发明实现了从网络文本获取地震宏观异常相关信息,能够使用主题相关判别和优先策略实现网络信息爬取。该方法能够针对一般网页、论坛(百度贴吧)和社交网络(新浪微博)进行地震宏观异常主题信息提取。

[0129]步骤2,信息筛选。

[0130](1)主观句判别。

[0131]图5是判断主观句的实现流程,根据贝叶斯公式计算似然指数,似然指数大于1时,认为此句属于主观句。

[0132](2)文本主观性判别。

[0133]图6为判断文本主观性的实现过程,主观性判别的阈值为0.5。

[0134](3)地震宏观异常匹配。

[0135]图7为地震宏观异常匹配方法流程。从主题相关并根据主观性进行过滤后的网络文本中进行事物主体词和行为词的匹配进而得出地震宏观异常信息。

[0136]本实施例基于Heritrix框架,应用地震宏观异常主题描述词组,分别针对一般网页、贴吧和社交网络三种信息来源定制...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the field of text data mining and provides a web text-based acquiring and screening method of seismic macroscopic anomaly information, applied to collecting and screening seismic macroscopic anomaly text information from the internet. According to the method, based on a Heritrix frame, by the use of a seismic macroscopic anomaly subject descriptor group, a crawling strategy from seismic macroscopic anomaly subject relevancy judging and link ordering to information extraction is customized for three information sources, namely common webs, post bars and social networks, and subject related webs crawled are subjected to information screening mainly from three aspects, namely subjective sentence judging, text subjectivity judging and seismic macroscopic anomaly matching. The method has the advantages that online collection of seismic macroscopic anomaly information is provided with a scientific, efficient and accurate technical means and information acquisition efficiency is greatly improved.

Description

technical field [0001] The invention belongs to the field of text data mining, and relates to a method for acquiring and screening seismic macroscopic abnormality information based on network texts, which is used for grabbing seismic subject network texts and filtering out seismic macroscopic abnormality related information. Background technique [0002] With today's increasingly abundant means of communication, the public often transmits the macroscopic earthquake anomalies they observe to the earthquake department through the Internet. Similarly, earthquake departments can also use information technology to collect macroscopic earthquake anomaly information on the Internet to enrich their own earthquake forecasting work. However, with the development of information technology and the improvement of people's reliance on the Internet, the information carried by the Internet has become more and more huge. How to obtain and screen out useful macroscopic seismic anomaly inform...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 李林方帅曹津张晓东赵明明王竹叶思菁姚晓闯朱德海
Owner CHINA AGRI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products