Method for automatically extracting mechanism entity noun from news page

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An automatic extraction and page technology, applied in the computer field, can solve problems such as low precision and efficiency, lack of entity nouns and news consultation relations, and achieve the effect of improving efficiency

Pending Publication Date: 2022-04-12

中译语通科技(成都)有限公司

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Manual data processing requires a lot of human resources, while the traditional machine learning method has relatively low accuracy and efficiency in extracting entity names and words, and lacks the relationship between entity names and news consultation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0020] Such as figure 1 , figure 2 , image 3 As shown, a method for automatically extracting institutional entity nouns from news pages:

[0021] (1) Extract and analyze the text content in the news page; use the request library and lxml library of Python3 to extract the text content of the HTML DOM structure of the news page or extract the common tags of the page, and then use the etree.HTML and csssector selectors to parse.

[0022] (2) Use the BloomFilter of Python3 to carry out de-duplication filtering process on the extracted text content, and delete spaces, titles and symbols to obtain the text content to be input;

[0023] (3) Input the text content to be input into the labeling model for sequence processing, and obtain the prediction labels of each word in the text content;

[0024] (4) compare each word prediction label with the existing entity noun database, and confirm the entity noun category corresponding to each word prediction label;

[0025] (5) Correspo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a method for automatically extracting institution entity nouns from a news page. The method comprises the steps that text content in the news page is extracted and analyzed; performing de-duplication filtering processing, and deleting the spaces, the titles and the symbols to obtain to-be-input text content; inputting the to-be-input text content into the labeling model for sequence processing to obtain each word prediction label in the text content; comparing each word prediction label with an existing entity noun database, and confirming an entity noun category corresponding to each word prediction label; the news pages correspond to the entity nouns and the entity noun corresponding categories and are stored. According to the method, entity nouns in news pages can be rapidly extracted, and news contents, names and categories are correspondingly marked to form structured data.

Description

technical field [0001] The invention belongs to the field of computers, and in particular relates to a method for automatically extracting institutional entity nouns from news pages. Background technique [0002] With the development of Internet technology, a large number of various news inquiries appear in daily life. The data processing of Internet news information has become a crucial task in various industries. Extracting entity names and their relationships from news consultation content can be used in abstracts or keywords to facilitate retrieval and screening of news information. Manual data processing requires a lot of human resources, while the traditional machine learning method has low accuracy and efficiency in extracting entity names and words, and lacks the relationship between entity names and news consultation. Contents of the invention [0003] The present invention provides a method for automatically extracting entity nouns from news pages, which can qu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/958G06F16/335G06F40/205G06F40/295G06N3/04G06N3/08

Inventor 夏朝高华伟

Owner 中译语通科技(成都)有限公司

Method for automatically extracting mechanism entity noun from news page

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology