Method for automatically extracting mechanism entity noun from news page
An automatic extraction and page technology, applied in the computer field, can solve problems such as low precision and efficiency, lack of entity nouns and news consultation relations, and achieve the effect of improving efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0020] Such as figure 1 , figure 2 , image 3 As shown, a method for automatically extracting institutional entity nouns from news pages:
[0021] (1) Extract and analyze the text content in the news page; use the request library and lxml library of Python3 to extract the text content of the HTML DOM structure of the news page or extract the common tags of the page, and then use the etree.HTML and csssector selectors to parse.
[0022] (2) Use the BloomFilter of Python3 to carry out de-duplication filtering process on the extracted text content, and delete spaces, titles and symbols to obtain the text content to be input;
[0023] (3) Input the text content to be input into the labeling model for sequence processing, and obtain the prediction labels of each word in the text content;
[0024] (4) compare each word prediction label with the existing entity noun database, and confirm the entity noun category corresponding to each word prediction label;
[0025] (5) Correspo...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


