Text keyword automatic extraction method and device based on co-occurrence language network

An automatic extraction and network text technology, which is applied in the field of keyword extraction, can solve the problems of low frequency of ignorance in statistical methods and weak generalization ability of language analysis methods, and achieve the effect of overcoming weak generalization ability and realizing automatic extraction

Inactive Publication Date: 2020-09-18
SICHUAN JIUZHOU ELECTRIC GROUP
View PDF3 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0011] This method effectively solves the shortcomings of supervised machine learning that require a large amount of manually labeled data, overcomes the weak generalization ability of language analysis methods, and avoids the problem that statistical methods tend to ignore low-frequency but very important keywords. The method of presenting the language network model realizes the automatic extraction of network text keywords without relying on dictionaries and training samples

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text keyword automatic extraction method and device based on co-occurrence language network
  • Text keyword automatic extraction method and device based on co-occurrence language network
  • Text keyword automatic extraction method and device based on co-occurrence language network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0063] Before any embodiment of the invention is described in detail, it is to be understood that the invention is not limited in application to the details of construction shown in the following description or in the accompanying drawings. The invention is capable of other embodiments and of being practiced or of being carried out in various ways. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative improvements belong to the protection scope of the present invention.

[0064] Text keyword automatic extraction method based on co-occurrence language network, such as figure 1 shown, including the following steps:

[0065] S1: Preprocessing the web page:

[0066] In Focused Crawler, in order to improve the accuracy of judging the relevance of webpage content and topics, webpage text preprocessing needs to perform operations such as webpage cleaning, word segmentation, and feature extra...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text keyword automatic extraction method and device based on a co-occurrence language network. The defect that a large amount of manual annotation data is needed in supervised machine learning is overcome, the text keyword automatic extraction method and device overcome the defect of weak generalization ability of a language analysis method, avoid the problem that a statistical method is easy to ignore low-frequency and very important keywords. The method comprises the steps of preprocessing a webpage, constructing a language network graph model, jointly extracting candidate keyword features, comprehensively sorting the candidate keyword features and outputting the keywords. According to the method, web text preprocessing, co-occurrence language network model construction, keyword feature joint extraction and candidate keyword sorting optimization are carried out, so that the extracted keywords have good readability, coherence and correlation, and can be widely applied to the fields of natural language processing, information retrieval, text mining, sentiment analysis, multi-mode human-computer interaction and the like.

Description

technical field [0001] The invention relates to the field of keyword extraction, in particular to a text keyword automatic extraction method and medium based on a co-occurrence language network. Background technique [0002] With the rapid development of network technology and the advent of the era of big data, a large amount of network text data has been generated in cyberspace. In the analysis of text big data, keyword extraction is a basic work and has important practical significance. Keyword is the smallest semantic unit. Keyword extraction refers to extracting words or phrases related to the subject of the text from a single or multiple texts, also known as keyword tagging. In the early days, because full-text search was not supported, in order to be able to use keywords to search papers, authors were required to manually set keywords in papers. The traditional way of manually labeling keywords has been unable to effectively deal with today's big text data. Therefore,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/289G06F40/242G06F40/216G06F40/211G06F40/253G06F16/35G06F16/951G06F16/901G06N20/00
CPCG06F40/289G06F40/242G06F40/216G06F40/211G06F40/253G06F16/355G06F16/951G06F16/9024G06N20/00
Inventor 刘斌王维赵火军聂常赟
Owner SICHUAN JIUZHOU ELECTRIC GROUP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products