Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

New keyword extraction technology

A keyword and technical technology, applied in the new field of keyword extraction, can solve the problem of low accuracy of keyword extraction methods, and achieve the effect of great use value and high accuracy

Inactive Publication Date: 2017-08-25
SICHUAN YONGLIAN INFORMATION TECH CO LTD
View PDF3 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Aiming at the problem of finding out some non-high-frequency words that contribute greatly to the topic from multi-theme documents as keywords, realizing the automatic extraction of topic words in documents, and the lack of high accuracy of commonly used keyword extraction methods, the present invention provides a A New Keyword Extraction Technology

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • New keyword extraction technology
  • New keyword extraction technology
  • New keyword extraction technology

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] In order to solve the problem of finding out some non-high-frequency words that contribute greatly to the topic as keywords from multi-theme documents, realize the automatic extraction of topic words in documents, and the problem of low accuracy of commonly used keyword extraction methods, combined with Figure 1-Figure 5 The present invention has been described in detail, and its specific implementation steps are as follows:

[0022] Step 1: Use Chinese word segmentation technology to perform word segmentation processing on the text. The specific word segmentation technology process is as follows:

[0023] Step 1.1: According to the "word segmentation dictionary", find the word in the sentence to be segmented that matches the dictionary, scan the Chinese character string to be segmented completely, search and match in the dictionary of the system, and mark the words in the dictionary when encountering them ; If there is no relevant match in the dictionary, simply split...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a new keyword extraction technology. According to the technology, vocabulary position weights and word class weights are determined according to a Chinese word segmentation preprocessing process, the relevancy between two vocabularies is calculated with reference to a core vocabulary with the highest text vocabulary contribution, a multi-subject network model is constructed, an objective function is constructed to extract connection words, a cross function is utilized to fuse the connection words into the multi-subject network model, a new model graph is obtained, and then an anteposition vocabulary, namely a text keyword is extracted. The technology is high in accuracy and has higher application value, the contributions of different vocabularies to text ideology can be precisely calculated, multi-subject performance is considered, different characteristics are distinguished, and a good theoretical basis is provided for subsequent text similarity analysis and text clustering.

Description

technical field [0001] The invention relates to the technical field of semantic network, in particular to a new keyword extraction technology. Background technique [0002] Keywords are a collection of words that describe the subject content of a text, and are a brief summary of the text. Users can quickly and roughly obtain the content of a document based on keywords. Therefore, document keywords can help users quickly find documents that users need or are related to from a large number of document collections. But except that academic papers contain keywords, a large number of documents do not have keywords, especially the numerous web pages on the Internet mentioned above. Manual extraction of keywords by language experts has a high accuracy rate, but manual extraction of massive document information is a cumbersome and infeasible method. Currently commonly used keyword extraction methods include word frequency-inverse document frequency method, information gain and oth...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/289
Inventor 金平艳
Owner SICHUAN YONGLIAN INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products