Topic content aggregation analysis method based on TF-IDF and domain dictionary

A technology of TF-IDF and analysis method, which is applied in the field of intelligence information processing, can solve problems such as complexity and aggregation information uncertainty, and achieve the effect of easy maintenance, easy expansion and maintenance, and guaranteed accuracy rate

Inactive Publication Date: 2019-08-09
AGRI INFORMATION INST OF CAS
View PDF5 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of the shortcomings of existing algorithms and the uncertainty and complexity of aggregated information, the purpose of this invention is to solve the problem of automatically and efficiently aggregate information resources of a specific topic, and propose a topic content aggregation analysis based on TF-IDF and domain dictionaries method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Topic content aggregation analysis method based on TF-IDF and domain dictionary
  • Topic content aggregation analysis method based on TF-IDF and domain dictionary
  • Topic content aggregation analysis method based on TF-IDF and domain dictionary

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0074] The present invention will be described in detail below, and the technical problems and beneficial effects solved by the technical solutions of the present invention are also described. It should be pointed out that the described examples are only intended to facilitate the understanding of the present invention, and do not have any limiting effect on it. .

[0075] The present invention proposes a subject content aggregation analysis method based on TF-IDF and a domain dictionary. See flow chart figure 1 .

[0076] Step S1: Obtain the specified domain and 1 initial demand word for topic aggregation, and specify the domain dictionary according to the initial demand word;

[0077] The domain dictionary is built on the basis of the scientific thesaurus of the designated domain, and the scientific thesaurus contains the descriptors, non-descriptors and semantic relations between words in the designated domain;

[0078] Acquisition of initial demand words: comprehensivel...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a topic content aggregation analysis method based on a TF-IDF and domain dictionary, and belongs to the field of information processing. An extended word set of the theme demand words is obtained based on the TF-IDF; retrieval of various types of resources in a resource pool is completed by taking the extended word set as a retrieval text and following a retrieval strategybased on a domain dictionary, resource deletion and supplementation are performed through auditing, and ordered arrangement and release are performed on each target resource type according to the timeliness, authority and relevancy of the resources. Compared with a traditional method, the method has the advantages that the check-in rate of resources in the theme is ensured by expanding the word set, and the check-in rate of the resources in the theme is ensured by a domain dictionary-based retrieval strategy and a targeted resource sorting model of each type; and the quality of resources in the theme is ensured through an auditing principle. The method is easy to expand and maintain, and meanwhile the theme resource content aggregation time and labor cost are greatly reduced.

Description

technical field [0001] The invention proposes a subject content aggregation analysis method based on TF-IDF and a domain dictionary, which belongs to the field of intelligence information processing. Background technique [0002] In the Internet age, the publication and distribution of academic resources has gradually turned to digitization and virtualization, the speed of knowledge circulation has been continuously improved, and the production cycle of knowledge achievements has been shortened. The number of papers published after 1950 has reached 400 times the total number of papers published before. In the era of knowledge explosion, the overload of resources has led to the submersion of scientific research knowledge discovery needs, and the problem of accurate discovery of academic resources for specific professional fields has become increasingly prominent. [0003] In order to help the scientific research team keep abreast of the development trend and the latest resear...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F16/335G06F16/35
CPCG06F16/3335G06F16/335G06F16/35
Inventor 赵瑞雪寇远涛张洁鲜国建仲跻亮
Owner AGRI INFORMATION INST OF CAS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products