Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A document analysis method and device

An analysis method and document technology, applied in the information field, can solve problems such as low subject comprehensibility and affecting document analysis efficiency

Active Publication Date: 2020-07-28
ALIBABA GRP HLDG LTD
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The embodiment of the present application provides a document analysis method and device, which is used to solve the document analysis method in the prior art. The intelligibility of the topic determined by analyzing the document set or corpus is low, resulting in the need to analyze documents repeatedly, which affects the efficiency of document analysis. The problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A document analysis method and device
  • A document analysis method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] In the embodiment of the present application, several topics and several central words contained in each topic are preset, and then the probability of each central word belonging to each topic is determined, and then each non-central word belongs to each topic in the training document set randomly set by training. The initial probabilities and the initial probabilities of each training document belonging to each topic are used to obtain the final probability of each non-center word belonging to each topic and the final probability of each training document belonging to each topic. When the document set to be analyzed is received, for each word segment in the document set to be analyzed, according to the probability that each central word belongs to each topic, the final probability that each non-central word belongs to each topic, and each training document belongs to each topic , determine the probability that the word segment belongs to each topic, and finally determin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Embodiments of the invention provide a document analysis method and device. The method comprises the following steps of: presetting a plurality of themes and a plurality of head word included in each theme, and determining a probability of each head word belonging to each theme; randomly setting an initial probability of each non-head word in a training document set belonging to each theme and an initial probability of each training set belonging to each theme; carrying out training to obtain a final probability of each non-head word belonging to each theme and a final probability of each training document belonging to each theme; and when a to-be-analyzed document set is received, for each participle in the to-be-analyzed document set, determining a probability of the participle belonging to each theme according to the probability of each head word belonging to each theme, the final probability of each non-head word belonging to each theme and the final probability of each training document belonging to each theme, and determining a probability of each to-be-analyzed document belonging to each theme. Through above method, the intelligibility of document analysis results is strengthened and the document analysis efficiency is improved.

Description

technical field [0001] The present application relates to the field of information technology, in particular to a document analysis method and device. Background technique [0002] With the development of the information society, since it is possible to understand the themes contained in the documents by analyzing the documents, and to understand important and valuable information such as public behavior habits and public concerns based on these themes, how to determine the potential themes of a large number of documents , has become one of the technologies that people focus on. [0003] In the prior art, for the identification method of latent topic information in large-scale documents or corpus, that is, after document analysis is performed on the data corresponding to large-scale document collection or corpus, the document analysis of determining the topic information of each document or each language material The method is mainly realized by using Latent Dirichlet Alloc...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/216G06F40/284
CPCG06F40/216G06F40/284
Inventor 周扬蔡宁任望熊军何帝君张凯杨旭
Owner ALIBABA GRP HLDG LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products