Unlock instant, AI-driven research and patent intelligence for your innovation.

Text annotation method and system based on mixed topics

A topic and text technology, applied in the field of text annotation methods and systems based on mixed topics, can solve the problems of imperfection, lack of control granularity, large granularity, etc., and achieve low-cost effects

Active Publication Date: 2016-06-22
NEUSOFT CORP
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] However, neither of these two ideas is perfect. Although the former can produce semantic annotations that are easy to understand and conform to human intuition, it requires a huge amount of manual editing work, which is simply impossible in many environments; Potential topics can be automatically learned through machine learning methods, but the meaning of the learned topics is often difficult to interpret, and its granularity is often too large, and there is a lack of methods to control the granularity

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text annotation method and system based on mixed topics
  • Text annotation method and system based on mixed topics
  • Text annotation method and system based on mixed topics

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0064] In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more embodiments. It may be evident, however, that these embodiments may be practiced without these specific details.

[0065] In order to solve the aforementioned problems, a combination method that can include both methods is needed. This combination should meet the following requirements: first, the domain ontology and resource text can be processed at the same time; second, the algorithm is simple, and it is best to use the traditional algorithm in a black box; third, the system should have good scalability sex.

[0066] There may be many ways to combine domain ontology and topic analysis, one of which is easy to imagine: use some concept extraction method to extract concept text from enterprise resource text, each concept text contains A detailed specification of concepts; then, these generated concept texts are...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a mixed-topic-based text marking method and a mixed-topic-based text marking system. The method comprises the following steps of learning about an acquired concept text by adopting an LDA (latent dirichlet allocation) algorithm, namely setting a first target explicit topic, and learning about the first target explicit topic to obtain the probability distribution of first target explicit topic-words; learning about an acquired resource text by adopting the LDA algorithm, namely setting a second target explicit topic and a target implicit topic, and learning about an initialization result of the second target explicit topic and the target implicit topic to obtain the probability distribution of second target explicit topic-words and the probability distribution of target implicit topic-words; marking a text to be marked according to the probability distribution of the second target explicit topic-words and the probability distribution of the target implicit topic-words. According to the method and the system, the explicit topics and the implicit topic are mixed, so that the problem of expandability of a field body can be solved in a low-cost way, and the text marking quality can be improved.

Description

technical field [0001] The present invention relates to the technical field of text annotation, and more specifically, to a method and system for text annotation based on mixed topics. Background technique [0002] With the promotion of mobile Internet and social networks, a large amount of User Generated Content (UGC for short) has been produced, but because people tend to use different words and expressions to express similar content, the widely used in traditional search engines The method of managing UGC content based on the inverted index of words cannot reveal the inherent relevance of UGC, and cannot effectively maintain, retrieve and recommend these texts. Therefore, it is very necessary to understand the meaning of the text at the semantic level. [0003] UGC can be deeply understood using Natural Language Processing (NLP for short) technology, but due to the complexity of human natural language, precise understanding is impossible and often unnecessary. In fact, if...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/27G06F17/30
Inventor 王勇赵立军
Owner NEUSOFT CORP