Unlock instant, AI-driven research and patent intelligence for your innovation.

Mixed-topic-based text marking method and system

A theme and text technology, applied in the field of text labeling methods and systems based on mixed themes, can solve problems such as imperfection, large granularity, and huge manual editing work, and achieve low-cost effects

Active Publication Date: 2014-04-02
NEUSOFT CORP
View PDF2 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] However, neither of these two ideas is perfect. Although the former can produce semantic annotations that are easy to understand and conform to human intuition, it requires a huge amount of manual editing work, which is simply impossible in many environments; Potential topics can be automatically learned through machine learning methods, but the meaning of the learned topics is often difficult to interpret, and its granularity is often too large, and there is a lack of methods to control the granularity

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mixed-topic-based text marking method and system
  • Mixed-topic-based text marking method and system
  • Mixed-topic-based text marking method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0064] In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more embodiments. It may be evident, however, that these embodiments may be practiced without these specific details.

[0065] In order to solve the aforementioned problems, a combination method that can include both methods is needed. This combination should meet the following requirements: first, the domain ontology and resource text can be processed at the same time; second, the algorithm is simple, and it is best to use the traditional algorithm in a black box; third, the system should have good scalability sex.

[0066] There may be many ways to combine domain ontology and topic analysis, one of which is easy to imagine: use some concept extraction method to extract concept text from enterprise resource text, each concept text contains A detailed specification of concepts; then, these generated concept texts are...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a mixed-topic-based text marking method and a mixed-topic-based text marking system. The method comprises the following steps of learning about an acquired concept text by adopting an LDA (latent dirichlet allocation) algorithm, namely setting a first target explicit topic, and learning about the first target explicit topic to obtain the probability distribution of first target explicit topic-words; learning about an acquired resource text by adopting the LDA algorithm, namely setting a second target explicit topic and a target implicit topic, and learning about an initialization result of the second target explicit topic and the target implicit topic to obtain the probability distribution of second target explicit topic-words and the probability distribution of target implicit topic-words; marking a text to be marked according to the probability distribution of the second target explicit topic-words and the probability distribution of the target implicit topic-words. According to the method and the system, the explicit topics and the implicit topic are mixed, so that the problem of expandability of a field body can be solved in a low-cost way, and the text marking quality can be improved.

Description

technical field [0001] The present invention relates to the technical field of text annotation, and more specifically, to a method and system for text annotation based on mixed topics. Background technique [0002] With the promotion of the mobile Internet and social networks, a large amount of User Generated Content (UGC) has been generated. However, because people often use different words and expressions to express similar content, traditional search engines are widely used The method of managing UGC content based on word-based inverted index cannot reveal the inherent relevance of UGC, and cannot effectively maintain, retrieve and recommend these texts. Therefore, it is very necessary to understand the meaning of texts at the semantic level. [0003] UGC can be deeply understood using Natural Language Processing (NLP) technology, but due to the complexity of human natural language, precise understanding is impossible and often unnecessary. In fact, If the text can be se...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/30
Inventor 王勇赵立军
Owner NEUSOFT CORP