Unlock instant, AI-driven research and patent intelligence for your innovation.

A Semantic Classification Method of Network Text Based on Baidu Encyclopedia

A Baidu Encyclopedia, network text technology, applied in the field of network text semantic classification, can solve problems such as a large amount of training data, inability to process, and inability to train data exhaustively.

Inactive Publication Date: 2015-11-11
HUAQIAO UNIVERSITY
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in fact, the phrases related to a certain category can be polysemous, diverse and infinite, and cannot be exhaustively exhausted through limited training data.
Taking the military category as an example, 'F35, J-9, J-10...' are all military-related phrases. Obviously, there are infinitely many such phrases. As a result, classification algorithms such as SVM and KNN require a large amount of training data and cannot handle Phrases that do not appear in the training data and emergent

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Semantic Classification Method of Network Text Based on Baidu Encyclopedia
  • A Semantic Classification Method of Network Text Based on Baidu Encyclopedia
  • A Semantic Classification Method of Network Text Based on Baidu Encyclopedia

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] Each open classification of Baidu Encyclopedia entries is a semantic topic. A meaningful Chinese text expresses the specific semantic theme to be expressed through certain phrases. It exists in the form of encyclopedia entries in Baidu Encyclopedia, which are referred to as entries below. By observing and analyzing the relationship between text, lexical entries and semantic topics, we have the following basic points of view:

[0044] Viewpoint 1. Entries are the extension of knowledge relations. The basic unit used to express content in Chinese natural language is the entry. Entries have the characteristics of polysemy, variety, and non-exhaustiveness. They are the extension of knowledge relations and are what the text wants to express. The external representation of meaning. Therefore, the traditional method of training and classifying in the form of statistical entries often requires a large amount of training data, and cannot deal with new and new words that do not...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a classification method of web text semantic based on Baidu Baike, comprising mapping a piece of text to a connotation-reflectable semantic theme space from a denotative entry collection by using the Baidu Baike, and calculating similarity between the text and a text and similarity between the text and a category according to a statistical regularity of the text semantic theme to complete classification of the text. The classification method of the invention avoids a statistical method of exhaustive entry, solving the difficult problem that traditional text classification algorithm needs a lot of training data and can not deal with network vocabularies and new vocabularies.

Description

technical field [0001] The invention relates to a classification method of network text semantics based on Baidu Encyclopedia. Background technique [0002] The network has entered the era of Web2.0. User-oriented, user-provided and shared resource network applications are developing rapidly. Massive new information appears every day. How to obtain the really needed content is a big problem. In order to effectively manage, filter and use these resources, content-based document management has gradually become a dominant technology in the field of information systems, which is called information retrieval (information retrieval, IR). Text classification is an important part of information retrieval technology, which refers to determining the category of text according to the content of natural language text under a predetermined set of categories. Processing these classified data often requires the application of text mining technology, involving text similarity calculation, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30G06F17/27
Inventor 陈叶旺
Owner HUAQIAO UNIVERSITY