Text clustering method, question-answering system applying same and search engine applying same

A text clustering, text technology, applied in the field of pattern recognition of natural language, can solve problems such as language barriers

Inactive Publication Date: 2012-09-19
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF0 Cites 23 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] However, the clustering of texts in the prior art only consi...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text clustering method, question-answering system applying same and search engine applying same
  • Text clustering method, question-answering system applying same and search engine applying same
  • Text clustering method, question-answering system applying same and search engine applying same

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0055] In order to make the purpose, technical solution and advantages of the present invention clearer, the text classification method according to an embodiment of the present invention will be further described in detail below in conjunction with the accompanying drawings. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0056] The following will combine figure 2 The flowchart of the text clustering method according to a specific embodiment of the present invention is described in detail by taking texts in two languages, Chinese and English, as examples.

[0057] First, Chinese and English texts are clustered separately. The clustering can adopt a content-dependent clustering method or a content-independent clustering method.

[0058] Among them, the content-related clustering method uses a similarity function to describe the degree of similarity between texts accord...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a text clustering method, a question-answering system applying the same and a search engine applying the same. The method comprises the following steps of 1) clustering texts in various languages; 2) drawing character word vectors of clustered texts in the various languages; and 3) calculating similarity of character word vectors of the texts in different languages, and clustering all the texts. By using the method, the question-answering system and the search engine, the texts in the various languages can be clustered correctly.

Description

technical field [0001] The invention relates to the field of pattern recognition, more specifically, to a pattern recognition method for natural language. Background technique [0002] With the popularization of information network, the emergence of massive electronic text information urgently requires machines to automatically classify text. Automatic text classification can save a lot of manpower and material resources, and avoid many defects such as long cycle, high cost and low efficiency caused by manual classification. Automatic text classification is to automatically classify a large amount of text according to its content, so as to help people effectively process and organize text data. [0003] This demand is even stronger for search engines. People increasingly rely on search for knowledge and information. Facing hundreds of millions of web pages and information resources, the biggest problem that search engines will face is how to quickly and accurately provide...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 沈文竹吴甜柴春光吴华
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products