Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Information processing method and device

An information processing method and text vector technology, which are applied in the field of information processing methods and devices, can solve problems such as vocabulary generation errors, reduce topic clustering speed and accuracy, do not consider the frequency of vocabulary occurrence, etc., so as to improve speed and accuracy. degree of effect

Inactive Publication Date: 2015-05-06
中国联合网络通信有限公司广东省分公司 +1
View PDF6 Cites 46 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In the existing technology, at least two words are manually extracted from the data source to form a vocabulary, and at least two words are selected as keywords according to the weight of each vocabulary in the vocabulary, and then directly compared according to the similarity between keywords Keywords are used for topic clustering. However, directly extracting at least two words from the data source to form a vocabulary without considering the frequency of the extracted words in the data source will lead to certain errors in the generation of the vocabulary. At the same time, Discrete points are not removed before topic clustering of keywords, which will reduce the speed and accuracy of topic clustering

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Information processing method and device
  • Information processing method and device
  • Information processing method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0024] see figure 1 , is a schematic flowchart of the first embodiment of an information processing method provided by the embodiment of the present invention. The method described in this embodiment includes the following steps:

[0025] S101. Parse a hypertext markup language HTML document set acquired in advance, and extract a text data set included in the HTML document set.

[0026] In some feasible implementation manners, a web crawler may be used to pre-obt...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses an information processing method and device. The method comprises the steps that an HTML document set which is obtained in advance is analyzed, and text data sets contained in the HTML document set are extracted; word segmentation is conducted on the text data sets, and a text segmentation table is obtained; word frequency analysis is conducted on all words in the text segmentation table, and a text vector space matrix is constructed; discrete point text vectors in the text vector space matrix are eliminated, and a text similarity matrix of all text vectors in the text vector space matrix without the discrete point text vectors is obtained; according to the text similarity matrix, topic cluster is conducted on the text data set. By means of the method, a word list can be accurately constructed, topic cluster is conducted after the discrete point text vectors are eliminated, the topic cluster speed is increased, and the topic cluster accuracy is improved.

Description

technical field [0001] The invention relates to the technical field of the Internet, in particular to an information processing method and device. Background technique [0002] With the rapid development of Internet technology, people obtain hotspot information more and more frequently through the Internet. How to extract the latest hot topics for people from the massive data on the Internet has become an important research topic. [0003] In the existing technology, at least two words are manually extracted from the data source to form a vocabulary, and at least two words are selected as keywords according to the weight of each vocabulary in the vocabulary, and then directly compared according to the similarity between keywords Keywords are used for topic clustering. However, directly extracting at least two words from the data source to form a vocabulary without considering the frequency of the extracted words in the data source will lead to certain errors in the generatio...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F17/27
Inventor 李慧苏茂金成旭强刘卉芳王保华万源沅刘辉蒙小辉林振华彭宇山郭伟
Owner 中国联合网络通信有限公司广东省分公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products