Key new word finding method based on multidimensional word and sentence characteristics and sentiment analysis

A technology for sentiment analysis and new word discovery, which is applied in special data processing applications, natural language data processing, instruments, etc., can solve problems such as low accuracy rate and difficult threshold setting, so as to improve accuracy rate, increase recall rate, The effect of improving precision

Inactive Publication Date: 2018-07-10
CHINA JILIANG UNIV
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The unsupervised method needs to set the threshold of effective statistics. The statistics of word and sentence features mainly include the word itself, part of speech, left and right entropy, mutual information, and TF / IDF. This method has a wide range of applications, but the threshold is difficult to set and accurate. low rate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Key new word finding method based on multidimensional word and sentence characteristics and sentiment analysis
  • Key new word finding method based on multidimensional word and sentence characteristics and sentiment analysis
  • Key new word finding method based on multidimensional word and sentence characteristics and sentiment analysis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] In order to make the object, technical solution and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with specific embodiments and with reference to the accompanying drawings.

[0028] In the following description, many details are set forth in order to fully understand the present invention. However, the present invention can also be implemented in other ways different from the scope of this description. Therefore, the protection scope of the present invention is not limited by the following disclosure. Specific embodiments are limited.

[0029] figure 1 It shows a technical roadmap of a method for discovering new key words based on multi-dimensional sentence features and sentiment analysis in the present invention.

[0030] Step 1, use web crawler technology to grab the comment text of a certain type of commodity from the e-commerce platform. The crawler technology starts from the URL of a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a key new word finding method based on multidimensional word and sentence characteristics and sentiment analysis. The key new word finding method comprises the following steps:step 1, grasping needed comments from an e-commerce platform; step 2, pre-processing the comments; step 3, carrying out word segmentation on the comments by adopting an NLPIR word segmentation tool; step 4, constructing and filtering repeated strings of a text subjected to the word segmentation by utilizing the multidimensional word and sentence characteristics; step 5, manually screening accuratenew words of a training sample; step 6, counting digital combinations and new class combinations of the new words; step 7, adjusting a filtering threshold value in step 4 according to a counting result and adding the new class combination and filtering to obtain a repeated string set of the tested sample; step 8, adding a user dictionary for optimizing the word segmentation; step 9, carrying outdependence grammar analysis on a word segmentation result by utilizing LTP (Long Term Potentiation); step 10, carrying out sentiment marking on governing words in a dependency relation by utilizing aCRF++ tool; step 11, taking the repeated strings which are sentiment words or the repeated strings with the governing words which are the sentiment words as the key new words. The key new word findingmethod provided by the invention is used for carrying out new word mining on a lot of comment texts of certain e-commerce products to obtain the new words which play a key role in the comments, so that the word segmentation accuracy is improved and a good foundation is provided for the mining and analysis work of the texts.

Description

technical field [0001] The invention relates to the field of electronic commerce, and specifically designs a method for discovering new key words based on multi-dimensional sentence features and sentiment analysis. Background technique [0002] Word segmentation technology is a very important research content in the field of Chinese natural language processing. New word discovery is an important part of optimizing word segmentation. Because in Chinese information processing, unlike English and other western languages, there are fixed separators between words, so word segmentation is usually a necessary step at the beginning of Chinese information processing tasks. Words that are not included in the word segmentation tool dictionary encountered in the word segmentation task (that is, unregistered words, the new words referred to in this article are unregistered words) will significantly affect the performance of word segmentation. Work has important meanings. However, in re...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/374G06F16/9535G06F16/955G06F40/211G06F40/289
Inventor 徐新胜俞飞
Owner CHINA JILIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products