A semantic dictionary establishing method based on comment data

A technology of semantic dictionary and construction method, applied in electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of low efficiency and insufficient scale, and achieve the effect of high efficiency

Active Publication Date: 2017-02-15
北京众荟信息技术股份有限公司
View PDF3 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, the dictionary resources collected and sorted by ha

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A semantic dictionary establishing method based on comment data
  • A semantic dictionary establishing method based on comment data
  • A semantic dictionary establishing method based on comment data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] In order to make the above objects, features and advantages of the present invention more obvious and understandable, the present invention will be further described below through specific embodiments and accompanying drawings.

[0028] For the construction of the semantic dictionary, the present invention adopts a method based on bootstrapping. Bootstrapping, or self-expanding or bootstrapping, is a semi-supervised machine learning method that can be used to simultaneously extract semantic dictionaries and templates. The idea of ​​this approach is based on the observation that extracted templates can be used to extract new instances, which in turn can be used to extract new templates. The advantage of this approach is that no labeled training corpus is required and only a few seeds are required. Firstly, the initialized seed words are obtained through manual intervention, the templates are obtained by using the seed words, and then the seed words are obtained through ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a semantic dictionary establishing method based on comment data. The method comprises the steps of 1) establishing a seed semantic dictionary according a small amount of comment data; 2) performing word segmentation on the comment data; 3) judging the semantic classes of the comment data word by word and replacing the comment data with semantic class tags; 4) generating templates according to the name of each semantic class and specific words contained in each semantic class; 5) applying the templates to the semantic class tag-replaced comment data to extract semantic words of each semantic class; 6) grading the templates according to the importance, the generalization performance and the accuracy of the templates; 7) selecting the templates with the highest scores, calculating the scores of the extracted semantic words of each template, and selecting the semantic words with the highest scores to expand the semantic dictionary; 8) performing iteration on the steps 3)-7) and acquiring a final semantic dictionary and template library after the step is over. The method enables a large-scale semantic dictionary to be obtained in a short time and enables multiple semantic classes to be extracted at the same time.

Description

technical field [0001] The invention belongs to the technical fields of information technology and data mining, and in particular relates to a method for constructing a semantic dictionary based on review data. Background technique [0002] With the rapid development of e-commerce, comments on the Internet have gradually entered people's field of vision, gradually influenced users' choices, and then gradually deepened their influence on brands. Taking the hotel industry as an example, the hotel hopes to use technical means to obtain user comments and feedback, which can be used to guide the hotel's brand management and operation management, and improve the brand image and service quality. Users want to check the reviews of other users to clarify the advantages and disadvantages of the hotel, which can be used as an important reference for booking. Tripadvisor research shows that more than 85% of users attach great importance to the quality of hotel reputation, and nearly 90...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/374
Inventor 林小俊张猛暴筱
Owner 北京众荟信息技术股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products