Time window and semantic meaning-based word variant normalization method and system

A time window and variant technology, applied in the field of social network data analysis, can solve problems such as unpredictable associations, and achieve the effect of improving accuracy and reducing scale

Active Publication Date: 2017-11-03
INST OF INFORMATION ENG CHINESE ACAD OF SCI
View PDF9 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

An important problem in the real network test set is that the known community structure is obtained based on people's observation and experience, while community discovery algorithms generally start from the topological structure, and it is impossible to predict how much the relationship between the two is.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Time window and semantic meaning-based word variant normalization method and system
  • Time window and semantic meaning-based word variant normalization method and system
  • Time window and semantic meaning-based word variant normalization method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0082] Variants normalization framework of the present invention such as Image 6 As shown, the specific steps are as follows:

[0083] (1) Discovery of social network candidate words. Specifically, it can be divided into two steps:

[0084] The module structure of the candidate word extraction module is as follows: Figure 4 As shown, this experimental scheme can make up for the shortcomings of too large or too small candidate word sets analyzed above.

[0085] The experimental steps are as follows:

[0086] 1) Division of corpus

[0087] a) Divided by time, based on the assumption of time-space distribution, within 7 days before the appearance of variant words, according to the time of each microblog in the corpus, a candidate corpus set D1 is divided.

[0088] b) According to the semantic division, based on the assumption of semantic similarity, add the microblogs in the candidate corpus set D1 that are semantically similar to the microblogs that appear...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a time window and semantic meaning-based word variant normalization method and system. The method comprises the following steps of: 1) selecting corpuses in a set time period before an appearance time of a given variant in a social network according to the appearance time so as to form a candidate corpus set D1; 2) adding corpuses similar to the corpus where the word variant is located in the aspect of semantic meaning in the candidate corpus set D1 into a candidate corpus set D2; 3) extracting candidate words from the set D2 to obtain a candidate word set; and 4) calculating a score of each pair of candidate word and variant according to a literal similarity and a context feature similarity between each candidate word and the word variant, determining a candidate word corresponding to the word variant according to the calculation result, and taking the determined candidate word as a normalized word of the word variant. The system comprises an acquisition module, a filter module, and obtaining module and a normalized word obtaining module. According to the method and system, the texts of social networks become more normalized, and the public opinion analysis and hotspot time tracing are convenient.

Description

technical field [0001] The invention relates to the field of social network data analysis, and is a method and system for standardizing variant words based on time windows and semantics to realize more targeted and accurate normalization of variant words in social networks. Background technique [0002] With the rapid development of social networks, hundreds of millions of information are posted on social networking platforms every day, bringing about an explosive growth of information. Information comes in various forms, including text, pictures, audio, video, etc. Among them, the text in the social network has the characteristics of randomness and informality. Variant words are a distinctive feature of Internet language as an irregular language. People often replace relatively serious, standardized, and sensitive words with relatively irregular and insensitive words in order to avoid censorship, express emotion, satire, and entertainment. A new word used to replace the o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/295G06F40/30
Inventor 沙灜施振辉李锐梁棋邱咏钦王斌
Owner INST OF INFORMATION ENG CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products