Two-level text similarity calculation method based on subjective and objective semantics

A technology of semantic similarity and text similarity, applied in computing, special data processing applications, instruments, etc., can solve the problems that the accuracy or rationality needs to be improved, and the text expression dimension is high, and achieve the improvement of accuracy and rationality, saving Storage space, the effect of concise expression of words

Inactive Publication Date: 2014-03-26
NANJING UNIV OF POSTS & TELECOMM
View PDF3 Cites 23 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0016] The purpose of the present invention is to provide a two-level text similarity calculation method based on subjective and objective semantics, which is used to solve the problems that the dimensionality of text expression is high, and the accuracy or rationality of the text similarity calculation results needs to be improved

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Two-level text similarity calculation method based on subjective and objective semantics
  • Two-level text similarity calculation method based on subjective and objective semantics

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] For the convenience of description, we assume the following application example: Randomly extract several different categories of texts from Sogou news texts, and calculate the similarity between these texts.

[0052] The specific embodiment of the present invention is:

[0053] (1) Download a corpus containing tens of thousands of Sogou news from the Internet, build a text corpus, segment words and extract keywords, and build a text index;

[0054] (2) Divide each text to be calculated for similarity into two parts: topic information and text content information;

[0055] (3) Process the topic information of the text as a sentence, divide the topic sentence into words, and filter out adverbs, prepositions, pronouns, conjunctions, and words that are substrings of other words to obtain sentence-word vectors, which combine subjective and objective semantics of words The similarity calculation method is applied to the word vector similarity calculation to obtain the text ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A two-level text similarity calculation method based on subjective and objective semantics is characterized in that text is divided into a topic and a main body, a topic-word vector is built by filtering, a main body-word vector with low dimensionality is built by extracting keywords, a word semantic similarity calculation method achieving subjective and objective combination is used for calculating word vector similarity so as to obtain the topic similarity and the main body similarity respectively, and therefore the text similarity is obtained; the word semantic similarity is calculated on the basis of word-text indexes of HowNet and a corpus, so that words are expressed concisely, and calculation results accord with not only subjective concepts but also objective semantic environments; during calculation of the text similarity, equal importance is attached to the topic and the main body, the word semantic similarity calculation method achieving subjective and objective combination is used, a text-word vector with high dimensionality is avoided, text information is extracted fully, accuracy of text similarity results is improved, and the two-level text similarity calculation method is suitable for text similarity analysis under various circumstances.

Description

[0001] technical field [0002] The invention relates to the technical field of Chinese information processing, in particular to a method for calculating bi-level text similarity based on subjective and objective semantics. Background technique [0003] With the popularization of computers among individual users and the rapid development of Internet technology, the number of Internet users and websites has shown explosive growth, and the information on the Internet has also increased massively. Text is one of the important information carriers in the computer and Internet world. Text similarity calculation is the basis of text information processing methods such as text classification and text clustering, and it is of great significance to improve the effects of text classification and text clustering. Scholars at home and abroad have done a lot of research work in the field of text similarity calculation. The current mainstream similarity calculation methods are: [0004] ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
Inventor 成卫青吴旭东黄卫东范恒亮
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products