Dialogue short text clustering method based on form and semantic similarity

A semantic similarity and short text technology, applied in text database clustering/classification, unstructured text data retrieval, instrumentation, etc., can solve problems such as short text cannot be handled well, prominent, single topic, etc.
CN104008166AInactive Publication Date: 2014-08-27EAST CHINA NORMAL UNIV

Patent Information

Authority / Receiving Office
CN Β· China
Current Assignee / Owner
EAST CHINA NORMAL UNIV
Publication Date
2014-08-27
Estimated Expiration
Not applicable Β· inactive patent

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses a dialogue short text clustering method based on form and semantic similarity. The form similarity adopts character string editing distance similarity, and the semantic similarity is based on HowNet and WordNet knowledge bases; weight values of the short text and words are introduced during the calculation of the short text similarity. The dialogue short text clustering method based on the form and semantic similarity solves the problems of certain irregular and input wrong noise information, synonyms and semantic gaps included in the dialogue short text to a certain extent, and consequently, relatively great improvement is realized in comparison with a word bag vector based clustering method.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention belongs to the technical field of short text clustering, and relates to a method for clustering short texts of dialogues based on the similarity of string edit distance and the semantic similarity of words. Background technique

[0002] With the rapid development of mobile communication and mobile Internet, various human-machine intelligent dialogue systems have emerged, such as Siri, google now, Xiaoi robot, etc. Taking Xiaoi Robot as an example, the number of users has exceeded 100 million, and there are 10 billion dialogue visits every year and a large amount of valuable dialogue text data are generated. These data are important data sources for user interest mining and knowledge base improvement of intelligent dialogue systems. Clustering analysis on these dialogue text data can gather similar dialogue texts and form several important cluster centers, which can improve the efficiency of mining user interests and extracting knowledge t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More