Unlock instant, AI-driven research and patent intelligence for your innovation.

Problem duplicate removal method and device, electronic equipment and computer readable storage medium

A technology for problems and standard problems, applied in computing, electrical digital data processing, special data processing applications, etc., can solve problems such as poor classification effect, random number of clusters, etc., and achieve high accuracy

Active Publication Date: 2019-08-16
TENCENT TECH (SHENZHEN) CO LTD +2
View PDF9 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of this application is to solve at least one of the above-mentioned technical defects, especially the technical defect that the randomness of the number of clusters leads to poor classification of problems

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Problem duplicate removal method and device, electronic equipment and computer readable storage medium
  • Problem duplicate removal method and device, electronic equipment and computer readable storage medium
  • Problem duplicate removal method and device, electronic equipment and computer readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0035] The embodiment of this application provides a problem deduplication method, such as figure 1 As shown, the method includes:

[0036] S101. Perform word segmentation operations on multiple question corpora to obtain multiple question vocabulary corresponding to each question corpus, and calculate the word frequency-inverse text frequency of each question vocabulary based on the first quantity of basic question corpus;

[0037] There are a large number of forums or platforms on the Internet, such as China Agricultural Technology Promotion Information Service Platform, Zhihu, etc. User 1 posts a question on the forum or platform, and user 2 can post the corresponding answer on the corresponding forum or platform. Therefore, a forum or platform corresponds to a large number of questions. The first quantity of basic question corpus may refer to all / part of the questions on one forum / platform, or may refer to all / part of the questions on multiple forums / platforms. Multiple ...

Embodiment 2

[0046] The embodiment of the present application provides another possible implementation manner. On the basis of the first embodiment, the method shown in the second embodiment is also included, wherein S102 specifically includes:

[0047] Step A: For any two question corpora, based on the word frequency-inverse text frequency of multiple question words in each question corpus in any two question corpora, establish two question vectors corresponding to any two question corpora one-to-one, and Calculate the similarity between two question vectors;

[0048] Step B: If the similarity is greater than the preset first threshold, classify any two question corpora into the same question category; if the similarity is not greater than the preset first threshold, then classify any two question corpora into two question categories. question categories;

[0049] Repeat step A and step B until multiple question corpora are classified into corresponding question categories.

[0050] For...

Embodiment 3

[0097] The embodiment of this application provides a problem deduplication device, such as figure 2 As shown, the problem deduplication device 20 may include: a word segmentation calculation module 201, a classification module 202, and a determination module 203, wherein,

[0098] The word segmentation calculation module 201 is used to perform a word segmentation operation on multiple question corpora to obtain a plurality of question vocabulary corresponding to each question corpus, and calculate the word frequency-inverse text frequency of each question vocabulary based on the first number of basic question corpora;

[0099] The classification module 202 is used to classify multiple question corpora based on word frequency-inverse text frequency of multiple question vocabulary corresponding to each question corpus to obtain multiple question categories;

[0100] The determination module 203 is configured to determine standard questions corresponding to each question categor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention provides a problem duplicate removal method and device, electronic equipment and a computer readable storage medium. The method comprises the steps of performing word segmentation operation on a plurality of question corpora to obtain a plurality of question words corresponding to each question corpus, and calculating the word frequency-inverse text frequency of each question vocabulary on the basis of the first number of basic question corpora; based on the word frequency-inverse text frequency of question vocabularies corresponding to each question corpus; classifying the plurality of question corpora according to the inverse text frequency to obtain a plurality of question categories, and determining a standard question corresponding to each question category based on at least one question corpus corresponding to each question category. According to the embodiment of the invention, the method achieves the calculation of the number of the question types, is high in accuracy, achieves the obtaining of a corresponding standard question based on one question type, and can achieve the effective deduplication of a large number of questions.

Description

technical field [0001] The present application relates to the field of Internet information technology, and in particular, the present invention relates to a method, device, electronic equipment and computer-readable storage medium for problem deduplication. Background technique [0002] Automatic question-answer system (Question-Answer System, QA system), also called chat robot system, is an intelligent chat system that relies on advanced Internet information technology and realizes communication between humans and machines with the help of communication tools. [0003] The current automatic question answering system is mainly realized based on retrieval. By obtaining a large number of questions and answers from a preset information service platform, after clustering and merging a large number of questions and answers, the results can be stored in the automatic question answering system. Questions and answers in . When the system receives the target question, it matches th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/332G06F16/35
CPCG06F16/3329G06F16/35
Inventor 王卓然亓超马宇驰陈华荣秦海龙郭伟
Owner TENCENT TECH (SHENZHEN) CO LTD