Problem deduplication method, apparatus, electronic device, and computer-readable storage medium

A problem and standard problem technology, applied in computing, electrical digital data processing, instruments, etc., can solve problems such as poor classification effect and randomness in the number of clusters, and achieve high accuracy.

Active Publication Date: 2021-11-26
TENCENT TECH (SHENZHEN) CO LTD +2
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of this application is to solve at least one of the above-mentioned technical defects, especially the technical defect that the randomness of the number of clusters leads to poor classification of problems

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Problem deduplication method, apparatus, electronic device, and computer-readable storage medium
  • Problem deduplication method, apparatus, electronic device, and computer-readable storage medium
  • Problem deduplication method, apparatus, electronic device, and computer-readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0035] The embodiment of this application provides a problem deduplication method, such as figure 1 As shown, the method includes:

[0036] S101. Perform word segmentation operations on multiple question corpora to obtain multiple question vocabulary corresponding to each question corpus, and calculate the word frequency-inverse text frequency of each question vocabulary based on the first quantity of basic question corpus;

[0037] There are a large number of forums or platforms on the Internet, such as China Agricultural Technology Promotion Information Service Platform, Zhihu, etc. User 1 posts a question on the forum or platform, and user 2 can post the corresponding answer on the corresponding forum or platform. Therefore, a forum or platform corresponds to a large number of questions. The first quantity of basic question corpus may refer to all / part of the questions on one forum / platform, or may refer to all / part of the questions on multiple forums / platforms. Multiple ...

Embodiment 2

[0046] The embodiment of the present application provides another possible implementation manner. On the basis of the first embodiment, the method shown in the second embodiment is also included, wherein S102 specifically includes:

[0047] Step A: For any two question corpora, based on the word frequency-inverse text frequency of multiple question words in each question corpus in any two question corpora, establish two question vectors corresponding to any two question corpora one-to-one, and Calculate the similarity between two question vectors;

[0048] Step B: If the similarity is greater than the preset first threshold, classify any two question corpora into the same question category; if the similarity is not greater than the preset first threshold, then classify any two question corpora into two question categories. question categories;

[0049] Repeat step A and step B until multiple question corpora are classified into corresponding question categories.

[0050] For...

Embodiment 3

[0097] The embodiment of this application provides a problem deduplication device, such as figure 2 As shown, the problem deduplication device 20 may include: a word segmentation calculation module 201, a classification module 202, and a determination module 203, wherein,

[0098] The word segmentation calculation module 201 is used to perform a word segmentation operation on multiple question corpora to obtain a plurality of question vocabulary corresponding to each question corpus, and calculate the word frequency-inverse text frequency of each question vocabulary based on the first number of basic question corpora;

[0099] The classification module 202 is used to classify multiple question corpora based on word frequency-inverse text frequency of multiple question vocabulary corresponding to each question corpus to obtain multiple question categories;

[0100] The determination module 203 is configured to determine standard questions corresponding to each question categor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Embodiments of the present application provide a problem deduplication method, device, electronic equipment, and computer-readable storage medium. The method comprises: performing word segmentation operations on a plurality of problem corpora to obtain a plurality of problem vocabulary corresponding to each problem corpus, and calculating the word frequency-inverse text frequency of each problem vocabulary based on the first quantity of basic problem corpus, and based on each problem corpus respectively Word frequency-inverse text frequency of corresponding multiple question vocabulary, classify and process multiple question corpora to obtain multiple question categories, and determine standard questions corresponding to each question category based on at least one question corpus corresponding to each question category . The embodiment of the present application achieves the calculation of the number of question categories with high accuracy, further obtains a corresponding standard question based on a question category, and can effectively deduplicate a large number of questions.

Description

technical field [0001] The present application relates to the field of Internet information technology, and in particular, the present invention relates to a method, device, electronic equipment and computer-readable storage medium for problem deduplication. Background technique [0002] Automatic question-answer system (Question-Answer System, QA system), also called chat robot system, is an intelligent chat system that relies on advanced Internet information technology and realizes communication between humans and machines with the help of communication tools. [0003] The current automatic question answering system is mainly realized based on retrieval. By obtaining a large number of questions and answers from a preset information service platform, after clustering and merging a large number of questions and answers, the results can be stored in the automatic question answering system. Questions and answers in . When the system receives the target question, it matches th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/332G06F16/35
CPCG06F16/3329G06F16/35
Inventor 王卓然亓超马宇驰陈华荣秦海龙郭伟
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products