Unlock instant, AI-driven research and patent intelligence for your innovation.

Text corpus processing method and device, storage medium and electronic equipment

A corpus and text technology, applied in the field of devices, text corpus processing methods, storage media and electronic equipment, can solve the problems of slow speed, poor effect, and low efficiency of text corpus deduplication operation, so as to improve deduplication speed and deduplication Accuracy, the effect of avoiding stress

Pending Publication Date: 2022-04-12
TENCENT TECH (SHENZHEN) CO LTD
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In related technologies, the accuracy and speed of extracting text information from text corpus are low and slow, resulting in low efficiency and poor effect of deduplication operation on text corpus, and it is difficult to apply to deduplication scenarios of massive text corpus

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text corpus processing method and device, storage medium and electronic equipment
  • Text corpus processing method and device, storage medium and electronic equipment
  • Text corpus processing method and device, storage medium and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] The technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only part of the embodiments of the present application, not all of them. . Based on the embodiments in the embodiments of the present application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the embodiments of the present application.

[0031] It should be noted that the terms "first" and "second" in the description and claims of the embodiments of the present application and the above drawings are used to distinguish similar objects, and not necessarily used to describe a specific order or sequence order. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Embodiments of the invention disclose a text corpus processing method and apparatus, a storage medium and an electronic device. The method comprises the steps of obtaining a text corpus; performing word segmentation processing on the text corpus to obtain a word sequence corresponding to the text corpus; information extraction processing is conducted on the word sequence, feature information and weight information corresponding to each word in the word sequence are obtained, and the weight information is determined according to the semantic importance degree and the position importance degree of the words in the word sequence; performing Hash mapping on the feature information corresponding to each word to obtain coded information corresponding to each word; according to the coding information corresponding to each word and the corresponding weight information, obtaining weighted coding information corresponding to each word; and carrying out fusion operation on the coding information corresponding to each word to obtain text information corresponding to the text corpus. According to the method and the device, the text corpus deduplication speed and deduplication accuracy can be improved.

Description

technical field [0001] The embodiments of the present application relate to the technical field of artificial intelligence, and in particular, to a text corpus processing method, device, storage medium, and electronic equipment. Background technique [0002] With the development of computer technology, applications relying on the analysis of text information have become more and more popular. For example, applications such as advertisement recommendation, news promotion, and sharing of various media content all rely on the analysis of text information. In order to reduce the pressure of text information analysis, it is necessary to perform deduplication operations on massive text corpora. In related technologies, the accuracy and speed of extracting text information from text corpus are low, resulting in low efficiency and poor effect of deduplication operation on text corpus, and it is also difficult to apply to deduplication scenarios of massive text corpus. Contents of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/33G06F16/31G06F40/126G06F40/194G06F40/258G06F40/279G06F40/30G06N3/04G06N3/08
Inventor 石志林
Owner TENCENT TECH (SHENZHEN) CO LTD