Short text automatic abstracting method and system based on double encoders
An automatic summarization and double-coding technology, applied in the field of information processing, can solve problems such as insufficient summarization precision and insufficient utilization of semantic information
Image
Examples
Embodiment
[0090] For verifying effect of the present invention, carry out experimental verification according to the step described above, experimental verification result is as follows Figure 4 shown.
[0091] Step 1: The news corpus data set provided by Sogou Labs, which contains a total of 679,978 news-headline data pairs from entertainment, culture, education, military, society, finance, etc. The preprocessing of the data set removes the text with a length less than 5, and replaces messy characters such as English, special characters, and emoticons; the data is divided into three levels according to the semantic similarity between the abstract and the original text to select high-quality experimental data pairs. 1 means least relevant and 3 is most relevant. The text-abstract semantic similarity is 1 in the interval (0,0.4), 2 in the interval [0.4,0.65), and 3 in the interval [0.65,1). In this paper, the semantic correlation algorithm formula is designed as follows:
[0092] ...
PUM
Login to View More Abstract
Description
Claims
Application Information
- IPC
- G06F17/27; G06F16/34; G06N3/04
- CPC
- G06F16/345; G06F40/30; G06N3/045
- Inventors
- δΈε»Ίη«; ζζ΄



