Chinese news story segmentation method based on flexible semantic similarity measurement

A similarity measurement and news story technology, applied in natural language data processing, special data processing applications, instruments, etc., can solve the problems of inaccurate semantic relationship measurement, inaccurate segmentation results of Chinese news stories, etc., and achieve improved segmentation accuracy. Effect

Active Publication Date: 2014-05-14
北京宏博知微科技有限公司
View PDF1 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Obviously, this repetition-based rigid similarity measurement method ignores the potential semantic correlation between different wo

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese news story segmentation method based on flexible semantic similarity measurement
  • Chinese news story segmentation method based on flexible semantic similarity measurement
  • Chinese news story segmentation method based on flexible semantic similarity measurement

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] In order to make the object, technical solution and advantages of the present invention clearer, the implementation manner of the present invention will be further described in detail below in conjunction with the accompanying drawings.

[0038] The measurement of semantic similarity is a very challenging research topic in natural language processing. Existing methods mainly fall into two categories: supervised and unsupervised. Supervised methods mainly include WordNet [8][9] and DISCO. WordNet is used to measure the similarity between any two English words. WordNet relies on the well-marked corpus to divide the rankings, verbs, adjectives and adverbs into levels, and the division is based on the semantic definitions of these words by language experts. Due to its simplicity and effectiveness, WordNet has been widely used in natural language processing tasks. Similar to WordNet, DISCO is another commonly used supervised method for retrieving the similarity between a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Chinese news story segmentation method based on flexible semantic similarity measurement. The method includes the following steps that a target text set is input, and word segmentation is conducted on news storyboards Ti in the text set; a context relation diagram is built; iteration spreading is conducted on context relevancy between words through the context relation diagram and a rapid sorting algorithm to acquire a flexible semantic relevant matrix; flexible semantic similarity between sentences is defined through the flexible semantic relevant matrix; a Chinese news story is segmented through the flexible semantic similarity. By the adoption of the flexible measurement method, the semantic similarity between the words and between word sets can be more reasonably expressed. Experiments show that compared with a traditional similarity measurement method, in a Chinese news story segmentation technology and based on same segmentation principles, the flexible semantic similarity measurement method can improve segmentation accuracy to 3% to 10%.

Description

technical field [0001] The invention relates to the field of Chinese news story segmentation, in particular to a Chinese news story segmentation method based on flexible semantic similarity measurement. Background technique [0002] With the popularization and development of the network, for example: multimedia content such as broadcast news, meeting minutes, and online open courses is increasing rapidly, and now there is an urgent need for an effective method to automatically organize such multimedia data for topic-based text retrieval and analysis. A multimedia document, such as a one-hour broadcast news program, usually consists of multiple stories. In order to perform efficient semantic retrieval, it is very important to guide users to find the beginning and end of the topics they are interested in. At the same time, , a segmented multimedia document is subject tracking [1] , classification and summary [2] An important prerequisite for high-level semantic browsing. T...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F40/131G06F40/253
Inventor 冯伟万亮聂学成高晓妮党建武
Owner 北京宏博知微科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products