Sequence segmentation method and apparatus

A technology of sequence and degree of dispersion, applied in the field of data processing, can solve the problems of limited applicable situations, a lot of manual labor, and difficulty in obtaining manual manual annotation of sequences, so as to save labor and improve efficiency.

Active Publication Date: 2017-11-24
ADVANCED NEW TECH CO LTD
View PDF4 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This implementation is based on a large amount of manual labeling, which not only requires a lot of human labor, but also is difficult to obtain for many application scenarios and many types

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sequence segmentation method and apparatus
  • Sequence segmentation method and apparatus
  • Sequence segmentation method and apparatus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0017] In the embodiment of this application, the symbol is the basic unit of the sequence. For the application scenario, each symbol can be extracted from the sequence without any doubt. At the same time, for the purpose of segmenting the sequence in the application scenario, the symbol No further splitting is required. Sequence segmentation is to divide the sequence to be segmented into several subsequences, and each subsequence includes one or more symbols. All subsequences formed after segmentation are sequentially connected to form the sequence to be segmented. For example, for a user access behavior sequence, its subsequence is a session, and its symbol is an event; for a Chinese text sequence, its subsequence is a word, and its symbol is a character. Two or more symbols belonging to the same subsequence have a certain relationship, and the specific relationship varies with different application scenarios.

[0018] The neural network model can be used to predict the sy...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a sequence segmentation method, which is used for dividing a to-be-segmented sequence into sub-sequences comprising one or more symbols. The method comprises the steps of obtaining class label set probability distribution of at least one symbol in the adjacent symbols of the to-be-segmented sequence by utilizing a neural network; training the neural network by taking each symbol in a sample sequence as an input vector of each moment and taking the adjacent symbols of the input vector of the current moment as target class labels; according to a dispersity eigenvalue of the at least one symbol in the class label set probability distribution and a probability value of another symbol in the class label set probability distribution, determining boundary indexes of the adjacent symbols; and when the boundary index meets a predetermined boundary condition, performing sub-sequence segmentation between the adjacent symbols. Through the technical scheme, a large amount of manual work is reduced and the model training efficiency is improved; and the method is suitable for various application scenes.

Description

technical field [0001] The present application relates to the field of data processing, in particular to a method and device for segmenting sequences. Background technique [0002] In the era of digital information, as people more and more use the network to complete various life and work ideas, more and more data are deposited on the Internet. The value of data analysis is also becoming increasingly prominent. For example, by analyzing the behavior data of users using a certain software product, the design of the software product can be effectively improved; by analyzing the user's consumption data, the accuracy of advertising can be increased, and the market potential Future development direction and so on. [0003] In Internet data analysis, the segmentation of data sequences is an essential link. For example, when a user visits a website, his visit behavior is usually performed in a session (Session) established with the website, and the session is composed of a series...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06N3/08
CPCG06N3/08
Inventor 燕鹏举李龙飞
Owner ADVANCED NEW TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products