Unlock instant, AI-driven research and patent intelligence for your innovation.

Feature extraction method and device

A feature extraction and feature word technology, applied in the field of feature extraction methods and devices, can solve the problems of poor address text mining effect, and achieve the effect of improving the mining effect

Active Publication Date: 2017-10-24
CAINIAO SMART LOGISTICS HLDG LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0012] To sum up, when address feature extraction is currently performed on address text, the words contained in the extracted feature word strings are all continuous in the address text, which may not contain highly distinctive feature word strings, resulting in the Text mining is less effective

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Feature extraction method and device
  • Feature extraction method and device
  • Feature extraction method and device

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0096] Such as Figure 3A and Figure 3B Shown are the schematic diagrams of the process of generating feature word strings in the case of s<n.

[0097] In this example, set n=6, s=3, and the word in the current position is the i-th word from the start word of the address text (that is, the current position is i), then the gram generation process is as follows:

[0098] 1. Select n (n=6) words continuously from the word at the current position and put them into the buff. At this time, the words in the buff are put together to form the first word string (corresponding to the above step B1);

[0099] 2. Select s words (corresponding to the above-mentioned step B2) continuously from the first word (ie corresponding position i+n) after the first word string as the second word string;

[0100] 3, according to the difference of the initial jump word position in the first word string, can be divided into two kinds of situations again: in the first word string, the word quantity q o...

example 2

[0105] Such as Figure 4 As shown, it is a schematic diagram of another generation process of gram in the case of s≥n provided by Example 2.

[0106] In this example, set n=4, s=5, the word of current position is the i-th word (that is, the current position is i) from the start word of address text, then the generation process of gram comprises as follows:

[0107] 1. Select n (n=4) words continuously from the word at the current position and put them into the buff. At this time, the words in the buff are put together to form the first word string (corresponding to the above step B1);

[0108] 2. Select s words (corresponding to the above-mentioned step B2) continuously from the first word (ie corresponding position i+n) after the first word string as the second word string;

[0109] 3. In the case of s≥n, no matter where the initial jump word in the first word string is, the word quantity q of the nth word from the start jump word to the first word string in the first word s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of data mining, and in particular to a feature extraction method and device. The feature extraction method comprises the following steps of determining an address text after word segmentation processing; and extracting words from the address text after word segmentation processing according to a preset word extraction number and a preset word skipping number, so as to form feature word strings of the address text after word segmentation processing, wherein the number of extracted words included in each feature word string is equal to the word extraction number, and the number of two adjacent words, which are separated in the address text, in each feature word string is equal to the word skipping number. According to the method and device, word skipping processing can be carried out on the address text, so that feature word strings with relatively strong distinguishability can be obtained and then the address text mining effect is enhanced.

Description

technical field [0001] The present application relates to the technical field of data mining, and in particular to a feature extraction method and device. Background technique [0002] With the rapid growth of text information in data warehouses, text mining has become a research hotspot in the field of information. Address information is stored in the data warehouse in the form of text. Since address information occupies a very important position in big data analysis, address feature mining, as a type of text mining, is becoming more and more important. [0003] Word segmentation of Chinese address text is the basis of text mining, which is determined by the characteristics of Chinese. For example, after the Chinese address text "Wenyi West Road, Jingfeng Community, Wuchang Street, Yuhang District, Hangzhou City, Zhejiang Province" is word-segmented, it can be obtained including Zhejiang Province, Hangzhou City, Yuhang District, Wuchang Street, Jingfeng Community, and Weny...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/3329G06F16/355G06F16/36
Inventor 王国印
Owner CAINIAO SMART LOGISTICS HLDG LTD