Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A method for automatic generation of electronic commerce dictionary

An e-commerce and automatic generation technology, which is applied in the direction of electronic digital data processing, special data processing applications, instruments, etc., can solve the problems of low accuracy of automatic generation methods, heavy manual processing workload, slow update speed, etc., to improve the generation Effects of efficiency, dictionary refinement, and slow update speed

Inactive Publication Date: 2015-07-29
姚明东
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the disadvantages of the traditional method mainly include: first, the manual processing workload is large; second, due to the endless changes in the application of new products in the e-commerce field, the update speed of the traditional method is slow; third, the accuracy of the automatic generation method is low, and the results are relatively rough

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method for automatic generation of electronic commerce dictionary
  • A method for automatic generation of electronic commerce dictionary
  • A method for automatic generation of electronic commerce dictionary

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0016] The present invention will be described in detail below in conjunction with specific embodiments.

[0017] The detailed implementation steps of this method include:

[0018] Step 1: Data crawling Raw data is crawled from raw data sources such as e-commerce websites and search engines. Raw data is generally HTML web pages, which contain product information such as product names, models, and descriptions; save them after text extraction and classification is rough text containing product information;

[0019] Step 2: Preprocessing and analyzing the HTML tags in the text, filtering the junk data of the product information in step 1, such as image links, URLs, and HTML tags; then structurally processing the product information to obtain products without punctuation marks and HTML tags describe plain text information;

[0020] Step 3: Progressive brute force is used to fully segment the collected information text, the initial position is the first character of the text str...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an automatic generation method of an e-commerce dictionary. The automatic generation method comprises the following steps of 1 data crawling: crawling original commodity data from an e-commerce website and a search engine; 2 pretreatment; 3 exhaustion in a mode going forward one by one; 4 word frequency statistics; 5 merger treatment; 6 redundancy filtering; 7 regular type filtering; 8 potential word compensation; 9 low frequency word rejecting; and 10 feature word compensation. The automatic generation method mainly has the advantages of being high in dictionary generation speed, adopting algorithms such as machine learning, intelligent filtering, error correction and compensation to automatically generate the dictionary, and being capable of greatly improving generation efficiency; being high in including rate of the generated dictionary, enabling fewer entries to be leaked in a word segmentation process due to the fact that a method of exhaustion in the mode going forward one by one is used for word segmentation of a text; and being refine in the generated dictionary, combining with processing algorithms such as the error correction, the redundancy filtering and the regular type filtering, removing redundancy and errors in the dictionary, and finally enabling the generated e-commerce dictionary to be refine.

Description

technical field [0001] The invention relates to a method for automatically generating an electronic commerce dictionary. Mainly for the field of e-commerce, the e-commerce dictionary is the basis of e-commerce website applications, such as search, recommendation, semantic word segmentation, sorting weight calculation and many other aspects need to be used. Background technique [0002] At present, dictionaries for e-commerce are rare. At present, mainstream applications such as Taobao mostly use manual generation or simple statistical generation, and some use machine learning methods to collect entries to form dictionaries. However, the disadvantages of the traditional method mainly include: first, the manual processing workload is large; second, due to the endless changes in the application of new products in the e-commerce field, the update speed of the traditional method is slow; third, the accuracy of the automatic generation method is low, and the result is relatively r...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30G06F17/27
Inventor 姚明东范英磊陈浩
Owner 姚明东
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products