Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Approximating matching method for numerous character strings

A matching method and string technology, which are applied in the fields of electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of lack of algorithms for constructing a preferred list, low search efficiency, and reduced efficiency, and achieve good development and application prospects. , the effect of strengthening product competitiveness and improving system performance

Inactive Publication Date: 2010-05-05
NEWEGG INFORMATION TECH XIAN
View PDF0 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0010] Disadvantages: Only the similarity between two strings can be calculated, and the length of a substring or subsequence in which one string is another string cannot be obtained
[0020] Disadvantage: The calculated longest common subsequence string is discontinuous and does not support backtracking
[0023] Disadvantages: It is necessary to construct the sequence of each person's preferred objects in advance, such as in e-commerce product matching, lacking an algorithm for constructing a preferred list
[0024] Therefore, at present, all medium and large e-commerce websites will use excellent web page information tracking or crawler tool software to capture external product data, but it has certain prerequisites and limitations:
[0025] 1. Some products on external websites lack key attributes or specifications as a direct comparison basis for matching and grabbing; for example, the product "Creative ZEN 4GB BLACK Mp3Mp4Video Player with Expandable SD Card Slot", which is a product of a B2C website However, the product naming method of each B2C website is different, so if the product number is missing in the information list, the accuracy of matching crawling will decrease
[0026] 2. The attributes or specifications of similar products in the same series are very similar, which reduces the accuracy of captured data and requires manual differentiation and matching, thus reducing efficiency
For example, the product names are "Epson Light Black Ink Cartridge T096720" and "Epson Matte Black Ink Cartridge T096820", and the product numbers are "T096720" and "T096820" respectively. The string similarity of the main attribute is very high. If there are only one product, it is necessary to manually select the best matching object, the search efficiency is low, and the real-time performance is poor

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Approximating matching method for numerous character strings
  • Approximating matching method for numerous character strings
  • Approximating matching method for numerous character strings

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] In order to make the technical means, creative features, objectives and effects achieved by the present invention easy to understand, the present invention will be further described below in conjunction with specific illustrations.

[0048]The present invention is based on a stable marriage asymmetrical algorithm and constructs a many-to-many matching model of a large number of character strings. The optimal list of matching items of this model uses the edit distance algorithm and the longest common subsequence algorithm in the approximate matching algorithm of character strings. Combining these three algorithm technologies between the product number and product number, product name and product name, product number and product name, price and price, sales type and sales type among the product components of one's own and the other party's products, set a certain An automatic calculation of the parameter weights of , yields the best match.

[0049] see image 3 , the spe...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an approximating matching method for numerous character strings, which comprises the steps of: (1) choosing a main matching parameter of an object to be matched, (2) adjusting the weighted value of the parameter, (3) constructing a many-to-many matching module by utilizing stable marriage asymmetric algorithm, (4) creating an optimized list according to the editing distance algorithm or the longest common sub-sequence, aiming at the matching items in the many-to-many matching module. Basing on an algorithm module of main body, the invention automatically chooses the algorithm by adding attributive analysis. After constructing the module, the matching result is stable, the matching rate is high, and the matching is real-time and rapid. According to different application scenes, the optimized list can be created by utilizing different approximating matching algorithms for different character strings. With excellent application prospect, the pertinent product strategy can be made in a short time, the product competitiveness can be enhanced, the running efficiency of website can be promoted, and the system performance can be improved.

Description

Technical field: [0001] The invention relates to a product data matching method, in particular to an approximate matching method of a large number of character strings applied in electronic commerce. Background technique: [0002] With the rapid development of e-commerce, the competition of e-commerce websites based on the B2C model is also intensifying, and its core is reflected in the differences in prices, activities and services among each product sold. Therefore, it is imperative to understand the various data of external website products in real time every day, and then make correct sales strategies and improve competitiveness. [0003] There are two algorithms currently used in the approximate matching method of products in the field of e-commerce: [0004] 1. Edit distance algorithm: [0005] It is used to judge the similarity between strings, which is equal to the minimum cost required to transform one string into another string through basic transformation. Edit...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06Q30/00
Inventor 蒋以仁宋卫卫王皓伊
Owner NEWEGG INFORMATION TECH XIAN
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products