Method for judging repetition of enterprise Chinese names on basis of core word similarity

A technology for enterprise names and core words, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as low efficiency and no consideration of the importance of individual words

Active Publication Date: 2014-06-25
FOCUS TECH
View PDF4 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This method also does not consider the importance of individual words, and it is inefficient when calculating similarity

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for judging repetition of enterprise Chinese names on basis of core word similarity
  • Method for judging repetition of enterprise Chinese names on basis of core word similarity
  • Method for judging repetition of enterprise Chinese names on basis of core word similarity

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0058] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, some terms involved in the method and system for judging duplicate Chinese company names of the present invention will be briefly explained below.

[0059] Duplication of enterprise Chinese name: it is the repetition of enterprise Chinese name text, not exactly equal.

[0060] Similarity: A measure of the similarity between two texts.

[0061] Territory: Refers to provinces, cities, counties and towns in China.

[0062] Core words: the keywords that can best distinguish the company name, usually remove the "shares", "limited" and "company" after the region and name.

[0063] Data mart: It is a subset of data warehouse, mainly for department-level business, and only for a specific topic.

[0064] combine figure 1 , the flow chart of the method for judging the duplication of the Chinese name of an enterprise based on the similarity of core words in the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for judging repetition of enterprise Chinese names on the basis of core word similarity. The method comprises the steps that the enterprises Chinese names in a B2B e-business platform database are loaded to an enterprise name data market through ETL; the enterprise Chinese names stored in the data market are processed in advance; core words of the newly-increased enterprise Chinese names are extracted according to an enterprise Chinese name core word extraction method; corresponding enterprise Chinese names with name region keywords are searched for according to enterprise Chinese name sets corresponding to the core words, and similarity between the newly-increased enterprise Chinese names and corresponding enterprise Chinese names without name region keywords is calculated through a text similarity calculation processing method taking the weight into consideration. According to the method for judging repetition of the enterprise Chinese names on the basis of core word similarity, the processing workload of matching work is reduced, and processing efficiency of the whole method is improved.

Description

technical field [0001] The invention belongs to the field of B2B e-commerce information review, in particular to a method for judging duplicate Chinese names of enterprises based on the similarity of core words. Background technique [0002] In China, as the application of B2B e-commerce in enterprises is becoming more and more popular, a large number of visitors generate a large number of registration behaviors on the B2B e-commerce platform every day. Intentions, etc., there will be repeated registrations, resulting in a large number of duplicate companies on the B2B e-commerce platform, which leads to a large amount of redundant information stored in the B2B e-commerce platform, reducing the information quality of the platform. In addition, if some companies intend to expand publicity, not only will a large amount of duplicate identity information be generated due to repeated registration, but also a large amount of identical product information will appear, which will re...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
Inventor 刘少武王婷
Owner FOCUS TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products