Unlock instant, AI-driven research and patent intelligence for your innovation.

Website text content-based online loan website entity recognition method and system

A technology for entity recognition and web page content, applied in the field of online loan website recognition, it can solve the problems of poor timeliness, low accuracy and high cost, and achieve the effect of high timeliness and high accuracy

Pending Publication Date: 2020-04-28
SHANGHAI GUAN AN INFORMATION TECH
View PDF8 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The manual labeling method is costly and inefficient. When the sample size is small, although the classification can be completed accurately, when faced with a large amount of text, it is difficult to identify whether the website is an online loan website only by manual labeling
Recognition methods such as simple dictionaries, rule matching, and machine learning models have low accuracy and poor timeliness, and need to rely on more professional and accurate thesaurus

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Website text content-based online loan website entity recognition method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0043] Such as figure 1 As shown, a method for entity recognition of an online loan website based on the text content of the website, the specific steps are as follows:

[0044] S01, build the domain name table of the training set, collect the website domain name hosts of known website types, obtain the web content texts corresponding to these domain name hosts through crawler technology, and mark these website type labels at the same time, where 1 indicates that it is an online loan website, and 0 indicates other websites. If the website is an online loan website, mark the entity name of the online loan website; if it is not an online loan website, leave it blank. Thus, the domain name table T_host of the training set is generated, which includes the domain name host, webpage content content, online loan website label, entity name entity;

[0045] S02. Construct the domain name table of the prediction set, obtain the DPI data of the operator, extract the host field of the doma...

Embodiment 2

[0060] Corresponding to Embodiment 1, this embodiment also provides an online loan website entity recognition system based on website text content, including

[0061] Build the domain name table module of the training set, collect the website domain name hosts of known website types, obtain the corresponding web page content texts of these domain name hosts through crawler technology, and mark these website types with labels, where 1 indicates that it is an online loan website, and 0 indicates other websites. If the website is an online loan website, mark the entity name of the online loan website; if it is not an online loan website, leave it blank. Thus, the domain name table T_host of the training set is generated, which includes the domain name host, webpage content content, online loan website label, entity name entity;

[0062] Build the prediction set domain name table module, obtain the DPI data of the operator, extract the domain name host field in the data, and form ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a website text content-based online loan website entity recognition method and system. The method comprises the steps of S01, constructing a training set domain name table; s02,constructing a prediction set domain name table; s03, data cleaning and preprocessing; s04, training a text classification model to obtain a target text classification model; s05, online loan websiterecognition: inputting the target webpage content field of each sample in the prediction set domain name table into a value target text classification model, and outputting whether the field corresponding to each sample is an online loan website field or not; s06, training a named entity recognition model to obtain a target named entity recognition model; and S07, entity name labeling. The methodis based on operator DPI data. The website domain name host accessed by the user is obtained, the webpage content is obtained, the online loan website is identified. Meanwhile, the entity name in theonline loan website is extracted by utilizing the named entity recognition technology, and then some bad websites are marked and an enterprise blacklist library is established in combination with external blacklist data, so that the method is high in accuracy and high in timeliness.

Description

technical field [0001] The invention relates to the technical field of online loan website identification, in particular to a method and system for identifying an online loan website entity based on website text content. Background technique [0002] With the rapid development of Internet technology, the establishment of websites is more convenient and the threshold is lower, which has led to the emergence of many bad and illegal websites such as illegal online loan websites, phishing websites, and gambling websites. In recent years, incidents such as P2P companies running away, network fraud, and telecom fraud have occurred frequently, causing relatively serious property losses to netizens, some even endangering personal safety, and at the same time producing adverse social impacts. Accurate and efficient identification of online loan websites, and timely remind users to operate carefully, can avoid the loss of users' property, etc., and at the same time improve the company...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/951G06F16/958G06F40/295G06Q50/26
CPCG06F16/951G06F16/958G06Q50/26
Inventor 梁淑云刘胜马影陶景龙王启凡魏国富徐明殷钱安余贤喆周晓勇
Owner SHANGHAI GUAN AN INFORMATION TECH
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More