Unlock instant, AI-driven research and patent intelligence for your innovation.

Based on multilingual efficient data acquisition method, computer program

A technology of data acquisition and data acquisition system, which is applied in computing, natural language translation, electronic digital data processing, etc. It can solve the problems of waste of collected information, large consumption of server resources, and complex system, so as to reduce server requirements and improve collection efficiency , the effect of improving the deployment speed

Active Publication Date: 2021-10-15
GLOBAL TONE COMM TECH
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] To sum up, the problems existing in the existing technology are: the existing data collection methods increase the cost of procurement and operation and maintenance, resulting in complex systems, high repetition rate of content on different websites, serious waste of collected information, high error rate, system deployment and The maintenance cycle is long and consumes a lot of server resources
For example, news information is usually through the homepage-list page-content page (multiple pages). Forum content generally requires registered users to log in, collect posts on each topic, reply to posts, etc., which are very different. Acquisition requirements, data structure requirements, especially when the collection volume requirements are relatively large, it is impossible to complete multiple complex tasks through a single system

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Based on multilingual efficient data acquisition method, computer program
  • Based on multilingual efficient data acquisition method, computer program
  • Based on multilingual efficient data acquisition method, computer program

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0087] In order to make the object, technical solution and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with the examples. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0088] The application principle of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0089] Such as figure 1 As shown, the multilingual efficient data collection method provided by the embodiment of the present invention includes the following steps:

[0090] S101: Enter keywords through the text input box of the management platform;

[0091] S102: Identify the input language type, and perform multilingual translation of the text through the translation engine. Currently, the system has realized the translation function of 32 languages;

[0092] S103: Distribute the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of computer software, and discloses a multilingual-based high-efficiency data collection method and a computer program, including: keyword entry; identification of input language types, multilingual translation of text; distribution of multilingual keywords; System-specific performance acquisition scheduling server tasks, calling related engines for business crawling; web pages are divided into news content pages, news list pages, and invalid information is filtered; news pages directly obtain news information from links; news list pages, enable secondary crawler subsystem recursion , page secondary analysis, get news list page, get news page, get news; get news content deduplication; data structured storage of legal data. The invention realizes the function of fast news collection through the search engine, realizes multilingual automatic collection of the system; fast deduplication; realizes fast load balancing, and supports dynamic loading and removal of the collection engine; reduces IP consumption and improves collection efficiency.

Description

technical field [0001] The invention belongs to the technical field of computer software, and in particular relates to a multilingual-based high-efficiency data collection method and a computer program. Background technique [0002] The demand for Internet data collection is increasing, and the efficiency and accuracy of the collection system is becoming the goal pursued by the collection system. At present, many collection systems use crawler systems to directly collect website data. Domestic and foreign collections mostly use multiple sets of systems to deploy collections, collection methods adopt general collection or template collection, and collect data from news websites. The deployment of multiple systems increases the cost of procurement and operation and maintenance, and also makes the system complex, making it difficult to uniformly allocate existing resources. The content repetition rate of different websites is high, and the waste of collected information is se...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/58G06F16/9536
CPCG06F16/9535G06F40/58
Inventor 詹咏松程国艮
Owner GLOBAL TONE COMM TECH