Capital market public opinion monitoring method based on distributed web crawler and NLP

A distributed network and distributed technology, applied in the Internet field, can solve problems such as lack of proprietary vocabulary, financial vocabulary, complex configuration, judgment error, etc., to reduce system instability and time consumption, low operating cost, and improve accuracy. rate effect

Active Publication Date: 2020-12-22
朱彤
View PDF12 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The technical problem solved by the present invention: the information grabbing system of a single node faces the problem of poor system stability, while the traditional distributed architecture, such as Docker, has relatively complicated configuration and high cost; the existing lexicon is designed based on the scenes of daily life There is a relative lack of proprietary vocabulary and financial vocabulary, and the cost of manual labeling is high; traditional text sentiment analysis algorithms have the problem of judgment bias, and synthetic indicators need to be produced to improve the accuracy of sentiment judgments

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Capital market public opinion monitoring method based on distributed web crawler and NLP
  • Capital market public opinion monitoring method based on distributed web crawler and NLP
  • Capital market public opinion monitoring method based on distributed web crawler and NLP

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0076] Below in conjunction with specific examples, further illustrate the present invention, the examples are implemented under the premise of the technical solutions of the present invention, it should be understood that these examples are only used to illustrate the present invention and are not intended to limit the scope of the present invention.

[0077] Such as figure 1 As shown, a distributed network crawler and NLP-based capital market public opinion monitoring method of the present application includes: a cloud server-based distributed crawler module and a financial text NLP analysis system.

[0078] The distributed crawler based on the cloud server realizes large-scale crawler tasks. Compared with traditional distributed architectures such as Docker, it is easier to configure and has lower operating costs, and the system scale can be adjusted according to actual needs.

[0079] The financial text NLP analysis system builds financial word segmentation and positive an...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a capital market public opinion monitoring method based on a distributed web crawler and an NLP. The capital market public opinion monitoring method comprises a distributed crawler module based on a cloud server and a financial text NLP analysis system. Based on a distributed crawler of a cloud server, multi-process and periodically updated public information capture is realized, and the system scale can be quickly adjusted according to use requirements; according to the method, a financial segmented word bank and a positive and negative emotion word bank are constructed, the cost of manual annotation is reduced in the corpus construction process through mixed sample inspection, fuzzy clustering and other algorithms, the positive and negative emotion tendency probability of a text is calculated based on a supervised learning algorithm, emotion indexes are synthesized by adopting an auto-encoder algorithm, and the emotion judgment accuracy is improved; accordingto the distributed architecture, system crash caused by a single node fault is avoided, the validity of the emotion index is improved through the reconstructed financial emotion word bank and the textanalysis algorithm, the market emotion is dynamically reflected, and real-time capital market data is provided for a user.

Description

technical field [0001] The invention belongs to the technical field of the Internet, and in particular relates to a method for monitoring public opinion in a capital market based on distributed web crawlers and NLP. Background technique [0002] Public opinion analysis has the characteristics of a large amount of data and fast update speed. The real-time capture system faces difficulties such as high scalability requirements, high long-term operating costs, and high system stability requirements. In the existing technology, although the system architecture for stock market sentiment and stock public opinion analysis exists, the relevant architecture is based on a complex server cluster system, which has high operating costs and long deployment time, and does not have the basis for large-scale popularization. [0003] On the other hand, the actual performance of algorithms for public opinion text analysis has always been a problem. First of all, there is a lack of corpora fo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/951G06F16/35G06F40/284G06F40/289G06F40/237G06K9/62G06N3/08
CPCG06F16/951G06F16/35G06F40/284G06F40/289G06F40/237G06N3/08G06F18/24147G06F18/24155G06F18/2451G06F18/2411
Inventor 朱彤
Owner 朱彤
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products