Supercharge Your Innovation With Domain-Expert AI Agents!

NLP-based short text data processing method

A data processing and short text technology, which is applied in the fields of electrical digital data processing, natural language data processing, instruments, etc., can solve the problems of inaccuracy and low efficiency of manual processing of short text data, so as to reduce manpower and solve the low efficiency of manual processing. Effect

Pending Publication Date: 2021-01-22
中电万维信息技术有限责任公司
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the embodiments of the present invention is to propose a method for processing short text data based on NLP, which aims to solve the problems of low efficiency, imprecise manual processing of short text data, and difficulty in processing large data, thereby reducing the consumption of a large amount of manpower, material resources

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • NLP-based short text data processing method
  • NLP-based short text data processing method
  • NLP-based short text data processing method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0038] figure 1 Shown is a flow chart of a method for NLP-based short text data processing according to an embodiment of the present invention, the method comprising steps:

[0039] S101 obtains short text data:

[0040] We synchronize the short text data in the business database to the local TXT file through the DataX tool.

[0041] S102 jieba participle:

[0042] The short text data obtained in step S101 is cut into the sentence most precisely by using the precise mode in units of lines.

[0043] S103 to stop words (stopwords):

[0044] By loading our accumulated Chinese stop words, use NLTK to delete the stop words contained in the jieba segmented words in step S102.

[0045] S104 obtain word bag:

[0046] By using the gensim library, a unique integer id is assigned to all words that appear in the corpus, for example: {'restaurant': 0, 'fried': 1, 'pull': 2, 'open': 3, 'member' : 4, 'restaurant': 5, 'Ramen': 6, 'hotel': 7, 'restaurant': 8, 'hot pot': 9, 'catering': 10...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of natural language processing, in particular to an NLP-based short text data processing method. The method comprises the steps of obtaining short text data, performing jieba word segmentation, removing stopwords, obtaining word bags, making corpora, performing TF-IDF processing, calculating cosine distance, and standardizing the short text data according to the calculated cosine distance. The problems that short text data is low in manual processing efficiency and inaccurate and big data processing is difficult are solved, and therefore consumptionof a large amount of manpower and material resources is reduced.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to a method for processing short text data based on NLP. Background technique [0002] With the rapid development of network information technology and the gradual transformation of traditional paper information to digital information, more and more information is accumulated in the network, especially short text information. Most of the short text data is collected through information systems and stored in relational databases. Short text data can be expressed in various forms, but the meaning of the expression is the same. For example: there is an employment questionnaire, and there is a non-selective item engaged in the type of work. Now it is necessary to perform statistical analysis on this indicator item, but the data collected by this indicator item has various expressions. , and the expression has the same meaning, such as: restaurant waiter, restaurant ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/284G06F40/289G06F40/216G06F40/103G06K9/62
CPCG06F40/284G06F40/289G06F40/216G06F40/103G06F18/22
Inventor 魏建军刘磊郭真王富
Owner 中电万维信息技术有限责任公司
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More