Method and system for core process knowledge intelligent pushing based on multi-model fusion

A technology knowledge, multi-model technology, applied in the computer field, can solve problems such as large amount of similarity calculation, difficulty in expressing files and queries, errors in corpus, etc., and achieve the effect of improving the classification effect

Active Publication Date: 2018-11-20
CHONGQING WANGJIANG IND +1
View PDF6 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] fastText Disadvantages: Requires a large amount of labeled data to train
[0005] Disadvantages of Rocchio: Rocchio assumes that the training data is absolutely correct, but the corpus will inevitably have errors; the unbalanced corpus will lead to deviations in the model file, and classes with more samples have more advantages; it is believed that documents of a category are only gathered around a centroid. This is often not the case (such data is called linearly inseparable)
[0006] Disadvantages of multi-classification SVM: Since this classification uses directed acyclic graphs to prevent classifiers, if the first classifier answers wrong (it is obviously an article of category 1, it says 5), then the latter classifier is no matter what It is impossible to correct its mistakes (because the category label "1" does not appear in the subsequent classifiers), in fact, there is such a phenomenon of downward accumulation of errors in the classifiers of each layer below
[0007] Disadvantages of Jaccard coefficient-Knn: the value of the element can only be 0 or 1, and richer information cannot be used
[0009] Disadvantages of the Boolean model: its retrieval strategy is derived from the binary judgment standard, documents are either relevant or irrelevant, and there is no concept of document classification, it is difficult to improve retrieval performance; although Boolean expressions have exact semantics, it is usually difficult to The user's information needs are converted into Boolean expressions. In fact, many users find it difficult to express their query requirements in Boolean expressions
[0010] Disadvantages of the vector space model: the calculation of the similarity is large, and when a new document is added, the weight of the word must be recalculated
[0011] Disadvantages of the probability model: First, the dependence on the text set is too strong, and the documents need to be divided into related and irrelevant sets. In fact, this model does not consider the frequency of index words in the document; second, the model stores and calculates The overhead is very high, parameter estimation is difficult, and the expression of files and queries is also difficult

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for core process knowledge intelligent pushing based on multi-model fusion
  • Method and system for core process knowledge intelligent pushing based on multi-model fusion
  • Method and system for core process knowledge intelligent pushing based on multi-model fusion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0126] see figure 1 , an intelligent push method for core process knowledge based on multi-model fusion, comprising the following steps:

[0127] 1) Text classification: all texts are preprocessed, and then the processed text is input into a classifier for pre-classification to obtain text category information, which is a model vector representation of the category as a whole; the text of this embodiment is the source of corpus data: Tsinghua public Chinese dataset, download address: https: / / ctwdataset.github.io / downloads.html . In step 1), the Adaboost algorithm is used to fuse a variety of different types of basic classifiers to form a final classifier, and the processed data is input into the final classifier for pre-classification to obtain category information; different types of basic classifiers include the Jaccard coefficient-Knn model , fastText deep learning model, Rocchio model, multi-classification SVM model; The steps of using Adaboost algorithm to fuse mul...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and a system for core process knowledge intelligent pushing based on multi-model fusion. The method comprises: preprocessing existing corpus data, and then inputting the data into a classification algorithm model for pre-classification, improving effect of classification through model fusion, when user queries or user feeds back, performing similarity calculation on user input and text categories, to determine the categories to which keywords belong, taking first k1 most similar categories, just retrieving in the categories, for each category, using the input keywords to retrieve respectively using different models in the category, combining all previous results, performing relevancy sorting using a BM25 algorithm, taking first k2 results, and using Jaccardsimilarity to remove texts which are too similar in the results, finally, returning the results to users. The method and the system can further adjust user's keyword models according to the feedbackof the users, and better fit needs of the users, so as to optimize user pushing effect and matching degree in the next pushing.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a method and system for intelligently pushing core process knowledge based on multi-model fusion. Background technique [0002] There are many problems in traditional information retrieval systems. In terms of query and access, the main methods include Boolean query, vector space model, probability model, etc., each has its own advantages and disadvantages, but when used alone, the accuracy cannot reach the highest level, and there will be a small number of inconsistencies. In the query results that match keywords, due to semantic problems, the same keyword does not refer to the same thing, that is, although the keyword of the entry matches the keyword retrieved, this part of the information is not needed by the user. From the results of the search, although the comprehensiveness of the information is considered, it cannot meet the needs of the searcher well, resulting in a de...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06K9/62
CPCG06F18/214
Inventor 周臣刚张国胜王科徐宁汪影王颂菊谢军魏大勇
Owner CHONGQING WANGJIANG IND
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products