Unlock instant, AI-driven research and patent intelligence for your innovation.

Data traceability method and system based on word vector and machine learning

A machine learning and data traceability technology, applied in the field of information security, can solve the problems of redundancy, large amount of engineering, useless data, etc., and achieve the effect of avoiding system management costs, stable and reliable traceability, and strong scalability.

Active Publication Date: 2022-04-08
SHANGHAI JIAO TONG UNIV
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For example, the document labeling method needs to modify all databases or operating systems in the system, which requires a huge amount of engineering, and may cause additional storage burden, and once the attacker knows the format of the label information, the label information is also easy to be tampered with; reverse The query method is limited to the data traceability of the database, and it is very difficult to construct a reverse query function, so it is not suitable for data traceability in the enterprise environment; the problem with API Hook is that it will generate a lot of redundant and useless data. It is monitoring at the application layer level, so there is no way to analyze the content of the file, resulting in the inability to completely and reliably restore the propagation path and modification records of a specific file

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data traceability method and system based on word vector and machine learning
  • Data traceability method and system based on word vector and machine learning
  • Data traceability method and system based on word vector and machine learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] The present invention will be described in detail below in conjunction with specific embodiments. The following examples will help those skilled in the art to further understand the present invention, but do not limit the present invention in any form. It should be noted that those skilled in the art can make several changes and improvements without departing from the concept of the present invention. These all belong to the protection scope of the present invention.

[0046]The present invention proposes a brand-new data traceability algorithm, which attempts to actively monitor the internal traffic data of the enterprise, restore the document content through the protocol restoration algorithm, and record the propagation path according to the results after calculating the similarity through word vectors and machine learning. , thus breaking the bottleneck of traditional data traceability technology. It has extremely high scalability, does not depend on a specific sys...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a data traceability method and system based on word vector and machine learning. The traffic data is collected through the network, file feature extraction is performed based on the traffic data, and a text feature vector is obtained; For each cluster type, the document to be traced is matched with multiple cluster types to obtain the type of the document to be traced; in the type of the document to be traced, the text similarity calculation is performed through the text feature vector, and the source of the document to be traced is traced. determination. The invention has strong scalability, does not depend on a specific system, does not require database transformation, has no requirements on data format, is stable and reliable in traceability, and has high efficiency; Class, and then trace the source by calculating the cosine value, break away from the dependence on the database, and avoid high system management costs.

Description

technical field [0001] The present invention relates to the technical field of information security, in particular to a data traceability method and system based on word vectors and machine learning. Background technique [0002] In recent years, due to the popularization of the Internet and the increase in the number of Internet users, the amount of data generated by people's network activities has also shown explosive growth. While big data provides assistance and value for the development of all walks of life, it also brings new challenges to information security. Especially in some enterprises and various organizations, how to ensure the security of traffic is a very important issue for a large amount of inflow and outflow data. The data traceability technology is such a technology that traces the ins and outs of data, which is also very helpful for data protection and confidential information flow control of various organizations. [0003] Data traceability is a relat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F16/33
CPCG06F16/353G06F16/334
Inventor 丁疏横范磊
Owner SHANGHAI JIAO TONG UNIV