Word vector file loading method and device and storage medium

A file loading and word vector technology, which is applied to program control devices, program loading/starting, text database indexing, etc., can solve the problems of slow loading process, long time consumption, high memory usage of word vector dictionary, etc., to reduce GC pressure, Improve loading speed and avoid memory fragmentation

Active Publication Date: 2022-04-29
北京沃丰时代数据科技有限公司
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In the process of natural language processing, word vectors are millions of levels, and the generated word vector dictionary takes up a

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word vector file loading method and device and storage medium
  • Word vector file loading method and device and storage medium
  • Word vector file loading method and device and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] First, some English and abbreviations that will appear in the embodiments of the present application are explained:

[0052] NLP: Natural Language Processing, natural language processing.

[0053] FAQ: Frequently Asked Question, quick question and answer.

[0054] CGO: The mechanism for calling each other between Go language and C language.

[0055] GC: Garbage Collection, garbage collection.

[0056] Key: word content.

[0057] Value: The index corresponding to the word vector.

[0058] Redis: Remote Dictionary Server, a remote dictionary service, is an open-source, log-type, Key-Value database that is written in standard C language, supports the network, and can be memory-based or persistent, and provides application programming interfaces in multiple languages.

[0059] Golang: Also known as Go, a statically strongly typed, compiled language that is parallelizable and has garbage collection capabilities.

[0060] Syscall: A system call from the application layer...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a word vector file loading method and device and a storage medium, and the method comprises the steps: after a Golang program is started, mapping a formatted word vector file to a memory through a Syscale instruction; the formatted word vector file is a binary file, and the formatted word vector file comprises a word length, a word, a word vector length and a word vector; and loading the word vector file mapped to the memory, and constructing a word vector dictionary. According to the word vector file loading method and device and the storage medium provided by the embodiment of the invention, the memory is effectively saved by formatting the original word vector file into the binary file, the word vector file is mapped to the memory through the Syscale instruction in the Golang environment, the loading rate is improved, the mapped memory file is used as a storage object of the word vector, and the loading efficiency is improved. Memory fragments are avoided, and GC pressure is reduced.

Description

technical field [0001] The present application relates to the technical field of natural language processing, in particular to a word vector file loading method, device and storage medium. Background technique [0002] The original information of the text cannot be recognized by the computer. In order to facilitate calculation and processing, word vectors appear, that is, to represent a word, word, phrase or text in the form of digital vectors. The internal relationship between semantics can be further discovered through word vectors. [0003] In the process of natural language processing, word vectors are at the level of millions, and the generated word vector dictionary takes up a lot of memory. In the process of loading word vectors to generate word vector dictionaries, it takes a long time and the loading process is slow. Contents of the invention [0004] In view of the above-mentioned problems in the prior art, the present application provides a word vector file load...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F9/445G06F16/31G06F40/242G06F40/284
CPCG06F9/44505G06F9/44521G06F40/242G06F40/284G06F16/31
Inventor 马冰
Owner 北京沃丰时代数据科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products