Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

33results about How to "Improve word segmentation efficiency" patented technology

A query method for indefinite words and sentences of evaluation documents based on inverted index

The invention relates to a query method for indefinite words and phrases of evaluation documents based on inverted index, which relates to an index method in the field of data science and a word segmentation method in the field of NLP, and solves the query problem of indefinite words and phrases of evaluation documents. The invention comprises the following steps: 1, data preprocessing is carriedout on the document to be queried, word segmentation is carried out by using jieba word segmentation method, and word dictionary and word frequency information are obtained; 2, based on the inverted index principle of complete reconstruction strategy, an adaptive inverted table is established. 3, combine that information of the indefinite words and sentence to be searched, identifying the indefinite words and phrases position information in each word and phrases in the adaptive inverted table index, and indexing the paragraphs where the indefinite words and phrases are located, so as to complete the query function of the indefinite words and phrases in the evaluation documents. The basic idea of the invention is to divide the text data into words and establish an inverted index so as to realize fast searching for indefinite words and sentences, thereby realizing the inquiry function of evaluation documents. The application scenario is wide, so it has high socio-economic value.
Owner:HARBIN INST OF TECH

Chinese word segmentation method and apparatus

Embodiments of the invention disclose a Chinese word segmentation method and apparatus. The method comprises the steps of dividing a text set into a plurality of short sentences and numbering the short sentences; for each Chinese character in the text set, obtaining a first short sentence number list corresponding to a current Chinese character, obtaining a second short sentence number list corresponding to an adjacent Chinese character adjacent to the right side of the current Chinese character, and calculating a degree of co-occurrence according to the first short sentence number list and the second short sentence number list; obtaining an adjacent character set corresponding to the current Chinese character, and calculating a relevant degree of adjacency according to the adjacent character set; determining whether a word consisting of the current Chinese character and the adjacent Chinese character is added into a candidate word set or not according to the degree of co-occurrence and the relevant degree of adjacency; and performing word segmentation on the text set according to the candidate word set. The method is small in calculation amount and high in accuracy when calculating the candidate word set, can effectively improve the accuracy of a word segmentation result and improve the efficiency of word segmentation, does not depend on a corpus dictionary, and can realize unsupervised candidate vocabulary extraction.
Owner:RUN TECH CO LTD BEIJING

Method and system for constructing emergency knowledge graph based on Chinese word segmentation technology

The invention discloses a method for constructing an emergency knowledge graph based on a Chinese word segmentation technology. The method specifically comprises the following steps: S1, inputting anemergency information text; S2, analyzing elements in the emergency information text in the step S1, extracting key data, and constructing an emergency knowledge base by utilizing the extracted key data; S3, performing word segmentation and judgment on the emergency information text input in the step S1 by adopting a multi-strategy combined Chinese word segmentation algorithm, and outputting a word segmentation result; S4, searching and matching the word segmentation result obtained in the step S3 in the emergency knowledge base by utilizing a retrieval engine, and outputting result data afterthe matching is successful; and S5, constructing an emergency knowledge graph according to the emergency service system in combination with the result data, and outputting graph result data. A scientific and comprehensive emergency knowledge graph is constructed according to an emergency business system, the data matching speed and the word segmentation precision are improved, the problems of lowretrieval efficiency and the like are solved, and the shared application service of emergency knowledge is realized.
Owner:SPEED SPACE TIME INFORMATION TECH CO LTD

Word segmentation method supporting large number of word banks, and computer readable storage medium and system

The invention provides a word segmentation method supporting a large number of word banks, and a computer readable storage medium and a system. The method comprises the following steps: constructing adomain dictionary; constructing an offline word segmentation model based on a domain dictionary; for the original text to be subjected to word segmentation, performing word segmentation through an offline word segmentation model to obtain a first word segmentation result; carrying out to-be-searched word extraction on the original text to be subjected to word segmentation, then carrying out first-level index search and second-level index search in the domain dictionary based on the to-be-searched words, and finally screening second-level index results to extract candidate words; and recombining the candidate words and the first word segmentation result, constructing a directed graph of the original text based on a recombining result, and calculating an optimal word segmentation result based on a shortest path method. According to the method, the word segmentation result in the single field is combined with the big word search result, the directed graph is constructed based on the combination result, the problem of solving the optimal word segmentation scheme is converted into the problem of the optimal path to be quickly solved, and the method is very suitable for segmenting the big words.
Owner:启业云大数据(南京)有限公司

Intelligent matching system

The invention provides an intelligent matching system which comprises: a data acquisition module used for acquiring user registration input data and user behavior logs published on a network and a system platform; a recommended object modeling module which is used for extracting a keyword in the announcement information according to each piece of announcement information in user registration inutdata, obtaining all keywords interested by the user according to all announcement information concerned by the user in the user behavior log of the user for each user, and obtaining the interest degree of each keyword interested by the user according to the attention behavior of the user on each announcement information concerned by the user in the user behavior log of the user; and a recommendation algorithm module which is used for calculating the interest degree of each user in the announcement information according to the keyword extracted from the announcement information and the interestdegree of each user in the extracted keyword, and recommending the announcement information to a plurality of users with the highest interest degree. The intelligent matching system is high in information recommendation precision and efficiency.
Owner:安徽省优质采科技发展有限责任公司

Chinese word segmentation method based on Hash algorithm

The invention discloses a Chinese word segmentation method based on a Hash algorithm, and relates to the field of natural language processing. The method comprises the following steps of S1, configuring a word segmentation device on a search engine and establishing a dictionary structure; s2, monitoring the return operation of the user, and obtaining the first character in an input box; s3, inputting the first character into a dictionary for primary searching and screening; s4, forming a tree by all words with the same first characters in the dictionary; s5, placing a second word in the word on a second layer of the tree, and creating a Hash index table; s6, carrying out Hash searching on the remaining characters; s7, after an IK reads the new lexicon, notifying the search engine to update; and S8, updating the dictionary information in the memory by the search engine. According to the invention, the Hash search is carried out on the first character by creating a dictionary storage mechanism, the dictionary structure and the algorithm of carrying out Hash search on the remaining characters via the tree result are established, and the search engine is updated by using IK word segmentation, so that the Chinese word segmentation efficiency is improved, the system complexity is reduced, and the index redundancy degree is reduced.
Owner:合肥天毅网络传媒有限公司

A Chinese word segmentation method based on bidirectional lstm, cnn and crf

The invention discloses a Chinese word segmentation method based on bidirectional LSTM, CNN and CRF, which is an improvement and optimization of traditional Chinese word segmentation based on a deep learning algorithm. The specific steps of the method are as follows: preprocessing the initial corpus, extracting the character feature information of the corpus and the corresponding pinyin feature information of the character; using the convolutional neural network to obtain the pinyin feature information vector of the character; using the word2vec model to obtain the character feature information vector of the text; Splicing the pinyin feature vector and the character feature vector to get the context information vector, put it into the bidirectional LSTM neural network; use the linear chain conditional random field to decode the output of the bidirectional LSTM to get the word segmentation tag sequence; decode the word segmentation tag sequence to get Word segmentation results. The present invention uses a deep neural network to extract text character features and pinyin features and combines conditional random fields for decoding, which can effectively extract Chinese text features and achieve good results in Chinese word segmentation tasks.
Owner:NANJING UNIV OF POSTS & TELECOMM

Word segmentation method, computer-readable storage medium and system supporting a large number of lexicons

The present invention proposes a word segmentation method, computer-readable storage medium and system supporting a large number of thesaurus. The method includes the following steps: constructing a domain dictionary; constructing an offline word segmentation model based on the domain dictionary; for the original text to be segmented, by offline word segmentation The model performs word segmentation to obtain the first word segmentation result; the original text to be segmented is extracted to be searched, and then based on the word to be searched, the first-level index search and the second-level index search are performed in the domain dictionary, and finally the second-level index results are screened. Extract the candidate words; reorganize the candidate words and the first word segmentation results, construct the directed graph of the original text based on the reorganization results, and calculate the optimal word segmentation results based on the shortest path method. The present invention combines word segmentation results in a single field with big word search results, constructs a directed graph based on the combined results, and converts the problem of solving the optimal word segmentation scheme into the problem of the optimal path to quickly solve, which is very suitable for separating big words.
Owner:启业云大数据(南京)有限公司

A Query Method of Indefinite Length Words and Sentences Based on Inverted Index

A query method for variable-length words and sentences in evaluation documents based on an inverted index, which involves an indexing method in the field of data science and a word segmentation method in the field of NLP, and solves the query problem of variable-length words and sentences in evaluation documents. The steps of the present invention are: 1. Carry out data preprocessing on the document to be queried, and use the jieba word segmentation method to carry out word segmentation processing to obtain word dictionary and word frequency information; 2. Establish an adaptive inverted table based on the inverted index principle of the complete reconstruction strategy; 3. , Combining the information of the variable-length words and sentences to be searched, through the self-adaptive inverted list indexing the position information of each word in the words and sentences, identifying the position information of the variable-length words and sentences and indexing the paragraphs where they are located, to complete the query function of variable-length words and sentences in evaluation documents. The basic idea of ​​the present invention is to segment the text data into words, establish an inverted index, and then realize fast searching for words and sentences of indefinite length, so as to realize the query function of evaluation documents. It has a wide range of application scenarios, so it has high socio-economic value.
Owner:HARBIN INST OF TECH

Coding method for clinical examination medical text

The invention provides a coding method for clinical laboratory medicine text, and relates to the field of clinical laboratory medicine. The method comprises the following steps: analyzing and processing a clinical examination medical text to obtain an internal content structure of the clinical examination medical text, and carrying out structured coding on each structure of the clinical examination medical text; and calculating the similarity with each clinical examination medical term in a clinical examination medical term library before carrying out structured coding. Therefore, repeated and similar clinical examination medical terms can be effectively reduced; when the clinical examination medical terms are stored, a source structure based on segmented words is adopted, the segmented words serve as basic units of coding, and then different segmented words in a segmented word library are coded in a combined mode, so that the corresponding clinical examination medical terms are formed; a great storage space can be saved; word segmentation is performed by combining a word segmentation dictionary and a machine learning word segmentation device, so that the workload of manual auditing is reduced, and the word segmentation efficiency is improved; and three mapping modes of full mapping, basic mapping and main segmented word mapping are added, so that the universality is better.
Owner:THE AFFILIATED HOSPITAL OF SOUTHWEST MEDICAL UNIV

A Chinese word segmentation method and device

Embodiments of the invention disclose a Chinese word segmentation method and apparatus. The method comprises the steps of dividing a text set into a plurality of short sentences and numbering the short sentences; for each Chinese character in the text set, obtaining a first short sentence number list corresponding to a current Chinese character, obtaining a second short sentence number list corresponding to an adjacent Chinese character adjacent to the right side of the current Chinese character, and calculating a degree of co-occurrence according to the first short sentence number list and the second short sentence number list; obtaining an adjacent character set corresponding to the current Chinese character, and calculating a relevant degree of adjacency according to the adjacent character set; determining whether a word consisting of the current Chinese character and the adjacent Chinese character is added into a candidate word set or not according to the degree of co-occurrence and the relevant degree of adjacency; and performing word segmentation on the text set according to the candidate word set. The method is small in calculation amount and high in accuracy when calculating the candidate word set, can effectively improve the accuracy of a word segmentation result and improve the efficiency of word segmentation, does not depend on a corpus dictionary, and can realize unsupervised candidate vocabulary extraction.
Owner:RUN TECH CO LTD BEIJING

A Personalized Parallel Word Segmentation Processing System and Processing Method

The invention relates to a personalized concurrent word segmentation processing system and a processing method of the processing system. The personalized concurrent word segmentation processing system and the processing method of the processing system comprises a word segmentation requesting module, a word segmentation module based on a personalized word segmentation dictionary, a word segmentation module based on a general word segmentation dictionary, a control module and a high speed word segmentation processing module. Word segmentation requests of a user are simultaneously sent to the word segmentation module based on the personalized word segmentation dictionary and the word segmentation module based on the general word segmentation dictionary. When the word segmentation module based on the personalized word segmentation dictionary is destined, word segmentation processing result is sent back to the word segmentation requesting module through the control module, and meanwhile word segmentation requests of the word segmentation requesting module to the word segmentation module based on the general word segmentation dictionary is interrupted; otherwise, dynamic update of the personalized word segmentation dictionary is proceeded according to an earliest and least using principle and the word segmentation processing result of the word segmentation module based on the personalized word segmentation dictionary by the control module. The personalized concurrent word segmentation processing system and the processing method of the processing system is capable of satisfying accuracy rate of the word segmentation, meanwhile improving word segmentation efficiency of the system greatly and satisfying efficient referring requirements of a mobile user.
Owner:XIAN UNIV OF POSTS & TELECOMM

Text word segmentation processing method and device, equipment and medium

The invention discloses a text word segmentation processing method and device, equipment and a medium. The method comprises the following steps: collecting a text to be subjected to word segmentation, the text to be subjected to word segmentation comprises a plurality of suspected words which are connected in series, and the suspected words are composed of pronunciation characters; all characters in the text to be subjected to word segmentation are sequentially traversed, redundant characters formed by continuous repetition in the suspected words are ignored in the traversing process, the redundant characters are converted into words in a dictionary tree diagram, the words are sequentially added into a result list, the dictionary tree diagram comprises a plurality of paths starting from a root node of the dictionary tree diagram and respectively reaching different tail end nodes, and the word sequence of the dictionary tree diagram is obtained; nodes through which each path passes store each character of the single word in sequence; and outputting the words in the result list in sequence as word segmentation results. According to the word segmentation device, word segmentation processing is carried out according to the tree diagram, abnormal repeated characters can be processed in the word segmentation process, redundant characters in the text to be subjected to word segmentation are ignored, and words contained in the text are extracted accurately and accurately.
Owner:GUANGZHOU HUADUO NETWORK TECH

Word segmentation method and device, electronic equipment and storage medium

The invention discloses a word segmentation method and device, electronic equipment and a storage medium. The word segmentation method comprises the steps: inputting a word segmentation word stock into a pre-stored baseline word segmentation model, and determining a preliminary word segmentation result of the word segmentation word stock based on the baseline word segmentation model; inputting thepreliminary word segmentation result into a pre-trained word segmentation model, and outputting a segmentation result of the preliminary word segmentation result based on the word segmentation model,the segmentation result comprising a segmentation unit, and the segmentation unit comprising a segmentation character and/or a segmentation character set; and combining the segmentation units according to a preset combination rule, and determining a final word segmentation result of the segmented word stock. According to the word segmentation method, the existing baseline word segmentation modelis not changed, and the convergence rate of the word segmentation model is ensured, and the word segmentation efficiency is improved, and the word segmentation result of the baseline word segmentationmodel is corrected, so that the accuracy of the word segmentation result is improved.
Owner:CHINA MOBILE SUZHOU SOFTWARE TECH CO LTD +1
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products