Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

34results about How to "Improve word segmentation accuracy" patented technology

Chinese text parallel data mining method based on hierarchy

The invention relates to a Chinese text parallel data mining method based on hierarchy, comprising the steps of: step 1: a establishing vector space model of Chinese texts: performing work segmentation regarding to the entire Chinese text set to obtain a word segmentation form and a feature term set containing all removed duplicated terms in the text set of each text, then using the feature term set to count the term frequency-inverse document frequency (TFIDF) of each text, and establishing the text vector space model according to the TFIDF; step 2: performing dimension reduction regarding to a feature item vector of the text vector space model; and step 3: clustering texts using DCURE algorithm based on hierarchy. The method is efficient in word segmentation of Chinese texts with high accuracy, requires no input of parameters like radius of neighborhood for the clustering process, can mine irregular cluster and is insensitive to noise, employs distributed calculating, has high efficiency in mining mass texts and improves calculating speed of feature weight.
Owner:UESTC COMSYS INFORMATION

Human-computer interaction intelligent question answering method based on cloud platform

The invention discloses a human-computer interaction intelligent question answering method based on a cloud platform, which comprises the following steps: acquiring voice information input by a user;performing voice-to-text conversion on the voice information to obtain text information; performing word segmentation processing on the text information to obtain keyword extraction results; using a machine learning algorithm to classify the obtained keyword extraction results so as to obtain classification results; using a natural language processing algorithm to perform keyword expansion on verbs and nouns in the keyword extraction results and taking out a result with the largest similarity from the results of the keyword expansion in each of the verbs and nous, wherein all the results forma keyword expansion sequence; and performing fuzzy matching in a local database according to the classification results and the keyword expandable sequence. The human-computer interaction intelligentquestion answering method based on the cloud platform in the invention can solve the technical problem of low interaction accuracy caused by inaccurate word segmentation, inaccurate keyword expansion,and inaccurate extraction of answers in the existing human-computer interaction question answering system.
Owner:CHANGSHA UNIVERSITY

Field encyclopedia establishment system based on general encyclopedia websites

The invention belongs to the technical field of open knowledge extraction and specifically relates to a field encyclopedia establishment system based on general encyclopedia websites. The system is divided into a plurality of modules, namely an encyclopedia data crawling module, an encyclopedia data preprocessing module, a related entity searching and ranking module and an entity clustering module. The field encyclopedia establishment system based on the general encyclopedia websites has the following beneficial effects: the field encyclopedia is mostly established manually at present, which takes time and labor, and as all related entities cannot be found out manually, the coverage rate is low; instead, the field encyclopedia is established on the basis of the field related entities found out by the field encyclopedia establishment system, and in this way, the labor of establishing the field encyclopedia can be greatly reduced and the coverage rate can be greatly increased; meanwhile, the field encyclopedia established by the field encyclopedia establishment system is greatly convenient for users to obtain the knowledge in specified fields; complex searching and screening processes are omitted, and the pattern that a user passively searches for information is changed into the pattern that the system initiatively provides information.
Owner:FUDAN UNIV

Method and system for blind people to read Chinese character

The invention provides a method and a system for blind people to read Chinese characters, and relates to the technical field of natural language processing and the technical field of disabled-oriented human-computer interaction. The method comprises the following steps: obtaining a Chinese language text, carrying out a word segmentation operation on the Chinese language text to generate a Chinese character string, converting each word in the Chinese character string into corresponding Pinyin by referring to part-of-speech tagging obtained by word segmentation through a pronouncing dictionary, a polyphone dictionary and word frequency information, and connecting the Pinyin into a Pinyin string; looking up a Pinyin and blind character contrast dictionary, converting the Pinyin string into a blind character string, carrying out braille word segmentation on the blind character string through a word segmentation model to generate initial braille segmentation words, fusing the Chinese character string with the initial braille segmentation words to generate new braille segmentation words, and regulating the new braille segmentation words according to a braille segmentation word ligature rule; and carrying out braille tone marking on the new braille segmentation words regulated according to the braille segmentation word ligature rule to generate final braille segmentation words, and displaying the final braille segmentation words.
Owner:INST OF COMPUTING TECH CHINESE ACAD OF SCI

Address similarity calculation method and device, equipment and storage medium

The invention discloses an address similarity calculation method and device, equipment and a storage medium. Aiming at the problems that address matching has complex rules, an existing matching algorithm is not high in retrieval speed and accuracy, and the address matching efficiency is low, a solution is proposed. Input address information is expressed by a proper initial vector; an address similarity calculation model based on a twin neural network is used, a gradient descent algorithm of a ternary loss function is combined, thus obtaining a feature vector of the initial address vector; finally, the cosine distance or L2 distance between the feature vector and an address vector in a standard address data set is calculated, the known address vector closest to the input address vector is obtained, so that the address matching rule is simplified, the preferred accuracy of the same address is improved, and the retrieval speed and accuracy of the matching algorithm are further improved.
Owner:SHANGHAI DONGPU INFORMATION TECH CO LTD

Method and device for segmentation on basis of webpage content classification

The embodiment of the invention provides a method and a device for segmentation on the basis of webpage content classification. The method comprises the following steps of: extracting the text information of webpage contents in search resources; dividing the classes of the text information according to the classes of the webpage contents; segmenting the text information according to segmentation dictionaries corresponding to the classes of the text information. According to the embodiment of the invention, the classes of the text information of the webpage contents in the search resources are divided, and the text information is segmented on the basis of the segmentation dictionaries corresponding to the classes, so as to adapt to different classes of language characteristics better, meanwhile, the segmentation accuracy for different classes is also improved, and the optimal processing for local segmentation is realized; moreover, the improvement of the accuracy of segmentation is close to the intention of a user and improve the user experience, and then reduce the operations of re-input, search and the like of the user, and improve the simplicity of operation, meanwhile, the response of equipment on the operation of the user is reduced, and the consumption of the system resources of the equipment is reduced.
Owner:BEIJING QIHOO TECH CO LTD +1

Multi-strategy integration standard terminology processing method for oil and gas pipeline field

The invention discloses a multi-strategy integration standard terminology processing method for the oil and gas pipeline field and relates to the technical field of linguistic analysis and pipeline systems. The method is characterized by mainly comprising three modules as follows: 1) corpus preprocessing in the oil and gas pipeline field and text segmentation result optimization are performed; 2) term construction is realized in forms of a single algorithm and combination of multiple algorithms respectively; 3) obtained terms are filtered according to summarized rules, junk terms and conventional terms are rejected, and term processing results are optimized. The overall process is as follows: 1) corpus preprocessing in the oil and gas pipeline field and text segmentation result optimization; 2) a term construction method in the oil and gas pipeline field; 3) term construction optimization in the oil and gas pipeline field. With the adoption of the method, the segmentation accuracy is improved, and the term extraction precision ratio and the technical field correlation of final relative terms are improved.
Owner:PIPECHINA SOUTH CHINA CO

Power grid equipment word segmentation dictionary and fault case library construction method

The invention discloses a power grid equipment word segmentation dictionary and a fault case library construction method. The method comprises the steps: constructing a power grid field word segmentation dictionary, carrying out the format conversion and word segmentation of fault case data, carrying out the analysis and generation of structured power grid equipment fault cases, feature tags, keyword clouds and association rules from text data through employing a plurality of technical means. and designing a relational database Schema for the information, taking a report as a main key, and storing the text information and information such as pictures, authors and the like reserved in preprocessing in a library to form a power grid equipment fault case library. According to the method, the word segmentation accuracy of the power grid field text is improved, the structured case database enables retrieval according to case contents to be more accurate, the feature tags in the fault case database serve as item sets, effective association rules of faults are sorted and mined, the method can be used for fault early warning, and the blank of application of the power grid field text analysis technology is filled up. The application value of corpora in the power grid field is improved, and the consulting cost is reduced.
Owner:ELECTRIC POWER RESEARCH INSTITUTE OF STATE GRID SHANDONG ELECTRIC POWER COMPANY +2

Participle processing method and system for customer address information

The invention provides a participle processing method and system for customer address information. The participle processing method is characterized in that an administrative region matching list used for defining the codes of all administrative regions is saved in advance, and comprises the following steps: determining current customer address information to be processed; processing the current customer address information to be processed, so as to obtain customer address information meeting processing standards; matching each sub-address information in the customer address information meeting the processing standards with administrative regions in the administrative region matching list according to the longest matching principle; when the first sub-address information in the customer address information meeting the processing standards is matched with the first administrative region in the administrative region matching list, and the matching result is unique, determining the first code of the first sub-address information; and acquiring the codes of all sub-address information in the customer address information meeting the processing standards, and generating normalized customer address information. According to the invention, the customer address information recorded manually is normalized, so that the participle accuracy of a banking system is improved.
Owner:CHINA CONSTRUCTION BANK

Text word segmentation analysis method and system for medical record text data structuring

The invention discloses a text word segmentation analysis method and system for medical record text data structuring, belongs to the technical field of medical record data mining, and aims to solve the technical problem of how to solve the defects of low mining efficiency, poor accuracy and incapability of meeting a medical record entity mapping relationship in traditional medical record data. The method comprises the following steps: constructing a medical word library based on medical text data; generating all formed words of the medical text data to be subjected to word segmentation based on the lexicon dictionary, and constructing a directed acyclic graph based on all the formed words; based on the medical lexicon and the directed acyclic graph, searching a maximum segmentation combination of a statement word frequency by searching a maximum return-to-zero path through dynamic planning to obtain a word set with a preamble sequence and part-of-speech; analyzing the word set through a ternary relation model to obtain a ternary mapping relation data set; and carrying out standardization processing on the ternary mapping relation data set to obtain a binary mapping relation data set.
Owner:山东健康医疗大数据有限公司

Animal product safety event text classification method based on multi-level structure dictionary

The invention relates to an animal product safety event text classification method based on a multi-level structure dictionary. The method comprises the steps of performing word segmentation and stopword removal processing on a to-be-processed text; distributing a counter for each residual vocabulary; matching the residual word segmentation result of each text with vocabularies in a constructed multi-level structure dictionary of the animal product safety event, and adding 1 to the count value of the successfully matched vocabularies in an accumulated manner; and finally, performing descending sorting according to the counting values of the vocabularies, and classifying the text into the hierarchy and category of the dictionary where the vocabulary with the highest frequency is located. The method can assist a word segmentation tool in word segmentation of texts to improve the accuracy of entity recognition, can classify Chinese texts according to a hierarchical structure of an animalproduct safety event dictionary, and can also realize hierarchical classification under different requirements to obtain hierarchical and category relationships among the texts. In addition, a largeamount of manpower and time are saved, and the accuracy is obviously improved.
Owner:CHINA AGRI UNIV

Method for extracting key elements from natural language input of user

The invention relates to a method for extracting key elements from natural language input of a user. The method includes following steps: performing semantic matching on first natural language input of the user according to a first semantic knowledge library to recognize overall semantic meaning; selecting a limiting knowledge library; shrinking the limiting knowledge library by determining entries correlated with the total semantic meaning in the limiting knowledge library and removing other entries; performing mechanical word segmentation on the first natural language input to generate a word segmentation result set of the first natural language input; using the limiting knowledge library after being shrunk to match word segmentation results to determine a word segmentation result in the word segmentation result set; selecting one or multiple words from the word segmentation result as the key elements. By the method, word segmentation efficiency and correctness can be improved greatly, so that determination of correct key elements is guaranteed.
Owner:上海对岸信息科技有限公司

A method and system for blind people to read Chinese characters

The invention provides a method and a system for blind people to read Chinese characters, and relates to the technical field of natural language processing and the technical field of disabled-oriented human-computer interaction. The method comprises the following steps: obtaining a Chinese language text, carrying out a word segmentation operation on the Chinese language text to generate a Chinese character string, converting each word in the Chinese character string into corresponding Pinyin by referring to part-of-speech tagging obtained by word segmentation through a pronouncing dictionary, a polyphone dictionary and word frequency information, and connecting the Pinyin into a Pinyin string; looking up a Pinyin and blind character contrast dictionary, converting the Pinyin string into a blind character string, carrying out braille word segmentation on the blind character string through a word segmentation model to generate initial braille segmentation words, fusing the Chinese character string with the initial braille segmentation words to generate new braille segmentation words, and regulating the new braille segmentation words according to a braille segmentation word ligature rule; and carrying out braille tone marking on the new braille segmentation words regulated according to the braille segmentation word ligature rule to generate final braille segmentation words, and displaying the final braille segmentation words.
Owner:INST OF COMPUTING TECH CHINESE ACAD OF SCI

Word segmentation model building method and apparatus

Embodiments of the invention provide a word segmentation model building method and apparatus. The method comprises the steps of performing alignment on characters in a first corpus and words in a second corpus to obtain an alignment relationship between the first corpus and the second corpus, wherein the first corpus is a corpus without a space division boundary between words; determining boundary information of the words in the first corpus according to the alignment relationship between the first corpus and the second corpus; and performing training according to the boundary information of the words in the first corpus to generate a word segmentation model. According to the word segmentation model building method and apparatus provided by the embodiments of the invention, the word segmentation accuracy, especially the word segmentation accuracy of the corpus without the space division boundary between the words can be improved.
Owner:新译信息科技(深圳)有限公司

Standard word library word segmentation method, device and equipment and computer readable storage medium

The invention discloses a standard word library word segmentation method, device and equipment and a computer readable storage medium, and the method comprises the steps of splitting standard words ina standard word library to be subjected to word segmentation into single Chinese characters, forming a Chinese character library, and generating the adjacent frequency between every two Chinese characters in the Chinese character library; carrying out merging operation on the Chinese characters in the Chinese character library according to the adjacent frequencies to generate a Chinese charactergroup, and carrying out updating operation on the adjacent frequencies among the Chinese characters in the Chinese character library after the merging operation; judging whether the maximum frequencyvalue of the adjacent frequencies among the Chinese characters in the updated Chinese character library is smaller than a preset threshold value or not; if not, executing the step of carrying out merging operation on each Chinese character in the Chinese character library according to the adjacent frequency; and if yes, forming standard segmented words of the standard word library to be segmentedby the Chinese character groups. According to the scheme, the standard words in the standard word library to be segmented are segmented through the adjacent frequencies among the Chinese characters, and the word segmentation accuracy of the standard word library to be segmented can be effectively improved.
Owner:PING AN TECH (SHENZHEN) CO LTD

Standard lexicon word segmentation method, device and equipment and computer readable storage medium

The invention provides a standard word library word segmentation method, device and equipment and a computer readable storage medium. The method comprises: standard words in a standard word library tobe segmented are scattered into single Chinese characters to form an original Chinese character library, and a first adjacent probability and a first Bayesian probability between every two Chinese characters in the original Chinese character library are calculated; performing a Chinese character merging operation on the original Chinese character library according to the first adjacent probability and the first Bayesian probability to obtain a to-be-adjusted Chinese character library; judging whether the minimum adjacent probability in the second adjacent probability between every two Chinesecharacters in the to-be-adjusted Chinese character library is greater than a preset threshold value or not; if yes, according to a second adjacent probability and a second Bayesian probability, executing a Chinese character combination operation on the to-be-adjusted Chinese character library until the minimum adjacent probability in the adjacent probabilities between every two Chinese charactersin the obtained target Chinese character library is smaller than or equal to a preset threshold value; otherwise, outputting the combined Chinese character groups as standard words. According to themethod, the word segmentation accuracy of the standard lexicon and the universality of the standard lexicon are improved.
Owner:PING AN TECH (SHENZHEN) CO LTD

Word segmentation method, device and equipment based on multilevel dictionary and readable storage medium

The invention discloses a word segmentation method based on multistage dictionaries, and the method comprises the steps: employing at least two dictionaries to assist a word segmentation model in wordsegmentation, generating conventional vector representation and feature representation of a character in the at least two dictionaries during the representation of the character, and finally, determining a word forming label of the character according to the vector representation and the feature representation. According to the method, by distinguishing the status and importance of different words, the word segmentation performance of the whole scheme is improved, and the domain adaptability and the word segmentation accuracy are improved. In addition, the invention further provides a word segmentation device and equipment based on the multilevel dictionary and a readable storage medium, and the technical effect of the word segmentation device and equipment corresponds to the technical effect of the method.
Owner:SUZHOU UNIV

Efficient intelligent question-answering system for knowledge in artificial intelligence field

The invention relates to an efficient intelligent question-answering system for knowledge in the field of artificial intelligence. The system comprises a preparation module and a question-answering module, wherein the preparation module comprises a data collection module, a model training module and a question and answer system knowledge structure construction module; the question-answering module comprises an input preprocessing module, a question-answering module based on a knowledge base, a question-answering module based on a text library and a question recommendation module based on the knowledge base. Through the preparation module and the question-answering module, the word segmentation accuracy of user questions, knowledge base questions and text base questions is greatly enhanced, and the overall accuracy of the full question-answering system is greatly improved, so that the user experience is greatly improved, and the knowledge question-answering service with low cost, high efficiency and high user experience is realized.
Owner:SOUTH CHINA UNIV OF TECH

Sensitive information query method based on deep learning

The invention discloses a sensitive information query method based on deep learning, which comprises the following steps: step 1, carrying out word segmentation processing on a text to be queried, andthen converting the text to be queried into a feature vector; and 2, inputting the feature vector obtained in the step 1 into a neural network model, outputting the similarity with a sensitive word library, if the similarity is higher than a threshold value, judging that the to-be-queried text contains sensitive words, and outputting a corresponding sensitive word result. According to the sensitive information query method based on deep learning, on one hand, by setting the word segmentation rule and training and updating the word segmentation rule, accurate word segmentation processing can be flexibly carried out on the text, and the word segmentation accuracy is improved; and on the other hand, by introducing an artificial intelligence technology, adopting a deep learning method and constructing a neural network model, the text is accurately and effectively recognized, the query accuracy is improved, and the query efficiency is improved.
Owner:盐城数智科技有限公司

A word segmentation processing method and system for customer address information

The invention provides a participle processing method and system for customer address information. The participle processing method is characterized in that an administrative region matching list used for defining the codes of all administrative regions is saved in advance, and comprises the following steps: determining current customer address information to be processed; processing the current customer address information to be processed, so as to obtain customer address information meeting processing standards; matching each sub-address information in the customer address information meeting the processing standards with administrative regions in the administrative region matching list according to the longest matching principle; when the first sub-address information in the customer address information meeting the processing standards is matched with the first administrative region in the administrative region matching list, and the matching result is unique, determining the first code of the first sub-address information; and acquiring the codes of all sub-address information in the customer address information meeting the processing standards, and generating normalized customer address information. According to the invention, the customer address information recorded manually is normalized, so that the participle accuracy of a banking system is improved.
Owner:CHINA CONSTRUCTION BANK

Standard terminology processing method for multi-strategy fusion in the field of oil and gas pipelines

The invention discloses a multi-strategy integration standard terminology processing method for the oil and gas pipeline field and relates to the technical field of linguistic analysis and pipeline systems. The method is characterized by mainly comprising three modules as follows: 1) corpus preprocessing in the oil and gas pipeline field and text segmentation result optimization are performed; 2) term construction is realized in forms of a single algorithm and combination of multiple algorithms respectively; 3) obtained terms are filtered according to summarized rules, junk terms and conventional terms are rejected, and term processing results are optimized. The overall process is as follows: 1) corpus preprocessing in the oil and gas pipeline field and text segmentation result optimization; 2) a term construction method in the oil and gas pipeline field; 3) term construction optimization in the oil and gas pipeline field. With the adoption of the method, the segmentation accuracy is improved, and the term extraction precision ratio and the technical field correlation of final relative terms are improved.
Owner:PIPECHINA SOUTH CHINA CO

Chinese word segmentation method and device

The invention discloses a Chinese word segmentation method and device. The method comprises the steps of receiving first target text information sent by a user; carrying out data mapping on the firsttarget text information through a first classifier to obtain corresponding first target category information; and performing preset inquiry operation according to the first target category informationand returning an inquiry result to the user. In a mode that the first target text information sent by the user is subjected to the data mapping through the first classifier, the corresponding first target category information is obtained, so that the purpose of performing preset inquiry operation according to the first target category information is achieved, the technical effect of improving theword segmentation accuracy is achieved, and the problem of low accuracy of Chinese word segmentation in related technologies is solved.
Owner:DATAGRAND TECH INC

Power grid power failure address matching method based on word bank two-way maximum matching method

The invention provides a power grid power failure address matching method based on a word bank two-way maximum matching method, which comprises the following steps of: 1, constructing a power failureaddress element library which comprises an address element word bank, a stop word bank and a synonym bank; 2, preprocessing a to-be-matched address text by utilizing the stop word bank and the synonymbank; 3, performing word segmentation on the to-be-matched address text by utilizing a bidirectional maximum matching word segmentation method, and segmenting an address element sequence of a to-be-matched system; and 4, comparing the address element sequences of the to-be-matched system according to an address element matching rule, judging whether the address element sequences are matched or not, and listing difference items if the address element sequences are not matched. By dynamically maintaining the power grid power failure address element library and performing abbreviation filling processing on the address elements, the address text word segmentation recognition rate can be improved, address element matching of a single address item can be processed, the address element matchingproblem of multiple address items can also be processed, and the address element matching accuracy can be effectively improved.
Owner:STATE GRID HUBEI ELECTRIC POWER RES INST +1

Intelligent Chinese word segmentation method based on statistics and deep learning

ActiveCN110414002AAccurate participleWord segmentation is fastEnergy efficient computingSpecial data processing applicationsChinese wordA domain
The invention discloses an intelligent Chinese word segmentation method based on statistics and deep learning. The method comprises the following steps of constructing a domain term set; selecting a word segmentation method; word segmentation decision. The method has the advantages that a word segmentation model combining the word segmentation method based on statistics and the deep learning technology is adopted, an application range is wide, accurate word segmentation can be conducted on professional words in the professional field, the algorithm is simple, and the word segmentation speed ishigh.
Owner:SHANDONG UNIV OF SCI & TECH

Word segmentation method, device, equipment and readable storage medium based on multi-level dictionary

This application discloses a word segmentation method based on multi-level dictionaries. The method uses at least two dictionaries to assist the word segmentation model for word segmentation. When representing a character, it not only generates a conventional vector representation, but also generates the character in at least two The feature representation in a dictionary, and finally determine the word-forming label of the character according to the vector representation and feature representation. This method improves the word segmentation performance of the overall scheme by distinguishing the status and importance of different words, and improves the domain adaptability and word segmentation accuracy. In addition, the present application also provides a word segmentation device, device and readable storage medium based on a multi-level dictionary, the technical effect of which is corresponding to the technical effect of the above method.
Owner:SUZHOU UNIV

Word segmentation method and system for omnimedia science popularization window

The invention discloses a word segmentation method and system for omnimedia science popularization window. The word segmentation method comprises the following steps: obtaining a character sequence; inputting the character sequence into a estimation module for estimation, and determining a good value; inputting the good value and the character sequence into the selection module for screening processing, and determining a segmented form with a maximum good value, wherein the segmented form with the maximum good value is a hierarchical structure, the hierarchical structure is used for word segmentation on the character sequence; judging whether iterative processing needs to be carried out according to the segmentation form with the maximum good value, and if yes, inputting the segmentation form with the maximum good value and the character sequence into the adjustment module for adjustment, determining an adjusted character sequence, and updating the statistical information in the evaluation module by using the adjusted character sequence. By adopting the word segmentation method and system provided by the invention, the word segmentation precision can be improved.
Owner:北京千松科技发展有限公司

Information extraction method and system oriented to international electric connection radio rules

The invention discloses an information extraction method and system for an international electric connection radio rule, and the method comprises the steps: carrying out the preprocessing of a text of the international electric connection radio rule, and building a database for recording all terms; identifying all terms which may have a relationship from a database; for each clause possibly having the relationship, analyzing a specific relationship type between the clause possibly having the relationship and the clause having the association relationship; and by using a natural language processing method based on rule matching, extracting the number of the clause which has an association relationship with the clause which possibly has the relationship, and writing the number of the clause which has the association relationship and the specific relationship type into a corresponding position in a database. According to the method, named entity and entity relationship extraction is carried out on an international electric connection radio rule by adopting an NLP technology based on knowledge engineering; and the mutual relation among the terms is automatically combed, and a satellite network operator is supported to quickly track and master the radio rules.
Owner:NAT SPACE SCI CENT CAS +1
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products