Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

34results about How to "Improve word segmentation accuracy" patented technology

Human-computer interaction intelligent question answering method based on cloud platform

The invention discloses a human-computer interaction intelligent question answering method based on a cloud platform, which comprises the following steps: acquiring voice information input by a user;performing voice-to-text conversion on the voice information to obtain text information; performing word segmentation processing on the text information to obtain keyword extraction results; using a machine learning algorithm to classify the obtained keyword extraction results so as to obtain classification results; using a natural language processing algorithm to perform keyword expansion on verbs and nouns in the keyword extraction results and taking out a result with the largest similarity from the results of the keyword expansion in each of the verbs and nous, wherein all the results forma keyword expansion sequence; and performing fuzzy matching in a local database according to the classification results and the keyword expandable sequence. The human-computer interaction intelligentquestion answering method based on the cloud platform in the invention can solve the technical problem of low interaction accuracy caused by inaccurate word segmentation, inaccurate keyword expansion,and inaccurate extraction of answers in the existing human-computer interaction question answering system.
Owner:CHANGSHA UNIVERSITY

Field encyclopedia establishment system based on general encyclopedia websites

The invention belongs to the technical field of open knowledge extraction and specifically relates to a field encyclopedia establishment system based on general encyclopedia websites. The system is divided into a plurality of modules, namely an encyclopedia data crawling module, an encyclopedia data preprocessing module, a related entity searching and ranking module and an entity clustering module. The field encyclopedia establishment system based on the general encyclopedia websites has the following beneficial effects: the field encyclopedia is mostly established manually at present, which takes time and labor, and as all related entities cannot be found out manually, the coverage rate is low; instead, the field encyclopedia is established on the basis of the field related entities found out by the field encyclopedia establishment system, and in this way, the labor of establishing the field encyclopedia can be greatly reduced and the coverage rate can be greatly increased; meanwhile, the field encyclopedia established by the field encyclopedia establishment system is greatly convenient for users to obtain the knowledge in specified fields; complex searching and screening processes are omitted, and the pattern that a user passively searches for information is changed into the pattern that the system initiatively provides information.
Owner:FUDAN UNIV

Method and system for blind people to read Chinese character

The invention provides a method and a system for blind people to read Chinese characters, and relates to the technical field of natural language processing and the technical field of disabled-oriented human-computer interaction. The method comprises the following steps: obtaining a Chinese language text, carrying out a word segmentation operation on the Chinese language text to generate a Chinese character string, converting each word in the Chinese character string into corresponding Pinyin by referring to part-of-speech tagging obtained by word segmentation through a pronouncing dictionary, a polyphone dictionary and word frequency information, and connecting the Pinyin into a Pinyin string; looking up a Pinyin and blind character contrast dictionary, converting the Pinyin string into a blind character string, carrying out braille word segmentation on the blind character string through a word segmentation model to generate initial braille segmentation words, fusing the Chinese character string with the initial braille segmentation words to generate new braille segmentation words, and regulating the new braille segmentation words according to a braille segmentation word ligature rule; and carrying out braille tone marking on the new braille segmentation words regulated according to the braille segmentation word ligature rule to generate final braille segmentation words, and displaying the final braille segmentation words.
Owner:INST OF COMPUTING TECH CHINESE ACAD OF SCI

Method and device for segmentation on basis of webpage content classification

The embodiment of the invention provides a method and a device for segmentation on the basis of webpage content classification. The method comprises the following steps of: extracting the text information of webpage contents in search resources; dividing the classes of the text information according to the classes of the webpage contents; segmenting the text information according to segmentation dictionaries corresponding to the classes of the text information. According to the embodiment of the invention, the classes of the text information of the webpage contents in the search resources are divided, and the text information is segmented on the basis of the segmentation dictionaries corresponding to the classes, so as to adapt to different classes of language characteristics better, meanwhile, the segmentation accuracy for different classes is also improved, and the optimal processing for local segmentation is realized; moreover, the improvement of the accuracy of segmentation is close to the intention of a user and improve the user experience, and then reduce the operations of re-input, search and the like of the user, and improve the simplicity of operation, meanwhile, the response of equipment on the operation of the user is reduced, and the consumption of the system resources of the equipment is reduced.
Owner:BEIJING QIHOO TECH CO LTD +1

Power grid equipment word segmentation dictionary and fault case library construction method

The invention discloses a power grid equipment word segmentation dictionary and a fault case library construction method. The method comprises the steps: constructing a power grid field word segmentation dictionary, carrying out the format conversion and word segmentation of fault case data, carrying out the analysis and generation of structured power grid equipment fault cases, feature tags, keyword clouds and association rules from text data through employing a plurality of technical means. and designing a relational database Schema for the information, taking a report as a main key, and storing the text information and information such as pictures, authors and the like reserved in preprocessing in a library to form a power grid equipment fault case library. According to the method, the word segmentation accuracy of the power grid field text is improved, the structured case database enables retrieval according to case contents to be more accurate, the feature tags in the fault case database serve as item sets, effective association rules of faults are sorted and mined, the method can be used for fault early warning, and the blank of application of the power grid field text analysis technology is filled up. The application value of corpora in the power grid field is improved, and the consulting cost is reduced.
Owner:ELECTRIC POWER RESEARCH INSTITUTE OF STATE GRID SHANDONG ELECTRIC POWER COMPANY +2

Participle processing method and system for customer address information

The invention provides a participle processing method and system for customer address information. The participle processing method is characterized in that an administrative region matching list used for defining the codes of all administrative regions is saved in advance, and comprises the following steps: determining current customer address information to be processed; processing the current customer address information to be processed, so as to obtain customer address information meeting processing standards; matching each sub-address information in the customer address information meeting the processing standards with administrative regions in the administrative region matching list according to the longest matching principle; when the first sub-address information in the customer address information meeting the processing standards is matched with the first administrative region in the administrative region matching list, and the matching result is unique, determining the first code of the first sub-address information; and acquiring the codes of all sub-address information in the customer address information meeting the processing standards, and generating normalized customer address information. According to the invention, the customer address information recorded manually is normalized, so that the participle accuracy of a banking system is improved.
Owner:CHINA CONSTRUCTION BANK

Animal product safety event text classification method based on multi-level structure dictionary

The invention relates to an animal product safety event text classification method based on a multi-level structure dictionary. The method comprises the steps of performing word segmentation and stopword removal processing on a to-be-processed text; distributing a counter for each residual vocabulary; matching the residual word segmentation result of each text with vocabularies in a constructed multi-level structure dictionary of the animal product safety event, and adding 1 to the count value of the successfully matched vocabularies in an accumulated manner; and finally, performing descending sorting according to the counting values of the vocabularies, and classifying the text into the hierarchy and category of the dictionary where the vocabulary with the highest frequency is located. The method can assist a word segmentation tool in word segmentation of texts to improve the accuracy of entity recognition, can classify Chinese texts according to a hierarchical structure of an animalproduct safety event dictionary, and can also realize hierarchical classification under different requirements to obtain hierarchical and category relationships among the texts. In addition, a largeamount of manpower and time are saved, and the accuracy is obviously improved.
Owner:CHINA AGRI UNIV

A method and system for blind people to read Chinese characters

The invention provides a method and a system for blind people to read Chinese characters, and relates to the technical field of natural language processing and the technical field of disabled-oriented human-computer interaction. The method comprises the following steps: obtaining a Chinese language text, carrying out a word segmentation operation on the Chinese language text to generate a Chinese character string, converting each word in the Chinese character string into corresponding Pinyin by referring to part-of-speech tagging obtained by word segmentation through a pronouncing dictionary, a polyphone dictionary and word frequency information, and connecting the Pinyin into a Pinyin string; looking up a Pinyin and blind character contrast dictionary, converting the Pinyin string into a blind character string, carrying out braille word segmentation on the blind character string through a word segmentation model to generate initial braille segmentation words, fusing the Chinese character string with the initial braille segmentation words to generate new braille segmentation words, and regulating the new braille segmentation words according to a braille segmentation word ligature rule; and carrying out braille tone marking on the new braille segmentation words regulated according to the braille segmentation word ligature rule to generate final braille segmentation words, and displaying the final braille segmentation words.
Owner:INST OF COMPUTING TECH CHINESE ACAD OF SCI

Standard word library word segmentation method, device and equipment and computer readable storage medium

The invention discloses a standard word library word segmentation method, device and equipment and a computer readable storage medium, and the method comprises the steps of splitting standard words ina standard word library to be subjected to word segmentation into single Chinese characters, forming a Chinese character library, and generating the adjacent frequency between every two Chinese characters in the Chinese character library; carrying out merging operation on the Chinese characters in the Chinese character library according to the adjacent frequencies to generate a Chinese charactergroup, and carrying out updating operation on the adjacent frequencies among the Chinese characters in the Chinese character library after the merging operation; judging whether the maximum frequencyvalue of the adjacent frequencies among the Chinese characters in the updated Chinese character library is smaller than a preset threshold value or not; if not, executing the step of carrying out merging operation on each Chinese character in the Chinese character library according to the adjacent frequency; and if yes, forming standard segmented words of the standard word library to be segmentedby the Chinese character groups. According to the scheme, the standard words in the standard word library to be segmented are segmented through the adjacent frequencies among the Chinese characters, and the word segmentation accuracy of the standard word library to be segmented can be effectively improved.
Owner:PING AN TECH (SHENZHEN) CO LTD

Standard lexicon word segmentation method, device and equipment and computer readable storage medium

The invention provides a standard word library word segmentation method, device and equipment and a computer readable storage medium. The method comprises: standard words in a standard word library tobe segmented are scattered into single Chinese characters to form an original Chinese character library, and a first adjacent probability and a first Bayesian probability between every two Chinese characters in the original Chinese character library are calculated; performing a Chinese character merging operation on the original Chinese character library according to the first adjacent probability and the first Bayesian probability to obtain a to-be-adjusted Chinese character library; judging whether the minimum adjacent probability in the second adjacent probability between every two Chinesecharacters in the to-be-adjusted Chinese character library is greater than a preset threshold value or not; if yes, according to a second adjacent probability and a second Bayesian probability, executing a Chinese character combination operation on the to-be-adjusted Chinese character library until the minimum adjacent probability in the adjacent probabilities between every two Chinese charactersin the obtained target Chinese character library is smaller than or equal to a preset threshold value; otherwise, outputting the combined Chinese character groups as standard words. According to themethod, the word segmentation accuracy of the standard lexicon and the universality of the standard lexicon are improved.
Owner:PING AN TECH (SHENZHEN) CO LTD

A word segmentation processing method and system for customer address information

The invention provides a participle processing method and system for customer address information. The participle processing method is characterized in that an administrative region matching list used for defining the codes of all administrative regions is saved in advance, and comprises the following steps: determining current customer address information to be processed; processing the current customer address information to be processed, so as to obtain customer address information meeting processing standards; matching each sub-address information in the customer address information meeting the processing standards with administrative regions in the administrative region matching list according to the longest matching principle; when the first sub-address information in the customer address information meeting the processing standards is matched with the first administrative region in the administrative region matching list, and the matching result is unique, determining the first code of the first sub-address information; and acquiring the codes of all sub-address information in the customer address information meeting the processing standards, and generating normalized customer address information. According to the invention, the customer address information recorded manually is normalized, so that the participle accuracy of a banking system is improved.
Owner:CHINA CONSTRUCTION BANK

Power grid power failure address matching method based on word bank two-way maximum matching method

The invention provides a power grid power failure address matching method based on a word bank two-way maximum matching method, which comprises the following steps of: 1, constructing a power failureaddress element library which comprises an address element word bank, a stop word bank and a synonym bank; 2, preprocessing a to-be-matched address text by utilizing the stop word bank and the synonymbank; 3, performing word segmentation on the to-be-matched address text by utilizing a bidirectional maximum matching word segmentation method, and segmenting an address element sequence of a to-be-matched system; and 4, comparing the address element sequences of the to-be-matched system according to an address element matching rule, judging whether the address element sequences are matched or not, and listing difference items if the address element sequences are not matched. By dynamically maintaining the power grid power failure address element library and performing abbreviation filling processing on the address elements, the address text word segmentation recognition rate can be improved, address element matching of a single address item can be processed, the address element matchingproblem of multiple address items can also be processed, and the address element matching accuracy can be effectively improved.
Owner:STATE GRID HUBEI ELECTRIC POWER RES INST +1

Information extraction method and system oriented to international electric connection radio rules

The invention discloses an information extraction method and system for an international electric connection radio rule, and the method comprises the steps: carrying out the preprocessing of a text of the international electric connection radio rule, and building a database for recording all terms; identifying all terms which may have a relationship from a database; for each clause possibly having the relationship, analyzing a specific relationship type between the clause possibly having the relationship and the clause having the association relationship; and by using a natural language processing method based on rule matching, extracting the number of the clause which has an association relationship with the clause which possibly has the relationship, and writing the number of the clause which has the association relationship and the specific relationship type into a corresponding position in a database. According to the method, named entity and entity relationship extraction is carried out on an international electric connection radio rule by adopting an NLP technology based on knowledge engineering; and the mutual relation among the terms is automatically combed, and a satellite network operator is supported to quickly track and master the radio rules.
Owner:NAT SPACE SCI CENT CAS +1
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products