Patents

Literature

Patsnap Eureka AI that helps you search prior art, draft patents, and assess FTO risks, powered by patent and scientific literature data.

34results about How to "Improve word segmentation accuracy" patented technology

Filter

Efficacy Topic

Property

Owner

Technical Advancement

Application Domain

Technology Topic

Technology Field Word

Patent Country/Region

Patent Type

Patent Status

Application Year

Inventor

Chinese text parallel data mining method based on hierarchy

ActiveCN102662952AImprove word segmentation efficiencyImprove word segmentation accuracySpecial data processing applicationsData dredgingText mining

The invention relates to a Chinese text parallel data mining method based on hierarchy, comprising the steps of: step 1: a establishing vector space model of Chinese texts: performing work segmentation regarding to the entire Chinese text set to obtain a word segmentation form and a feature term set containing all removed duplicated terms in the text set of each text, then using the feature term set to count the term frequency-inverse document frequency (TFIDF) of each text, and establishing the text vector space model according to the TFIDF; step 2: performing dimension reduction regarding to a feature item vector of the text vector space model; and step 3: clustering texts using DCURE algorithm based on hierarchy. The method is efficient in word segmentation of Chinese texts with high accuracy, requires no input of parameters like radius of neighborhood for the clustering process, can mine irregular cluster and is insensitive to noise, employs distributed calculating, has high efficiency in mining mass texts and improves calculating speed of feature weight.

Chinese text parallel data mining method based on hierarchy

Chinese text parallel data mining method based on hierarchy

Chinese text parallel data mining method based on hierarchy

Owner:UESTC COMSYS INFORMATION

Human-computer interaction intelligent question answering method based on cloud platform

InactiveCN108595696AWord segmentation results are accurateImprove word segmentation accuracyNatural language data processingSpeech recognitionTextual informationKeyword extraction

The invention discloses a human-computer interaction intelligent question answering method based on a cloud platform, which comprises the following steps: acquiring voice information input by a user;performing voice-to-text conversion on the voice information to obtain text information; performing word segmentation processing on the text information to obtain keyword extraction results; using a machine learning algorithm to classify the obtained keyword extraction results so as to obtain classification results; using a natural language processing algorithm to perform keyword expansion on verbs and nouns in the keyword extraction results and taking out a result with the largest similarity from the results of the keyword expansion in each of the verbs and nous, wherein all the results forma keyword expansion sequence; and performing fuzzy matching in a local database according to the classification results and the keyword expandable sequence. The human-computer interaction intelligentquestion answering method based on the cloud platform in the invention can solve the technical problem of low interaction accuracy caused by inaccurate word segmentation, inaccurate keyword expansion,and inaccurate extraction of answers in the existing human-computer interaction question answering system.

Human-computer interaction intelligent question answering method based on cloud platform

Human-computer interaction intelligent question answering method based on cloud platform

Human-computer interaction intelligent question answering method based on cloud platform

Owner:CHANGSHA UNIVERSITY

Field encyclopedia establishment system based on general encyclopedia websites

InactiveCN104408148AImprove word segmentation accuracyImprove the efficiency of knowledge acquisitionWeb data indexingCharacter and pattern recognitionEncyclopediaComputer science

The invention belongs to the technical field of open knowledge extraction and specifically relates to a field encyclopedia establishment system based on general encyclopedia websites. The system is divided into a plurality of modules, namely an encyclopedia data crawling module, an encyclopedia data preprocessing module, a related entity searching and ranking module and an entity clustering module. The field encyclopedia establishment system based on the general encyclopedia websites has the following beneficial effects: the field encyclopedia is mostly established manually at present, which takes time and labor, and as all related entities cannot be found out manually, the coverage rate is low; instead, the field encyclopedia is established on the basis of the field related entities found out by the field encyclopedia establishment system, and in this way, the labor of establishing the field encyclopedia can be greatly reduced and the coverage rate can be greatly increased; meanwhile, the field encyclopedia established by the field encyclopedia establishment system is greatly convenient for users to obtain the knowledge in specified fields; complex searching and screening processes are omitted, and the pattern that a user passively searches for information is changed into the pattern that the system initiatively provides information.

Field encyclopedia establishment system based on general encyclopedia websites

Field encyclopedia establishment system based on general encyclopedia websites

Field encyclopedia establishment system based on general encyclopedia websites

Owner:FUDAN UNIV

Method and system for blind people to read Chinese character

ActiveCN105404621AAvoid low accuracyImprove accuracySpecial data processing applicationsHuman languageBraille

The invention provides a method and a system for blind people to read Chinese characters, and relates to the technical field of natural language processing and the technical field of disabled-oriented human-computer interaction. The method comprises the following steps: obtaining a Chinese language text, carrying out a word segmentation operation on the Chinese language text to generate a Chinese character string, converting each word in the Chinese character string into corresponding Pinyin by referring to part-of-speech tagging obtained by word segmentation through a pronouncing dictionary, a polyphone dictionary and word frequency information, and connecting the Pinyin into a Pinyin string; looking up a Pinyin and blind character contrast dictionary, converting the Pinyin string into a blind character string, carrying out braille word segmentation on the blind character string through a word segmentation model to generate initial braille segmentation words, fusing the Chinese character string with the initial braille segmentation words to generate new braille segmentation words, and regulating the new braille segmentation words according to a braille segmentation word ligature rule; and carrying out braille tone marking on the new braille segmentation words regulated according to the braille segmentation word ligature rule to generate final braille segmentation words, and displaying the final braille segmentation words.

Method and system for blind people to read Chinese character

Method and system for blind people to read Chinese character

Method and system for blind people to read Chinese character

Owner:INST OF COMPUTING TECH CHINESE ACAD OF SCI

Address similarity calculation method and device, equipment and storage medium

PendingCN111783419AImprove the matching success rateImprove word segmentation accuracyNatural language data processingNeural architecturesFeature vectorData set

The invention discloses an address similarity calculation method and device, equipment and a storage medium. Aiming at the problems that address matching has complex rules, an existing matching algorithm is not high in retrieval speed and accuracy, and the address matching efficiency is low, a solution is proposed. Input address information is expressed by a proper initial vector; an address similarity calculation model based on a twin neural network is used, a gradient descent algorithm of a ternary loss function is combined, thus obtaining a feature vector of the initial address vector; finally, the cosine distance or L2 distance between the feature vector and an address vector in a standard address data set is calculated, the known address vector closest to the input address vector is obtained, so that the address matching rule is simplified, the preferred accuracy of the same address is improved, and the retrieval speed and accuracy of the matching algorithm are further improved.

Address similarity calculation method and device, equipment and storage medium

Address similarity calculation method and device, equipment and storage medium

Address similarity calculation method and device, equipment and storage medium

Owner:SHANGHAI DONGPU INFORMATION TECH CO LTD

Method and device for segmentation on basis of webpage content classification

InactiveCN104008126AImprove word segmentation accuracyWord segmentation accuracy improvedWeb data indexingSpecial data processing applicationsInformation retrievalWeb page

The embodiment of the invention provides a method and a device for segmentation on the basis of webpage content classification. The method comprises the following steps of: extracting the text information of webpage contents in search resources; dividing the classes of the text information according to the classes of the webpage contents; segmenting the text information according to segmentation dictionaries corresponding to the classes of the text information. According to the embodiment of the invention, the classes of the text information of the webpage contents in the search resources are divided, and the text information is segmented on the basis of the segmentation dictionaries corresponding to the classes, so as to adapt to different classes of language characteristics better, meanwhile, the segmentation accuracy for different classes is also improved, and the optimal processing for local segmentation is realized; moreover, the improvement of the accuracy of segmentation is close to the intention of a user and improve the user experience, and then reduce the operations of re-input, search and the like of the user, and improve the simplicity of operation, meanwhile, the response of equipment on the operation of the user is reduced, and the consumption of the system resources of the equipment is reduced.

Method and device for segmentation on basis of webpage content classification

Method and device for segmentation on basis of webpage content classification

Owner:BEIJING QIHOO TECH CO LTD +1

Multi-strategy integration standard terminology processing method for oil and gas pipeline field

ActiveCN104063382AFine granularityImprove word segmentation accuracySpecial data processing applicationsComputer modulePre treatment

The invention discloses a multi-strategy integration standard terminology processing method for the oil and gas pipeline field and relates to the technical field of linguistic analysis and pipeline systems. The method is characterized by mainly comprising three modules as follows: 1) corpus preprocessing in the oil and gas pipeline field and text segmentation result optimization are performed; 2) term construction is realized in forms of a single algorithm and combination of multiple algorithms respectively; 3) obtained terms are filtered according to summarized rules, junk terms and conventional terms are rejected, and term processing results are optimized. The overall process is as follows: 1) corpus preprocessing in the oil and gas pipeline field and text segmentation result optimization; 2) a term construction method in the oil and gas pipeline field; 3) term construction optimization in the oil and gas pipeline field. With the adoption of the method, the segmentation accuracy is improved, and the term extraction precision ratio and the technical field correlation of final relative terms are improved.

Multi-strategy integration standard terminology processing method for oil and gas pipeline field

Multi-strategy integration standard terminology processing method for oil and gas pipeline field

Multi-strategy integration standard terminology processing method for oil and gas pipeline field

Owner:PIPECHINA SOUTH CHINA CO

Power grid equipment word segmentation dictionary and fault case library construction method

ActiveCN112732934AImprove word segmentation accuracyAchieve intuitive knowledgeNatural language data processingNeural architecturesPower gridData mining

The invention discloses a power grid equipment word segmentation dictionary and a fault case library construction method. The method comprises the steps: constructing a power grid field word segmentation dictionary, carrying out the format conversion and word segmentation of fault case data, carrying out the analysis and generation of structured power grid equipment fault cases, feature tags, keyword clouds and association rules from text data through employing a plurality of technical means. and designing a relational database Schema for the information, taking a report as a main key, and storing the text information and information such as pictures, authors and the like reserved in preprocessing in a library to form a power grid equipment fault case library. According to the method, the word segmentation accuracy of the power grid field text is improved, the structured case database enables retrieval according to case contents to be more accurate, the feature tags in the fault case database serve as item sets, effective association rules of faults are sorted and mined, the method can be used for fault early warning, and the blank of application of the power grid field text analysis technology is filled up. The application value of corpora in the power grid field is improved, and the consulting cost is reduced.

Power grid equipment word segmentation dictionary and fault case library construction method

Power grid equipment word segmentation dictionary and fault case library construction method

Power grid equipment word segmentation dictionary and fault case library construction method

Owner:ELECTRIC POWER RESEARCH INSTITUTE OF STATE GRID SHANDONG ELECTRIC POWER COMPANY +2

Participle processing method and system for customer address information

ActiveCN105426351AImprove recognition rateImprove word segmentation accuracyNatural language data processingSpecial data processing applicationsData miningParticiple

The invention provides a participle processing method and system for customer address information. The participle processing method is characterized in that an administrative region matching list used for defining the codes of all administrative regions is saved in advance, and comprises the following steps: determining current customer address information to be processed; processing the current customer address information to be processed, so as to obtain customer address information meeting processing standards; matching each sub-address information in the customer address information meeting the processing standards with administrative regions in the administrative region matching list according to the longest matching principle; when the first sub-address information in the customer address information meeting the processing standards is matched with the first administrative region in the administrative region matching list, and the matching result is unique, determining the first code of the first sub-address information; and acquiring the codes of all sub-address information in the customer address information meeting the processing standards, and generating normalized customer address information. According to the invention, the customer address information recorded manually is normalized, so that the participle accuracy of a banking system is improved.

Participle processing method and system for customer address information

Participle processing method and system for customer address information

Owner:CHINA CONSTRUCTION BANK

Text word segmentation analysis method and system for medical record text data structuring

PendingCN112949303ARealize the structureImplement lexical parsingNatural language data processingPatient-specific dataMedical recordData set

The invention discloses a text word segmentation analysis method and system for medical record text data structuring, belongs to the technical field of medical record data mining, and aims to solve the technical problem of how to solve the defects of low mining efficiency, poor accuracy and incapability of meeting a medical record entity mapping relationship in traditional medical record data. The method comprises the following steps: constructing a medical word library based on medical text data; generating all formed words of the medical text data to be subjected to word segmentation based on the lexicon dictionary, and constructing a directed acyclic graph based on all the formed words; based on the medical lexicon and the directed acyclic graph, searching a maximum segmentation combination of a statement word frequency by searching a maximum return-to-zero path through dynamic planning to obtain a word set with a preamble sequence and part-of-speech; analyzing the word set through a ternary relation model to obtain a ternary mapping relation data set; and carrying out standardization processing on the ternary mapping relation data set to obtain a binary mapping relation data set.

Text word segmentation analysis method and system for medical record text data structuring

Text word segmentation analysis method and system for medical record text data structuring

Text word segmentation analysis method and system for medical record text data structuring

Owner:山东健康医疗大数据有限公司

Animal product safety event text classification method based on multi-level structure dictionary

InactiveCN110659365AImprove word segmentation accuracyImprove classification accuracySpecial data processing applicationsText database clustering/classificationText categorizationClassification methods

The invention relates to an animal product safety event text classification method based on a multi-level structure dictionary. The method comprises the steps of performing word segmentation and stopword removal processing on a to-be-processed text; distributing a counter for each residual vocabulary; matching the residual word segmentation result of each text with vocabularies in a constructed multi-level structure dictionary of the animal product safety event, and adding 1 to the count value of the successfully matched vocabularies in an accumulated manner; and finally, performing descending sorting according to the counting values of the vocabularies, and classifying the text into the hierarchy and category of the dictionary where the vocabulary with the highest frequency is located. The method can assist a word segmentation tool in word segmentation of texts to improve the accuracy of entity recognition, can classify Chinese texts according to a hierarchical structure of an animalproduct safety event dictionary, and can also realize hierarchical classification under different requirements to obtain hierarchical and category relationships among the texts. In addition, a largeamount of manpower and time are saved, and the accuracy is obviously improved.

Animal product safety event text classification method based on multi-level structure dictionary

Animal product safety event text classification method based on multi-level structure dictionary

Animal product safety event text classification method based on multi-level structure dictionary

Owner:CHINA AGRI UNIV

Method for extracting key elements from natural language input of user

ActiveCN107203512AEasy extractionImprove efficiencySemantic analysisSpecial data processing applicationsSemantic matchingResult set

The invention relates to a method for extracting key elements from natural language input of a user. The method includes following steps: performing semantic matching on first natural language input of the user according to a first semantic knowledge library to recognize overall semantic meaning; selecting a limiting knowledge library; shrinking the limiting knowledge library by determining entries correlated with the total semantic meaning in the limiting knowledge library and removing other entries; performing mechanical word segmentation on the first natural language input to generate a word segmentation result set of the first natural language input; using the limiting knowledge library after being shrunk to match word segmentation results to determine a word segmentation result in the word segmentation result set; selecting one or multiple words from the word segmentation result as the key elements. By the method, word segmentation efficiency and correctness can be improved greatly, so that determination of correct key elements is guaranteed.

Method for extracting key elements from natural language input of user

Owner:上海对岸信息科技有限公司

A method and system for blind people to read Chinese characters

ActiveCN105404621BAvoid low accuracyImprove word segmentation accuracySpecial data processing applicationsOrthodontic ligatureHuman–robot interaction

The invention provides a method and a system for blind people to read Chinese characters, and relates to the technical field of natural language processing and the technical field of disabled-oriented human-computer interaction. The method comprises the following steps: obtaining a Chinese language text, carrying out a word segmentation operation on the Chinese language text to generate a Chinese character string, converting each word in the Chinese character string into corresponding Pinyin by referring to part-of-speech tagging obtained by word segmentation through a pronouncing dictionary, a polyphone dictionary and word frequency information, and connecting the Pinyin into a Pinyin string; looking up a Pinyin and blind character contrast dictionary, converting the Pinyin string into a blind character string, carrying out braille word segmentation on the blind character string through a word segmentation model to generate initial braille segmentation words, fusing the Chinese character string with the initial braille segmentation words to generate new braille segmentation words, and regulating the new braille segmentation words according to a braille segmentation word ligature rule; and carrying out braille tone marking on the new braille segmentation words regulated according to the braille segmentation word ligature rule to generate final braille segmentation words, and displaying the final braille segmentation words.

A method and system for blind people to read Chinese characters

A method and system for blind people to read Chinese characters

A method and system for blind people to read Chinese characters

Owner:INST OF COMPUTING TECH CHINESE ACAD OF SCI

Word segmentation model building method and apparatus

ActiveCN106407186AImprove word segmentation accuracyImprove accuracyNatural language translationSpecial data processing applicationsSpeech recognition

Embodiments of the invention provide a word segmentation model building method and apparatus. The method comprises the steps of performing alignment on characters in a first corpus and words in a second corpus to obtain an alignment relationship between the first corpus and the second corpus, wherein the first corpus is a corpus without a space division boundary between words; determining boundary information of the words in the first corpus according to the alignment relationship between the first corpus and the second corpus; and performing training according to the boundary information of the words in the first corpus to generate a word segmentation model. According to the word segmentation model building method and apparatus provided by the embodiments of the invention, the word segmentation accuracy, especially the word segmentation accuracy of the corpus without the space division boundary between the words can be improved.

Word segmentation model building method and apparatus

Word segmentation model building method and apparatus

Word segmentation model building method and apparatus

Owner:新译信息科技(深圳)有限公司

Standard word library word segmentation method, device and equipment and computer readable storage medium

ActiveCN109766539AParticiple realizationImprove word segmentation accuracySpecial data processing applicationsSpeech recognition

The invention discloses a standard word library word segmentation method, device and equipment and a computer readable storage medium, and the method comprises the steps of splitting standard words ina standard word library to be subjected to word segmentation into single Chinese characters, forming a Chinese character library, and generating the adjacent frequency between every two Chinese characters in the Chinese character library; carrying out merging operation on the Chinese characters in the Chinese character library according to the adjacent frequencies to generate a Chinese charactergroup, and carrying out updating operation on the adjacent frequencies among the Chinese characters in the Chinese character library after the merging operation; judging whether the maximum frequencyvalue of the adjacent frequencies among the Chinese characters in the updated Chinese character library is smaller than a preset threshold value or not; if not, executing the step of carrying out merging operation on each Chinese character in the Chinese character library according to the adjacent frequency; and if yes, forming standard segmented words of the standard word library to be segmentedby the Chinese character groups. According to the scheme, the standard words in the standard word library to be segmented are segmented through the adjacent frequencies among the Chinese characters, and the word segmentation accuracy of the standard word library to be segmented can be effectively improved.

Standard word library word segmentation method, device and equipment and computer readable storage medium

Standard word library word segmentation method, device and equipment and computer readable storage medium

Standard word library word segmentation method, device and equipment and computer readable storage medium

Owner:PING AN TECH (SHENZHEN) CO LTD

Standard lexicon word segmentation method, device and equipment and computer readable storage medium

ActiveCN109858011AParticiple realizationImprove word segmentation accuracySpecial data processing applicationsChinese charactersBayesian probability

The invention provides a standard word library word segmentation method, device and equipment and a computer readable storage medium. The method comprises: standard words in a standard word library tobe segmented are scattered into single Chinese characters to form an original Chinese character library, and a first adjacent probability and a first Bayesian probability between every two Chinese characters in the original Chinese character library are calculated; performing a Chinese character merging operation on the original Chinese character library according to the first adjacent probability and the first Bayesian probability to obtain a to-be-adjusted Chinese character library; judging whether the minimum adjacent probability in the second adjacent probability between every two Chinesecharacters in the to-be-adjusted Chinese character library is greater than a preset threshold value or not; if yes, according to a second adjacent probability and a second Bayesian probability, executing a Chinese character combination operation on the to-be-adjusted Chinese character library until the minimum adjacent probability in the adjacent probabilities between every two Chinese charactersin the obtained target Chinese character library is smaller than or equal to a preset threshold value; otherwise, outputting the combined Chinese character groups as standard words. According to themethod, the word segmentation accuracy of the standard lexicon and the universality of the standard lexicon are improved.

Standard lexicon word segmentation method, device and equipment and computer readable storage medium

Standard lexicon word segmentation method, device and equipment and computer readable storage medium

Standard lexicon word segmentation method, device and equipment and computer readable storage medium

Owner:PING AN TECH (SHENZHEN) CO LTD

Word segmentation method, device and equipment based on multilevel dictionary and readable storage medium

ActiveCN112214994AImprove domain adaptabilityImprove word segmentation accuracyNatural language data processingMachine learningEngineeringDomain adaptation

The invention discloses a word segmentation method based on multistage dictionaries, and the method comprises the steps: employing at least two dictionaries to assist a word segmentation model in wordsegmentation, generating conventional vector representation and feature representation of a character in the at least two dictionaries during the representation of the character, and finally, determining a word forming label of the character according to the vector representation and the feature representation. According to the method, by distinguishing the status and importance of different words, the word segmentation performance of the whole scheme is improved, and the domain adaptability and the word segmentation accuracy are improved. In addition, the invention further provides a word segmentation device and equipment based on the multilevel dictionary and a readable storage medium, and the technical effect of the word segmentation device and equipment corresponds to the technical effect of the method.

Word segmentation method, device and equipment based on multilevel dictionary and readable storage medium

Word segmentation method, device and equipment based on multilevel dictionary and readable storage medium

Word segmentation method, device and equipment based on multilevel dictionary and readable storage medium

Owner:SUZHOU UNIV

Efficient intelligent question-answering system for knowledge in artificial intelligence field

ActiveCN113157885AImprove experienceImprove word segmentation accuracySemantic analysisEnergy efficient computingKnowledge structureEngineering

The invention relates to an efficient intelligent question-answering system for knowledge in the field of artificial intelligence. The system comprises a preparation module and a question-answering module, wherein the preparation module comprises a data collection module, a model training module and a question and answer system knowledge structure construction module; the question-answering module comprises an input preprocessing module, a question-answering module based on a knowledge base, a question-answering module based on a text library and a question recommendation module based on the knowledge base. Through the preparation module and the question-answering module, the word segmentation accuracy of user questions, knowledge base questions and text base questions is greatly enhanced, and the overall accuracy of the full question-answering system is greatly improved, so that the user experience is greatly improved, and the knowledge question-answering service with low cost, high efficiency and high user experience is realized.

Efficient intelligent question-answering system for knowledge in artificial intelligence field

Efficient intelligent question-answering system for knowledge in artificial intelligence field

Efficient intelligent question-answering system for knowledge in artificial intelligence field

Owner:SOUTH CHINA UNIV OF TECH

Sensitive information query method based on deep learning

PendingCN112597770AAccurate Word Segmentation ProcessingImprove word segmentation accuracyNatural language data processingNeural architecturesFeature vectorNetwork model

The invention discloses a sensitive information query method based on deep learning, which comprises the following steps: step 1, carrying out word segmentation processing on a text to be queried, andthen converting the text to be queried into a feature vector; and 2, inputting the feature vector obtained in the step 1 into a neural network model, outputting the similarity with a sensitive word library, if the similarity is higher than a threshold value, judging that the to-be-queried text contains sensitive words, and outputting a corresponding sensitive word result. According to the sensitive information query method based on deep learning, on one hand, by setting the word segmentation rule and training and updating the word segmentation rule, accurate word segmentation processing can be flexibly carried out on the text, and the word segmentation accuracy is improved; and on the other hand, by introducing an artificial intelligence technology, adopting a deep learning method and constructing a neural network model, the text is accurately and effectively recognized, the query accuracy is improved, and the query efficiency is improved.

Sensitive information query method based on deep learning

Sensitive information query method based on deep learning

Sensitive information query method based on deep learning

Owner:盐城数智科技有限公司

Chinese text parallel data mining method based on hierarchy

ActiveCN102662952BImprove word segmentation efficiencyImprove word segmentation accuracySpecial data processing applicationsData dredgingText mining

The invention relates to a Chinese text parallel data mining method based on hierarchy, comprising the steps of: step 1: a establishing vector space model of Chinese texts: performing work segmentation regarding to the entire Chinese text set to obtain a word segmentation form and a feature term set containing all removed duplicated terms in the text set of each text, then using the feature term set to count the term frequency-inverse document frequency (TFIDF) of each text, and establishing the text vector space model according to the TFIDF; step 2: performing dimension reduction regarding to a feature item vector of the text vector space model; and step 3: clustering texts using DCURE algorithm based on hierarchy. The method is efficient in word segmentation of Chinese texts with high accuracy, requires no input of parameters like radius of neighborhood for the clustering process, can mine irregular cluster and is insensitive to noise, employs distributed calculating, has high efficiency in mining mass texts and improves calculating speed of feature weight.

Chinese text parallel data mining method based on hierarchy

Chinese text parallel data mining method based on hierarchy

Chinese text parallel data mining method based on hierarchy

Owner:UESTC COMSYS INFORMATION

A word segmentation processing method and system for customer address information

ActiveCN105426351BImprove recognition rateImprove word segmentation accuracyNatural language data processingSpecial data processing applicationsData mining

The invention provides a participle processing method and system for customer address information. The participle processing method is characterized in that an administrative region matching list used for defining the codes of all administrative regions is saved in advance, and comprises the following steps: determining current customer address information to be processed; processing the current customer address information to be processed, so as to obtain customer address information meeting processing standards; matching each sub-address information in the customer address information meeting the processing standards with administrative regions in the administrative region matching list according to the longest matching principle; when the first sub-address information in the customer address information meeting the processing standards is matched with the first administrative region in the administrative region matching list, and the matching result is unique, determining the first code of the first sub-address information; and acquiring the codes of all sub-address information in the customer address information meeting the processing standards, and generating normalized customer address information. According to the invention, the customer address information recorded manually is normalized, so that the participle accuracy of a banking system is improved.

A word segmentation processing method and system for customer address information

A word segmentation processing method and system for customer address information

Owner:CHINA CONSTRUCTION BANK

Standard terminology processing method for multi-strategy fusion in the field of oil and gas pipelines

ActiveCN104063382BImprove word segmentation accuracyImprove accuracySpecial data processing applicationsTreatment strategyComputer science

The invention discloses a multi-strategy integration standard terminology processing method for the oil and gas pipeline field and relates to the technical field of linguistic analysis and pipeline systems. The method is characterized by mainly comprising three modules as follows: 1) corpus preprocessing in the oil and gas pipeline field and text segmentation result optimization are performed; 2) term construction is realized in forms of a single algorithm and combination of multiple algorithms respectively; 3) obtained terms are filtered according to summarized rules, junk terms and conventional terms are rejected, and term processing results are optimized. The overall process is as follows: 1) corpus preprocessing in the oil and gas pipeline field and text segmentation result optimization; 2) a term construction method in the oil and gas pipeline field; 3) term construction optimization in the oil and gas pipeline field. With the adoption of the method, the segmentation accuracy is improved, and the term extraction precision ratio and the technical field correlation of final relative terms are improved.

Standard terminology processing method for multi-strategy fusion in the field of oil and gas pipelines

Standard terminology processing method for multi-strategy fusion in the field of oil and gas pipelines

Standard terminology processing method for multi-strategy fusion in the field of oil and gas pipelines

Owner:PIPECHINA SOUTH CHINA CO

Chinese word segmentation method and device

InactiveCN108763200AImprove word segmentation accuracyLow resolution accuracyNatural language data processingSpecial data processing applicationsMachine learningChinese word

The invention discloses a Chinese word segmentation method and device. The method comprises the steps of receiving first target text information sent by a user; carrying out data mapping on the firsttarget text information through a first classifier to obtain corresponding first target category information; and performing preset inquiry operation according to the first target category informationand returning an inquiry result to the user. In a mode that the first target text information sent by the user is subjected to the data mapping through the first classifier, the corresponding first target category information is obtained, so that the purpose of performing preset inquiry operation according to the first target category information is achieved, the technical effect of improving theword segmentation accuracy is achieved, and the problem of low accuracy of Chinese word segmentation in related technologies is solved.

Chinese word segmentation method and device

Chinese word segmentation method and device

Chinese word segmentation method and device

Owner:DATAGRAND TECH INC

A Method for Structural Extraction of Image Report

ActiveCN114328938BEnhance expressive abilityImprove accuracyNatural language data processingMedical reportsMisclassification errorRelationship extraction

The invention discloses a method for structured extraction of image reports, which comprises the following steps: acquiring unstructured radiographic image texts and performing preprocessing; performing word segmentation on the preprocessed texts and performing normalization processing; using attention focal loss as The optimization function optimizes the bert model, and performs entity recognition on the normalized text based on the optimized bert model; extracts the entity structured relationship based on the entity-extent bert model, and forms a structured report; the present invention extracts attention-focal loss The loss function increases the penalty for the wrong prediction of individual word labels that appear in the same entity, and can increase the loss of misclassified labels and reduce the loss of easy classification errors, so as to accelerate the convergence of the model and improve the accuracy rate the goal of.

A Method for Structural Extraction of Image Report

A Method for Structural Extraction of Image Report

A Method for Structural Extraction of Image Report

Owner:浙江卡易智慧医疗科技有限公司

Power grid power failure address matching method based on word bank two-way maximum matching method

PendingCN112084773AImprove matching accuracyImprove word segmentation recognition rateData processing applicationsNatural language data processingAlgorithmPower grid

The invention provides a power grid power failure address matching method based on a word bank two-way maximum matching method, which comprises the following steps of: 1, constructing a power failureaddress element library which comprises an address element word bank, a stop word bank and a synonym bank; 2, preprocessing a to-be-matched address text by utilizing the stop word bank and the synonymbank; 3, performing word segmentation on the to-be-matched address text by utilizing a bidirectional maximum matching word segmentation method, and segmenting an address element sequence of a to-be-matched system; and 4, comparing the address element sequences of the to-be-matched system according to an address element matching rule, judging whether the address element sequences are matched or not, and listing difference items if the address element sequences are not matched. By dynamically maintaining the power grid power failure address element library and performing abbreviation filling processing on the address elements, the address text word segmentation recognition rate can be improved, address element matching of a single address item can be processed, the address element matchingproblem of multiple address items can also be processed, and the address element matching accuracy can be effectively improved.

Power grid power failure address matching method based on word bank two-way maximum matching method

Power grid power failure address matching method based on word bank two-way maximum matching method

Power grid power failure address matching method based on word bank two-way maximum matching method

Owner:STATE GRID HUBEI ELECTRIC POWER RES INST +1

Test data generation and test case management method

ActiveCN114637692AImprove word segmentation accuracyIncrease granularityCharacter and pattern recognitionNatural language data processingTest data generationSoftware testing

The invention discloses a test data generation and test case management method, and belongs to the technical field of software testing. According to the method, parameter values of unknown parameters are obtained through parameter name similarity calculation, and the key step that the parameter values are difficult to obtain in test data construction is solved. In addition, effective management of different versions of the same unit test case in a multi-person coding scene is realized through modes of assertion data enhancement, test case deduplication and test function combination, and the delivery quality and delivery efficiency of developed software can be improved.

Test data generation and test case management method

Test data generation and test case management method

Test data generation and test case management method

Owner:杭州优诗科技有限公司

Intelligent Chinese word segmentation method based on statistics and deep learning

ActiveCN110414002AAccurate participleWord segmentation is fastEnergy efficient computingSpecial data processing applicationsChinese wordA domain

The invention discloses an intelligent Chinese word segmentation method based on statistics and deep learning. The method comprises the following steps of constructing a domain term set; selecting a word segmentation method; word segmentation decision. The method has the advantages that a word segmentation model combining the word segmentation method based on statistics and the deep learning technology is adopted, an application range is wide, accurate word segmentation can be conducted on professional words in the professional field, the algorithm is simple, and the word segmentation speed ishigh.

Intelligent Chinese word segmentation method based on statistics and deep learning

Intelligent Chinese word segmentation method based on statistics and deep learning

Intelligent Chinese word segmentation method based on statistics and deep learning

Owner:SHANDONG UNIV OF SCI & TECH

Word segmentation method, device, equipment and readable storage medium based on multi-level dictionary

ActiveCN112214994BImprove domain adaptabilityImprove word segmentation accuracyNatural language data processingMachine learningDomain adaptationSpeech recognition

This application discloses a word segmentation method based on multi-level dictionaries. The method uses at least two dictionaries to assist the word segmentation model for word segmentation. When representing a character, it not only generates a conventional vector representation, but also generates the character in at least two The feature representation in a dictionary, and finally determine the word-forming label of the character according to the vector representation and feature representation. This method improves the word segmentation performance of the overall scheme by distinguishing the status and importance of different words, and improves the domain adaptability and word segmentation accuracy. In addition, the present application also provides a word segmentation device, device and readable storage medium based on a multi-level dictionary, the technical effect of which is corresponding to the technical effect of the above method.

Word segmentation method, device, equipment and readable storage medium based on multi-level dictionary

Word segmentation method, device, equipment and readable storage medium based on multi-level dictionary

Word segmentation method, device, equipment and readable storage medium based on multi-level dictionary

Owner:SUZHOU UNIV

Word segmentation method and system for omnimedia science popularization window

InactiveCN108874781AReduce the intensity of manual interferenceImprove versatilityCharacter and pattern recognitionNatural language data processingSpeech recognition

The invention discloses a word segmentation method and system for omnimedia science popularization window. The word segmentation method comprises the following steps: obtaining a character sequence; inputting the character sequence into a estimation module for estimation, and determining a good value; inputting the good value and the character sequence into the selection module for screening processing, and determining a segmented form with a maximum good value, wherein the segmented form with the maximum good value is a hierarchical structure, the hierarchical structure is used for word segmentation on the character sequence; judging whether iterative processing needs to be carried out according to the segmentation form with the maximum good value, and if yes, inputting the segmentation form with the maximum good value and the character sequence into the adjustment module for adjustment, determining an adjusted character sequence, and updating the statistical information in the evaluation module by using the adjusted character sequence. By adopting the word segmentation method and system provided by the invention, the word segmentation precision can be improved.

Word segmentation method and system for omnimedia science popularization window

Word segmentation method and system for omnimedia science popularization window

Word segmentation method and system for omnimedia science popularization window

Owner:北京千松科技发展有限公司

Information extraction method and system oriented to international electric connection radio rules

PendingCN114861648AFast trackMaster quicklyNatural language data processingSpecial data processing applicationsEngineeringKnowledge engineering

The invention discloses an information extraction method and system for an international electric connection radio rule, and the method comprises the steps: carrying out the preprocessing of a text of the international electric connection radio rule, and building a database for recording all terms; identifying all terms which may have a relationship from a database; for each clause possibly having the relationship, analyzing a specific relationship type between the clause possibly having the relationship and the clause having the association relationship; and by using a natural language processing method based on rule matching, extracting the number of the clause which has an association relationship with the clause which possibly has the relationship, and writing the number of the clause which has the association relationship and the specific relationship type into a corresponding position in a database. According to the method, named entity and entity relationship extraction is carried out on an international electric connection radio rule by adopting an NLP technology based on knowledge engineering; and the mutual relation among the terms is automatically combed, and a satellite network operator is supported to quickly track and master the radio rules.

Information extraction method and system oriented to international electric connection radio rules

Information extraction method and system oriented to international electric connection radio rules

Information extraction method and system oriented to international electric connection radio rules

Owner:NAT SPACE SCI CENT CAS +1

Popular searches

Implement extensions Improve recall Show full Improve retrieval speed Improve retrieval accuracy Reduce dimensionality Easy to pass Improve user experience Improve simplicity Reduced response