Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

35 results about "Longest common substring problem" patented technology

In computer science, the longest common substring problem is to find the longest string (or strings) that is a substring (or are substrings) of two or more strings.

Systems and methods to control work progress for content transformation based on natural language processing and/or machine learning

Systems and methods are provided to compute indicators of completeness of the work output of a transformation of text-based content, worker capacity in performing the transformation, and / or the degree of matching between a unit of work and a worker, based on information collected about complexity of works, times and throughput of workers, rating of work outputs and using natural language processing techniques and machine learning techniques, such as language detection, longest common substring, length ratio, document similarity, etc. The indicators are utilized to optimize job pickup and output submission for online crowdsourcing tasks related to transformation of text-based content, such as transcription, translation, proofreading, etc.
Owner:GENGO

Electronic official document trace reserving method based on file comparison

The invention relates to the technical field of e-government affairs, in particular relates to an electronic official document trace reserving method based on file comparison, and provides an electronic official document trace reserving method based on text comparison by using the longest public substring matching. The method can effectively solve the problem of overuse of marks, and is simple in algorithm, relatively easy to be achieved by using various programming languages, and applicable to various operating systems and software environments; the electronic official document trace reserving method based on file comparison comprises the steps of firstly comparing an original text with a modified text, thus obtaining which character strings of the modified text is inserted and which character strings of the modified text is deleted based on the original text, and at last respectively marking the inserted and deleted character strings, thus achieving trace reservation; the electronic official document trace reserving method based on file comparison is mainly applied to modifying the electronic text.
Owner:NO 33 RES INST OF CHINA ELECTRONICS TECHNOOGY GRP

Text similarity detection method

The invention relates to a text similarity detection method and belongs to the technical field of natural language processing. The method comprises the steps of firstly performing similarity calculation on a text by using a conventional Simhash algorithm; secondly introducing an N-Gram language model for performing combination on text keywords to enable the keywords to have a context connection relationship, and performing similarity calculation on the text by using the Simhash algorithm again; thirdly, introducing a longest common substring to serve as one of similarity judgment standards forperforming similarity calculation on the text; and finally, giving a corresponding weight to the calculated similarity, and performing final similarity superposition calculation. Compared with the prior art, the method has the advantages that the phenomena of poor supportability of short texts by the Simhash algorithm, effective information loss in a fingerprint generation process and the like are mainly eliminated; and the accuracy and reliability of text similarity detection are improved.
Owner:KUNMING UNIV OF SCI & TECH

Method and system for obtaining word pair translation from bilingual sentence

InactiveCN101187924AReduce workloadImprove the efficiency of obtaining translationsSpecial data processing applicationsResource poolLongest common substring problem
The invention provides a method for obtaining word pair translation from a bilingual sentence pair. The method includes the following steps: A. a lemma to be treated is received; B. the bilingual sentence pair to be chosen is searched from an index resource pool according to the lemma to be treated; C. two groups of bilingual sentence pairs are chosen from the index result, a longest public substring with the same language type sentence as that of the lemma to be treated in the two groups of the bilingual sentence pairs is obtained; D. whether the substring is consistent to the lemma to be treated or not is judged, if being not consistent, another two groups of bilingual sentence pairs are chosen from the index result, the step C is repeated; if being consistent, then, E. the longest public substring of a corresponding sentence in the two groups of the bilingual sentence pairs is obtained. The index way is utilized, thereby reducing the workload of data processing, and improving the efficiency for obtaining the translation. The invention provides a system obtaining the word pair translation from the bilingual sentence pairs.
Owner:BEIJING KINGSOFT SOFTWARE +2

Log event extraction method and system based on log tree and parse tree

The invention discloses a log event extraction method and system based on a log tree and a parse tree. The method is divided into two steps of preprocessing and log content parsing, and the method specifically comprises the steps of providing and maintaining a rule base composed of regular expressions and heuristic rules, and extracting a small part of logs to automatically generate a log format; recognizing the log as a log head and log content on line based on the log format; searching the analytic tree, and respectively calculating the similarity between the static field and the dynamic parameter in the log tree and the event tree by adopting the longest common substring and the longest common subvector; and matching the log tree and the event tree by adopting a clustering technology, and extracting events and corresponding parameters. In order to cope with the complexity of the log content, the preprocessing and log content analysis steps in the online event extraction method are improved. The workload of manually recognizing log formats is reduced, the problem that an existing method is difficult to identify events containing uncertain number of parameters is solved, and log events are extracted more accurately.
Owner:NANJING UNIV OF SCI & TECH

Text similarity calculation method and device and electronic device

The embodiment of the invention discloses a text similarity calculation method and device and an electronic device. The method according to the embodiment of the invention comprises the following steps: obtaining an original text and a target text; calculating an editing distance between the original text and the target text; determining the longest common substring of the original text and the target text, and obtaining a starting position of the longest common substring in the original text; calculating text similarity between the original text and the target text based on the starting position of the longest common substring in the original text. The embodiment of the invention combines the editing distance of the original text and the target text and the longest common substring to calculate the text similarity, the calculated text similarity is closer to the reality, and the accuracy of the text similarity calculation is improved.
Owner:广西三方大供应链技术服务有限公司

Semantic similarity calculation method and device based on CTW and KM algorithms

The invention provides a semantic similarity calculation method and device based on CTW and KM algorithms, and aims to overcome the defect that in the semantic similarity calculation method in the prior art, the important influence of a word segmentation sequence on semantics is not considered, and the influence of the sequence on sentences is considered while a single semantic judgment rule is kept. The method comprises: using a Word2Vec deep learning platform for dividing a text into word segmentation vectors of a multi-dimensional space; obtaining a plurality of text similarity values, mapping the text similarity values to a multi-dimensional vector space, connecting vectors to form a curve in the multi-dimensional space, comparing the similarity values of a plurality of texts through aword vector curve by means of a relatively new time warping distance in the curve similarity values in an image, and adopting a KM algorithm in order to reduce the calculation scale. Compared with traditional longest common substrings, word frequency statistics and other methods, the method has higher robustness, has an obvious effect on sentences with the same word segmentation word order and different word orders which cannot be overcome by the traditional method, and improves the calculation accuracy.
Owner:HUBEI UNIV OF TECH

Method for obtaining longest common substring of alphabetic strings

The invention relates to a method for obtaining the longest common substring among alphabetic strings. For improving the efficiency to obtain the longest common substring among alphabetic strings, the method comprises the following steps that: firstly, bidirectional comparison is carried out between the two sides of a match byte so as to obtain initial common substrings and calculate the lengths of the initial common substrings; and secondly, based on the existing longest common substring, a longer common substring is repeatedly tried to be found by means of combing multiple trans-mechanisms.until all alphabetic strings are subjected to the process. The invention has the advantages of improving the calculation efficiency for obtaining the longest common substring and reducing resource overhead.
Owner:COMP APPL RES INST CHINA ACAD OF ENG PHYSICS

Method for detecting similarity of string matching codes

The invention discloses a method for detecting the similarity of string matching codes. The method includes steps of preprocessing program codes and carrying out standardized processing on source codes; comparing obtained feature vectors to to-be-compared codes according to rows and generating feature values formed by binary systems; dynamically generating code structure fingerprints; extracting identical feature vectors from the to-be-compared codes, searching generated corresponding structure fingerprints according to the identical feature vectors and forming structure fingerprints of code features. The feature values 0 represent the fact that local rows do not contain feature vector values, and the feature values 1 represent the fact that the local rows contain the feature vector values. The similarity can be compared; the structure similarity of the codes can be obtained from structure feature fingerprints of the to-be-compared codes by the aid of processes for matching the longestcommon substrings. The method has the advantages that the structure similarity of the codes can be detected on the basis of detection by the aid of the original methods for the similarity of the string matching codes, and the code similarity detection accuracy can be improved.
Owner:NANJING UNIV OF POSTS & TELECOMM

Military equipment knowledge graph-oriented key information query method

The invention discloses a military equipment knowledge graph-oriented key information query method, which comprises the following steps of: acquiring a natural language query statement, and performing entity linking on entities involved in the natural language query statement and existing entities in a military equipment knowledge graph based on longest common substring matching to obtain a key information query statement; obtaining a plurality of entity linking results; sorting the plurality of entity linking results based on longest prefix matching, and selecting an optimal entity linking result; creating a query template library according to element types in the natural language query statements; on the basis of template matching, the elements in the natural language query statement are matched with the query template library, the corresponding elements in the natural language query statement are filled into the corresponding atlas query statement templates in the query template library, a complete atlas query statement is formed, and a query result is obtained after atlas query. And four typical query statements of military equipment can be answered without data set and algorithm training.
Owner:中国人民解放军军事科学院战争研究院

Method and device of analyzing search keyword frequency

ActiveCN107203570AFix and make up for errorsImprove the efficiency of similarity calculationSpecial data processing applicationsText database clustering/classificationLongest common substring problemNeighbor algorithm
The invention provides a method and device of analyzing search keyword frequency based on HLSA. In the method, keyword aggregation is conducted by introducing the LSA space model which contains a theme, the deficiency that the Euclidean distance model based on VSM vector does not take into account the semantic information of a word per se is overcome and the error caused by the order changes of keywords based on an edit distance model is remedied. Additionally the method further combines with Hamming keywords to make computations on the similarity of eigenvectors between the keywords, new HLSA algorithm is formed, the computation efficiency of similarity is increased; the K-nearest neighbor algorithm is utilized to classify and statistically measure the frequency of keywords, aggregation is conducted on keywords of different granularities, and misjudgments due to too small particle size by the longest common substring model are effectively avoided.
Owner:BEIJING JINGDONG SHANGKE INFORMATION TECH CO LTD +1

Classic track similar track identification method

The invention discloses a classic track similar track identification method, and aims to provide a classic track identification method with a high similar track identification rate and capable of processing unstable tracks; the method comprises the following steps: reading classic tracks from a classic track knowledge base, and reading real time tracks from a real time track database; using a Douglas-Peucker algorithm to compress the real time track; primarily determining track similarity according to track features; if the primary determination succeeds, using the distance between points of the classic track and a line segment of the real time track to calculate the multi-to-1 longest common substring distance; using the multi-to-1 longest common substring distance as the multi-to-1 longest common substring distance between points and line of the classic track and the real time track; using the ratio between the point-to-line multi-to-1 longest common substring distance and the classic track length as the track similarity; precisely determining track similarity according to the obtained track similarity, and outputting a result if the track similarity precise determination succeeds.
Owner:10TH RES INST OF CETC

Method and device of data deduplication

The invention belongs to the technical field of data statistics, and particularly relates to a method and a device of data deduplication. The method of data deduplication of the invention includes: constructing a longest-common-sub-string table according to acquired target data; extracting a longest common sub-string of two pieces of data on which deduplication judgment needs to be carried out, and comparing the longest common sub-string with sub-strings in the longest-common-sub-string table; and carrying out deduplication processing on the two pieces of data if a sub-string which is the sameas the longest common sub-string does not exist in the longest-common-sub-string table. According to the method and the device of data deduplication of the invention, frequent updating of data in thetable is not needed, a data storage amount is decreased, and efficiency of data comparison in a deduplication process is improved.
Owner:CHINA ACADEMY OF INFORMATION & COMM

Similar track recognition method for classic tracks

The invention discloses a similar track recognition method for classic tracks and aims at providing a classic track recognition method which has a high similar track recognition rate and can process an instable track. The technical scheme comprises steps: a classic track is read from a classic track knowledge base, a real-time track is then read from a real-time track base, a Douglas-Peucker algorithm is adopted to compress the real-time track, track features are used for track similarity initial judgment, initial judgment succeeds, the distance between a point of the classic track and a linesection of the real-time track is used to calculate the longest common substring distance of multiple one pairs, the longest common substring distance of multiple one pairs is used as the longest common substring distance of multiple one pairs for a point to a line between the classic track and the real-time track, the ratio of the longest common substring distance of multiple one pairs for a point to a line to the length of the classic track is used as a track similarity, track similarity precise judgment is then carried out according to the track similarity, and if the track similarity precise judgment succeeds, a result is outputted.
Owner:10TH RES INST OF CETC

Method and device for generating traffic detection rule

Embodiments of the invention provide a method and a device for generating a traffic detection rule, which are applied to electronic equipment. The method comprises the following steps of obtaining traffic files of at least two attack traffics for a preset loophole, wherein the traffic files at least include load data in the attack traffics; determining a requester and an answer party of each attack traffic according to a protocol type of the attack traffic; determining loophole information of the preset loophole as an information guide item; extracting the first load data of all requesters from all traffic files; taking all first load data and the information guide item as a first input source and computing to obtain a first longest common substring of all first load data; determining the first longest common substring as a first characteristic; and generating a first traffic detection rule according to the first characteristic. Through application of the embodiments of the invention, the time consumed by generation of the traffic detection rule is reduced.
Owner:NEW H3C TECH CO LTD

Candidate word evaluation method and device, computer device and storage medium

The invention relates to a candidate word evaluation method and device, a computer device and a storage medium, which are applied to the field of data processing. The method comprises the following steps of: upon detecting an error word, obtaining a plurality of candidate words corresponding to the error word; determining similarity between each candidate word and the error word, wherein the similarity is obtained according to the longest common subsequence and / or the longest common substring of each candidate word and the error word; obtaining error information of the error word relative to each candidate word; and determining the evaluation score corresponding to each candidate word according to the similarity and the error information. The method, the device, the computer device or thestorage medium of the embodiment of the invention are advantageous in improving the reliability of the candidate word evaluation result.
Owner:GUANGZHOU SHIYUAN ELECTRONICS CO LTD

Candidate word evaluation method and device, computer equipment and storage medium

The invention relates to a candidate word evaluation method and device, computer equipment and a storage medium, which are applied to the field of data processing. The method comprises the following steps of: detecting an error word and obtaining a plurality of candidate words corresponding to the error word; determining an editing distance between each candidate word and the error word; determining the similarity between each candidate word and the error word, wherein the similarity is obtained according to the longest common subsequence and / or longest common substring of each candidate wordand the error word; replacing the error word with each candidate word to obtain a candidate sentence, and determining an evaluation probability of the corresponding candidate word according to the candidate sentence; obtaining error information of the error word relative to each candidate word; and determining the evaluation score corresponding to each candidate word according to the editing distance, similarity, evaluation probability and error information. The embodiment of the invention solves the problem of low reliability of candidate word evaluation, and is beneficial to improving the reliability of the candidate word evaluation result.
Owner:GUANGZHOU SHIYUAN ELECTRONICS CO LTD

Candidate word assessment method and apparatus, and candidate word sorting method and apparatus

The invention relates to a candidate word assessment method and apparatus, and a candidate word sorting method and apparatus, which are applied to the field of data processing. The method comprises the steps of detecting a wrong word, and obtaining multiple candidate words corresponding to the wrong word; determining the similarity between each candidate word and the wrong word, wherein the similarity is obtained according to a longest common sub-sequence and / or a longest common sub-string of each candidate word and the wrong word; replacing the wrong word with the candidate words to obtain candidate statements, and determining assessment probabilities corresponding to the candidate words according to the candidate statements, wherein the assessment probabilities are obtained according tolanguage environment probabilities of the candidate words in the candidate statements and language environment probabilities of neighboring words of the candidate words; and according to the similarity and the assessment probabilities, determining assessment scores of the candidate words. The reliability of candidate word assessment results can be improved.
Owner:GUANGZHOU SHIYUAN ELECTRONICS CO LTD

A Method for Retaining Traces of Electronic Official Documents Based on File Comparison

The invention relates to the technical field of e-government affairs, in particular relates to an electronic official document trace reserving method based on file comparison, and provides an electronic official document trace reserving method based on text comparison by using the longest public substring matching. The method can effectively solve the problem of overuse of marks, and is simple in algorithm, relatively easy to be achieved by using various programming languages, and applicable to various operating systems and software environments; the electronic official document trace reserving method based on file comparison comprises the steps of firstly comparing an original text with a modified text, thus obtaining which character strings of the modified text is inserted and which character strings of the modified text is deleted based on the original text, and at last respectively marking the inserted and deleted character strings, thus achieving trace reservation; the electronic official document trace reserving method based on file comparison is mainly applied to modifying the electronic text.
Owner:NO 33 RES INST OF CHINA ELECTRONICS TECHNOOGY GRP

Candidate word evaluation method and apparatus, computer device and storage medium

The present invention relates to a candidate word evaluation method and apparatus, a computer device and a storage medium, which are applied to the field of data processing. The method comprises: whendetecting a wrong word, acquiring a plurality of candidate words corresponding to the wrong word; determining similarity between each candidate word and the wrong word, wherein the similarity is obtained according to a longest common subsequence and / or a longest common substring between each candidate word and the wrong word; respectively replacing the wrong word with each candidate word to obtain a candidate sentence, and determining an evaluation probability of the corresponding candidate word according to the candidate sentence, wherein the evaluation probability is obtained according to alocale probability of the candidate word in the candidate sentence and a locale probability of the adjacent words of the candidate word; acquiring error information of the wrong word with respect toeach candidate word; and determining an evaluation score corresponding to each candidate word according to the similarity, the evaluation probability, and the error information. According to embodiments of the present invention, the problem of low reliability of the candidate word evaluation is solved, and the reliability of the candidate word evaluation result can be improved in a facilitated manner.
Owner:GUANGZHOU SHIYUAN ELECTRONICS CO LTD

Method for determining malicious software characteristics and malicious software detection method and device

The embodiment of the invention discloses a method for determining malicious software characteristics and a malicious software detection method and device. One of the methods comprises the following steps: determining a longest common substring in each character string binary group in one or more character string binary groups according to a longest common substring algorithm, and determining thecharacteristics of the malicious software according to the determined one or more longest common substrings. Therefore, the features of the malicious software can be automatically extracted, and the working efficiency is greatly improved.
Owner:BEIJING VENUS INFORMATION SECURITY TECH +1

Candidate word evaluation method and device, computer equipment and storage medium

The invention relates to a candidate word evaluation method and device, computer equipment and a storage medium, which are applied to the field of data processing. The method comprises the following steps: detecting an error word and obtaining a plurality of candidate words corresponding to the error word; determining an editing distance between each candidate word and the error word; determiningsimilarity between each candidate word and the error word, wherein the similarity is obtained according to the longest common subsequence and / or longest common substring of each candidate word and theerror word; and obtaining error information of the error word relative to each candidate word; and determining the evaluation score corresponding to each candidate word according to the editing distance, similarity and error information. The embodiment of the invention solves the problem of low reliability of candidate word evaluation, and is beneficial to improving the reliability of candidate word evaluation results.
Owner:GUANGZHOU SHIYUAN ELECTRONICS CO LTD

Candidate word evaluation method, apparatus, computer equipment and storage medium

The invention relates to a candidate word evaluation method, device, computer equipment and storage medium, which are applied to the field of data processing. The method includes: detecting a wrong word, obtaining a plurality of candidate words corresponding to the wrong word; determining the similarity between each candidate word and the wrong word, and the similarity is based on the longest common subsequence and / or the longest common subsequence of each candidate word and the wrong word. or the longest common substring; replace the wrong word with each candidate word respectively to obtain a candidate sentence, and determine the evaluation probability of the corresponding candidate word according to the candidate sentence, and the evaluation probability is based on the language environment probability of the candidate word in the candidate sentence. , and the language environment probability of the adjacent words of the candidate word; obtain the error information of the wrong word relative to each candidate word; according to the similarity, the evaluation probability and the error information, determine the evaluation score corresponding to each candidate word. The embodiment of the present invention solves the problem of low reliability of candidate word evaluation, and is beneficial to improve the reliability of the candidate word evaluation result.
Owner:GUANGZHOU SHIYUAN ELECTRONICS CO LTD

Method and system for obtaining word pair translation from bilingual sentence

InactiveCN100524293CReduce workloadImprove the efficiency of obtaining translationsSpecial data processing applicationsResource poolLongest common substring problem
The invention provides a method for obtaining word pair translation from a bilingual sentence pair. The method includes the following steps: A. a lemma to be treated is received; B. the bilingual sentence pair to be chosen is searched from an index resource pool according to the lemma to be treated; C. two groups of bilingual sentence pairs are chosen from the index result, a longest public substring with the same language type sentence as that of the lemma to be treated in the two groups of the bilingual sentence pairs is obtained; D. whether the substring is consistent to the lemma to be treated or not is judged, if being not consistent, another two groups of bilingual sentence pairs are chosen from the index result, the step C is repeated; if being consistent, then, E. the longest public substring of a corresponding sentence in the two groups of the bilingual sentence pairs is obtained. The index way is utilized, thereby reducing the workload of data processing, and improving the efficiency for obtaining the translation. The invention provides a system obtaining the word pair translation from the bilingual sentence pairs.
Owner:BEIJING KINGSOFT SOFTWARE +2

Candidate word evaluation method and apparatus, computer equipment and storage medium

The invention relates to a candidate word evaluation method and apparatus, computer equipment and a storage medium, which are applied to the field of data processing. The method comprises the steps ofdetecting a wrong word and acquiring a plurality of candidate words corresponding to the wrong word; determining an editing distance between each candidate word and the wrong word; determining the similarity between each candidate word and the wrong word, wherein the similarity is obtained according to the longest common sub-sequence and / or the longest common sub-string of each candidate word andthe wrong word; determining a language environment probability of each candidate word in the position of the wrong word; acquiring error information of the wrong word relative to each candidate word;and according to the editing distance, the similarity, the language environment probability and the error information, determining an evaluation score corresponding to each candidate word. Accordingto the candidate word evaluation method and apparatus, the problem of relatively low reliability of candidate word evaluation is solved, so that the reliability of a candidate word evaluation result is improved.
Owner:GUANGZHOU SHIYUAN ELECTRONICS CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products