Code analysis method based on regular expression
A code analysis and expression technology, applied in the field of code analysis based on regular expressions, can solve the problem of inability to quickly analyze code fragments, and achieve the effect of rapid analysis
Pending Publication Date: 2021-12-10
SOUTH UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
0 Cites 2 Cited by
AI-Extracted Technical Summary
Problems solved by technology
[0004] The technical problem to be solved by the present invention is to provide a code analysis method based on regular expressions for the above-mentioned defects of the prior art, aiming at solving the need...
Method used
In summary, the present invention discloses a method of code analysis based on regular expressions, said method comprising: obtaining code modification text, and extracting preprocessing to said code modification text, to obtain some program statements; Perform code analysis and screening based on regular expression rules on several of the program statements to obtain a number of abnormal program statements; generate Github messages according to the several of the abnormal program statements. The present invention can realize line-by-line matching of each error pattern by extracting and preprocessing the code modification text, and can analyze incomplete code fragments by performing code analysis and screening based on regular expression rules on several of the program statements Perform quick analysis without compiling and parsing entire code bases.
[0077] In step S203, the present invention optimizes regular expression performance by using word boundaries to match words. In regular grammar, a statement is composed of words, and each word is composed of letters, numbers or underscores. A boundary is defined as the edge between a sequence of alphanumeric characters or the underscore character (_) and any other character. '\b' matches a word boundary. For example, the phrase "\bif\b" matches the standalone string "if", but not the string "ifa" because there is no word boundary to the right of the "if". Since the text of source code is usually a string of words, the regular expression rules Codegex restricts each error pattern to a word search so that it can quickly skip unmatched input.
[0078] In step S204, performing background information matching detection on several of the third abnormal program sentences, obtaining several fourth abnormal program sentences includes the following steps: based on a preset search strategy, to several of the third abnormal program sentences The program statement carries out background information matching detection, and when detecting that there are potential safety hazards in the background information in some of the third abnormal program statements, only some of the third abnormal program statements with security risks are detected, which can improve The function of warning priority or excluding false positives finally obtains a number of fourth abnormal program statements, wherein the search strategy includes searching in all code modification texts and searching on the code hosting platform.
[0080] In step S205, the present invention encodes Java operator priority (for determining the evaluation order of operators) in the analyzer to improve the accuracy of analyzing arithmetic operations and bit operations. For example, when detecting the SA_LOCAL_SELF_COMPUTATION mode, which checks for meaning...
Abstract
The invention discloses a code analysis method based on a regular expression, and the method comprises the steps: obtaining a code change text, and carrying out the extraction preprocessing of the code change text to obtain a plurality of program statements; performing regular expression rule-based code analysis and screening on the plurality of program statements to obtain a plurality of abnormal program statements; and according to the plurality of abnormal program statements, generating a Github message. Each error pattern can be matched line by line by performing extraction preprocessing on the code change text, incomplete code snippets can be quickly analyzed by performing regular expression rule-based code analysis and screening on a plurality of program statements, and the whole code library does not need to be compiled and analyzed.
Application Domain
Digital data information retrievalSoftware testing/debugging +1
Technology Topic
CodebaseTheoretical computer science +4
Image
Examples
- Experimental program(1)
Example Embodiment
[0052] The present invention discloses a code analysis method based on regular expressions. In order to make the purpose, technical solution and effect of the present invention clearer and clearer, the present invention will be further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.
[0053] Those skilled in the art will understand that unless otherwise stated, the singular forms "a", "an", "said" and "the" used herein may also include plural forms. It should be further understood that the word "comprising" used in the description of the present invention refers to the presence of said features, integers, steps, operations, elements and/or components, but does not exclude the presence or addition of one or more other features, Integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when the invention refers to an element as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Additionally, "connected" or "coupled" as used herein may include wireless connection or wireless coupling. The expression "and/or" used herein includes all or any elements and all combinations of one or more associated listed items.
[0054] Those skilled in the art can understand that, unless otherwise defined, all terms (including technical terms and scientific terms) used herein have the same meaning as commonly understood by those of ordinary skill in the art to which this invention belongs. It should also be understood that terms, such as those defined in commonly used dictionaries, should be understood to have meanings consistent with their meaning in the context of the prior art, and unless specifically defined as herein, are not intended to be idealized or overly Formal meaning to explain.
[0055] In the prior art, in the field of automatic code review (code review), a common practice is to check code through static analysis. But existing work suffers from the following drawbacks: (1) some developers believe that running static analysis tools like FindBugs may lead to a loss of productivity because these tools take a long time to run; (2) they focus more on how to show The output of static analysis tools, or the use of microservices to apply static analysis to large code bases, rather than adapting static analysis tools to improve their application effectiveness in the field of code review; (3) static analysis tools like FindBugs are analysis Java bytecode, which means that to run these tools, also need to meet the prerequisites: get all dependencies to successfully generate bytecode. With the popularity of pull-based websites such as Github, static analysis tools also need to face the problem of how to quickly analyze incomplete code fragments, because they usually need to fetch all the code and compile it to generate Java bytecode.
[0056] Other automated code review techniques either rely on deep learning to model code changes and corresponding review comments, or rely on code reviewer recommendations. Although these techniques may be used to discover new problems in a given code change, they are more suitable for code review on mature projects with many PRs (Pull Requests, changed code content) and code review comments. If used in Code snippets can make the automated review process too cumbersome and complicated.
[0057] In order to solve the problems of the prior art, this embodiment provides a code analysis method based on regular expressions, by extracting and preprocessing the code change text, each error pattern can be matched line by line, and by combining several Code analysis and screening based on regular expression rules can be performed on the above program statements, which can quickly analyze incomplete code fragments without compiling and parsing the entire code base. During specific implementation, firstly, the code change text is obtained, and the code change text is extracted and preprocessed to obtain a number of program statements; then a number of the program statements are subjected to code analysis and screening based on regular expression rules, and a number of exceptions are obtained A program statement; wherein, the abnormal program statement is a program statement that includes error information; finally, according to some of the abnormal program statements, a Github message is generated, wherein the Github message includes a code reference and a message text.
[0058] exemplary method
[0059] This embodiment provides a code analysis method based on a regular expression, which can be applied to an intelligent terminal of a computer. Specific as figure 1 As shown, the method includes:
[0060] Step S100, obtaining the code modification text, and performing extraction and preprocessing on the code modification text to obtain several program statements;
[0061] like figure 2 In the method flow chart shown, the code modification text is code patch, and the code patch refers to a small code program compiled for a program or software with bugs, so as to make the program or software more perfect. In the embodiment of the present invention, each code patch is extracted and preprocessed to obtain several program statements. Correspondingly, the extracting and preprocessing the code modification text to obtain several program statements includes the following steps:
[0062] Step S101, extracting the context object and the new line object of the code modification text;
[0063] Step S102, modifying and dividing the context object and the newly added row object to obtain several program statements.
[0064] Specifically, for a code patch (patches), it is usually to correct a piece of program or software, and the correction method includes contexts (that is, the entire programming environment), new line objects and deleted row object. The context object is the content of the upper and lower segments of a piece of code. Additions are new lines of code added by authors writing code patches. And deleted row objects (deletions) are the row codes deleted by the author who writes the code patch. Since the deleted row codes do not exist in later code versions, the deleted row objects can be ignored in this embodiment. Then change and split the context object and the new line object to obtain a number of program statements, for example: given a PR with Unified diff format to change C, Codegex will treat C as text, and Use Java program terminators (i.e., semicolons, "{" and "}") to split text into program statements. Since most regex libraries have good support for single-line matching, this preprocessing step enables Codegex to match each error pattern line-by-line, rather than matching multiple lines of code at once.
[0065] After getting a number of program statements, you can execute such as figure 1 The following steps are shown: S200. Perform code analysis and screening of several of the program statements based on regular expression rules to obtain a number of abnormal program statements; wherein, the abnormal program statements are program statements containing error information;
[0066] Specifically, for the statements extracted from the changed content, the regularization-based analyzer checks for problems in the statements through bug pattern detection. The main technical challenge faced by the analyzer is to design regular rules (regex rules) to represent the patterns selected from SpotBugs, rather than relying on off-the-shelf program analysis techniques. To solve this problem, the present invention uses several strategies to ensure the effectiveness of error detection. Correspondingly, performing code analysis and screening based on regular expression rules on several of the program statements to obtain a number of abnormal program statements includes the following steps:
[0067] S201. Based on regular expression rules, perform syntax-guided matching detection on several of the program statements to obtain several first abnormal program statements;
[0068] S202. Based on regular expression rules, perform type-driven matching detection on several first abnormal program statements to obtain several second abnormal program statements;
[0069] S203. Perform word boundary matching detection on several of the second abnormal program statements, and when the words in some of the second abnormal program statements contain boundaries, only perform matching on independent character strings composed of the words to obtain several The third abnormal program statement;
[0070] S204. Perform background information matching detection on several third abnormal program statements to obtain several fourth abnormal program statements;
[0071] S205. Perform encoding operator priority matching detection on a plurality of fourth abnormal program statements to obtain a plurality of fifth abnormal program statements;
[0072] S206. Perform anti-pattern matching detection on several fifth abnormal program statements to obtain several abnormal program statements.
[0073] In step S201, the performing grammar-driven matching detection (Type-driven matching) on several program statements based on regular expression rules, and obtaining several first abnormal program statements includes the following steps: obtaining signature information, wherein the Signature information is used to represent information representing class names, method names, variable names, field names, modifiers, JAVA keywords and operators in program statements; adding signature information to regular expression rules to obtain the first fusion regular Expression rule; carry out keyword matching detection to several described program statements, when detecting that several described program statements contain keywords representing conditions of one or several error patterns, then record the source code information of the row where the keywords are located, File path, number of lines, matching pattern name, pattern description and priority to obtain a number of keyword abnormal program statements; based on the first fusion regular expression rule, perform pattern-based matching detection on several of the keyword abnormal program statements , to obtain a number of first exception program statements.
[0074] Specifically, in the pattern investigation of SpotBugs in the present invention, most of the patterns in SpotBugs are detected using the information of class signature (3.65%) or method signature (13.70%). First, the signature information is added to the regular expression rules to obtain the first fusion regular expression rules; as the signature information is added to the regular rules (regexrules) of the present invention, the present invention uses representative classes/methods/ Keywords, modifiers (eg "static"), Java keywords (eg "if") and operators ("&&") for variable/field names are used for detection. To support grammar-guided matching, the present invention uses a hierarchical analysis approach, checking each pattern using two stages. The first mode is keyword matching detection, which carries out keyword matching detection to some of the program statements, and when it is detected that some of the program statements contain keywords representing conditions of one or several error patterns, then record the keywords The source code information, file path, line number, matching pattern name, pattern description and priority of the line where it is located, get a number of keyword exception program statements; keyword matching is a faster analysis, designed to filter out any error patterns that do not match statement. That is to say, keyword matching is to check whether a program statement contains a keyword representing a condition of an error pattern or a group of error patterns. For example, when checking whether the special field name serialVersionUID of a serializable class is declared as static (SE_NONSTATIC_SERIALVERSIONID), the present invention uses keyword matching to skip statements that do not contain the keyword "serialVersionUID". After keyword matching detection, in order to obtain more accurate matching results, based on the first fusion regular expression rule, pattern-based matching detection is performed on several second abnormal program statements to obtain several first abnormal program statements.
[0075] In step S202, the type-driven matching detection (Type-driven matching) is performed on several of the first abnormal program statements based on regular expression rules, and obtaining the second abnormal program statement includes the following steps: obtaining data type information, wherein The data type information is used to characterize the type information of the first abnormal program statement; the type information includes byte type, short integer type, integer type, long integer type, single-precision floating-point type, double-precision floating-point type, Boolean type, Character type; type information is added to the regular expression rule to obtain the second fusion regular expression rule; based on the second fusion regular expression rule, type matching detection is performed on several of the first abnormal program statements; when detected When the type information patterns in some of the first abnormal program statements are wrong, record the source code information, file path, line number, matching pattern name, pattern description, and priority of the line where the type information pattern is located, and obtain several second abnormal programs statement.
[0076] In this embodiment, in the survey of the present invention, about 44.29% of the patterns require data type information. Although Codegex essentially regards code modification as plain text to match, the present invention uses data types as the key to analysis words, to incorporate the type information into the error pattern, that is, to add the type information to the regular expression rule to obtain the second fusion regular expression rule; based on the second fusion regular expression rule, for some of the first abnormal programs The statement carries out type matching detection; when detecting that the type information patterns in some of the first abnormal program statements are wrong, then record the source code information, file path, line number, matching pattern name, pattern description, priority of the line where the type information pattern is located. level, a number of second exception program statements are obtained. For example, in detection mode RV_01_TO_INT, a warning is generated when a random value from 0 to 1 is coerced to an integer value. The present invention uses regular pattern "\(\s*int\s*\)\s*(\w+)\.(?:random|nextDouble|nextFloat)\(\s*\)" to detect the pattern, wherein "\ (\s*int\s*\)" to detect casts. By taking into account the data type information, the present invention can deterministically report this mode with the highest priority (the same priority used in SpotBugs).
[0077] In step S203, the present invention optimizes the regular expression performance by using word boundaries to match by words. In regular grammar, a statement is composed of words, and each word is composed of letters, numbers or underscores. A boundary is defined as the edge between a sequence of alphanumeric characters or the underscore character (_) and any other character. '\b' matches a word boundary. For example, the phrase "\bif\b" matches the standalone string "if", but not the string "ifa" because there is no word boundary to the right of the "if". Since the text of source code is usually a string of words, the regular expression rules Codegex restricts each error pattern to a word search so that it can quickly skip unmatched input.
[0078] In step S204, performing background information matching detection on several of the third abnormal program statements to obtain several fourth abnormal program statements includes the following steps: based on a preset search strategy, performing a search on several of the third abnormal program statements Background information matching detection, when it is detected that there are potential safety hazards in the background information in some of the third abnormal program statements, only some of the third abnormal program statements that have potential safety hazards are detected, which can improve the warning priority Or exclude the effect of false positives, and finally obtain several fourth abnormal program statements, wherein the search strategy includes searching in all code modification texts and searching on the code hosting platform.
[0079] Specifically, the preset search strategies include "diff" search (search in all code changes) and online search (search code on Github). Github is a hosting platform for open source and private software projects. Github messages are on Github leave a message. Existing techniques show that some error patterns in SpotBugs require more contextual information to ensure accurate error detection. Therefore, it is necessary to perform background information matching detection on several third abnormal program statements based on a preset search strategy to obtain several fourth abnormal program statements. In practice, Codegex uses two search strategies to add background information to the analysis: diff search and online search. When these search strategies successfully find relevant analysis context objects, Codegex will adjust the priority of a given error pattern because the probability of identifying an error pattern increases when given more context objects. For most of the implemented error patterns, Codegex uses a regular expression to match one-liner statements. When the diff search strategy is activated in a bug mode, Codegex will use contextual information around statements by searching all code changes in an incoming changed code content (also called a pull request) PR. For example the mode UI_INHERITANCE_UNSAFE_GETRESOURCE checks for calls to this.getClass().getResource() because the method call may be unsafe if the class calling the method is extended by a class in another package. Detecting this pattern requires checking (1) whether the program statement contains a call to the getClass().getResource() method (which can be matched with a regex), and (2) whether the class is extended (if this condition is met, SpotBugs will raise warning priority). To check the second condition, Codegex uses the diff search strategy to search for the "extendsClassA" keyword (ClassA is the name of the class that called the getResource() method) in the code changes ("diff") in the given changed code content PR . If the diff search fails, Codegex will use an online search to do a further check on the second condition. Specifically, Online Search uses the GitHub Search API to perform a code search of the entire repository for a given PR. For example, to detect the pattern UI_INHERITANCE_UNSAFE_GETRESOURCE, Codegex searches the repository for the keyword "extends ClassA". If the query is found in the relevant code-modified repository, Codegex will increase the priority of this error pattern because the second condition is already met. Currently, Codegex only uses online search in one mode, because (1) it is expensive and very dependent on the speed of the GitHub search API, and (2) it requires defining an exact-match search query (for example, if you change the query to "extends Class", the search may return many irrelevant results).
[0080] In step S205, the present invention encodes the priority of Java operators (for determining the evaluation order of operators) in the analyzer to improve the accuracy of analyzing arithmetic operations and bit operations. For example, when detecting the SA_LOCAL_SELF_COMPUTATION mode, which checks for meaningless self-computation in the statement "return i|i&j;", if the present invention extracts the bit manipulation with a simple regular expression, it will match the first expression "i|i", because the operator "i|i" is meaningless, but in i|i&j, this is a false positive; because it actually means "i|(i&j) ", because the operator '&' has higher precedence than '|'. In this example, encoding the operator priority into the pattern helps to reduce the misjudgment rate of Codegex, so that a number of the fourth abnormal program statements are encoded to match the operator priority, and a number of fifth abnormal program statements are obtained. .
[0081] In step S206, performing anti-pattern matching detection on several fifth abnormal program statements to obtain several abnormal program statements includes the following steps: filtering and encoding anti-patterns based on keywords, and matching several fifth abnormal program statements Detecting to obtain a number of sixth abnormal program statements; based on the negative lookahead assertion coding anti-pattern, performing matching detection on several of the sixth abnormal program statements to obtain a number of abnormal program statements.
[0082] Specifically, when designing regularization rules, most of the error patterns in SpotBugs in the prior art have a set of rules that do not allow matching certain program elements to prevent false positives. The present invention refers to these rules as anti-patterns. To ensure the accuracy of error detection, the present invention encodes antipatterns using several strategies: keyword filtering encoding antipattern and negative lookahead encoding antipattern (negative lookahead). Keyword filtering coding anti-pattern: At the beginning of the design of each pattern, the present invention refers to several sources: (1) error description, (2) source code, and (3) test cases in SpotBugs; the present invention starts from the above Anti-patterns are extracted from test cases to improve the accuracy of the analysis. For example, the mode NM_CLASS_NAMING_CONVENTION checks whether the class name of a Java class conforms to the upper camel case (upper camel case is recommended in JAVA programs). In order to prevent false positives when analyzing special classes, SpotBugs adds a filter rule for class names with an underscore character '_' at the beginning of the word. In order to reuse this filter in Codegex, the present invention skips the naming convention check for class names with underscore characters. Negative lookahead encoding antipattern: For certain patterns, Codegex uses negative lookaheads (regex construct "q(?!u)" for matching a q not followed by a regular expression u) to filter Negative or corner cases. For example, in order to detect the NM_METHOD_NAMING_CONVENTION pattern, that is, to check whether a Java method is a lower camel case, the present invention includes the regularization "(?!new)" to avoid matching constructors whose first letter may be an uppercase letter (such as "new Object()") because its method name may start with a capital letter.
[0083] In another implementation mode of the present invention, the present invention is realized by using the built-in regular method library re of the python language and its extended library regex. If you use other languages to develop, you can use the corresponding regular method library instead.
[0084] After obtaining a number of abnormal program statements, the following steps can be performed: S300. Generate Github comments according to the number of abnormal program statements. Correspondingly, the said generation of Github message according to some of the abnormal program statements includes the following steps:
[0085] S301. According to each of the abnormal program statements, determine the source code information, file path, line number, matching pattern name, pattern description and priority corresponding to each of the abnormal program statements;
[0086] S302. Input the source code information, the file path, the line number, the matching pattern name, the pattern description, and the priority corresponding to each of the abnormal program statements into a message generator to generate a Github message.
[0087]Specifically, warnings will be included in the above-mentioned several abnormal program statements, that is, relevant pattern type (pattern type), matching pattern name, file path, error pattern description (bug description), source information (file name, line number) will be generated. And the priority of the warning, etc., the above process is done in an analyzer based on regular expression rules. Based on the source code information, the file path, the line number, the matching pattern name, the pattern description, and the priority of the above-mentioned several abnormal program statements in the analyzer, the changed code content ( PR) Comment Generator (Github) Automatically generate Github comments containing code references. For each code fragment in the changed code content (PR) that the analyzer of the present invention generates a warning, the PR comment generator of the present invention will give a Github comment with the comment code. Formally, a line that violates the bug pattern has (1) the bug category cat to which it belongs (eg, BADPRACTICE), (2) the short description sd, and (3) the long description ld. The regular expression Codegex uses the following template to generate a message: I detect that this code is problematic.According to the cat,sd(pat).ld. image 3 Shows an example of a Codegex-generated comment where Codegex reports a warning for the NM_METHOD_NAMING_CONVENTION pattern under the BAD_PRACTICE category and cites the offending line of code.
[0088] Advantage of the present invention, Codegex has the same accuracy as SpotBugs and is more accurate in detecting certain patterns. First, the present invention has intercepted the top 100 open source projects of Github, and these projects (1) have the largest number of stars, and (2) use Maven to compile (the present invention uses the SpotBugs Maven plug-in as a benchmark). While Codegex does not require compilation, SpotBugs can only run on compiled code, so this invention excludes 48 Java projects that cannot be compiled with the default Maven compile command, which skips compilation of other files (Com:mvn clean install -DskipTests=true -Dgpg.skip=true -Drat.skip=true -Dmaven.javadoc.skip=true -fn-Bdependency:purge-local-repository). Finally, the present invention evaluates Codegex and SpotBugs on 52 projects, and the statistics show that the scales of these projects are different, and the number of lines of source codes ranges from 0.01K to 1279.49K.
[0089] Figure 4 The comparison results of the two tools are shown. This invention mainly focuses on the alarms generated by the two tools that fail to reach an agreement, because the two tools have reached an agreement in Overlaps, and their validity is the same, so this invention does not expand this part , to save space. The present invention calculates the accuracy rate (Accuracy), precision (Precision), recall rate (Recall) and F1 score of the two tools respectively. Codegex outperforms SpotBugs in precision, recall, and F1-score for 6 out of 10 modalities. In addition, Codegex outperforms SpotBugs in terms of overall precision, recall and F1 score. The inventors also observe that (1) Codegex can find one more TP than SpotBugs for most error patterns, and (2) Codegex performs particularly well in detecting DMI_RANDOM_USED_ONLY_ONCE patterns, finding more TPs and fewer FNs.
[0090] Second, Codegex runs 88k times faster than SpotBugs in terms of initial compilation time and analysis time of SpotBugs. If only considering the time for SpotBugs to generate the analysis report, Codegex can run up to 877 times faster than SpotBugs (average speed = 76.87).
[0091] Finally, because the present invention treats the code as text without compiling, the present invention is not only applicable to complete project codes, but also can analyze incomplete code fragments, and has wider application scenarios.
[0092] exemplary device
[0093] like Figure 5 As shown in , the embodiment of the present invention provides a code review device based on regular expressions, which includes several program statement acquisition units 401, several abnormal program statement acquisition units 402, and message text generation unit 403, wherein:
[0094] A plurality of program statement acquisition unit 401, configured to acquire code modification text, and extract and preprocess the code modification text to obtain a plurality of program statements;
[0095] A plurality of abnormal program statements acquiring unit 402, configured to perform code analysis and screening based on regular expression rules on several of the program statements to obtain a plurality of abnormal program statements; wherein, the abnormal program statements are program statements containing error information;
[0096] The message text generation unit 403 is configured to generate Github messages according to the several abnormal program statements, wherein the Github messages include code references and message texts.
[0097] Based on the above embodiments, the present invention also provides an intelligent terminal, the functional block diagram of which can be as follows Image 6 shown. The intelligent terminal includes a processor, a memory, a network interface, a display screen and a temperature sensor connected through a system bus. Wherein, the processor of the smart terminal is used to provide calculation and control capabilities. The memory of the smart terminal includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The network interface of the smart terminal is used to communicate with external terminals through a network connection. When the computer program is executed by the processor, a regular expression-based code analysis method is implemented. The display screen of the smart terminal may be a liquid crystal display screen or an electronic ink display screen, and the temperature sensor of the smart terminal is pre-set inside the smart terminal to detect the operating temperature of the internal equipment.
[0098] Those skilled in the art can understand that, Image 6 The schematic diagram in is only a block diagram of a part of the structure related to the solution of the present invention, and does not constitute a limitation on the smart terminal to which the solution of the present invention is applied. A specific smart terminal may include more or more fewer components, or combine some components, or have a different arrangement of components.
[0099] In one embodiment, an intelligent terminal is provided, including a memory, and one or more programs, wherein one or more programs are stored in the memory, and are configured to be executed by one or more processors. One or more programs contain instructions for:
[0100] Obtaining code change text, and extracting and preprocessing the code change text to obtain several program statements;
[0101] Performing code analysis and screening based on regular expression rules on several of the program statements to obtain a number of abnormal program statements; wherein, the abnormal program statements are program statements containing error information;
[0102] According to several of the abnormal program statements, a Github message is generated, wherein the Github message includes a code reference and a message text.
[0103] Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented through computer programs to instruct related hardware, and the computer programs can be stored in a non-volatile computer-readable memory In the medium, when the computer program is executed, it may include the processes of the embodiments of the above-mentioned methods. Wherein, any reference to memory, storage, database or other media used in the various embodiments provided by the present invention may include non-volatile and/or volatile memory. Nonvolatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in many forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
[0104] To sum up, the present invention discloses a code analysis method based on regular expressions. The method includes: obtaining code modification text, and extracting and preprocessing the code modification text to obtain several program statements; The above program statements are analyzed and screened based on regular expression rules to obtain a number of abnormal program statements; according to the number of abnormal program statements, a Github message is generated. The present invention can realize line-by-line matching of each error pattern by extracting and preprocessing the code modification text, and can analyze incomplete code fragments by performing code analysis and screening based on regular expression rules on several of the program statements Perform quick analysis without compiling and parsing entire code bases.
[0105] Based on the above embodiments, the present invention discloses a code analysis method based on regular expressions. It should be understood that the application of the present invention is not limited to the above examples, and those of ordinary skill in the art can improve or Transformation, all these improvements and transformations should belong to the protection scope of the appended claims of the present invention.
PUM


Description & Claims & Application Information
We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.
Similar technology patents
Dielectric barrier discharge plasma emission spectrometer based on online detection
ActiveCN112098395Aquick analysisSimple instrument structure
Owner:BEIHANG UNIV
Business travel public opinion analysis method based on scapy crawler architecture and text analysis
PendingCN112148936Aquick analysis
Owner:广州瀚信通信科技股份有限公司
Kinship relationship analyzing method and device, storage medium and intelligent terminal
Owner:新智数通(北京)技术服务有限公司
Classification and recommendation of technical efficacy words
- quick analysis
Dielectric barrier discharge plasma emission spectrometer based on online detection
ActiveCN112098395Aquick analysisSimple instrument structure
Owner:BEIHANG UNIV
Business travel public opinion analysis method based on scapy crawler architecture and text analysis
PendingCN112148936Aquick analysis
Owner:广州瀚信通信科技股份有限公司