Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

302 results about "Unicode" patented technology

Unicode is a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard is maintained by the Unicode Consortium, and as of May 2019 the most recent version, Unicode 12.1, contains a repertoire of 137,994 characters covering 150 modern and historic scripts, as well as multiple symbol sets and emoji. The character repertoire of the Unicode Standard is synchronized with ISO/IEC 10646, and both are code-for-code identical.

Multi-language domain name service

A multilingual Domain Name System allows users to use Domain Names in non-Unicode or ASCII encodings. An international DNS server (or iDNS server) receives multilingual DNS requests and converts them to a format that can be used in the conventional Domain Name System. When the iDNS server first receives a DNS request, it determines the encoding type of that request. It may do this by considering the bit string in the top-level domain (or other portion) of the Domain Name and matching that string against a list of known bit strings for known top-level domains of various encoding types. One entry in the list may be the bit string for ".com" in Chinese BIG5, for example. After the iDNS server identifies the encoding type of the Domain Name, it converts the encoding of the Domain Name to Unicode. It then translates the Unicode representation to an ASCII representation conforming to the universal DNS standard. This is then passed into a conventional Domain Name System, which recognizes the ASCII format Domain Name and returns the associated IP address.
Owner:I DNS NET INT

Methods and apparatus related to pruning for concatenative text-to-speech synthesis

The present invention provides, among other things, automatic identification of near-redundant units in a large TTS voice table, identifying which units are distinctive enough to keep and which units are sufficiently redundant to discard. According to an aspect of the invention, pruning is treated as a clustering problem in a suitable feature space. All instances of a given unit (e.g. word or characters expressed as Unicode strings) are mapped onto the feature space, and cluster units in that space using a suitable similarity measure. Since all units in a given cluster are, by construction, closely related from the point of view of the measure used, they are suitably redundant and can be replaced by a single instance. The disclosed method can detect near-redundancy in TTS units in a completely unsupervised manner, based on an original feature extraction and clustering strategy. Each unit can be processed in parallel, and the algorithm is totally scalable, with a pruning factor determinable by a user through the near-redundancy criterion. In an exemplary implementation, a matrix-style modal analysis via Singular Value Decomposition (SVD) is performed on the matrix of the observed instances for the given word unit, resulting in each row of the matrix associated with a feature vector, which can then be clustered using an appropriate closeness measure. Pruning results by mapping each instance to the centroid of its cluster.
Owner:APPLE INC

Multi-language domain name service

A multilingual Domain Name System allows users to use Domain Names in non-Unicode or ASCII encodings. An international DNS server (or iDNS server) receives multilingual DNS requests and converts them to a format that can be used in the conventional Domain Name System. When the iDNS server first receives a DNS request, it determines the encoding type of that request. It may do this by considering the bit string in the top-level domain (or other portion) of the Domain Name and matching that string against a list of known bit strings for known top-level domains of various encoding types. One entry in the list may be the bit string for ".com" in Chinese BIG5, for example. After the iDNS server identifies the encoding type of the Domain Name, it converts the encoding of the Domain Name to Unicode. It then translates the Unicode representation to an ASCII representation conforming to the universal DNS standard. This is then passed into a conventional Domain Name System, which recognizes the ASCII format Domain Name and returns the associated IP address.
Owner:I DNS NET INT

Methods, systems, and computer program products for securely transforming an audio stream to encoded text

A method, system, computer program product, and method of doing business by providing improved audio compression wherein an audio stream is securely transformed to an encoded text stream (such as an ASCII, EBCDIC, or Unicode text stream). One or more components which are involved in the transformation process are authenticated. A unique identifier of each such component is included within cryptographically-protected information that is provided for the encoded text stream. A digital signature is preferably used for the cryptographic protection, thereby digitally notarizing the encoded text stream. The authenticity and integrity of the encoded text stream can therefore be verified. In preferred embodiments, the authenticated identities of components performing the transformation can also be determined from the cryptographically-protected information. The encoded text stream will typically require much less storage space than the audio stream, and providing the digital notarization along with the encoded text stream serves to reliably establish evidence of the contents of the audio stream (even though a perfect speech-to-text transformation might not be achieved).
Owner:MICROSOFT TECH LICENSING LLC

Method for extracting, interpreting and standardizing tabular data from unstructured documents

A system, method, and computer program for automatically identifying, parsing, and interpreting tabular data from unstructured documents stored in various formats such as ASCII text, Unicode text, HTML, PDF text, and PDF image format is provided. A set of table identification, parsing / tokenizing, and interpreting / mapping rules are developed with grammar descriptors. These rules are then applied to a set of documents to identify a table, parse the content of the table, and interpret the parsed content, if required, thereby standardizing the tabular data.
Owner:RAGE FRAMEWORKS +1

Character-level font linking

A “Character-Level Font Linker” provides character-level linking of fonts via Unicode code-point to font mapping. A lookup table is used to identify glyph-level support for runs of particular characters on a Unicode code-point basis for relative to a set of available fonts. This lookup table enables automatic selection of one or more specific fonts for rendering one or more runs of characters comprising a text string. The lookup table is constructed offline by automatically evaluating glyphs comprising a set of common or default fonts. The table is then used for automatically selecting fonts for rendering text strings. Alternately, the lookup table is generated (or updated) locally to include some or all locally installed fonts. Finally, in another embodiment, if no supporting font is identified in the table for a particular character, the system automatically downloads the necessary glyph from one or more remote servers.
Owner:MICROSOFT TECH LICENSING LLC

Data structure for creating, scoping, and converting to unicode data from single byte character sets, double byte character sets, or mixed character sets comprising both single byte and double byte character sets

A data structure for specifying the types of constants whose character values are to be converted to Unicode; for specifying which code page or pages are used for specifying the character encodings used in the source program for writing the character strings to be converted to Unicode; and that can be used to perform conversions from SBCS, mixed SBCS / DBCS, and pure DBCS character strings to Unicode. A syntax suitable for specifying character data conversion from SBCS, mixed SBCS / DBCS, and pure DBCS representation to Unicode utilizes an extension to the conventional constant subtype notation. In converting the nominal value data to Unicode, currently relevant SBCS and DBCS code pages are used, as specified by three levels or scopes derived from either global options, from local AOPTIONS statement specifications, or from constant-specific modifiers. Global code page specifications apply to the entire source program. These global specifications allow a programmer to declare the source-program code page or code pages just once. These specifications then apply to all constants containing a request for conversion to Unicode. Local code page specifications apply to all subsequent source-program statements. These local specifications allow the programmer to create groups of statements containing Unicode conversion requests, all of which use the same code page or code pages for their source-character encodings. Code page specifications that apply to individual constants allow a detailed level of control over the source data encodings to be used for Unicode conversion. The conversion of source data to Unicode may be implemented inherently to the translator (assembler, compiler, or interpreter) wherein it recognizes and parses the complete syntax of the statement in which the constant or constants is specified, and performs the requested conversion. Alternatively, an external function may be invoked by a variety of source language syntaxes which parses as little or as much of the source statement as its implementation provides, and returns the converted value for inclusion in the generated machine language of the object program. Alternatively, the conversion may be provided by the translator's macro instruction definition facility.
Owner:IBM CORP

Technique for improved audio compression

A method, system, computer program product, and method of doing business by providing improved audio compression wherein an audio stream is securely transformed to an encoded text stream (such as an ASCII, EBCDIC, or Unicode text stream). One or more components which are involved in the transformation process are authenticated. A unique identifier of each such component is included within cryptographically-protected information that is provided for the encoded text stream. A digital signature is preferably used for the cryptographic protection, thereby digitally notarizing the encoded text stream. The authenticity and integrity of the encoded text stream can therefore be verified. In preferred embodiments, the authenticated identities of components performing the transformation can also be determined from the cryptographically-protected information. The encoded text stream will typically require much less storage space than the audio stream, and providing the digital notarization along with the encoded text stream serves to reliably establish evidence of the contents of the audio stream (even though a perfect speech-to-text transformation might not be achieved).
Owner:NUANCE COMM INC

Authentication password storage method and generation method, user authentication method, and computer

Protection of an authentication password stored in a database held by a SAM of Windows® is strengthened. A GINA, a part of the OS, receives an authentication password in ASCII codes. The authentication password is converted to first UNICODES, and the first UNICODES are salted with a random number and converted to second UNICODES. The random number used for salting is associated with a user account and a password and stored in a read / write protected non-volatile memory or a non-volatile memory which can be accessed only by a BIOS. An LSA of the OS can process UNICODES without being changed.
Owner:LENOVO PC INT

Method for extracting, interpreting and standardizing tabular data from unstructured documents

A system, method, and computer program for automatically identifying, parsing, and interpreting tabular data from unstructured documents stored in various formats such as ASCII text, Unicode text, HTML, PDF text, and PDF image format is provided. A set of table identification, parsing / tokenizing, and interpreting / mapping rules are developed with grammar descriptors. These rules are then applied to a set of documents to identify a table, parse the content of the table, and interpret the parsed content, if required, thereby standardizing the tabular data.
Owner:RAGE FRAMEWORKS +1

Method and apparatus for layout of text and image documents

A mixed text and image layout algorithm capable of supporting Unicode text and arbitrary content definitions for geometric layout with worst case two-pass layout placement procedure. Layout of Unicode text requires a number of distinct processing steps commencing with classification of input characters into contiguous groups of identical directionality, writing system and possibly script (and language) followed by mapping of character groups to glyphs for display purposes followed by a layout taking into account font display characteristics, embedded directionality level and shape of container for layout contents. Layout is best-case achieved in a single layout pass and worst-case in two passes. During layout information is cached to facilitate incremental changes to an existing layout in order to minimize refresh operations for editing display purposes. An optional two-pass operation on the layout result may be used to generate ordered rendering operation to support so-called Z-index display. An optimized Unicode character classification method utilizing reduced memory is also disclosed. Additionally a method to selectively display caret location for mixed font and / or directional text is disclosed.
Owner:DANILO ALEXANDER VINCENT

Unicode input method editor

A method for converting to Unicode, in a Java Input Method Editor ("IME"), the encoding formats of a character code unit, including selecting an encoding format, receiving, through a computer user interface, in an IME, at least one character code unit having the encoding format and an encoding base, and displaying the character code unit through the computer user interface. Embodiments also include converting the encoding format of the character code unit to Unicode, thereby creating a Unicode code point, displaying, through the computer user interface, a glyph corresponding to the Unicode code point, and transferring the Unicode code point to an application.
Owner:GOOGLE LLC

Translating a non-unicode string stored in a constant into unicode, and storing the unicode into the constant

Provided are a method, system, and program for creating a string of Unicode characters stored in a memory of a computer. A constant is created whose data type is a non-Unicode data type, wherein the constant specifies non-Unicode data to convert to Unicode. A string of non-Unicode characters are stored in the constant which is stored in the memory of the computer. A specification of a code page is retrieved in which the non-Unicode character string is encoded. The non-Unicode character string stored in the constant is translated into a Unicode character string responsive to the specification of the code page and the Unicode character string is stored in the constant stored in the memory of the computer.
Owner:GOOGLE LLC

Cross-platform Mongolian display and intelligent input method based on Unicode

The invention relates to a method for displaying Mongolian on a GNOME desktop system platform of an LINUX system. The method comprises steps of building a Mongolian processing system engine in a Pango system processing word language in the GNOME desktop system, registering a name of the Mongolian processing system to the Pango system executing word langue processing, forming an interface between the Mongolian processing system engine and a word langue processing module of an operation system, generating a Mongolian processing module based on rules and structures of an Open Type font in the Mongolian processing system engine, constructing an font section engine to select and replace the Open Type Mongolian font, and finally obtaining correct Mongolian display results after font selecting replacement. Mongolian display and intelligent input thereof on the basis of the Unicode in the Linux operation system are realized by the method, and the Mongolian display and the intelligent input method thereof can be used together with Chinese or other language input methods which are loaded and can not affect original functions and applications thereof.
Owner:MINZU UNIVERSITY OF CHINA

AP connecting method and system of Wi-Fi of Android

The invention discloses an AP connecting method and system of the Wi-Fi of the Android. The method comprises the steps of starting the Wi-Fi, initiating AP scanning, and storing the scanned AP information in a storage unit; extracting a BSSID and an SSID in the AP information, and storing the BSSID and the SSID in a List container; detecting a coded format of the SSID of the AP in the storage unit, if the coded format is a non Unicode coded format, converting the coded format of the SSID into a Unicode coded format, and executing the next step; selecting any one AP in an AP list of the Wi-Fi in an UI region of the Unicode coded format to conduct connection. According to the method and system, the defect that the Android does not support the Wi-Fi Chinese SSID is overcome, the Chinese-named AP scanned by an Android terminal is normally displayed, and the AP of the Chinese SSID can be connected.
Owner:TCL CORPORATION

Method and apparatus for displaying electronic book on mobile phone

The invention discloses a method for displaying electric book on a mobile phone, characterized in that reading and displaying the data in fixed length in each time, recording bookmark information at the tail of document. The invention also disclose a relative device, comprising a document head reader for reading the information of document head and finding document format, a data reading and converting unit for reading the data in fixed length and converting the data into Unicode, a display unit for displaying the read and converted data, and a bookmark operator for adding bookmark information at the tail of the document. The invention uses segmented read method in limited device as mobile terminal to reduce memory consumption, support ultra-large document, realize paging, and jump or the like, and inserts bookmark data into the tail of document, to directly correlate the bookmark and the document to support the management of bookmark data, as document deletion and bookmark data elimination.
Owner:SHANGHAI CHENXING ELECTRONICS SCI & TECH CO LTD

Embedded equipment and method for displaying language word on OSD interface

The invention discloses embedded equipment and a method for displaying a language character on an on-screen display (OSD) interface. The method comprises the following steps of: reading UNICODE code character data to be displayed; judging a word stock in which the character data is arranged according to a UNICODE code value; comparing the UNICODE code value of the character data with a UNICODE code in a UNICODE code vector table in the word stock one by one so as to obtain an index value of the corresponding UNICODE code value in the UNICODE code vector table; performing an address calculation according to the index value so as to obtain a corresponding character dot matrix data offset address; finding dot matrix data, an actual width and height data of the character according to the offset address; and calling an OSD display function to display the character according to the dot matrix data, the actual width and the height data of the character. By the method, the function of displaying various language words on the OSD interface of the embedded equipment can be realized efficiently and conveniently with low cost.
Owner:SHENZHEN SKYWORTH DIGITAL TECH CO LTD

System and method for linguistic collation

A system and method is provided for handling the collation of linguistic symbols of different languages that may have various types of compressions (e.g., from 2-to-1 to 8-to-1). A symbol table of the symbols identified as Unicode code points is generated, with each symbol tagged with a highest compression type of that symbol by sorting the compression tables of the various languages. During a sorting operation with respect to a given string, the tag of a symbol in the string is checked to identify the highest compression type of compressions beginning with that symbol, and the compression tables for the language with compression types equal or lower than the highest compression type of the symbol are searched using a binary search method to find a matching compression for the symbols in the string. A common search module is used to perform binary searches through compression tables of different compression types.
Owner:MICROSOFT TECH LICENSING LLC

Chinese / Pin Yin / english dictionary

A method for translating between a Simplified Chinese character, a Traditional Chinese character, a Pin Yin word, and an English word is disclosed. The present invention comprises a Dictionary Program (DP). The DP accepts a character or word in Big 5, GB2312, ASCII, or any Unicode encoding scheme and translates the character or word into Unicode. The DP translates the user input, as required, into the Traditional Chinese character, the Simplified Chinese character, the accented Pin Yin word, and the English word. The user may designate whether the user input is the entire desired word, the beginning of the desired word, or appears anywhere in the desired word. The user may also configure the display size of the Chinese characters.
Owner:IBM CORP

System And Method For Improved Font Substitution With Character Variant Replacement

Text is presented at a computer system in a font that lacks a visual representation for a character by substituting the visual representation of a variant of the character in the text. For example, a character having a Unicode code point is associated with variants in a character variant table, each variant having a code point different from the character. In one embodiment, if text calls for presentation of the character in a font not supported by a computer system, a variant is selected that supports the font and a graphical representation of the variant is substituted for the character.
Owner:IBM CORP

Color character encoding method and decoding method

The invention discloses a color character encoding method and a decoding method which solve the problems of an anti-counterfeiting code technology that the security is low and the appearance effect of a product is affected. The color character encoding method comprises the following steps that (1) N different colors are selected; (2) an N system encoding library is set up and the base of the encoding library is matched with the selected colors; (3) source information is input and converted into an M system code; (4) the M system code is converted into an N system code to obtain corresponding data; (5) the data is substituted through the colors which are matched with the base of the N system encoding library, lined and combined into a color character; and (6) the color character is output. The color character decoding method comprises the following steps that (1) the color character is identified and substituted into the M system code by terminal equipment; (2) the M system code is converted into a Unicode or an ASCII code; and (3) the terminal equipment converts the Unicode or ASCII code into source information and outputs the same. According to the color character encoding method and the decoding method, an anti-counterfeiting code manufacturing method is novel and the security is high.
Owner:曾芝渝 +3

Method of, system for, and computer program product for scoping the conversion of unicode data from single byte character sets, double byte character sets, or mixed character sets comprising both single byte and double byte character sets

Provided are a method, system and program for translating a source character string in a first character encoding into a target character string in a second character encoding. A plurality of specifications are maintained. Each specification has one of a plurality of scopes identifying at least one code page providing a mapping for source character strings in the first character encoding. The scopes specify different portions of the program to which the code page identified by the specification applies. The source character string for which translation is requested in the program is processed and a determination is made of one specification having one scope that is applicable to the processed source character string. The code page identified by the determined specification is used to translate the processed source character string in the first character encoding into the target character string in the second character encoding.
Owner:IBM CORP

System and a process for searching massive amounts of time-series performance data using regular expressions

A system to collect and analyze performance metric data recorded in time-series measurements, converted into unicode, and arranged into a special data structure. The performance metric data is collected by one or more probes running on machines about which data is being collected. The performance metric data is also organized into a special data structure. The data structure at the server where analysis is done has a directory for every day of performance metric data collected with a subdirectory for every resource type. Each subdirectory contain text files of performance metric data values measured for attributes in a group of attributes to which said text file is dedicated. Each attribute has its own section and the performance metric data values are recorded in time series as unicode hex numbers as a comma delimited list. Analysis of the performance metric data is done using regular expressions.
Owner:CUMULUS SYST INC

Unicode-based drivers, device configuration interface and methodology for configuring similar but potentially incompatible peripheral devices

A computing system employs an unicode driver to access and control peripheral devices by abstracting commands and status data to a level above register sets of similar but potentially incompatible peripheral devices. A unicode may be generated by an operating system or the unicode driver. Unicodes are routed by a device configuration interface that passes the unicodes between the unicode driver and peripheral devices. The peripheral devices include command decoders for performing conversion between unicodes and device-specific instructions. The use of unicode drivers eliminates duplicate driver code and simplifies device configuration for the computing system.
Owner:GLOBALFOUNDRIES INC

Method of determining Unicode values corresponding to the text in digital documents

A method of determining Unicode values corresponding to the text in digital documents includes: providing a digital document containing information related to the text in the document, the information including at least one set of data selected from the group consisting of: the numerical character code comprised by a single byte value or a sequence of multiple bytes, the glyph name corresponding to the character code for simple fonts, the code-to-Unicode mapping provided by a ToUnicode CMap, and font outline data embedded in the document; obtaining the information related to the text from the document; and determining the Unicode values corresponding to a specific code of a specific font on a per-glyph basis by executing a cascade of determination steps for each code separately, the cascade being executed in a predetermined sequence using different sources of information.
Owner:PDFLIB

Process for gathering and special data structure for storing performance metric data

A system to collect performance data and store it in a special data structure which records the metadata in the structure itself. The performance data for each day is recorded in time-series, converted into Unicode, and stored in a single directory. The performance metric data is compressed prior to transmission to a server over any data path. The data structure at the server where analysis is done has a subdirectory for every resource type. Each subdirectory contains text files each of which stores performance metric data values for a group of attributes with one section per attribute. The performance metric data values are recorded as a comma delimited list. Analysis of the performance metric data is done using regular expressions.
Owner:CUMULUS SYST INC

Unicode input method editor

A method for converting to Unicode, in a Java Input Method Editor (“IME”), the encoding formats of a character code unit, including selecting an encoding format, receiving, through a computer user interface, in an IME, at least one character code unit having the encoding format and an encoding base, and displaying the character code unit through the computer user interface. Embodiments also include converting the encoding format of the character code unit to Unicode, thereby creating a Unicode code point, displaying, through the computer user interface, a glyph corresponding to the Unicode code point, and transferring the Unicode code point to an application.
Owner:GOOGLE LLC

Copyright protection oriented database watermarking method

InactiveCN103646195ALow perceptionSolve the watermark positioning problemDigital data protectionProgram/content distribution protectionWatermark synchronizationData mining
The invention discloses a copyright protection oriented database watermarking method. The method is to carry out imbedding and detecting algorithm of a digital watermark based on database text data of Unicode, and comprises the following steps of: 1, selecting a seed number to generate a meaningless two-value watermark sequence and storing copyright information and the watermark sequence; 2, establishing a mapping relationship between an invisible character set and the two-value watermark sequence; 3, mapping the watermark sequence into an invisible character combination, embedding the invisible character combination into a database and updating data; 4, extracting characters belonging to an invisible character set from the database while detecting; 5, mapping the invisible characters into watermark information based on the mapping relationship; 6, recovering the two-value watermark sequence based on the watermark information, comparing and calculating relevant coefficients, and accordingly judging the copyright information. The copyright protection oriented database watermarking method fully utilizes the characteristics of the invisible characters and solves the problems of watermark lossless embedding and watermark synchronization of text data in the database.
Owner:NANJING NORMAL UNIVERSITY

System and method for performing unicode matching

System and method for performing Unicode matching for comparing and merging similar data objects having Unicode strings that are equivalent yet not exact matches. Unicode characters are characterized by number of strokes, stroke order, radicals, geometry, phonemes in association with input method editor and keyboard characteristics such as location of a character on an IME or keyboard (or number of GUI interface interactions used in entering the character, e.g., via tapping where “a” on a mobile device keyboard takes 1 tap of a key, “b” takes 2 taps). These characteristics associated with code points and IME's / keyboards are utilized to create subdomains for matching and determining “distance” to other Unicode code points (e.g., number of keyboard keys away). Allows for determining whether close, yet incorrect data entry may have taken place. Enables merging of duplicate data objects into master data object where minor differences or spelling errors introduce actually represent duplicate data.
Owner:SAP AG
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products