[0005]Various embodiments of the present invention generally relate to data as a concept and related systems and methods. In particular, systems and methods are disclosed that allow for the aggregation, reconciliation, analysis, and visualization of data in a unitary end-to-end tool. In some embodiments, the tool, which may also be referred to herein as a platform, may be called Datavore. The tool may be used to automatically learn concepts and relationships existent within big data sets. For example, the tool may be used to aggregate data by a process of ingesting and combining raw data from multiple sources with different file types. As another example, the tool may be used to reconcile the data by scrubbing “messy” (e.g., noisy data) data to produce high quality data that permits better aggregation and analysis of the data. As yet another example, the tool may be used to analyze the data by using multiple data manipulation techniques (e.g., “Excel-like” data manipulation techniques) and statistical analysis to allow for better data discovery. As yet another example, the tool may be used to visualize the data by using dynamic graphs and charts to illustrate key relationships and trends within the data. As yet a further example, the tool may be used to export data, meta-data, or visualizations to other tools, programs, and / or modules. In some embodiments, the tool may be an end-to-end Software as a Service (SaaS) solution that allows data analysis experts to easily conduct complex analysis on big data sets. In some embodiments the tool may act as a Master Data Management tool that includes reference data and analytical data to be an authoritative source of master data. In such embodiments, the tool may operate to reconcile data by removing duplicate and / or incorrect data and automatically generating rules to prevent such data from entering the system or any data analysis step.
[0006]Such a tool may be streamlined and may offer several advantages due to its ability to learn concepts and relationships within big data sets. For example, the tool may allow for a user-defined world in which domain expertise is captured to make appropriate “apples-to-apples” comparisons between similar types of data, from the user's perspective. As another example, the tool may allow for superior analysis of data by conducting customized statistical and predictive analysis of financial and market data. As yet another example, the tool may allow for data curation by cleaning and integrating disparate, messy, or syntactically different data sets. As yet another example, the tool may allow for “smart” visualizations of the data by automatically creating graphs and charts to show the most important relationships between similar and / or different data including magnitudes, relations and allowing for trend and outlier detection. As a further example, the tool may have an intuitive interface that is simple and seamless to the user because it does not involve computer programming, creation of macros, or cryptic database queries.
[0007]A particular example of the use of the too disclosed herein includes industry comparables analysis on financial data. The tool may be used to aggregate multiple financial statements that may be siloed in, e.g., Bloomberg, and / or CapIQ, and / or other data sources. The tool may be used to reconcile the data by quickly creating “apples-to-apples” comparisons of related or similar companies' financials, the comparisons may include industry Key Performance Indicators (KPIs) from the relevant industries. The tool may be used to analyze the data by comparing performance of a company with data analysis expert defined specification and metrics. The work data flow (the step-by-step procedure by which the data is manipulated in order to analyze the data) may be used by the data analysis expert to analyze the data. The analysis may include filtering and grouping the data (e.g., in accordance with the industries in which the company operates). The work data flow may be stored by the tool for later use. For example, the stored work data flow may be used for automation of analysis on different data, portability of data analysis techniques, or as one or more building blocks for additional data analysis. This work data flow and other work data flows created by a user of the tool, in conjunction with learned concepts, may be considered the user's lens with respect to viewing / analyzing particular types of data. The tool may be used to visualize the data by simultaneously viewing company financials and KPI's over time and across the industries in which the company operates. During visualization, outliers and trends may be recognized by the tool. For example, for the use case of multi-strategy and long short equities hedge funds, the tool may allow for a holistic industry review, industry comparables analysis, simulated portfolio performance, and macro data correlations. As another example, for the use case of fixed income and real estate hedge funds, the tool may allow for bond data cleaning, capital structure assessments, complex financial instrument analysis, and merger ramifications.
[0008]The Datavore tool, described herein, may be able to learn concepts and relationships associated with different data types to simplify the analysis of the data and to allow data analysis experts to more efficiently work with the data. The tool may be able to combine the learned concepts and relationships with user defined concepts and relationships associated with the different data types.
[0014]In some embodiments a user context mapping module may be used, e.g., by a user of the tool, to specify a particular “context” to which a data concept belongs. A syntax and semantic reconciliation module may automatically attempt to correct spelling errors such as small typos (string distance / n-grams), sounds-like corrections, and may perform language normalization. The syntax and semantic reconciliation module may automatically attempt to map labels in the data (associated with the data concepts in, e.g., a data set) to universal identifiers stored, e.g., in a remote data store(s). The user may be presented the matches that result from the operation of the syntax and semantic reconciliation module. The user can disambiguate items based on, e.g., a confidence score associated with a particular one of the matches. The user's actions may be stored in a context mapping memory module to assist in future mapping of the same data concept.