Concept-oriented model: Modeling and processing data using functions

44 0 0.0 ( 0 )

Download Cite

Added by Alexandr Savinov

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Alexandr Savinov

Databases

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We describe a new logical data model, called the concept-oriented model (COM). It uses mathematical functions as first-class constructs for data representation and data processing as opposed to using exclusively sets in conventional set-oriented models. Functions and function composition are used as primary semantic units for describing data connectivity instead of relations and relation composition (join), respectively. Grouping and aggregation are also performed by using (accumulate) functions providing an alternative to group-by and reduce operations. This model was implemented in an open source data processing toolkit examples of which are used to illustrate the model and its operations. The main benefit of this model is that typical data processing tasks become simpler and more natural when using functions in comparison to adopting sets and set operations.

rate research

Principles of the Concept-Oriented Data Model

216 - Alexandr Savinov 2007

In the paper a new approach to data representation and manipulation is described, which is called the concept-oriented data model (CODM). It is supposed that items represent data units, which are stored in concepts. A concept is a combination of superconcepts, which determine the concepts dimensionality or properties. An item is a combination of superitems taken by one from all the superconcepts. An item stores a combination of references to its superitems. The references implement inclusion relation or attribute-value relation among items. A concept-oriented database is defined by its concept structure called syntax or schema and its item structure called semantics. The model defines formal transformations of syntax and semantics including the canonical semantics where all concepts are merged and the data semantics is represented by one set of items. The concept-oriented data model treats relations as subconcepts where items are instances of the relations. Multi-valued attributes are defined via subconcepts as a view on the database semantics rather than as a built-in mechanism. The model includes concept-oriented query language, which is based on collection manipulations. It also has such mechanisms as aggregation and inference based on semantics propagation through the database schema.

Databases

On the importance of functions in data modeling

38 - Alexandr Savinov 2020

In this paper we argue that representing entity properties by tuple attributes, as evangelized in most set-oriented data models, is a controversial method conflicting with the principle of tuple immutability. As a principled solution to this problem of tuple immutability on one hand and the need to modify tuple attributes on the other hand, we propose to use mathematical functions for representing entity properties. In this approach, immutable tuples are intended for representing the existence of entities while mutable functions (mappings between sets) are used for representing entity properties. In this model, called the concept-oriented model (COM), functions are made first-class elements along with sets, and both functions and sets are used to represent and process data in a simpler and more natural way in comparison to purely set-oriented models.

Databases

Integrated Data Acquisition, Storage, Retrieval and Processing Using the COMPASS DataBase (CDB)

612 - J. Urban , J. Pipek , M. Hron 2014

We present a complex data handling system for the COMPASS tokamak, operated by IPP ASCR Prague, Czech Republic [1]. The system, called CDB (Compass DataBase), integrates different data sources as an assortment of data acquisition hardware and software from different vendors is used. Based on widely available open source technologies wherever possible, CDB is vendor and platform independent and it can be easily scaled and distributed. The data is directly stored and retrieved using a standard NAS (Network Attached Storage), hence independent of the particular technology; the description of the data (the metadata) is recorded in a relational database. Database structure is general and enables the inclusion of multi-dimensional data signals in multiple revisions (no data is overwritten). This design is inherently distributed as the work is off-loaded to the clients. Both NAS and database can be implemented and optimized for fast local access as well as secure remote access. CDB is implemented in Python language; bindings for Java, C/C++, IDL and Matlab are provided. Independent data acquisitions systems as well as nodes managed by FireSignal [2] are all integrated using CDB. An automated data post-processing server is a part of CDB. Based on dependency rules, the server executes, in parallel if possible, prescribed post-processing tasks.

Databases

Knowledge Graphs for Processing Scientific Data: Challenges and Prospects

68 - Masoud Salehpour , Joseph G. Davis 2020

There is growing interest in the use of Knowledge Graphs (KGs) for the representation, exchange, and reuse of scientific data. While KGs offer the prospect of improving the infrastructure for working with scalable and reusable scholarly data consistent with the FAIR (Findability, Accessibility, Interoperability, and Reusability) principles, the state-of-the-art Data Management Systems (DMSs) for processing large KGs leave somewhat to be desired. In this paper, we studied the performance of some of the major DMSs in the context of querying KGs with the goal of providing a finely-grained, comparative analysis of DMSs representing each of the four major DMS types. We experimented with four well-known scientific KGs, namely, Allie, Cellcycle, DrugBank, and LinkedSPL against Virtuoso, Blazegraph, RDF-3X, and MongoDB as the representative DMSs. Our results suggest that the DMSs display limitations in processing complex queries on the KG datasets. Depending on the query type, the performance differentials can be several orders of magnitude. Also, no single DMS appears to offer consistently superior performance. We present an analysis of the underlying issues and outline two integrated approaches and proposals for resolving the problem.

Databases

Concept-Oriented Programming

471 - Alexandr Savinov 2010

Object-oriented programming (OOP) is aimed at describing the structure and behaviour of objects by hiding the mechanism of their representation and access in primitive references. In this article we describe an approach, called concept-oriented programming (COP), which focuses on modelling references assuming that they also possess application-specific structure and behaviour accounting for a great deal or even most of the overall program complexity. References in COP are completely legalized and get the same status as objects while the functions are distributed among both objects and references. In order to support this design we introduce a new programming construct, called concept, which generalizes conventional classes and concept inclusion relation generalizing class inheritance. The main advantage of COP is that it allows programmers to describe two sides of any program: explicitly used functions of objects and intermediate functionality of references having cross-cutting nature and executed implicitly behind the scenes during object access.

Programming Languages