ﻻ يوجد ملخص باللغة العربية
MADlib is a free, open source library of in-database analytic methods. It provides an evolving suite of SQL-based algorithms for machine learning, data mining and statistics that run at scale within a database engine, with no need for data import/export to other tools. The goal is for MADlib to eventually serve a role for scalable database systems that is similar to the CRAN library for R: a community repository of statistical methods, this time written with scale and parallelism in mind. In this paper we introduce the MADlib project, including the background that led to its beginnings, and the motivation for its open source nature. We provide an overview of the librarys architecture and design patterns, and provide a description of various statistical methods in that context. We include performance and speedup results of a core design pattern from one of those methods over the Greenplum parallel DBMS on a modest-sized test cluster. We then report on two initial efforts at incorporating academic research into MADlib, which is one of the projects goals. MADlib is freely available at http://madlib.net, and the project is open for contributions of both new methods, and ports to additional database platforms.
We propose the client-side AES256 encryption for a cloud SQL DB. A column ciphertext is deterministic or probabilistic. We trust the cloud DBMS for security of its run-time values, e.g., through a moving target defense. The client may send AES key(s)
In this project we are presenting a grammar which unify the design and development of spatial databases. In order to make it, we combine nominal and spatial information, the former is represented by the relational model and latter by a modification o
The objective of this work was to utilize BigBench [1] as a Big Data benchmark and evaluate and compare two processing engines: MapReduce [2] and Spark [3]. MapReduce is the established engine for processing data on Hadoop. Spark is a popular alterna
In recent years, there has been a substantial amount of work on large-scale data analytics using Hadoop-based platforms running on large clusters of commodity machines. A less-explored topic is how those data, dominated by application logs, are colle
Geo-replication poses an inherent trade-off between low latency, high availability and strong consistency. While NoSQL databases favor low latency and high availability, relaxing consistency, more recent cloud databases favor strong consistency and e