بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

fplyr: the split-apply-combine strategy for big data in R

48 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Federico Marotta

تاريخ النشر 2020

مجال البحث الاحصاء الرياضي

والبحث باللغة English

تأليف Federico Marotta

حساب

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We present fplyr, a new package for the R language to deal with big files. It allows users to easily implement the split-apply-combine strategy for files that are too big to fit into the available memory, without relying on data bases nor introducing non-native R classes. A custom function can be applied independently to each group of observations, and the results may be either returned or directly printed to one or more output files.

قيم البحث

120 - HaiYing Wang , Yanyuan Ma 2020

We investigate optimal subsampling for quantile regression. We derive the asymptotic distribution of a general subsampling estimator and then derive tw

حساب المنهجية

Gaussian Process for Functional Data Analysis: The GPFDA Package for R

179 - Evandro Konzen , Yafeng Cheng , Jian Qing Shi 2021

We present and describe the GPFDA package for R. The package provides flexible functionalities for dealing with Gaussian process regression (GPR) models for functional data. Multivariate functional data, functional data with multidimensional inputs, and nonseparable and/or nonstationary covariance structures can be modeled. In addition, the package fits functional regression models where the mean function depends on scalar and/or functional covariates and the covariance structure is modeled by a GPR model. In this paper, we present the versatility of GPFDA with respect to mean function and covariance function specifications and illustrate the implementation of estimation and prediction of some models through reproducible numerical examples.

حساب المنهجية

ProcData: An R Package for Process Data Analysis

167 - Xueying Tang , Susu Zhang , Zhi Wang 2020

Process data refer to data recorded in the log files of computer-based items. These data, represented as timestamped action sequences, keep track of respondents response processes of solving the items. Process data analysis aims at enhancing educatio nal assessment accuracy and serving other assessment purposes by utilizing the rich information contained in response processes. The R package ProcData presented in this article is designed to provide tools for processing, describing, and analyzing process data. We define an S3 class proc for organizing process data and extend generic methods summary and print for class proc. Two feature extraction methods for process data are implemented in the package for compressing information in the irregular response processes into regular numeric vectors. ProcData also provides functions for fitting and making predictions from a neural-network-based sequence model. These functions call relevant functions in package keras for constructing and training neural networks. In addition, several response process generators and a real dataset of response processes of the climate control item in the 2012 Programme for International Student Assessment are included in the package.

حساب التعلم الآلي

Collaborative Nested Sampling: Big Data vs. complex physical models

68 - Johannes Buchner 2017

The data torrent unleashed by current and upcoming astronomical surveys demands scalable analysis methods. Many machine learning approaches scale well, but separating the instrument measurement from the physical effects of interest, dealing with vari able errors, and deriving parameter uncertainties is often an after-thought. Classic forward-folding analyses with Markov Chain Monte Carlo or Nested Sampling enable parameter estimation and model comparison, even for complex and slow-to-evaluate physical models. However, these approaches require independent runs for each data set, implying an unfeasible number of model evaluations in the Big Data regime. Here I present a new algorithm, collaborative nested sampling, for deriving parameter probability distributions for each observation. Importantly, the number of physical model evaluations scales sub-linearly with the number of data sets, and no assumptions about homogeneous errors, Gaussianity, the form of the model or heterogeneity/completeness of the observations need to be made. Collaborative nested sampling has immediate application in speeding up analyses of large surveys, integral-field-unit observations, and Monte Carlo simulations.

حساب الأجهزة والأساليب للزيئات الفيزياء الفلكية تحليل البيانات والإحصاءات والاحتمال

Efficient Bayesian Modeling of Binary and Categorical Data in R: The UPG Package

263 - Gregor Zens , Sylvia Fruhwirth-Schnatter , Helga Wagner 2021

We introduce the UPG package for highly efficient Bayesian inference in probit, logit, multinomial logit and binomial logit models. UPG offers a convenient estimation framework for balanced and imbalanced data settings where sampling efficiency is en sured through Markov chain Monte Carlo boosting methods. All sampling algorithms are implemented in C++, allowing for rapid parameter estimation. In addition, UPG provides several methods for fast production of output tables and summary plots that are easily accessible to a broad range of users.

حساب المنهجية

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

المعهد الوطني الجزائري للبحث الزراعي

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

fplyr: the split-apply-combine strategy for big data in R

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً