REPPlab: An R package for detecting clusters and outliers using exploratory projection pursuit

167 0 0.0 ( 0 )

Download Cite

Added by Daniel Fischer

Publication date 2016

fields Mathematical Statistics

and research's language is English

Authors Daniel Fischer - Alain Berro - Klaus Nordhausen

Computation

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The R-package REPPlab is designed to explore multivariate data sets using one-dimensional unsupervised projection pursuit. It is useful in practice as a preprocessing step to find clusters or as an outlier detection tool for multivariate numerical data. Except from the package tourr that implements smooth sequences of projection matrices and rggobi that provides an interface to a dynamic graphics package called GGobi, there is no implementation of exploratory projection pursuit tools available in R especially in the context of outlier detection. REPPlab is an R interface for the Java program EPPlab that implements four projection indices and three biologically inspired optimization algorithms. The implemented indices are either adapted to cluster or to outlier detection and the optimization algorithms have at most one parameter to tune. Following the original software EPPlab, the exploration strategy in REPPlab is divided into two steps. Many potentially interesting projections are calculated at the first step and examined at the second step. For this second step, different tools for plotting and combining the results are proposed with specific tools for outlier detection. Compared to EPPlab, some of these tools are new and their performance is illustrated through some simulations and using some real data sets in a clustering context. The functionalities of the package are also illustrated for outlier detection on a new data set that is provided with the package.

rate research

ProcData: An R Package for Process Data Analysis

167 - Xueying Tang , Susu Zhang , Zhi Wang 2020

Process data refer to data recorded in the log files of computer-based items. These data, represented as timestamped action sequences, keep track of respondents response processes of solving the items. Process data analysis aims at enhancing educational assessment accuracy and serving other assessment purposes by utilizing the rich information contained in response processes. The R package ProcData presented in this article is designed to provide tools for processing, describing, and analyzing process data. We define an S3 class proc for organizing process data and extend generic methods summary and print for class proc. Two feature extraction methods for process data are implemented in the package for compressing information in the irregular response processes into regular numeric vectors. ProcData also provides functions for fitting and making predictions from a neural-network-based sequence model. These functions call relevant functions in package keras for constructing and training neural networks. In addition, several response process generators and a real dataset of response processes of the climate control item in the 2012 Programme for International Student Assessment are included in the package.

Computation Machine Learning

Rbox: an integrated R package for ATOM Editor

125 - Saeid Amiri 2017

R is a programming language and environment that is a central tool in the applied sciences for writing program. Its impact on the development of modern statistics is inevitable. Current research, especially for big data may not be done solely using R and will likely use different programming languages; hence, having a modern integrated development environment (IDE) is very important. Atom editor is modern IDE that is developed by GitHub, it is described as A hackable text editor for the 21st Century. This report is intended to present a package deployed entitled Rbox that allows Atom Editor to write and run codes professionally in R.

Computation

Regression Modeling for Recurrent Events Using R Package reReg

154 - Sy Han Chiou , Gongjun Xu , Jun Yan 2021

Recurrent event analyses have found a wide range of applications in biomedicine, public health, and engineering, among others, where study subjects may experience a sequence of event of interest during follow-up. The R package reReg (Chiou and Huang 2021) offers a comprehensive collection of practical and easy-to-use tools for regression analysis of recurrent events, possibly with the presence of an informative terminal event. The regression framework is a general scale-change model which encompasses the popular Cox-type model, the accelerated rate model, and the accelerated mean model as special cases. Informative censoring is accommodated through a subject-specific frailty without no need for parametric specification. Different regression models are allowed for the recurrent event process and the terminal event. Also included are visualization and simulation tools.

Computation

quantreg.nonpar: An R Package for Performing Nonparametric Series Quantile Regression

406 - Michael Lipsitz , Alexandre Belloni , Victor Chernozhukov 2016

The R package quantreg.nonpar implements nonparametric quantile regression methods to estimate and make inference on partially linear quantile models. quantreg.nonpar obtains point estimates of the conditional quantile function and its derivatives based on series approximations to the nonparametric part of the model. It also provides pointwise and uniform confidence intervals over a region of covariate values and/or quantile indices for the same functions using analytical and resampling methods. This paper serves as an introduction to the package and displays basic functionality of the functions contained within.

Computation Econometrics

sgmcmc: An R Package for Stochastic Gradient Markov Chain Monte Carlo

121 - Jack Baker , Paul Fearnhead , Emily B. Fox 2017

This paper introduces the R package sgmcmc; which can be used for Bayesian inference on problems with large datasets using stochastic gradient Markov chain Monte Carlo (SGMCMC). Traditional Markov chain Monte Carlo (MCMC) methods, such as Metropolis-Hastings, are known to run prohibitively slowly as the dataset size increases. SGMCMC solves this issue by only using a subset of data at each iteration. SGMCMC requires calculating gradients of the log likelihood and log priors, which can be time consuming and error prone to perform by hand. The sgmcmc package calculates these gradients itself using automatic differentiation, making the implementation of these methods much easier. To do this, the package uses the software library TensorFlow, which has a variety of statistical distributions and mathematical operations as standard, meaning a wide class of models can be built using this framework. SGMCMC has become widely adopted in the machine learning literature, but less so in the statistics community. We believe this may be partly due to lack of software; this package aims to bridge this gap.

Computation Applications Machine Learning