أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Rohit Pandey

A simple proof of Blackwells renewal theorem by mapping to integers

35 - Rohit Pandey 2021

This paper presents a new proof of the renewal theorem by bijecting a general point process to a deterministic one (where the time between events is always fixed). It also provides insight into the workings of the renewal theorem.

الاحتمالات

GeLaTO: Generative Latent Textured Objects

87 - Ricardo Martin-Brualla , Rohit Pandey , Sofien Bouaziz 2020

Accurate modeling of 3D objects exhibiting transparency, reflections and thin structures is an extremely challenging problem. Inspired by billboards and geometric proxies used in computer graphics, this paper proposes Generative Latent Textured Objec ts (GeLaTO), a compact representation that combines a set of coarse shape proxies defining low frequency geometry with learned neural textures, to encode both medium and fine scale geometry as well as view-dependent appearance. To generate the proxies textures, we learn a joint latent space allowing category-level appearance and geometry interpolation. The proxies are independently rasterized with their corresponding neural texture and composited using a U-Net, which generates an output photorealistic image including an alpha map. We demonstrate the effectiveness of our approach by reconstructing complex objects from a sparse set of views. We show results on a dataset of real images of eyeglasses frames, which are particularly challenging to reconstruct using classical methods. We also demonstrate that these coarse proxies can be handcrafted when the underlying object geometry is easy to model, like eyeglasses, or generated using a neural network for more complex categories, such as cars.

الرؤية الحاسوبية وتمييز الأنماط الرسم الحاسوبي التعلم الآلي

Learning Illumination from Diverse Portraits

144 - Chloe LeGendre , Wan-Chun Ma , Rohit Pandey 2020

We present a learning-based technique for estimating high dynamic range (HDR), omnidirectional illumination from a single low dynamic range (LDR) portrait image captured under arbitrary indoor or outdoor lighting conditions. We train our model using portrait photos paired with their ground truth environmental illumination. We generate a rich set of such photos by using a light stage to record the reflectance field and alpha matte of 70 diverse subjects in various expressions. We then relight the subjects using image-based relighting with a database of one million HDR lighting environments, compositing the relit subjects onto paired high-resolution background imagery recorded during the lighting acquisition. We train the lighting estimation model using rendering-based loss functions and add a multi-scale adversarial loss to estimate plausible high frequency lighting detail. We show that our technique outperforms the state-of-the-art technique for portrait-based lighting estimation, and we also show that our method reliably handles the inherent ambiguity between overall lighting strength and surface albedo, recovering a similar scale of illumination for subjects with diverse skin tones. We demonstrate that our method allows virtual objects and digital characters to be added to a portrait photograph with consistent illumination. Our lighting inference runs in real-time on a smartphone, enabling realistic rendering and compositing of virtual objects into live video for augmented reality applications.

الرؤية الحاسوبية وتمييز الأنماط الرسم الحاسوبي

The mean and variance in coupons required to complete a collection

37 - Rohit Pandey 2020

This paper is about the Coupon collectors problem. There are some coupons, or baseball cards, or other plastic knick-knacks that are put into bags of chips or under soda bottles, etc. A collector starts collecting these trinkets and wants to form a c omplete collection of all possible ones. Every time they buy the product however, they dont know which coupon they will collect until they open the product. How many coupons do they need to collect before they complete the collection? In this paper, we explore the mean and variance of this random variable, $N$ using various methods. Some of them work only for the special case with the coupons having equal probabilities of being collected, while others generalize to the case where the coupons are collected with unequal probabilities (which is closer to a real world scenario).

الاحتمالات

Breaking hypothesis testing for failure rates

89 - Rohit Pandey , Yingnong Dang , Gil Lapid Shafriri 2020

We describe the utility of point processes and failure rates and the most common point process for modeling failure rates, the Poisson point process. Next, we describe the uniformly most powerful test for comparing the rates of two Poisson point proc esses for a one-sided test (henceforth referred to as the rate test). A common argument against using this test is that real world data rarely follows the Poisson point process. We thus investigate what happens when the distributional assumptions of tests like these are violated and the test still applied. We find a non-pathological example (using the rate test on a Compound Poisson distribution with Binomial compounding) where violating the distributional assumptions of the rate test make it perform better (lower error rates). We also find that if we replace the distribution of the test statistic under the null hypothesis with any other arbitrary distribution, the performance of the test (described in terms of the false negative rate to false positive rate trade-off) remains exactly the same. Next, we compare the performance of the rate test to a version of the Wald test customized to the Negative Binomial point process and find it to perform very similarly while being much more general and versatile. Finally, we discuss the applications to Microsoft Azure. The code for all experiments performed is open source and linked in the introduction.

تطبيقات الإحصاء التعلم الآلي

Annual Interruption Rate as a KPI, its measurement and comparison

45 - Rohit Pandey , Yingnong Dang , Ali Vira 2019

This article is divided into two chapters. The first chapter describes the failure rate as a KPI and studies its properties. The second one goes over ways to compare this KPI across two groups using the concepts of statistical hypothesis testing. I n section 1., we will motivate the failure rate as a KPI (in Azure, it is dubbed `Annual Interruption Rate or AIR. In section 3, we will discuss measuring failure rate from logs machines typically generate. In section 1.2, we will discuss the problem of measuring it from real-world data. In section 2.1, we will discuss the general concepts of hypothesis testing. In section 2.2, we will go over some general count distributions for modeling Azure reboots. In section 2.3, we will go over some experiments on applying various hypothesis tests to simulated data. In section 2.4, we will discuss some applications of this work like using these statistical methods to catch regressions in failure rate and how long we need to let changes to the system `bake before we are reasonably sure they didnt regress failure rate.

النظم الموزعة والتوازية والحوسبة العنقودية

Volumetric Capture of Humans with a Single RGBD Camera via Semi-Parametric Learning

191 - Rohit Pandey , Anastasia Tkach , Shuoran Yang 2019

Volumetric (4D) performance capture is fundamental for AR/VR content generation. Whereas previous work in 4D performance capture has shown impressive results in studio settings, the technology is still far from being accessible to a typical consumer who, at best, might own a single RGBD sensor. Thus, in this work, we propose a method to synthesize free viewpoint renderings using a single RGBD camera. The key insight is to leverage previously seen calibration images of a given user to extrapolate what should be rendered in a novel viewpoint from the data available in the sensor. Given these past observations from multiple viewpoints, and the current RGBD image from a fixed view, we propose an end-to-end framework that fuses both these data sources to generate novel renderings of the performer. We demonstrate that the method can produce high fidelity images, and handle extreme changes in subject pose and camera viewpoints. We also show that the system generalizes to performers not seen in the training data. We run exhaustive experiments demonstrating the effectiveness of the proposed semi-parametric model (i.e. calibration images available to the neural network) compared to other state of the art machine learned solutions. Further, we compare the method with more traditional pipelines that employ multi-view capture. We show that our framework is able to achieve compelling results, with substantially less infrastructure than previously required.

الرؤية الحاسوبية وتمييز الأنماط

LookinGood: Enhancing Performance Capture with Real-time Neural Re-Rendering

148 - Ricardo Martin-Brualla , Rohit Pandey , Shuoran Yang 2018

Motivated by augmented and virtual reality applications such as telepresence, there has been a recent focus in real-time performance capture of humans under motion. However, given the real-time constraint, these systems often suffer from artifacts in geometry and texture such as holes and noise in the final rendering, poor lighting, and low-resolution textures. We take the novel approach to augment such real-time performance capture systems with a deep architecture that takes a rendering from an arbitrary viewpoint, and jointly performs completion, super resolution, and denoising of the imagery in real-time. We call this approach neural (re-)rendering, and our live system LookinGood. Our deep architecture is trained to produce high resolution and high quality images from a coarse rendering in real-time. First, we propose a self-supervised training method that does not require manual ground-truth annotation. We contribute a specialized reconstruction error that uses semantic information to focus on relevant parts of the subject, e.g. the face. We also introduce a salient reweighing scheme of the loss function that is able to discard outliers. We specifically design the system for virtual and augmented reality headsets where the consistency between the left and right eye plays a crucial role in the final user experience. Finally, we generate temporally stable results by explicitly minimizing the difference between two consecutive frames. We tested the proposed system in two different scenarios: one involving a single RGB-D sensor, and upper body reconstruction of an actor, the second consisting of full body 360 degree capture. Through extensive experimentation, we demonstrate how our system generalizes across unseen sequences and subjects. The supplementary video is available at http://youtu.be/Md3tdAKoLGU.

الرؤية الحاسوبية وتمييز الأنماط

Egocentric 6-DoF Tracking of Small Handheld Objects

112 - Rohit Pandey , Pavel Pidlypenskyi , Shuoran Yang 2018

Virtual and augmented reality technologies have seen significant growth in the past few years. A key component of such systems is the ability to track the pose of head mounted displays and controllers in 3D space. We tackle the problem of efficient 6 -DoF tracking of a handheld controller from egocentric camera perspectives. We collected the HMD Controller dataset which consist of over 540,000 stereo image pairs labelled with the full 6-DoF pose of the handheld controller. Our proposed SSD-AF-Stereo3D model achieves a mean average error of 33.5 millimeters in 3D keypoint prediction and is used in conjunction with an IMU sensor on the controller to enable 6-DoF tracking. We also present results on approaches for model based full 6-DoF tracking. All our models operate under the strict constraints of real time mobile CPU inference.

الرؤية الحاسوبية وتمييز الأنماط

Real-time Egocentric Gesture Recognition on Mobile Head Mounted Displays

72 - Rohit Pandey , Marie White , Pavel Pidlypenskyi 2017

Mobile virtual reality (VR) head mounted displays (HMD) have become popular among consumers in recent years. In this work, we demonstrate real-time egocentric hand gesture detection and localization on mobile HMDs. Our main contributions are: 1) A no vel mixed-reality data collection tool to automatic annotate bounding boxes and gesture labels; 2) The largest-to-date egocentric hand gesture and bounding box dataset with more than 400,000 annotated frames; 3) A neural network that runs real time on modern mobile CPUs, and achieves higher than 76% precision on gesture recognition across 8 classes.

الرؤية الحاسوبية وتمييز الأنماط

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد