ترغب بنشر مسار تعليمي؟ اضغط هنا

We propose 3DETR, an end-to-end Transformer based object detection model for 3D point clouds. Compared to existing detection methods that employ a number of 3D-specific inductive biases, 3DETR requires minimal modifications to the vanilla Transformer block. Specifically, we find that a standard Transformer with non-parametric queries and Fourier positional embeddings is competitive with specialized architectures that employ libraries of 3D-specific operators with hand-tuned hyperparameters. Nevertheless, 3DETR is conceptually simple and easy to implement, enabling further improvements by incorporating 3D domain knowledge. Through extensive experiments, we show 3DETR outperforms the well-established and highly optimized VoteNet baselines on the challenging ScanNetV2 dataset by 9.5%. Furthermore, we show 3DETR is applicable to 3D tasks beyond detection, and can serve as a building block for future research.
We consider adversarial machine learning based attacks on power allocation where the base station (BS) allocates its transmit power to multiple orthogonal subcarriers by using a deep neural network (DNN) to serve multiple user equipments (UEs). The D NN that corresponds to a regression model is trained with channel gains as the input and allocated transmit powers as the output. While the BS allocates the transmit power to the UEs to maximize rates for all UEs, there is an adversary that aims to minimize these rates. The adversary may be an external transmitter that aims to manipulate the inputs to the DNN by interfering with the pilot signals that are transmitted to measure the channel gain. Alternatively, the adversary may be a rogue UE that transmits fabricated channel estimates to the BS. In both cases, the adversary carefully crafts adversarial perturbations to manipulate the inputs to the DNN of the BS subject to an upper bound on the strengths of these perturbations. We consider the attacks targeted on a single UE or all UEs. We compare these attacks with a benchmark, where the adversary scales down the input to the DNN. We show that adversarial attacks are much more effective than the benchmark attack in terms of reducing the rate of communications. We also show that adversarial attacks are robust to the uncertainty at the adversary including the erroneous knowledge of channel gains and the potential errors in exercising the attacks exactly as specified.
In batch reinforcement learning, there can be poorly explored state-action pairs resulting in poorly learned, inaccurate models and poorly performing associated policies. Various regularization methods can mitigate the problem of learning overly-comp lex models in Markov decision processes (MDPs), however they operate in technically and intuitively distinct ways and lack a common form in which to compare them. This paper unifies three regularization methods in a common framework -- a weighted average transition matrix. Considering regularization methods in this common form illuminates how the MDP structure and the state-action pair distribution of the batch data set influence the relative performance of regularization methods. We confirm intuitions generated from the common framework by empirical evaluation across a range of MDPs and data collection policies.
Research in machine learning (ML) has primarily argued that models trained on incomplete or biased datasets can lead to discriminatory outputs. In this commentary, we propose moving the research focus beyond bias-oriented framings by adopting a power -aware perspective to study up ML datasets. This means accounting for historical inequities, labor conditions, and epistemological standpoints inscribed in data. We draw on HCI and CSCW work to support our argument, critically analyze previous research, and point at two co-existing lines of work within our community -- one bias-oriented, the other power-aware. This way, we highlight the need for dialogue and cooperation in three areas: data quality, data work, and data documentation. In the first area, we argue that reducing societal problems to bias misses the context-based nature of data. In the second one, we highlight the corporate forces and market imperatives involved in the labor of data workers that subsequently shape ML datasets. Finally, we propose expanding current transparency-oriented efforts in dataset documentation to reflect the social contexts of data design and production.
Offline reinforcement learning (RL) algorithms have shown promising results in domains where abundant pre-collected data is available. However, prior methods focus on solving individual problems from scratch with an offline dataset without considerin g how an offline RL agent can acquire multiple skills. We argue that a natural use case of offline RL is in settings where we can pool large amounts of data collected in various scenarios for solving different tasks, and utilize all of this data to learn behaviors for all the tasks more effectively rather than training each one in isolation. However, sharing data across all tasks in multi-task offline RL performs surprisingly poorly in practice. Thorough empirical analysis, we find that sharing data can actually exacerbate the distributional shift between the learned policy and the dataset, which in turn can lead to divergence of the learned policy and poor performance. To address this challenge, we develop a simple technique for data-sharing in multi-task offline RL that routes data based on the improvement over the task-specific data. We call this approach conservative data sharing (CDS), and it can be applied with multiple single-task offline RL methods. On a range of challenging multi-task locomotion, navigation, and vision-based robotic manipulation problems, CDS achieves the best or comparable performance compared to prior offline multi-task RL methods and previous data sharing approaches.
Personalized federated learning (FL) aims to train model(s) that can perform well for individual clients that are highly data and system heterogeneous. Most work in personalized FL, however, assumes using the same model architecture at all clients an d increases the communication cost by sending/receiving models. This may not be feasible for realistic scenarios of FL. In practice, clients have highly heterogeneous system-capabilities and limited communication resources. In our work, we propose a personalized FL framework, PerFed-CKT, where clients can use heterogeneous model architectures and do not directly communicate their model parameters. PerFed-CKT uses clustered co-distillation, where clients use logits to transfer their knowledge to other clients that have similar data-distributions. We theoretically show the convergence and generalization properties of PerFed-CKT and empirically show that PerFed-CKT achieves high test accuracy with several orders of magnitude lower communication cost compared to the state-of-the-art personalized FL schemes.
We report a workflow and the output of a natural language processing (NLP)-based procedure to mine the extant metal-organic framework (MOF) literature describing structurally characterized MOFs and their solvent removal and thermal stabilities. We ob tain over 2,000 solvent removal stability measures from text mining and 3,000 thermal decomposition temperatures from thermogravimetric analysis data. We assess the validity of our NLP methods and the accuracy of our extracted data by comparing to a hand-labeled subset. Machine learning (ML, i.e. artificial neural network) models trained on this data using graph- and pore-geometry-based representations enable prediction of stability on new MOFs with quantified uncertainty. Our web interface, MOFSimplify, provides users access to our curated data and enables them to harness that data for predictions on new MOFs. MOFSimplify also encourages community feedback on existing data and on ML model predictions for community-based active learning for improved MOF stability models.
Disentangling data into interpretable and independent factors is critical for controllable generation tasks. With the availability of labeled data, supervision can help enforce the separation of specific factors as expected. However, it is often expe nsive or even impossible to label every single factor to achieve fully-supervised disentanglement. In this paper, we adopt a general setting where all factors that are hard to label or identify are encapsulated as a single unknown factor. Under this setting, we propose a flexible weakly-supervised multi-factor disentanglement framework DisUnknown, which Distills Unknown factors for enabling multi-conditional generation regarding both labeled and unknown factors. Specifically, a two-stage training approach is adopted to first disentangle the unknown factor with an effective and robust training method, and then train the final generator with the proper disentanglement of all labeled factors utilizing the unknown distillation. To demonstrate the generalization capacity and scalability of our method, we evaluate it on multiple benchmark datasets qualitatively and quantitatively and further apply it to various real-world applications on complicated datasets.
Typically, Open Information Extraction (OpenIE) focuses on extracting triples, representing a subject, a relation, and the object of the relation. However, most of the existing techniques are based on a predefined set of relations in each domain whic h limits their applicability to newer domains where these relations may be unknown such as financial documents. This paper presents a zero-shot open information extraction technique that extracts the entities (value) and their descriptions (key) from a sentence, using off the shelf machine reading comprehension (MRC) Model. The input questions to this model are created using a novel noun phrase generation method. This method takes the context of the sentence into account and can create a wide variety of questions making our technique domain independent. Given the questions and the sentence, our technique uses the MRC model to extract entities (value). The noun phrase corresponding to the question, with the highest confidence, is taken as the description (key). This paper also introduces the EDGAR10-Q dataset which is based on publicly available financial documents from corporations listed in US securities and exchange commission (SEC). The dataset consists of paragraphs, tagged values (entities), and their keys (descriptions) and is one of the largest among entity extraction datasets. This dataset will be a valuable addition to the research community, especially in the financial domain. Finally, the paper demonstrates the efficacy of the proposed technique on the EDGAR10-Q and Ade corpus drug dosage datasets, where it obtained 86.84 % and 97% accuracy, respectively.
Extracting spatial-temporal knowledge from data is useful in many applications. It is important that the obtained knowledge is human-interpretable and amenable to formal analysis. In this paper, we propose a method that trains neural networks to lear n spatial-temporal properties in the form of weighted graph-based signal temporal logic (wGSTL) formulas. For learning wGSTL formulas, we introduce a flexible wGSTL formula structure in which the users preference can be applied in the inferred wGSTL formulas. In the proposed framework, each neuron of the neural networks corresponds to a subformula in a flexible wGSTL formula structure. We initially train a neural network to learn the wGSTL operators and then train a second neural network to learn the parameters in a flexible wGSTL formula structure. We use a COVID-19 dataset and a rain prediction dataset to evaluate the performance of the proposed framework and algorithms. We compare the performance of the proposed framework with three baseline classification methods including K-nearest neighbors, decision trees, and artificial neural networks. The classification accuracy obtained by the proposed framework is comparable with the baseline classification methods.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا