No Arabic abstract
We present nbodykit, an open-source, massively parallel Python toolkit for analyzing large-scale structure (LSS) data. Using Python bindings of the Message Passing Interface (MPI), we provide parallel implementations of many commonly used algorithms in LSS. nbodykit is both an interactive and scalable piece of scientific software, performing well in a supercomputing environment while still taking advantage of the interactive tools provided by the Python ecosystem. Existing functionality includes estimators of the power spectrum, 2 and 3-point correlation functions, a Friends-of-Friends grouping algorithm, mock catalog creation via the halo occupation distribution technique, and approximate N-body simulations via the FastPM scheme. The package also provides a set of distributed data containers, insulated from the algorithms themselves, that enable nbodykit to provide a unified treatment of both simulation and observational data sets. nbodykit can be easily deployed in a high performance computing environment, overcoming some of the traditional difficulties of using Python on supercomputers. We provide performance benchmarks illustrating the scalability of the software. The modular, component-based approach of nbodykit allows researchers to easily build complex applications using its tools. The package is extensively documented at http://nbodykit.readthedocs.io, which also includes an interactive set of example recipes for new users to explore. As open-source software, we hope nbodykit provides a common framework for the community to use and develop in confronting the analysis challenges of future LSS surveys.
Existing works, including ELMO and BERT, have revealed the importance of pre-training for NLP tasks. While there does not exist a single pre-training model that works best in all cases, it is of necessity to develop a framework that is able to deploy various pre-training models efficiently. For this purpose, we propose an assemble-on-demand pre-training toolkit, namely Universal Encoder Representations (UER). UER is loosely coupled, and encapsulated with rich modules. By assembling modules on demand, users can either reproduce a state-of-the-art pre-training model or develop a pre-training model that remains unexplored. With UER, we have built a model zoo, which contains pre-trained models based on different corpora, encoders, and targets (objectives). With proper pre-trained models, we could achieve new state-of-the-art results on a range of downstream datasets.
Familia is an open-source toolkit for pragmatic topic modeling in industry. Familia abstracts the utilities of topic modeling in industry as two paradigms: semantic representation and semantic matching. Efficient implementations of the two paradigms are made publicly available for the first time. Furthermore, we provide off-the-shelf topic models trained on large-scale industrial corpora, including Latent Dirichlet Allocation (LDA), SentenceLDA and Topical Word Embedding (TWE). We further describe typical applications which are successfully powered by topic modeling, in order to ease the confusions and difficulties of software engineers during topic model selection and utilization.
We present an open source toolkit of flight-proven electronic devices which can be used to track, terminate and recover high altitude balloon flights and payloads. Comprising a beacon, pyrotechnic and non-pyrotechnic cut-down devices plus associated software, the toolkit can be used to: (i) track the location of a flight via Iridium satellite communication; (ii) release lift and/or float balloons manually or at pre-defined altitudes; (iii) locate the payload after descent. The size and mass of the toolkit make it suitable for use on weather or sounding balloon flights. We describe the technology readiness level of the toolkit, based on over 20 successful flights to altitudes of typically 32,000 m.
Textual adversarial attacking has received wide and increasing attention in recent years. Various attack models have been proposed, which are enormously distinct and implemented with different programming frameworks and settings. These facts hinder quick utilization and apt comparison of attack models. In this paper, we present an open-source textual adversarial attack toolkit named OpenAttack. It currently builds in 12 typical attack models that cover all the attack types. Its highly inclusive modular design not only supports quick utilization of existing attack models, but also enables great flexibility and extensibility. OpenAttack has broad uses including comparing and evaluating attack models, measuring robustness of a victim model, assisting in developing new attack models, and adversarial training. Source code, built-in models and documentation can be obtained at https://github.com/thunlp/OpenAttack.
We present a highly frequency multiplexed readout for large-format superconducting detector arrays intended for use in the next generation of balloon-borne and space-based sub-millimeter and far-infrared missions. We will demonstrate this technology on the upcoming NASA Next Generation Balloon-borne Large Aperture Sub-millimeter Telescope (BLAST-TNG) to measure the polarized emission of Galactic dust at wavelengths of 250, 350 and 500 microns. The BLAST-TNG receiver incorporates the first arrays of Lumped Element Kinetic Inductance Detectors (LeKID) along with the first microwave multiplexing readout electronics to fly in a space-like environment and will significantly advance the TRL for these technologies. After the flight of BLAST-TNG, we will continue to improve the performance of the detectors and readout electronics for the next generation of balloon-borne instruments and for use in a future FIR Surveyor.