Research papers, master and doctoral theses about accelerated large-scale inference

An Architecture for Accelerated Large-Scale Inference of Transformer-Based Language Models

785 - Association for Computation Linguistics 2021 مقالة

This work demonstrates the development process of a machine learning architecture for inference that can scale to a large volume of requests. We used a BERT model that was fine-tuned for emotion analysis, returning a probability distribution of emoti ons given a paragraph. The model was deployed as a gRPC service on Kubernetes. Apache Spark was used to perform inference in batches by calling the service. We encountered some performance and concurrency challenges and created solutions to achieve faster running time. Starting with 200 successful inference requests per minute, we were able to achieve as high as 18 thousand successful requests per minute with the same batch job resource allocation. As a result, we successfully stored emotion probabilities for 95 million paragraphs within 96 hours.

كلمة أساسية accelerated large-scale inference architecture for accelerated تسارع الاستدلال على نطاق واسع العمارة للتسرع صناعة حمض الفوسفور