Memory-Efficient Differentiable Transformer Architecture Search

نشر في Yuekai Zhao بتاريخ 2021 في مجال الهندسة المعلوماتية والبحث باللغة English تحميل البحث

الملخص بالإنكليزية

Differentiable architecture search (DARTS) is successfully applied in many vision tasks. However, directly using DARTS for Transformers is memory-intensive, which renders the search process infeasible. To this end, we propose a multi-split reversible network and combine it with DARTS. Specifically, we devise a backpropagation-with-reconstruction algorithm so that we only need to store the last layers outputs. By relieving the memory burden for DARTS, it allows us to search with larger hidden size and more candidate operations. We evaluate the searched architecture on three sequence-to-sequence datasets, i.e., WMT14 English-German, WMT14 English-French, and WMT14 English-Czech. Experimental results show that our network consistently outperforms standard Transformers across the tasks. Moreover, our method compares favorably with big-size Evolved Transformers, reducing search computation by an order of magnitude.

تحميل البحث