Both image registration and label fusion in the multi-atlas segmentation (MAS) rely on the intensity similarity between target and atlas images. However, such similarity can be problematic when target and atlas images are acquired using different imaging protocols. High-level structure information can provide reliable similarity measurement for cross-modality images when cooperating with deep neural networks (DNNs). This work presents a new MAS framework for cross-modality images, where both image registration and label fusion are achieved by DNNs. For image registration, we propose a consistent registration network, which can jointly estimate forward and backward dense displacement fields (DDFs). Additionally, an invertible constraint is employed in the network to reduce the correspondence ambiguity of the estimated DDFs. For label fusion, we adapt a few-shot learning network to measure the similarity of atlas and target patches. Moreover, the network can be seamlessly integrated into the patch-based label fusion. The proposed framework is evaluated on the MM-WHS dataset of MICCAI 2017. Results show that the framework is effective in both cross-modality registration and segmentation.