In a typical fusion experiment, the plasma can have several possible confinement modes. At the TCV tokamak, aside from the Low (L) and High (H) confinement modes, an additional mode, dithering (D), is frequently observed. Developing methods that automatically detect these modes is considered to be important for future tokamak operation. Previous work with deep learning methods, particularly convolutional recurrent neural networks (Conv-RNNs), indicates that they are a suitable approach. Nevertheless, those models are sensitive to noise in the temporal alignment of labels, and that model in particular is limited to making individual decisions taking into account only its own hidden state and its input at each time step. In this work, we propose an architecture for a sequence-to-sequence neural network model with attention which solves both of those issues. Using a carefully calibrated dataset, we compare the performance of a Conv-RNN with that of our proposed sequence-to-sequence model, and show two results: one, that the Conv-RNN can be improved upon with new data; two, that the sequence-to-sequence model can improve the results even further, achieving excellent scores on both train and test data.