The detection of facial action units (AUs) has been studied as it has the competition due to the wide-ranging applications thereof. In this paper, we propose a novel framework for the AU detection from a single input image by grasping the textbf{c}o-textbf{o}ccurrence and textbf{m}utual textbf{ex}clusion (COMEX) as well as the intensity distribution among AUs. Our algorithm uses facial landmarks to detect the features of local AUs. The features are input to a bidirectional long short-term memory (BiLSTM) layer for learning the intensity distribution. Afterwards, the new AU feature continuously passed through a self-attention encoding layer and a continuous-state modern Hopfield layer for learning the COMEX relationships. Our experiments on the challenging BP4D and DISFA benchmarks without any external data or pre-trained models yield F1-scores of 63.7% and 61.8% respectively, which shows our proposed networks can lead to performance improvement in the AU detection task.