A novel non-orthogonal multiple access (NOMA) based cache-aided mobile edge computing (MEC) framework is proposed. For the purpose of efficiently allocating communication and computation resources to users computation tasks requests, we propose a long-short-term memory (LSTM) network to predict the task popularity. Based on the predicted task popularity, a long-term reward maximization problem is formulated that involves a joint optimization of the task offloading decisions, computation resource allocation, and caching decisions. To tackle this challenging problem, a single-agent Q-learning (SAQ-learning) algorithm is invoked to learn a long-term resource allocation strategy. Furthermore, a Bayesian learning automata (BLA) based multi-agent Q-learning (MAQ-learning) algorithm is proposed for task offloading decisions. More specifically, a BLA based action select scheme is proposed for the agents in MAQ-learning to select the optimal action in every state. We prove that the BLA based action selection scheme is instantaneously self-correcting and the selected action is an optimal solution for each state. Extensive simulation results demonstrate that: 1) The prediction error of the proposed LSTMs based task popularity prediction decreases with increasing learning rate. 2) The proposed framework significantly outperforms the benchmarks like all local computing, all offloading computing, and non-cache computing. 3) The proposed BLA based MAQ-learning achieves an improved performance compared to conventional reinforcement learning algorithms.