Grant-free non-orthogonal multiple access (GF-NOMA) is a potential technique to support massive Ultra-Reliable and Low-Latency Communication (mURLLC) service. However, the dynamic resource configuration in GF-NOMA systems is challenging due to random traffics and collisions, that are unknown at the base station (BS). Meanwhile, joint consideration of the latency and reliability requirements makes the resource configuration of GF-NOMA for mURLLC more complex. To address this problem, we develop a general learning framework for signature-based GF-NOMA in mURLLC service taking into account the multiple access signature collision, the UE detection, as well as the data decoding procedures for the K-repetition GF and the Proactive GF schemes. The goal of our learning framework is to maximize the long-term average number of successfully served users (UEs) under the latency constraint. We first perform a real-time repetition value configuration based on a double deep Q-Network (DDQN) and then propose a Cooperative Multi-Agent learning technique based on the DQN (CMA-DQN) to optimize the configuration of both the repetition values and the contention-transmission unit (CTU) numbers. Our results show that the number of successfully served UEs under the same latency constraint in our proposed learning framework is up to ten times for the K-repetition scheme, and two times for the Proactive scheme, more than that with fixed repetition values and CTU numbers. In addition, the superior performance of CMA-DQN over the conventional load estimation-based approach (LE-URC) demonstrates its capability in dynamically configuring in long term. Importantly, our general learning framework can be used to optimize the resource configuration problems in all the signature-based GF-NOMA schemes.