Future wireless access networks need to support diversified quality of service (QoS) metrics required by various types of Internet-of-Things (IoT) devices, e.g., age of information (AoI) for status generating sources and ultra low latency for safety information in vehicular networks. In this paper, a novel inner-state driven random access (ISDA) framework is proposed based on distributed policy learning, in particular a cross-entropy method. Conventional random access schemes, e.g., $p$-CSMA, assume state-less terminals, and thus assigning equal priorities to all. In ISDA, the inner-states of terminals are described by a time-varying state vector, and the transmission probabilities of terminals in the contention period are determined by their respective inner-states. Neural networks are leveraged to approximate the function mappings from inner-states to transmission probabilities, and an iterative approach is adopted to improve these mappings in a distributed manner. Experiment results show that ISDA can improve the QoS of heterogeneous terminals simultaneously compared to conventional CSMA schemes.