Advanced persistent threats (APTs) are organized prolonged cyberattacks by sophisticated attackers. Although APT activities are stealthy, they interact with the system components and these interactions lead to information flows. Dynamic Information Flow Tracking (DIFT) has been proposed as one of the effective ways to detect APTs using the information flows. However, wide range security analysis using DIFT results in a significant increase in performance overhead and high rates of false-positives and false-negatives generated by DIFT. In this paper, we model the strategic interaction between APT and DIFT as a non-cooperative stochastic game. The game unfolds on a state space constructed from an information flow graph (IFG) that is extracted from the system log. The objective of the APT in the game is to choose transitions in the IFG to find an optimal path in the IFG from an entry point of the attack to an attack target. On the other hand, the objective of DIFT is to dynamically select nodes in the IFG to perform security analysis for detecting APT. Our game model has imperfect information as the players do not have information about the actions of the opponent. We consider two scenarios of the game (i) when the false-positive and false-negative rates are known to both players and (ii) when the false-positive and false-negative rates are unknown to both players. Case (i) translates to a game model with complete information and we propose a value iteration-based algorithm and prove the convergence. Case (ii) translates to a game with unknown transition probabilities. In this case, we propose Hierarchical Supervised Learning (HSL) algorithm that integrates a neural network, to predict the value vector of the game, with a policy iteration algorithm to compute an approximate equilibrium. We implemented our algorithms on real attack datasets and validated the performance of our approach.