This paper presents a framework of imitating the price behavior of the underlying stock for reinforcement learning option price. We use accessible features of the equities pricing data to construct a non-deterministic Markov decision process for modeling stock price behavior driven by principal investors decision making. However, low signal-to-noise ratio and instability that appear immanent in equity markets pose challenges to determine the state transition (price change) after executing an action (principal investors decision) as well as decide an action based on current state (spot price). In order to conquer these challenges, we resort to a Bayesian deep neural network for computing the predictive distribution of the state transition led by an action. Additionally, instead of exploring a state-action relationship to formulate a policy, we seek for an episode based visible-hidden state-action relationship to probabilistically imitate principal investors successive decision making. Our algorithm then maps imitative principal investors decisions to simulated stock price paths by a Bayesian deep neural network. Eventually the optimal option price is reinforcement learned through maximizing the cumulative risk-adjusted return of a dynamically hedged portfolio over simulated price paths of the underlying.