Inverse treatment planning in radiation therapy is formulated as optimization problems. The objective function and constraints consist of multiple terms designed for different clinical and practical considerations. Weighting factors of these terms are needed to define the optimization problem. While a treatment planning system can solve the optimization problem with given weights, adjusting the weights for high plan quality is performed by human. The weight tuning task is labor intensive, time consuming, and it critically affects the final plan quality. An automatic weight-tuning approach is strongly desired. The weight tuning procedure is essentially a decision making problem. Motivated by the tremendous success in deep learning for decision making with human-level intelligence, we propose a novel framework to tune the weights in a human-like manner. Using treatment planning in high-dose-rate brachytherapy as an example, we develop a weight tuning policy network (WTPN) that observes dose volume histograms of a plan and outputs an action to adjust organ weights, similar to the behaviors of a human planner. We train the WTPN via end-to-end deep reinforcement learning. Experience replay is performed with the epsilon greedy algorithm. Then we apply the trained WTPN to guide treatment planning of testing patient cases. The trained WTPN successfully learns the treatment planning goals to guide the weight tuning process. On average, the quality score of plans generated under the WTPNs guidance is improved by ~8.5% compared to the initial plan with arbitrary weights, and by 10.7% compared to the plans generated by human planners. To our knowledge, this is the first tool to adjust weights for the treatment planning in a human-like fashion based on learnt intelligence. The study demonstrates potential feasibility to develop intelligent treatment planning system via deep reinforcement learning.