No Arabic abstract
Purpose: Several inverse planning algorithms have been developed for Gamma Knife (GK) radiosurgery to determine a large number of plan parameters via solving an optimization problem, which typically consists of multiple objectives. The priorities among these objectives need to be repetitively adjusted to achieve a clinically good plan for each patient. This study aimed to achieve automatic and intelligent priority-tuning, by developing a deep reinforcement learning (DRL) based method to model the tuning behaviors of human planners. Methods: We built a priority-tuning policy network using deep convolutional neural networks. Its input was a vector composed of the plan metrics that were used in our institution for GK plan evaluation. The network can determine which tuning action to take, based on the observed quality of the intermediate plan. We trained the network using an end-to-end DRL framework to approximate the optimal action-value function. A scoring function was designed to measure the plan quality. Results: Vestibular schwannoma was chosen as the test bed in this study. The number of training, validation and testing cases were 5, 5, and 16, respectively. For these three datasets, the average plan scores with initial priorities were 3.63 $pm$ 1.34, 3.83 $pm$ 0.86 and 4.20 $pm$ 0.78, respectively, while can be improved to 5.28 $pm$ 0.23, 4.97 $pm$ 0.44 and 5.22 $pm$ 0.26 through manual priority tuning by human expert planners. Our network achieved competitive results with 5.42 $pm$ 0.11, 5.10 $pm$ 0. 42, 5.28 $pm$ 0.20, respectively. Conclusions: Our network can generate GK plans of comparable or slightly higher quality comparing with the plans generated by human planners via manual priority tuning. The network can potentially be incorporated into the clinical workflow to improve GK planning efficiency.
With many variables to adjust, conventional manual forward planning for Gamma Knife (GK) radiosurgery is very complicated and cumbersome. The resulting plan quality heavily depends on planners skills, experiences and devoted efforts, and varies significantly among cases, planners, and institutions. Quality control for GK planning is desired to consistently provide high-quality plan to each patient. In this study, we proposed a quality control method for GK planning by building a database of high-quality GK plans. Patient anatomy was described by target volume, target shape complexity, and spatial relationship between target and nearby organs, which determine GK planning difficulty level. Plan quality was evaluated using target coverage, selectivity, intermediate dose spillage, maximum dose to 0.1 cc of brainstem, mean dose of ipsilateral cochlea, and beam-on time. When a new plan is created, a high-quality plan that has the most similar target volume size and shape complexity will be identified from the database. A model has also been built to predict the dose to brainstem and cochlea based on their overlap volume histograms. The identified reference plan and the predicted organ dose will help planners to make quality control decisions accordingly. To validate this method, we have built a database for vestibular schwannoma, which are considered to be challenging for GK planning due to the irregularly-shaped target and its proximity to brainstem and cochlea. Five cases were tested, among which one case was considered to be of high quality and four cases had a lower plan quality than prediction. These four cases were replanned and got substantially improved. Our results have demonstrated the efficacy of our proposed quality control method. This method may also be used as a plan quality prediction method to facilitate the development of automatic treatment planning for GK radiosurgery.
We previously proposed an intelligent automatic treatment planning framework for radiotherapy, in which a virtual treatment planner network (VTPN) was built using deep reinforcement learning (DRL) to operate a treatment planning system (TPS). Despite the success, the training of VTPN via DRL was time consuming. Also the training time is expected to grow with the complexity of the treatment planning problem, preventing the development of VTPN for more complicated but clinically relevant scenarios. In this study we proposed a knowledge-guided DRL (KgDRL) that incorporated knowledge from human planners to guide the training process to improve the training efficiency. Using prostate cancer intensity modulated radiation therapy as a testbed, we first summarized a number of rules of operating our in-house TPS. In training, in addition to randomly navigating the state-action space, as in the DRL using the epsilon-greedy algorithm, we also sampled actions defined by the rules. The priority of sampling actions from rules decreased over the training process to encourage VTPN to explore new policy that was not covered by the rules. We trained a VTPN using KgDRL and compared its performance with another VTPN trained using DRL. Both VTPNs trained via KgDRL and DRL spontaneously learned to operate the TPS to generate high-quality plans, achieving plan quality scores of 8.82 and 8.43, respectively. Both VTPNs outperformed treatment planning purely based on the rules, which had a plan score of 7.81. VTPN trained with 8 episodes using KgDRL was able to perform similarly to that trained using DRL with 100 episodes. The training time was reduced from more than a week to 13 hours. The proposed KgDRL framework accelerated the training process by incorporating human knowledge, which will facilitate the development of VTPN for more complicated treatment planning scenarios.
Inverse treatment planning in radiation therapy is formulated as optimization problems. The objective function and constraints consist of multiple terms designed for different clinical and practical considerations. Weighting factors of these terms are needed to define the optimization problem. While a treatment planning system can solve the optimization problem with given weights, adjusting the weights for high plan quality is performed by human. The weight tuning task is labor intensive, time consuming, and it critically affects the final plan quality. An automatic weight-tuning approach is strongly desired. The weight tuning procedure is essentially a decision making problem. Motivated by the tremendous success in deep learning for decision making with human-level intelligence, we propose a novel framework to tune the weights in a human-like manner. Using treatment planning in high-dose-rate brachytherapy as an example, we develop a weight tuning policy network (WTPN) that observes dose volume histograms of a plan and outputs an action to adjust organ weights, similar to the behaviors of a human planner. We train the WTPN via end-to-end deep reinforcement learning. Experience replay is performed with the epsilon greedy algorithm. Then we apply the trained WTPN to guide treatment planning of testing patient cases. The trained WTPN successfully learns the treatment planning goals to guide the weight tuning process. On average, the quality score of plans generated under the WTPNs guidance is improved by ~8.5% compared to the initial plan with arbitrary weights, and by 10.7% compared to the plans generated by human planners. To our knowledge, this is the first tool to adjust weights for the treatment planning in a human-like fashion based on learnt intelligence. The study demonstrates potential feasibility to develop intelligent treatment planning system via deep reinforcement learning.
Due to the complexity and cumbersomeness of Gamma Knife (GK) manual forward planning, the quality of the resulting treatment plans heavily depends on the planners skill, experience and the amount of effort devoted to plan development. Hence, GK plan quality may vary significantly among institutions and planners, and even for a same planner at different cases. This is particularly a concern for challenging cases with complicated geometry, such as vestibular schwannoma cases. The purpose of this retrospective study is to investigate the plan quality and variation in the manually forward planned, clinically acceptable GK treatment plans of 22 previous vestibular schwannoma cases. Considering the impacts of different patient geometry and different trade-offs among the planning objectives in GK planning, it is difficult to objectively assess the plan quality across different cases. To reduce these confounding factors on plan quality assessment, we employed our recently developed multiresolution-level inverse planning algorithm to generate a golden plan for each case, which is expected to be on or close to the pareto surface with a similar trade-off as used in the manual plan. The plan quality of the manual plan is then quantified in terms of its deviation from the golden plan. A scoring criterion between 0-100 was designed to calculate a final score for each manual plan to simplify our analysis. Large quality variation was observed in these 22 cases, with two cases having a score lower than 75, three cases scoring between 80 and 85, two cases between 85 and 90, eight cases between 90 and 95, and seven cases higher than 95. Inter- and intra- planner variability was also observed in our study. This large variation in GK manual planning deserves high attention, and merits further investigation on how to reduce the variation in GK treatment plan quality.
We propose a new approach to inverse reinforcement learning (IRL) based on the deep Gaussian process (deep GP) model, which is capable of learning complicated reward structures with few demonstrations. Our model stacks multiple latent GP layers to learn abstract representations of the state feature space, which is linked to the demonstrations through the Maximum Entropy learning framework. Incorporating the IRL engine into the nonlinear latent structure renders existing deep GP inference approaches intractable. To tackle this, we develop a non-standard variational approximation framework which extends previous inference schemes. This allows for approximate Bayesian treatment of the feature space and guards against overfitting. Carrying out representation and inverse reinforcement learning simultaneously within our model outperforms state-of-the-art approaches, as we demonstrate with experiments on standard benchmarks (object world,highway driving) and a new benchmark (binary world).