No Arabic abstract
We previously proposed an intelligent automatic treatment planning framework for radiotherapy, in which a virtual treatment planner network (VTPN) was built using deep reinforcement learning (DRL) to operate a treatment planning system (TPS). Despite the success, the training of VTPN via DRL was time consuming. Also the training time is expected to grow with the complexity of the treatment planning problem, preventing the development of VTPN for more complicated but clinically relevant scenarios. In this study we proposed a knowledge-guided DRL (KgDRL) that incorporated knowledge from human planners to guide the training process to improve the training efficiency. Using prostate cancer intensity modulated radiation therapy as a testbed, we first summarized a number of rules of operating our in-house TPS. In training, in addition to randomly navigating the state-action space, as in the DRL using the epsilon-greedy algorithm, we also sampled actions defined by the rules. The priority of sampling actions from rules decreased over the training process to encourage VTPN to explore new policy that was not covered by the rules. We trained a VTPN using KgDRL and compared its performance with another VTPN trained using DRL. Both VTPNs trained via KgDRL and DRL spontaneously learned to operate the TPS to generate high-quality plans, achieving plan quality scores of 8.82 and 8.43, respectively. Both VTPNs outperformed treatment planning purely based on the rules, which had a plan score of 7.81. VTPN trained with 8 episodes using KgDRL was able to perform similarly to that trained using DRL with 100 episodes. The training time was reduced from more than a week to 13 hours. The proposed KgDRL framework accelerated the training process by incorporating human knowledge, which will facilitate the development of VTPN for more complicated treatment planning scenarios.
Purpose: Several inverse planning algorithms have been developed for Gamma Knife (GK) radiosurgery to determine a large number of plan parameters via solving an optimization problem, which typically consists of multiple objectives. The priorities among these objectives need to be repetitively adjusted to achieve a clinically good plan for each patient. This study aimed to achieve automatic and intelligent priority-tuning, by developing a deep reinforcement learning (DRL) based method to model the tuning behaviors of human planners. Methods: We built a priority-tuning policy network using deep convolutional neural networks. Its input was a vector composed of the plan metrics that were used in our institution for GK plan evaluation. The network can determine which tuning action to take, based on the observed quality of the intermediate plan. We trained the network using an end-to-end DRL framework to approximate the optimal action-value function. A scoring function was designed to measure the plan quality. Results: Vestibular schwannoma was chosen as the test bed in this study. The number of training, validation and testing cases were 5, 5, and 16, respectively. For these three datasets, the average plan scores with initial priorities were 3.63 $pm$ 1.34, 3.83 $pm$ 0.86 and 4.20 $pm$ 0.78, respectively, while can be improved to 5.28 $pm$ 0.23, 4.97 $pm$ 0.44 and 5.22 $pm$ 0.26 through manual priority tuning by human expert planners. Our network achieved competitive results with 5.42 $pm$ 0.11, 5.10 $pm$ 0. 42, 5.28 $pm$ 0.20, respectively. Conclusions: Our network can generate GK plans of comparable or slightly higher quality comparing with the plans generated by human planners via manual priority tuning. The network can potentially be incorporated into the clinical workflow to improve GK planning efficiency.
Inverse treatment planning in radiation therapy is formulated as optimization problems. The objective function and constraints consist of multiple terms designed for different clinical and practical considerations. Weighting factors of these terms are needed to define the optimization problem. While a treatment planning system can solve the optimization problem with given weights, adjusting the weights for high plan quality is performed by human. The weight tuning task is labor intensive, time consuming, and it critically affects the final plan quality. An automatic weight-tuning approach is strongly desired. The weight tuning procedure is essentially a decision making problem. Motivated by the tremendous success in deep learning for decision making with human-level intelligence, we propose a novel framework to tune the weights in a human-like manner. Using treatment planning in high-dose-rate brachytherapy as an example, we develop a weight tuning policy network (WTPN) that observes dose volume histograms of a plan and outputs an action to adjust organ weights, similar to the behaviors of a human planner. We train the WTPN via end-to-end deep reinforcement learning. Experience replay is performed with the epsilon greedy algorithm. Then we apply the trained WTPN to guide treatment planning of testing patient cases. The trained WTPN successfully learns the treatment planning goals to guide the weight tuning process. On average, the quality score of plans generated under the WTPNs guidance is improved by ~8.5% compared to the initial plan with arbitrary weights, and by 10.7% compared to the plans generated by human planners. To our knowledge, this is the first tool to adjust weights for the treatment planning in a human-like fashion based on learnt intelligence. The study demonstrates potential feasibility to develop intelligent treatment planning system via deep reinforcement learning.
Purpose: A Monte Carlo (MC) beam model and its implementation in a clinical treatment planning system (TPS, Varian Eclipse) are presented for a modified ultra-high dose-rate electron FLASH radiotherapy (eFLASH-RT) LINAC. Methods: The gantry head without scattering foils or targets, representative of the LINAC modifications, was modelled in Geant4. The energy spectrum ({sigma}E) and beam source emittance cone angle ({theta}cone) were varied to match the calculated and Gafchromic film measured central-axis percent depth dose (PDD) and lateral profiles. Its Eclipse configuration was validated with measured profiles of the open field and nominal fields for clinical applicators. eFLASH-RT plans were MC forward calculated in Geant4 for a mouse brain treatment and compared to a conventional (Conv-RT) plan in Eclipse for a human patient with metastatic renal cell carcinoma. Results: The beam model and its Eclipse configuration agreed best with measurements at {sigma}E=0.5 MeV and {theta}cone=3.9+/-0.2 degrees to clinically acceptable accuracy (the absolute average error was within 1.5% for in-water lateral, 3% for in-air lateral, and 2% for PDD). The forward dose calculation showed dose was delivered to the entire mouse brain with adequate conformality. The human patient case demonstrated the planning capability with routine accessories in relatively complex geometry to achieve an acceptable plan (90% of the tumor volume receiving 95% and 90% of the prescribed dose for eFLASH and Conv-RT, respectively). Conclusion: To the best of our knowledge, this is the first functional beam model commissioned in a clinical TPS for eFLASH-RT, enabling planning and evaluation with minimal deviation from Conv-RT workflow. It facilitates the clinical translation as eFLASH-RT and Conv-RT plan quality were comparable for a human patient. The methods can be expanded to model other eFLASH irradiators.
Sepsis is a leading cause of mortality in intensive care units and costs hospitals billions annually. Treating a septic patient is highly challenging, because individual patients respond very differently to medical interventions and there is no universally agreed-upon treatment for sepsis. In this work, we propose an approach to deduce treatment policies for septic patients by using continuous state-space models and deep reinforcement learning. Our model learns clinically interpretable treatment policies, similar in important aspects to the treatment policies of physicians. The learned policies could be used to aid intensive care clinicians in medical decision making and improve the likelihood of patient survival.
The Radiotherapy treatment planning optimization process based on a quasi-Newton algorithm with an object function containing dose-volume constraints is not guaranteed to converge when the dose value in the dose-volume constraint is a critical value of the dose distribution. This is caused by finite differentiability of the dose-volume histogram at such values. A closer look near such values reveals that convergence is most likely not at stake, but it might be slowed down.