Importance: Lung cancer is the leading cause of cancer mortality in the US, responsible for more deaths than breast, prostate, colon and pancreas cancer combined and it has been recently demonstrated that low-dose computed tomography (CT) screening of the chest can significantly reduce this death rate. Objective: To compare the performance of a deep learning model to state-of-the-art automated algorithms and radiologists as well as assessing the robustness of the algorithm in heterogeneous datasets. Design, Setting, and Participants: Three low-dose CT lung cancer screening datasets from heterogeneous sources were used, including National Lung Screening Trial (NLST, n=3410), Lahey Hospital and Medical Center (LHMC, n=3174) data, Kaggle competition data (from both stages, n=1595+505) and the University of Chicago data (UCM, a subset of NLST, annotated by radiologists, n=197). Relevant works on automated methods for Lung Cancer malignancy estimation have used significantly less data in size and diversity. At the first stage, our framework employs a nodule detector; while in the second stage, we use both the image area around the nodules and nodule features as inputs to a neural network that estimates the malignancy risk for the entire CT scan. We trained our two-stage algorithm on a part of the NLST dataset, and validated it on the other datasets. Results, Conclusions, and Relevance: The proposed deep learning model: (a) generalizes well across all three data sets, achieving AUC between 86% to 94%; (b) has better performance than the widely accepted PanCan Risk Model, achieving 11% better AUC score; (c) has improved performance compared to the state-of-the-art represented by the winners of the Kaggle Data Science Bowl 2017 competition on lung cancer screening; (d) has comparable performance to radiologists in estimating cancer risk at a patient level.