The prediction of solar flares, eruptions, and high energy particle storms is of great societal importance. The data mining approach to forecasting has been shown to be very promising. Benchmark datasets are a key element in the further development of data-driven forecasting. With one or more benchmark data sets established, judicious use of both the data themselves and the selection of prediction algorithms is key to developing a high quality and robust method for the prediction of geo-effective solar activity. We review here briefly the process of generating benchmark datasets and developing prediction algorithms.