K-means is a popular but computation-intensive algorithm for unsupervised learning. To address this issue, we present KPynq, a work-efficient triangle-inequality based K-means on FPGA for handling large-size, high-dimension datasets. KPynq leverages an algorithm-level optimization to balance the performance and computation irregularity, and a hardware architecture design to fully exploit the pipeline and parallel processing capability of various FPGAs. In the experiment, KPynq consistently outperforms the CPU-based standard K-means in terms of its speedup (up to 4.2x) and significant energy-efficiency (up to 218x).