Data-driven fault diagnosis methods often require abundant labeled examples for each fault type. On the contrary, real-world data is often unlabeled and consists of mostly healthy observations and only few samples of faulty conditions. The lack of labels and fault samples imposes a significant challenge for existing data-driven fault diagnosis methods. In this paper, we aim to overcome this limitation by integrating expert knowledge with domain adaptation in a synthetic-to-real framework for unsupervised fault diagnosis. Motivated by the fact that domain experts often have a relatively good understanding on how different fault types affect healthy signals, in the first step of the proposed framework, a synthetic fault dataset is generated by augmenting real vibration samples of healthy bearings. This synthetic dataset integrates expert knowledge and encodes class information about the faults types. However, models trained solely based on the synthetic data often do not perform well because of the distinct distribution difference between the synthetically generated and real faults. To overcome this domain gap between the synthetic and real data, in the second step of the proposed framework, an imbalance-robust domain adaptation~(DA) approach is proposed to adapt the model from synthetic faults~(source) to the unlabeled real faults~(target) which suffer from severe class imbalance. The framework is evaluated on two unsupervised fault diagnosis cases for bearings, the CWRU laboratory dataset and a real-world wind-turbine dataset. Experimental results demonstrate that the generated faults are effective for encoding fault type information and the domain adaptation is robust against the different levels of class imbalance between faults.