As important data carriers, the drastically increasing number of multimedia videos often brings many duplicate and near-duplicate videos in the top results of search. Near-duplicate video retrieval (NDVR) can cluster and filter out the redundant contents. In this paper, the proposed NDVR approach extracts the frame-level video representation based on convolutional neural network (CNN) features from fully-connected layer and aggregated intermediate convolutional layers. Unsupervised metric learning is used for similarity measurement and feature matching. An efficient re-ranking algorithm combined with k-nearest neighborhood fuses the retrieval results from two levels of features and further improves the retrieval performance. Extensive experiments on the widely used CC_WEB_VIDEO dataset shows that the proposed approach exhibits superior performance over the state-of-the-art.