Pixel-wise crack detection is a challenging task because of poor continuity and low contrast in cracks. The existing frameworks usually employ complex models leading to good accuracy and yet low inference efficiency. In this paper, we present a lightweight encoder-decoder architecture, CarNet, for efficient and high-quality crack detection. To this end, we first propose that the ideal encoder should present an olive-type distribution about the number of convolutional layers at different stages. Specifically, as the network stages deepen in the encoder, the number of convolutional layers shows a downward trend after the model input is compressed in the initial network stage. Meanwhile, in the decoder, we introduce a lightweight up-sampling feature pyramid module to learn rich hierarchical features for crack detection. In particular, we compress the feature maps of the last three network stages to the same channels and then employ up-sampling with different multiples to resize them to the same resolutions for information fusion. Finally, extensive experiments on four public databases, i.e., Sun520, Rain365, BJN260, and Crack360, demonstrate that our CarNet gains a good trade-off between inference efficiency and test accuracy over the existing state-of-the-art methods.