The linear part of transient evoked (TE) otoacoustic emission (OAE) is thought to be generated via coherent reflection near the characteristic place of constituent wave components. Because of the tonotopic organization of the cochlea, high frequency emissions return earlier than low frequencies; however, due to the random nature of coherent reflection, the instantaneous frequency (IF) and amplitude envelope of TEOAEs both fluctuate. Multiple reflection components and synchronized spontaneous emissions can further make it difficult to extract the IF by linear transforms. In this paper, we propose to model TEOAEs as a sum of {em intrinsic mode-type functions} and analyze it by a {nonlinear-type time-frequency analysis} technique called concentration of frequency and time (ConceFT). When tested with synthetic OAE signals {with possibly multiple oscillatory components}, the present method is able to produce clearly visualized traces of individual components on the time-frequency plane. Further, when the signal is noisy, the proposed method is compared with existing linear and bilinear methods in its accuracy for estimating the fluctuating IF. Results suggest that ConceFT outperforms the best of these methods in terms of optimal transport distance, reducing the error by 10 to {21%} when the signal to noise ratio is 10 dB or below.