Submodular function minimization (SFM) is a fundamental discrete optimization problem which generalizes many well known problems, has applications in various fields, and can be solved in polynomial time. Owing to applications in computer vision and machine learning, fast SFM algorithms are highly desirable. The current fastest algorithms [Lee, Sidford, Wong, FOCS 2015] run in $O(n^{2}log nMcdottextrm{EO} +n^{3}log^{O(1)}nM)$ time and $O(n^{3}log^{2}ncdot textrm{EO} +n^{4}log^{O(1)}n$) time respectively, where $M$ is the largest absolute value of the function (assuming the range is integers) and $textrm{EO}$ is the time taken to evaluate the function on any set. Although the best known lower bound on the query complexity is only $Omega(n)$, the current shortest non-deterministic proof certifying the optimum value of a function requires $Theta(n^{2})$ function evaluations. The main contribution of this paper are subquadratic SFM algorithms. For integer-valued submodular functions, we give an SFM algorithm which runs in $O(nM^{3}log ncdottextrm{EO})$ time giving the first nearly linear time algorithm in any known regime. For real-valued submodular functions with range in $[-1,1]$, we give an algorithm which in $tilde{O}(n^{5/3}cdottextrm{EO}/varepsilon^{2})$ time returns an $varepsilon$-additive approximate solution. At the heart of it, our algorithms are projected stochastic subgradient descent methods on the Lovasz extension of submodular functions where we crucially exploit submodularity and data structures to obtain fast, i.e. sublinear time subgradient updates. . The latter is crucial for beating the $n^{2}$ bound as we show that algorithms which access only subgradients of the Lovasz extension, and these include the theoretically best algorithms mentioned above, must make $Omega(n)$ subgradient calls (even for functions whose range is ${-1,0,1}$).