In this paper we introduce the algorithm and the fixed point hardware to calculate the normalized singular value decomposition of a non-symmetric matrices using Givens fast (approximate) rotations. This algorithm only uses the basic combinational logic modules such as adders, multiplexers, encoders, Barrel shifters (B-shifters), and comparators and does not use any lookup table. This method in fact combines the iterative properties of singular value decomposition method and CORDIC method in one single iteration. The introduced architecture is a systolic architecture that uses two different types of processors, diagonal and non-diagonal processors. The diagonal processor calculates, transmits and applies the horizontal and vertical rotations, while the non-diagonal processor uses a fully combinational architecture to receive, and apply the rotations. The diagonal processor uses priority encoders, Barrel shifters, and comparators to calculate the rotation angles. Both processors use a series of adders to apply the rotation angles. The design presented in this work provides $2.83sim649$ times better energy per matrix performance compared to the state of the art designs. This performance achieved without the employment of pipelining; a better performance advantage is expected to be achieved employing pipelining.