The upgraded CERN LHCb detector, due to start data taking in 2021, will have to reconstruct 4 TB/s of raw detector data in real time using commodity processors. This is one of the biggest real-time data processing challenges in any scientific domain. We present an intrinsically parallel reconstruction algorithm for the vertex detector of the LHCb experiment designed to optimally exploit multi-core general purpose architectures. We compare it to previous state-of-the-art scalar pattern recognition algorithms and show significantly faster processing and in some cases increased physics performance over all current alternatives. We evaluate the algorithm on two high-end architectures from two different vendors and discuss in detail the impact of different SIMD Instruction Set Architecture extensions on the performance.