Many virtual machines exist for sensor nodes with only a few KB RAM and tens to a few hundred KB flash memory. They pack an impressive set of features, but suffer from a slowdown of one to two orders of magnitude compared to optimised native code, reducing throughput and increasing power consumption. Compiling bytecode to native code to improve performance has been studied extensively for larger devices, but the restricted resources on sensor nodes mean most modern techniques cannot be applied. Simply replacing bytecode instructions with predefined sequences of native instructions is known to improve performance, but produces code several times larger than the optimised C equivalent, limiting the size of programmes that can fit onto a device. This paper identifies the major sources of overhead resulting from this basic approach, and presents optimisations to remove most of the remaining performance overhead, and over half the size overhead, reducing them to 69% and 91% respectively. While this increases the size of the VM, the break-even point at which this fixed cost is compensated for is well within the range of memory available on a sensor device, allowing us to both improve performance and load more code on a device.