Add SSSE3 imnplementation ofg block copy function, gives us ~30% kernel scanning...