performance test
copy size: 405504 [bytes] w/t 16 byte-alignment * 1000 cycles
memcpy: 5.1 [ms]
rep movsd: 10.6 [ms]
FPU 8bytes: 11.4 [ms]
MMX movntq pre 16bytes: 10.2 [ms]
MMX movntq 16bytes: 10.3 [ms]
MMX movntq 8bytes: 10.4 [ms]
SSE movntps pre 32bytes: 10.5 [ms]
SSE movntps 16bytes: 10.7 [ms]
SSE 16bytes: 9.4 [ms]
MMX 16bytes: 6.5 [ms]
MMX 8bytes: 11.6 [ms]
asm 8bytes: 10.2 [ms]
asm 4bytes: 11.4 [ms]
C++ 4bytes: 41.9 [ms]
Оказывается нет ничего лучше, чем memcpy в редакции msvc+ 2005...