128-bit Values - From XMM Registers To General Purpose

August 05, 2015

Answer :

You cannot move the upper bits of an XMM register into a general purpose register directly.
You'll have to follow a two-step process, which may or may not involve a roundtrip to memory or the destruction of a register.

in registers (SSE2)

movq rax,xmm0       ;lower 64 bits movhlps xmm0,xmm0   ;move high 64 bits to low 64 bits. movq rbx,xmm0       ;high 64 bits.

punpckhqdq xmm0,xmm0 is the SSE2 integer equivalent of movhlps xmm0,xmm0. Some CPUs may avoid a cycle or two of bypass latency if xmm0 was last written by an integer instruction, not FP.

via memory (SSE2)

movdqu [mem],xmm0 mov rax,[mem] mov rbx,[mem+8]

slow, but does not destroy xmm register (SSE4.1)

mov rax,xmm0 pextrq rbx,xmm0,1        ;3 cycle latency on Ryzen! (and 2 uops)

A hybrid strategy is possible, e.g. store to memory, movd/q e/rax,xmm0 so it's ready quickly, then reload the higher elements. (Store-forwarding latency is not much worse than ALU, though.) That gives you a balance of uops for different back-end execution units. Store/reload is especially good when you want lots of small elements. (mov / movzx loads into 32-bit registers are cheap and have 2/clock throughput.)

For 32 bits, the code is similar:

in registers

movd eax,xmm0 psrldq xmm0,xmm0,4    ;shift 4 bytes to the right movd ebx,xmm0 psrldq xmm0,xmm0,4    ; pshufd could copy-and-shuffle the original reg movd ecx,xmm0         ; not destroying the XMM and maybe creating some ILP psrlq xmm0,xmm0,4 movd edx,xmm0

via memory

movdqu [mem],xmm0 mov eax,[mem] mov ebx,[mem+4] mov ecx,[mem+8] mov edx,[mem+12]

Not destroying xmm register (SSE4.1) (slow like the psrldq / pshufd version)

movd eax,xmm0 pextrd ebx,xmm0,1        ;3 cycle latency on Skylake! pextrd ecx,xmm0,2        ;also 2 uops: like a shuffle(port5) + movd(port0) pextrd edx,xmm0,3

The 64-bit shift variant can run in 2 cycles. The pextrq version takes 4 minimum. For 32-bit, the numbers are 4 and 10, respectively.

Search This Blog

Newbe Dev Stack

128-bit Values - From XMM Registers To General Purpose

Answer :

Comments

Post a Comment

Popular posts from this blog

Chemistry - Bond Angles In NH3 And NCl3

Can Not Use Command Telnet In Git Bash

How To Go To The Next Line In Github Readme Code Example