Posts

Showing posts with the label Assembly

Assembly: Why Some X86 Opcodes Are Invalid In X64?

Answer : The 06 and 07 opcodes in 32-bit mode are the instructions PUSH ES and POP ES . In 64-bit mode, the segment registers CS, DS, ES, and SS are no longer used to determine memory addresses: the processor assumes a base address of 0 and no size limits. As there's now usually no reason for applications (other than the operating system itself) to access these registers, the push/pop opcodes for changing and accessing them were removed, leaving only mov to/from Sreg (which is just 2 total opcodes; the register number goes in the ModRM byte instead of part of the 1-byte opcode). That's totally sufficient for something that's almost never needed. The FS and GS segment registers can still set the base address in 64-bit mode, so push and pop opcodes related to them have not been removed. (These 2-byte 0F xx opcodes were added in 386, and are a less valuable part of opcode space than the old 1-byte opcodes for 8086 segment registers). Push/pop or mov of segment reg...

ASM X86_64 AVX: Xmm And Ymm Registers Differences

Image
Answer : xmm0 is the low half of ymm0 , exactly like eax is the low half of rax . Writing to xmm0 (with a VEX-coded instruction, not legacy SSE) zeros the upper lane of ymm0 , just like writing to eax zeros the upper half of rax to avoid false dependencies. Lack of zeroing the upper bytes for legacy SSE instructions is why there's a penalty for mixing AVX and legacy SSE instructions. Most AVX instructions are available with either 128-bit or 256-bit size. e.g. vaddps xmm0, xmm1, xmm2 or vaddps ymm0, ymm1, ymm2 . (The 256-bit versions of most integer instructions are only available in AVX2, with AVX only providing the 128-bit version. There are a couple exceptions, like vptest ymm, ymm in AVX1. And vmovdqu if you count that as an "integer" instruction). Scalar instructions like vmovd , vcvtss2si , and vcvtsi2ss are only available with XMM registers. Reading a YMM register is not logically different from reading an XMM register, but writing the low ...

128-bit Values - From XMM Registers To General Purpose

Answer : You cannot move the upper bits of an XMM register into a general purpose register directly. You'll have to follow a two-step process, which may or may not involve a roundtrip to memory or the destruction of a register. in registers (SSE2) movq rax,xmm0 ;lower 64 bits movhlps xmm0,xmm0 ;move high 64 bits to low 64 bits. movq rbx,xmm0 ;high 64 bits. punpckhqdq xmm0,xmm0 is the SSE2 integer equivalent of movhlps xmm0,xmm0 . Some CPUs may avoid a cycle or two of bypass latency if xmm0 was last written by an integer instruction, not FP. via memory (SSE2) movdqu [mem],xmm0 mov rax,[mem] mov rbx,[mem+8] slow, but does not destroy xmm register (SSE4.1) mov rax,xmm0 pextrq rbx,xmm0,1 ;3 cycle latency on Ryzen! (and 2 uops) A hybrid strategy is possible, e.g. store to memory, movd/q e/rax,xmm0 so it's ready quickly, then reload the higher elements. (Store-forwarding latency is not much worse than ALU, though.) That gives you a bal...

Assembly - Carry Flag VS Overflow Flag

Answer : Overflow occurs when the result of adding two positive numbers is negative or the result of adding two negative numbers is positive. For instance: +127+1=? +127=0111 1111 +1=0000 0001 --------- 1000 0000 As we look at the sign bits of the two operands and the sign bit of the result, we find out that Overflow occurred and the answer is incorrect. In unsigned arithmetic, you have added 0xFB to 0x84 , i.e. 251 + 132, which indeed is larger than 8-bit, and so the carry flag is set. In the second case, you are adding +127 to 1, which indeed exceeds a signed 8-bit range, and so the overflow flag is set.

Carry Flag, Auxiliary Flag And Overflow Flag In Assembly

Answer : Carry Flag The rules for turning on the carry flag in binary/integer math are two: The carry flag is set if the addition of two numbers causes a carry out of the most significant (leftmost) bits added. 1111 + 0001 = 0000 (carry flag is turned on) The carry (borrow) flag is also set if the subtraction of two numbers requires a borrow into the most significant (leftmost) bits subtracted. 0000 - 0001 = 1111 (carry flag is turned on) Otherwise, the carry flag is turned off (zero). 0111 + 0001 = 1000 (carry flag is turned off [zero]) 1000 - 0001 = 0111 (carry flag is turned off [zero]) In unsigned arithmetic, watch the carry flag to detect errors. In signed arithmetic, the carry flag tells you nothing interesting. Overflow Flag The rules for turning on the overflow flag in binary/integer math are two: If the sum of two numbers with the sign bits off yields a result number with the sign bit on, the "overflow" flag is turned on. 0100 + 0100 = 1000 ...

Assembly Code Vs Machine Code Vs Object Code?

Image
Answer : Machine code is binary (1's and 0's) code that can be executed directly by the CPU. If you open a machine code file in a text editor you would see garbage, including unprintable characters (no, not those unprintable characters ;) ). Object code is a portion of machine code not yet linked into a complete program. It's the machine code for one particular library or module that will make up the completed product. It may also contain placeholders or offsets not found in the machine code of a completed program. The linker will use these placeholders and offsets to connect everything together. Assembly code is plain-text and (somewhat) human read-able source code that mostly has a direct 1:1 analog with machine instructions. This is accomplished using mnemonics for the actual instructions, registers, or other resources. Examples include JMP and MULT for the CPU's jump and multiplication instructions. Unlike machine code, the CPU does not understand assemb...

Assembly: REP MOVS Mechanism

Answer : For questions about particular instructions always consult the instruction set reference. In this case, you will need to look up rep and movs . In short, rep repeats the following string operation ecx times. movs copies data from ds:esi to es:edi and increments or decrements the pointers based on the setting of the direction flag. As such, repeating it will move a range of memory to somewhere else. PS: usually the operation size is encoded as an instruction suffix, so people use movsb and movsd to indicate byte or dword operation. Some assemblers however allow specifying the size as in your example, by byte ptr or dword ptr . Also, the operands are implicit in the instruction, and you can not modify them. The short explanation about syntax At the assembly-code level, two forms of this instruction are allowed: the “explicit-operands” form and the “nooperand” form. The explicit-operands form allows the source and the destination address of the memory to be ...

Assembly JLE Jmp Instruction Example

Answer : The jump itself checks the flags in the EFL register. These are usually set with TEST or CMP(or as a side effect of many other instructions). CMP ebx,10 JLE there CMP corresponds to calculating the difference of the operands, updating the flags and discarding the result. Typically used for greater/smaller checks TEST corresponds to calculating the binary AND of the operands, updating the flags and discarding the result. Typically used for equality checks. See also: The art of assembly language on CMP As a sidenote: You should get the Intel reference manuals. In particular the two part "Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 2: Instruction Set Reference" which describes all x86 instructions. JLE instruction conducts two tests: Signed Flag ( SF ) != Overflow Flag ( OF ) Zero flag ( ZF ) == 1 If Zero flags is 1 and Signed Flag and Overflow Flag are not equal, then the short relative jump will be executed. Maybe...