VPERMB
Permute Packed Bytes Elements
| Opcode/Instruction | Op/En | 64/32 bit Mode Support | CPUID Feature Flag | Description | 
|---|---|---|---|---|
| EVEX.128.66.0F38.W0 8D /r VPERMB xmm1 {k1}{z}, xmm2, xmm3/m128 | A | V/V | AVX512VL AVX512_VBMI | Permute bytes in xmm3/m128 using byte indexes in xmm2 and store the result in xmm1 using writemask k1. | 
| EVEX.256.66.0F38.W0 8D /r VPERMB ymm1 {k1}{z}, ymm2, ymm3/m256 | A | V/V | AVX512VL AVX512_VBMI | Permute bytes in ymm3/m256 using byte indexes in ymm2 and store the result in ymm1 using writemask k1. | 
| EVEX.512.66.0F38.W0 8D /r VPERMB zmm1 {k1}{z}, zmm2, zmm3/m512 | A | V/V | AVX512_VBMI | Permute bytes in zmm3/m512 using byte indexes in zmm2 and store the result in zmm1 using writemask k1. | 
Instruction Operand Encoding
| Op/En | Tuple Type | Operand 1 | Operand 2 | Operand 3 | Operand 4 | 
|---|---|---|---|---|---|
| A | Full Mem | ModRM:reg (w) | EVEX.vvvv (r) | ModRM:r/m (r) | N/A | 
Description
Copies bytes from the second source operand (the third operand) to the destination operand (the first operand) according to the byte indices in the first source operand (the second operand). Note that this instruction permits a byte in the source operand to be copied to more than one location in the destination operand.
Only the low 6(EVEX.512)/5(EVEX.256)/4(EVEX.128) bits of each byte index is used to select the location of the source byte from the second source operand.
The first source operand is a ZMM/YMM/XMM register. The second source operand can be a ZMM/YMM/XMM register, a 512/256/128-bit memory location. The destination operand is a ZMM/YMM/XMM register updated at byte granularity by the writemask k1.
Operation
VPERMB (EVEX encoded versions)
(KL, VL) = (16, 128), (32, 256), (64, 512)
IF VL = 128:
    n := 3;
ELSE IF VL = 256:
    n := 4;
ELSE IF VL = 512:
    n := 5;
FI;
FOR j := 0 TO KL-1:
    id := SRC1[j*8 + n : j*8] ; // location of the source byte
    IF k1[j] OR *no writemask* THEN
        DEST[j*8 + 7: j*8] := SRC2[id*8 +7: id*8];
    ELSE IF zeroing-masking THEN
        DEST[j*8 + 7: j*8] := 0;
    *ELSE
        DEST[j*8 + 7: j*8] remains unchanged*
    FI
ENDFOR
DEST[MAX_VL-1:VL] := 0;
Intel C/C++ Compiler Intrinsic Equivalent
VPERMB __m512i _mm512_permutexvar_epi8( __m512i idx, __m512i a);
VPERMB __m512i _mm512_mask_permutexvar_epi8(__m512i s, __mmask64 k, __m512i idx, __m512i a);
VPERMB __m512i _mm512_maskz_permutexvar_epi8( __mmask64 k, __m512i idx, __m512i a);
VPERMB __m256i _mm256_permutexvar_epi8( __m256i idx, __m256i a);
VPERMB __m256i _mm256_mask_permutexvar_epi8(__m256i s, __mmask32 k, __m256i idx, __m256i a);
VPERMB __m256i _mm256_maskz_permutexvar_epi8( __mmask32 k, __m256i idx, __m256i a);
VPERMB __m128i _mm_permutexvar_epi8( __m128i idx, __m128i a);
VPERMB __m128i _mm_mask_permutexvar_epi8(__m128i s, __mmask16 k, __m128i idx, __m128i a);
VPERMB __m128i _mm_maskz_permutexvar_epi8( __mmask16 k, __m128i idx, __m128i a);
SIMD Floating-Point Exceptions
None.
Other Exceptions
See Exceptions Type E4NF.nb in Table 2-50, “Type E4NF Class Exception Conditions.”
This UNOFFICIAL, mechanically-separated, non-verified reference is provided for convenience, but it may be incomplete or broken in various obvious or non-obvious ways. Refer to Intel® 64 and IA-32 Architectures Software Developer’s Manual for anything serious.