VRSQRTSH
Compute Approximate Reciprocal of Square Root of Scalar FP
| Instruction En bit Mode Flag Support Instruction En bit Mode Flag Support 64/32 CPUID Feature Instruction En bit Mode Flag CPUID Feature Instruction En bit Mode Flag Op/ 64/32 CPUID Feature Instruction En bit Mode Flag 64/32 CPUID Feature Instruction En bit Mode Flag CPUID Feature Instruction En bit Mode Flag Op/ 64/32 CPUID Feature | Support | Description | ||
|---|---|---|---|---|
| EVEX.LLIG.66.MAP6.W0 4F /r VRSQRTSH xmm1{k1}{z}, xmm2, xmm3/m16 | A | V/V | AVX512-FP16 | Compute the approximate reciprocal square root of the FP16 value in xmm3/m16 and store the result in the low word element of xmm1 subject to writemask k1. Bits 127:16 of xmm2 are copied to xmm1[127:16]. | 
Instruction Operand Encoding
| Op/En | Tuple | Operand 1 | Operand 2 | Operand 3 | Operand 4 | 
|---|---|---|---|---|---|
| A | Scalar | ModRM:reg (w) | VEX.vvvv (r) | ModRM:r/m (r) | N/A | 
Description
This instruction performs the computation of the approximate reciprocal square-root of the low FP16 value in the second source operand (the third operand) and stores the result in the low word element of the destination operand (the first operand) according to the writemask k1.
The maximum relative error for this approximation is less than 2−11 + 2−14.
Bits 127:16 of the destination operand are copied from the corresponding bits of the first source operand. Bits MAXVL−1:128 of the destination operand are zeroed.
For special cases, see Table 5-38.
Operation
VRSQRTSH dest{k1}, src1, src2
VL = 128, 256 or 512
KL := VL/16
IF k1[0] or *no writemask*:
    DEST.fp16[0] := APPROXIMATE(1.0 / SQRT(src2.fp16[0]))
ELSE IF *zeroing*:
    DEST.fp16[0] := 0
//else DEST.fp16[0] remains unchanged
DEST[127:16] := src1[127:16]
DEST[MAXVL-1:128] := 0
Intel C/C++ Compiler Intrinsic Equivalent
VRSQRTSH __m128h _mm_mask_rsqrt_sh (__m128h src, __mmask8 k, __m128h a, __m128h b);
VRSQRTSH __m128h _mm_maskz_rsqrt_sh (__mmask8 k, __m128h a, __m128h b);
VRSQRTSH __m128h _mm_rsqrt_sh (__m128h a, __m128h b);
SIMD Floating-Point Exceptions
None.
Other Exceptions
EVEX-encoded instruction, see Table 2-58, “Type E10 Class Exception Conditions.”
This UNOFFICIAL, mechanically-separated, non-verified reference is provided for convenience, but it may be incomplete or broken in various obvious or non-obvious ways. Refer to Intel® 64 and IA-32 Architectures Software Developer’s Manual for anything serious.