Skip to content

❓ [Question] Running LayerNorm in fp16 #2730

Open
@Tomiinek

Description

@Tomiinek

❓ Question

What you have already tried

I am trying to convert a transformer model to TRT in fp16 (fp32 works fine 🙂). It includes bunch of LayerNorms, all of them have explicit casting of inputs to fp32, i.e:

class LayerNormFP32(nn.LayerNorm):
    def forward(self, x):
        return super().forward(x.float()).type(x.dtype)

I am getting warnings about precisions of the layers:

WARNING: [Torch-TensorRT TorchScript Conversion Context] - Detected layernorm nodes in FP16: %126 : Tensor = aten::layer_norm(%input.9, %127, %self.decoder.layers.0.attn_ln.weight.1, %370, %129, %130), scope: __module.decoder/__module.decoder.layers.0/__module.decoder.layers.0.attn_ln
...
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Running layernorm after self-attention in FP16 may cause overflow. Exporting the model to the latest available ONNX opset (later than opset 17) to use the INormalizationLayer, or forcing layernorm layers to run in FP32 precision can help with preserving accuracy.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT encountered issues when converting weights between types and that could affect accuracy.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Check verbose logs for the list of affected weights.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - - 2 weights are affected by this issue: Detected FP32 infinity values and converted them to corresponding FP16 infinity.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - - 27 weights are affected by this issue: Detected subnormal FP16 values.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - - 3 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.

I checked dtype of the mentioned weights in the trace that I pass to torch_tensorrt.compile and they are correctly in fp32, even though the warnings state the opposite.

The warning suggets two solutions (use INormalizationLayer or force FP32 precisions) but I have no idea ho to achieve it.
This might be a related: #2509 (or NVIDIA/TensorRT#3101)

Any ideas how to resolve or debug this issue?

Environment

  • Python 3.11.8
  • torch 2.2.1
  • torch_tensorrt 2.2.0
  • a100

Metadata

Metadata

Assignees

Labels

questionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions