❓ [Question] Running LayerNorm in fp16

## ❓ Question



## What you have already tried

I am trying to convert a transformer model to TRT in fp16 (fp32 works fine 🙂). It includes bunch of LayerNorms, all of them have explicit casting of inputs to fp32, i.e:
``` python
class LayerNormFP32(nn.LayerNorm):
    def forward(self, x):
        return super().forward(x.float()).type(x.dtype)
``` 
I am getting warnings about precisions of the layers:
```
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Detected layernorm nodes in FP16: %126 : Tensor = aten::layer_norm(%input.9, %127, %self.decoder.layers.0.attn_ln.weight.1, %370, %129, %130), scope: __module.decoder/__module.decoder.layers.0/__module.decoder.layers.0.attn_ln
...
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Running layernorm after self-attention in FP16 may cause overflow. Exporting the model to the latest available ONNX opset (later than opset 17) to use the INormalizationLayer, or forcing layernorm layers to run in FP32 precision can help with preserving accuracy.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT encountered issues when converting weights between types and that could affect accuracy.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Check verbose logs for the list of affected weights.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - - 2 weights are affected by this issue: Detected FP32 infinity values and converted them to corresponding FP16 infinity.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - - 27 weights are affected by this issue: Detected subnormal FP16 values.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - - 3 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
```
I checked dtype of the mentioned weights in the trace that I pass to `torch_tensorrt.compile` and they are correctly in fp32, even though the warnings state the opposite.

The warning suggets two solutions (use INormalizationLayer or force FP32 precisions) but I have no idea ho to achieve it.
This might be a related: https://github.com/pytorch/TensorRT/pull/2509 (or https://github.com/NVIDIA/TensorRT/issues/3101)

Any ideas how to resolve or debug this issue?

## Environment

- Python 3.11.8
- torch 2.2.1
- torch_tensorrt 2.2.0
- a100

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

❓ [Question] Running LayerNorm in fp16 #2730

❓ Question

What you have already tried

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

❓ [Question] Running LayerNorm in fp16 #2730

Description

❓ Question

What you have already tried

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions