Open-source tool to compute the Hessian of the Perplexity function for Large Language Models (LLMs) using PyTorch autograd
Technical Report on arXiv
This repository provides an accurate and efficient implementation for computing the Hessian of the Perplexity function in LLMs such as OPT-125M using PyTorch's native autograd engine. Results include full Hessian matrices and their diagonals across different layers and configurations.
If you find our work helpful, please cite us:
@article{ilin2025hessian,
title={Hessian of Perplexity for Large Language Models by PyTorch autograd (Open Source)},
author={Ilin, Ivan},
journal={arXiv preprint arXiv:2504.04520},
year={2025}
}
This repository is compatible with:
- 🧠 OPT models (e.g.
facebook/opt-125m
) - 🐑 LLaMA 2/3/4 models (e.g.
meta-llama/Llama-3.2-1B
) - 🐣 TinyLlama (e.g.
TinyLlama/TinyLlama-1.1B-Chat-v1.0
)
These models are supported via the Hugging Face Transformers interface.
Left: Hessian for
Right: Hessian for all 6 linear layers in block 0 — 300 params each.
Left: Hessian for q_proj across 12 blocks — 150 params each.
Right: Hessian for all layers in 12 blocks — 25 params each × 6 layers/block.
Saved as PyTorch tensors:
- hessian_q_proj_t_768.pt
- hessian_q_proj_all_blocks_t_150.pt
- hessian_all_layers_first_block_t_300.pt
- hessian_all_layers_all_blocks_t_25.pt
- hessian_diag_q_proj_vhp_samples_5000.pt
Experiments with varying batch size
Varying number of VHP samples
- Python version: 3.12.4 🐍
pip install -r requirements.txt
📦 Click here for full Python 3.12.4 installation guide
Argument | Description |
---|---|
--model |
Hugging Face model identifier. |
--layer_name |
Name of the linear layer to evaluate. |
--t |
Number of parameters to consider per layer. |
--block_index |
Index of a single block (used in some scripts). |
--num_blocks |
Number of blocks to include. |
--num_layers |
Number of linear layers per block. |
--b |
Total number of samples for perplexity. |
--vhp_samples |
VHP samples for Hessian diagonal estimation. |
--model_input_bs |
Number of samples per batch. |
--seqlen |
The sequence length for the model. 2048 by default. |
--cache_dir |
Where to load/store weights. Default: llm_weights . |
--seed |
Random seed. |
💡 Tips:
- Use larger
--model_input_bs
or--seqlen
on GPUs with more memory to speed up runtime.- Higher
--b
$\cdot$ --seqlen
and--vhp_samples
give more accurate results, but increase compute time.
Note
Please note that after running any scripts, a .pt
Hessian tensor and a .pdf
heatmap of the Hessian will be saved in the /data
folder.
python src/single_layer_single_block.py \
--model meta-llama/Llama-3.2-1B \
--layer_name self_attn.q_proj \
--block_index 0 \
--t 5 \
--b 30 \
--model_input_bs 1 \
--seed 0 \
--cache_dir llm_weights
python src/single_layer_several_blocks.py \
--model meta-llama/Llama-3.2-1B \
--layer_name self_attn.q_proj \
--t 5 \
--num_blocks 3 \
--b 30 \
--model_input_bs 1 \
--seed 0 \
--cache_dir llm_weights
python src/several_layers_several_blocks.py \
--model meta-llama/Llama-3.2-1B \
--t 5 \
--num_layers 3 \
--num_blocks 3 \
--b 30 \
--model_input_bs 1 \
--seed 0 \
--cache_dir llm_weights
python src/hessian_diag_single_layer.py \
--model meta-llama/Llama-3.2-1B \
--layer_name self_attn.q_proj \
--vhp_samples 10 \
--block_index 0 \
--b 30 \
--model_input_bs 1 \
--seed 0 \
--cache_dir llm_weights
Warning
Please try facebook/opt-125m
for --model
parameter instead of larger Llama models if your computations are too slow, or you do not have enough GPU memory.
Note
If you want to consider your custom subset of parameters (for example a random subset or custom_forward(self, inpt)
method, where you define how the desired subset of parameters should form a full weight metrix.
MIT License. See LICENSE for details.
We welcome issues, feature requests, and contributions! Feel free to open an issue or a pull request.