To download the MF2 dataset into the data/
folder, follow these steps:
-
Make sure Git LFS is installed. Then run:
git lfs install (cd data && git clone https://huggingface.co/datasets/sardinelab/MF2)
It is recommended to use a virtual environment to manage dependencies. You can create one using venv
:
python -m venv mf2-env
source mf2-env/bin/activate
pip install -r requirements.txt
Make sure ffmpeg is installed and the module is loaded. If your system does not use modules, you can install FFmpeg via a package manager.
sudo apt install ffmpeg
module load ffmpeg
To run inference with an open-weight model, use the following command:
bash run_open_inference.sh <model> <output_dir> <modality> <prompt_template>
<model>
: Name of the model as defined inmodels/registry.py
.<output_dir>
: Directory path where results will be saved.<modality>
: Input type to be received during evaluation, with options:video_only
transcripts_only
video_and_transcripts
video_and_synopsis
video_transcripts_and_synopsis
statement_only
synopsis_only
<prompt_template>
: Prompt style to be used during evaluation, with options:explanation
explanation_free
direct
direct_free
— recommended for best results with open-weight models.
To run inference with a closed model, use the following command:
bash run_closed_inference.sh <model> <output_dir> <modality> <prompt_template> <api_key>
<model>
: e.g. gpt-4o or gemini-2.5-pro-preview-03-25.<output_dir>
: Directory path where results will be saved.<modality>
: Input type to be received during evaluation, with options:video_only
transcripts_only
video_and_transcripts
video_and_synopsis
video_transcripts_and_synopsis
statement_only
synopsis_only
<prompt_template>
: Prompt style to be used during evaluation, with options:explanation
explanation_free
— recommended for best results with closed models.direct
direct_free
<api_key>
: the key to access the closed model.
After running the model, parse the outputs using the following command:
python parse_model_outputs.py --output_dir <output_dir> --strategy <strategy>
<output_dir>
: Directory containing the results from running the model.<strategy>
: Parsing strategy to apply, with options:strict
first-occurrence
last-occurrence
strict-w-fallback-first-occurrence
-
For open-weight models,
first-occurrence
andlast-occurrence
can be pretty much interchangeably used, as most models will output just the answer due to the selection of thedirect_free
prompt. -
For closed models,
last-occurrence
is recommended since the output includes an explanation first before the final result, as the prompt used isexplanation_free
.
For questions or support, contact [email protected].
CC-BY-NC-SA 4.0 license. This dataset is provided for non-commercial use only.