Movie Facts and Fibs (MF²): A Benchmark for Long Movie Understanding

Download the MF2 Dataset

To download the MF2 dataset into the data/ folder, follow these steps:

Make sure Git LFS is installed. Then run:

git lfs install 
(cd data && git clone https://huggingface.co/datasets/sardinelab/MF2)

Setup Instructions

1. Create and Activate a Virtual Environment (Optional but Recommended)

It is recommended to use a virtual environment to manage dependencies. You can create one using venv:

python -m venv mf2-env
source mf2-env/bin/activate

2. Install Dependencies

pip install -r requirements.txt

3. FFmpeg

Make sure ffmpeg is installed and the module is loaded. If your system does not use modules, you can install FFmpeg via a package manager.

sudo apt install ffmpeg
module load ffmpeg

Example of Evaluating Models

1. Running an Open-Weight Model

To run inference with an open-weight model, use the following command:

bash run_open_inference.sh <model> <output_dir> <modality> <prompt_template>

Arguments:

<model>: Name of the model as defined in models/registry.py.
<output_dir>: Directory path where results will be saved.
<modality>: Input type to be received during evaluation, with options:
- video_only
- transcripts_only
- video_and_transcripts
- video_and_synopsis
- video_transcripts_and_synopsis
- statement_only
- synopsis_only
<prompt_template>: Prompt style to be used during evaluation, with options:
- explanation
- explanation_free
- direct
- direct_free — recommended for best results with open-weight models.

2. Running a Closed Model

To run inference with a closed model, use the following command:

bash run_closed_inference.sh <model> <output_dir> <modality> <prompt_template> <api_key>

Arguments:

<model>: e.g. gpt-4o or gemini-2.5-pro-preview-03-25.
<output_dir>: Directory path where results will be saved.
<modality>: Input type to be received during evaluation, with options:
- video_only
- transcripts_only
- video_and_transcripts
- video_and_synopsis
- video_transcripts_and_synopsis
- statement_only
- synopsis_only
<prompt_template>: Prompt style to be used during evaluation, with options:
- explanation
- explanation_free — recommended for best results with closed models.
- direct
- direct_free
<api_key>: the key to access the closed model.

3. Parsing Model Outputs

After running the model, parse the outputs using the following command:

python parse_model_outputs.py --output_dir <output_dir> --strategy <strategy>

Arguments:

<output_dir>: Directory containing the results from running the model.
<strategy>: Parsing strategy to apply, with options:
- strict
- first-occurrence
- last-occurrence
- strict-w-fallback-first-occurrence

Notes on Strategy Usage:

For open-weight models, first-occurrence and last-occurrence can be pretty much interchangeably used, as most models will output just the answer due to the selection of the direct_free prompt.
For closed models, last-occurrence is recommended since the output includes an explanation first before the final result, as the prompt used is explanation_free.

Contact

For questions or support, contact [email protected].

License

CC-BY-NC-SA 4.0 license. This dataset is provided for non-commercial use only.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
models		models
templates		templates
README.md		README.md
parse_model_outputs.py		parse_model_outputs.py
requirements.txt		requirements.txt
run_closed_inference.sh		run_closed_inference.sh
run_gemini_2.5.py		run_gemini_2.5.py
run_gpt4o.py		run_gpt4o.py
run_open_inference.sh		run_open_inference.sh
run_open_vlm.py		run_open_vlm.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Movie Facts and Fibs (MF²): A Benchmark for Long Movie Understanding

Download the MF2 Dataset

Setup Instructions

1. Create and Activate a Virtual Environment (Optional but Recommended)

2. Install Dependencies

3. FFmpeg

Example of Evaluating Models

1. Running an Open-Weight Model

Arguments:

2. Running a Closed Model

Arguments:

3. Parsing Model Outputs

Arguments:

Notes on Strategy Usage:

Contact

License

About

Releases

Packages

Languages

deep-spin/MF2

Folders and files

Latest commit

History

Repository files navigation

Movie Facts and Fibs (MF²): A Benchmark for Long Movie Understanding

Download the MF2 Dataset

Setup Instructions

1. Create and Activate a Virtual Environment (Optional but Recommended)

2. Install Dependencies

3. FFmpeg

Example of Evaluating Models

1. Running an Open-Weight Model

Arguments:

2. Running a Closed Model

Arguments:

3. Parsing Model Outputs

Arguments:

Notes on Strategy Usage:

Contact

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages