[QEff Finetune]: Adding steps about how to fine tune on any custom dataset. #381

quic-swatia · 2025-04-28T11:14:42Z

Added steps on how to create the custom_dataset.py to run fine-tuning through QEfficient pipeline on any custom dataset. Also, added a detailed template for the user which covers how to create custom_dataset.py
Added the argument 'context_length' in the existing APIs which helps run fine tuning with padding for custom dataset.

Signed-off-by: Swati Allabadi <[email protected]>

quic-meetkuma

Good work in listing down the detailed steps for custom dataset, Swati! Please check on the comments. :)

quic-meetkuma · 2025-05-12T09:44:36Z

docs/source/finetune.md

+To run fine tuning for any user specific dataset, prepare the dataset using the following steps:
+
+    1) Create a  directory named 'dataset' inside efficient-transformers. 


double space between "a" and "directory"

quic-meetkuma · 2025-05-12T09:46:46Z

docs/source/finetune.md

+
+    def tokenize():
+


Add some comment as "Implement tokenization and prepare inputs for the training."

quic-meetkuma · 2025-05-12T09:49:04Z

docs/source/finetune.md

+    # load dataset
+    # based on split, retrieve only the specific portion of the dataset (train or eval) either here or at the last
+


Add one more comment as "Define a prompt template"

quic-meetkuma · 2025-05-12T09:49:32Z

docs/source/finetune.md

+
+    def apply_prompt_template():
+


Add some comment as "Convert the raw input into format as per the template defined earlier."

quic-meetkuma · 2025-05-12T09:53:16Z

docs/source/finetune.md

+    5) Inside get_custom_dataset(), dataset needs to prepared for fine tuning. So, the user needs to apply prompt and tokenize the dataset accordingly. Please refer the below template on how to define get_custom_dataset().
+    6) For examples, please refer python files present in efficient-transformers/QEfficient/finetune/dataset. In case of Samsum dataset, get_preprocessed_samsum() of efficient-transformers/QEfficient/finetune/dataset/samsum_dataset.py is called. 
+    7) In efficient-transformers/QEfficient/finetune/configs/dataset_config.py, for custom_dataset class, pass the appropriate value for train_split and test_split according to the dataset keys corresponding to train and test data points.


I think this is no longer needed after PR#289. We can directly pass --train_split and --test_split from the CLI.

quic-swatia requested review from quic-rishinr and ochougul as code owners April 28, 2025 11:14

quic-swatia self-assigned this Apr 28, 2025

quic-swatia requested a review from vbaddi April 28, 2025 11:16

quic-rishinr marked this pull request as draft April 28, 2025 16:51

Adding steps about how to fine tune on any custom dataset.

e757ab8

Signed-off-by: Swati Allabadi <[email protected]>

quic-swatia force-pushed the custom_dataset branch from 8485344 to e757ab8 Compare April 29, 2025 09:10

quic-mamta changed the title ~~Adding steps about how to fine tune on any custom dataset.~~ [QEff Finetune]: Adding steps about how to fine tune on any custom dataset. May 8, 2025

quic-swatia marked this pull request as ready for review May 9, 2025 10:39

quic-swatia requested a review from quic-amitraj as a code owner May 9, 2025 10:39

Update finetune.md

bae75d2

Signed-off-by: Swati Allabadi <[email protected]>

quic-rishinr added the fine-tuning label May 12, 2025

quic-swatia requested a review from quic-mamta May 12, 2025 05:35

quic-meetkuma suggested changes May 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QEff Finetune]: Adding steps about how to fine tune on any custom dataset. #381

[QEff Finetune]: Adding steps about how to fine tune on any custom dataset. #381

quic-swatia commented Apr 28, 2025 •

edited

Loading

quic-meetkuma left a comment

quic-meetkuma May 12, 2025

quic-meetkuma May 12, 2025

quic-meetkuma May 12, 2025

quic-meetkuma May 12, 2025

quic-meetkuma May 12, 2025

		To run fine tuning for any user specific dataset, prepare the dataset using the following steps:

		1) Create a directory named 'dataset' inside efficient-transformers.

		# load dataset
		# based on split, retrieve only the specific portion of the dataset (train or eval) either here or at the last

[QEff Finetune]: Adding steps about how to fine tune on any custom dataset. #381

Are you sure you want to change the base?

[QEff Finetune]: Adding steps about how to fine tune on any custom dataset. #381

Conversation

quic-swatia commented Apr 28, 2025 • edited Loading

quic-meetkuma left a comment

Choose a reason for hiding this comment

quic-meetkuma May 12, 2025

Choose a reason for hiding this comment

quic-meetkuma May 12, 2025

Choose a reason for hiding this comment

quic-meetkuma May 12, 2025

Choose a reason for hiding this comment

quic-meetkuma May 12, 2025

Choose a reason for hiding this comment

quic-meetkuma May 12, 2025

Choose a reason for hiding this comment

quic-swatia commented Apr 28, 2025 •

edited

Loading