Skip to content

[usability] Accelerate Support #936

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 57 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
beba6ef
[usability] accelerate support initial commit
wheresmyhair Feb 22, 2025
ef083b6
[usability] accelerate support - scripts update
wheresmyhair Feb 27, 2025
f7ab42b
change ruff target ver
wheresmyhair Mar 4, 2025
19586d1
separate lisa and custom optims
wheresmyhair Mar 4, 2025
e12e25a
add decorator for base methods
wheresmyhair Mar 4, 2025
edc9236
encoder_decoder model temporarily disabled
wheresmyhair Mar 4, 2025
05e26a9
future support for trainer
wheresmyhair Mar 4, 2025
593f44b
fix fsdp cfg warp policy
wheresmyhair Mar 4, 2025
16eefe0
add note for single gpu
wheresmyhair Mar 4, 2025
29f10d8
add accelerate singlegpu config
wheresmyhair Mar 4, 2025
8194d36
fix qlora script
wheresmyhair Mar 6, 2025
0faa453
[usability] simplify shell scripts
wheresmyhair Mar 6, 2025
43ff307
[usability] add hint for flash_attn
wheresmyhair Mar 6, 2025
0ba9b60
[fix] fix create_customized_optimizer arg
wheresmyhair Mar 6, 2025
16d7af6
[usability] simplify dpov2 script
wheresmyhair Mar 6, 2025
c78ac26
change fsdp default wrap policy to transformer based
wheresmyhair Mar 11, 2025
e7712f1
streamlining
wheresmyhair Mar 11, 2025
71c6664
[fix] qlora+fsdp
wheresmyhair Mar 11, 2025
cdc6702
[usability] scripts default log dir update
wheresmyhair Mar 11, 2025
9d9ea68
temporarily remove raft from autopipeline test
wheresmyhair Mar 12, 2025
21e7156
fix broken test
wheresmyhair Mar 12, 2025
b65b80e
[fix] add condition for tests
wheresmyhair Mar 12, 2025
496eca5
[dev] add test for finetuner
wheresmyhair Mar 12, 2025
54e43fe
[usability] `lora_rank` args are now consistent with peft package `r`
wheresmyhair Mar 14, 2025
fb0e7de
[dev] add tests for finetuner
wheresmyhair Mar 14, 2025
afccd0b
[usability] accelerate support initial commit
wheresmyhair Feb 22, 2025
92e9182
[usability] accelerate support - scripts update
wheresmyhair Feb 27, 2025
ea6dda7
change ruff target ver
wheresmyhair Mar 4, 2025
e11b429
separate lisa and custom optims
wheresmyhair Mar 4, 2025
f68e6e5
add decorator for base methods
wheresmyhair Mar 4, 2025
d28af83
encoder_decoder model temporarily disabled
wheresmyhair Mar 4, 2025
0bff86a
future support for trainer
wheresmyhair Mar 4, 2025
4b6b9ef
fix fsdp cfg warp policy
wheresmyhair Mar 4, 2025
03f8999
add note for single gpu
wheresmyhair Mar 4, 2025
e71c93c
add accelerate singlegpu config
wheresmyhair Mar 4, 2025
c824761
fix qlora script
wheresmyhair Mar 6, 2025
4293034
[usability] simplify shell scripts
wheresmyhair Mar 6, 2025
02dc4ef
[usability] add hint for flash_attn
wheresmyhair Mar 6, 2025
7a55591
[fix] fix create_customized_optimizer arg
wheresmyhair Mar 6, 2025
4331f46
[usability] simplify dpov2 script
wheresmyhair Mar 6, 2025
d560851
change fsdp default wrap policy to transformer based
wheresmyhair Mar 11, 2025
13c013f
streamlining
wheresmyhair Mar 11, 2025
83f7ad8
[fix] qlora+fsdp
wheresmyhair Mar 11, 2025
79b32e6
[usability] scripts default log dir update
wheresmyhair Mar 11, 2025
8d7b4ef
temporarily remove raft from autopipeline test
wheresmyhair Mar 12, 2025
c8c71f2
fix broken test
wheresmyhair Mar 12, 2025
f49672e
[fix] add condition for tests
wheresmyhair Mar 12, 2025
6f4740f
[dev] add test for finetuner
wheresmyhair Mar 12, 2025
82e0467
[usability] `lora_rank` args are now consistent with peft package `r`
wheresmyhair Mar 14, 2025
a80d291
[dev] add tests for finetuner
wheresmyhair Mar 14, 2025
324ca09
Merge branch 'lmflow-nightly' of https://github.com/OptimalScale/LMFl…
wheresmyhair Mar 18, 2025
d7b2f55
[doc] add diagram for finetuner
wheresmyhair Mar 28, 2025
ce21c23
Merge branch 'main' into lmflow-nightly
wheresmyhair Apr 16, 2025
19c9ef5
qwen3 gemma3 support and estimated gpu mem
wheresmyhair May 13, 2025
402ea07
support load from lora checkpoint
wheresmyhair May 13, 2025
956c0fd
[feature] support count number of tokens
wheresmyhair May 13, 2025
c3f1524
[doc] readme update
wheresmyhair May 15, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ log/
regression_test/*/new_output_models
regression_test/*/new_log
output_dir/
tests_out

# data files
data/
Expand Down
26 changes: 10 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,6 @@ An extensible, convenient, and efficient toolbox for finetuning large machine le
- [LMFlow](#lmflow)
- [Latest News](#latest-news)
- [Table of Contents](#table-of-contents)
- [Supported Models](#supported-models)
- [Quick Start](#quick-start)
- [Setup](#setup)
- [Prepare Dataset](#prepare-dataset)
Expand All @@ -85,21 +84,6 @@ An extensible, convenient, and efficient toolbox for finetuning large machine le
- [License](#license)
- [Citation](#citation)

## Supported Models

See all conversation template details [here](https://optimalscale.github.io/LMFlow/examples/supported_conversation_template.html).

| Model | Conversation Template |
| :---: | :-------------------: |
| DeepSeek | `deepseek` <br> `deepseek_v2` <br> `deepseek_r1` <br> `deepseek_r1_distill` <br> `deepseek_v3` |
| Gemma | `gemma` |
| Hymba | `hymba` |
| InternLM2 | `internlm2` |
| LLaMA | `llama2` <br> `llama3` <br> `llama3_for_tool`|
| Phi | `phi3` |
| Qwen | `qwen2` <br> `qwen2_for_tool` <br> `qwen2_5` <br> `qwen2_5_1m` <br> `qwen2_5_math` <br> `qwen_qwq` |
| Yi | `yi` <br> `yi1_5` |
| Zephyr | `zephyr` |

## Quick Start

Expand Down Expand Up @@ -162,6 +146,16 @@ Please refer to our [doc](https://optimalscale.github.io/LMFlow/examples/DATASET

### Finetuning

#### Estimated Hardware Requirement

| Method | 0.5B | 3B | 7B | 14B | 30B | 70B | `x`B |
| ---------------------- | ---- | ---- | ---- | ----- | ----- | ----- | ------- |
| Full `bf16`/`fp16` | 9GB | 55GB |120GB | 240GB | 600GB | 1200GB| `18x`GB |
| LoRA | 1GB | 6GB | 16GB | 32GB | 64GB | 160GB | `2x`GB |
| QLoRA `quant_bit=8` | 0.7GB| 3GB | 10GB | 20GB | 40GB | 80GB| `x`GB |
| QLoRA `quant_bit=4` | 0.4GB| 1.5GB| 6GB | 12GB | 24GB | 48GB| `x/2`GB |


#### Full Finetuning

Full training updates all the parameters to finetune a language model.
Expand Down
29 changes: 29 additions & 0 deletions configs/accelerate_fsdp_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
compute_environment: LOCAL_MACHINE
debug: false
distributed_type: FSDP

fsdp_config:
fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
fsdp_min_num_params: 1000000
fsdp_backward_prefetch: BACKWARD_PRE
fsdp_forward_prefetch: false
fsdp_cpu_ram_efficient_loading: true
fsdp_offload_params: false
fsdp_sharding_strategy: FULL_SHARD
fsdp_state_dict_type: FULL_STATE_DICT
fsdp_sync_module_states: true
fsdp_use_orig_params: true

downcast_bf16: true
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 8 # NOTE: distributed_type should be `NO` if you're training on a single GPU
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
main_process_port: 1204
29 changes: 29 additions & 0 deletions configs/accelerate_singlegpu_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
compute_environment: LOCAL_MACHINE
debug: false
distributed_type: 'NO'

fsdp_config:
fsdp_auto_wrap_policy: SIZE_BASED_WRAP
fsdp_min_num_params: 1000000
fsdp_backward_prefetch: BACKWARD_PRE
fsdp_forward_prefetch: false
fsdp_cpu_ram_efficient_loading: true
fsdp_offload_params: false
fsdp_sharding_strategy: 'NO_SHARD'
fsdp_state_dict_type: FULL_STATE_DICT
fsdp_sync_module_states: true
fsdp_use_orig_params: true

downcast_bf16: true
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
main_process_port: 1204
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
1 change: 0 additions & 1 deletion configs/iterative_dpo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ preprocessing_num_workers: 16
output_dir: ./output_models/iterative_dpo
run_name: iterative_dpo
random_seed: 42
use_accelerator: True
enable_distributed_inference: True
distributed_inference_num_instances: 8
initial_iter_idx: 0 # 0 refers to the first dataset in dataset_path_list
Expand Down
2 changes: 1 addition & 1 deletion contrib/rlhflow/run_reward_modeling.sh
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ deepspeed ${deepspeed_args} \
--block_size 512 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1\
--deepspeed configs/ds_config_zero2.json \
--deepspeed configs/archive/ds_config_zero2.json \
--bf16 \
--run_name rm_test \
--validation_split_percentage 10 \
Expand Down
2 changes: 1 addition & 1 deletion contrib/tool-finetune/run_function_call_finetune.sh
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ deepspeed ${deepspeed_args} \
--disable_group_texts 1 \
--block_size 1024 \
--per_device_train_batch_size 1 \
--deepspeed configs/ds_config_zero3.json \
--deepspeed configs/archive/ds_config_zero3.json \
--fp16 \
--run_name finetune \
--validation_split_percentage 0 \
Expand Down
21 changes: 0 additions & 21 deletions docker/Dockerfile

This file was deleted.

74 changes: 0 additions & 74 deletions docker/README.md

This file was deleted.

62 changes: 62 additions & 0 deletions docs/dev_notes/finetuning.mmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
sequenceDiagram
participant User
participant Finetuner as LMFlow Finetuner
participant Model as LMFlow Model
participant Dataset as LMFlow Dataset
participant Trainer as Trainer

User->>Finetuner: tune(model, dataset)

%% Tokenization
Finetuner->>Model: tokenize(dataset)
Model->>Dataset: Apply tokenization to dataset

alt if not disable_group_texts
Finetuner->>Finetuner: group_text(tokenized_dataset, model_max_length)
end

%% Prepare for training
Finetuner->>Finetuner: Prepare dataset for trainer

%% Create appropriate trainer based on configuration
alt if model_args.use_lora
Finetuner->>Finetuner: Initialize PeftTrainer
else
Finetuner->>Finetuner: Initialize standard Trainer
end

alt if training_args.use_customized_optim
Finetuner->>Finetuner: create_customized_optimizer()
end

alt if training_args.use_lisa
Finetuner->>Finetuner: Create DynamicLayerActivationCallback
end

%% Start training
Finetuner->>Trainer: train(resume_from_checkpoint)

%% Training loop (simplified)
loop Training iterations (Trainer._inner_training_loop simplified)
Trainer->>Model: Forward pass
Model-->>Trainer: Return predictions
Trainer->>Trainer: Compute loss
Trainer->>Model: Backward pass
Model->>Model: Compute Gradient
Trainer->>Trainer: Optimizer step
end

%% Save the model
alt if not model_args.use_lora
Trainer->>Trainer: save_model()
else
alt if model_args.save_aggregated_lora
Finetuner->>Model: merge_lora_weights()
end
Finetuner->>Model: save(output_dir, save_aggregated_lora)
end

%% Finish and return
Trainer-->>Finetuner: Return train result
Finetuner->>Finetuner: Log metrics
Finetuner-->>User: Return fine-tuned model
2 changes: 1 addition & 1 deletion examples/benchmarking.py
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,7 @@ def main():
dataset_name = benchmarking_args.dataset_name
# metric = pipeline_args.metric
if is_lmflow_local_benchmarking(dataset_name): # TODO (@Jipeng)
model = AutoModel.get_model(model_args, tune_strategy='none', ds_config=ds_config)
model = AutoModel.get_model(model_args, do_train=False, ds_config=ds_config)
run_lmflow_local_benchmarking(dataset_name,pipeline_name,model_args,pipeline_args,model) # Pass args TODO (@Jipeng)
elif is_lm_evaluation_benchmarking(dataset_name):
model = model_args.model_name_or_path
Expand Down
3 changes: 1 addition & 2 deletions examples/chatbot.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,10 +64,9 @@ def main():

model = AutoModel.get_model(
model_args,
tune_strategy='none',
do_train=False,
ds_config=ds_config,
device=pipeline_args.device,
use_accelerator=True,
)

# We don't need input data, we will read interactively from stdin
Expand Down
2 changes: 1 addition & 1 deletion examples/chatbot_gradio.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ class ChatbotArguments:

model = AutoModel.get_model(
model_args,
tune_strategy='none',
do_train=False,
ds_config=ds_config,
device=pipeline_args.device,
torch_dtype=torch.float16
Expand Down
6 changes: 3 additions & 3 deletions examples/detail_memory.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@

LISA = True if sys.argv[3] == "1" else False
LORA = True if sys.argv[4] == "1" else False
lora_rank = int(sys.argv[5])
lora_r = int(sys.argv[5])
# Check if the model name is provided as a command-line argument
if len(sys.argv) < 6:
print("Usage: python script_name.py <model_name>")
Expand All @@ -26,7 +26,7 @@
print("token_length : ", sys.argv[2])
print("LISA : ", LISA)
print("LORA : ", LORA)
print("lora_rank : ", lora_rank)
print("lora_r : ", lora_r)
# Model initialization
model_name = sys.argv[1]
token_length = sys.argv[2]
Expand All @@ -48,7 +48,7 @@
peft_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
inference_mode=False,
r=lora_rank,
r=lora_r,
lora_alpha=32,
lora_dropout=0.1,
target_modules=[ "q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", "embed_tokens", "lm_head"],
Expand Down
11 changes: 0 additions & 11 deletions examples/ds_config.json

This file was deleted.

3 changes: 1 addition & 2 deletions examples/evaluation.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,8 @@

model = AutoModel.get_model(
model_args,
tune_strategy='none',
do_train=False,
ds_config=ds_config,
use_accelerator=pipeline_args.use_accelerator_for_evaluator
)
dataset = Dataset(data_args)

Expand Down
2 changes: 1 addition & 1 deletion examples/finetune_multi_modal.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ def main():
# do not resiger deepspeed in the model.
# with_deepspeed flag may be removed
# by modifying the tune strategy in the future.
model = AutoModel.get_model(model_args, tune_strategy='none',
model = AutoModel.get_model(model_args, do_train=True,
ds_config=pipeline_args.deepspeed,
custom_model=True,
with_deepspeed=False,
Expand Down
Loading