I think the docs are insufficient. See my questions here: Using Transformers with DistributedDataParallel — any examples?
brando
3
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Single Node Multi GPU FlanT5 fine-tuning using HF Dataset and HF Trainer | 4 | 2132 | July 5, 2023 | |
| Training using multiple GPUs | 20 | 20420 | February 25, 2024 | |
| Multi gpu training | 3 | 6110 | April 24, 2022 | |
| How to run an end to end example of distributed data parallel with hugging face's trainer api (ideally on a single node multiple gpus)? | 17 | 18538 | September 6, 2023 | |
| Boilerplate for Trainer using torch.distributed | 4 | 2125 | January 11, 2022 |