Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nn.dataparallel has issue for mac (mps device) #2601

Open
rnb007 opened this issue Jun 1, 2024 · 0 comments
Open

nn.dataparallel has issue for mac (mps device) #2601

rnb007 opened this issue Jun 1, 2024 · 0 comments

Comments

@rnb007
Copy link

rnb007 commented Jun 1, 2024

This is the error i get when I use the below function

def try_all_gpus(): #@save
"""Return all available GPUs, or [cpu(),] if no GPU exists."""
devices = [torch.device(f'cuda:{i}')
for i in range(torch.cuda.device_count())]
return devices if devices else [torch.device('cpu')]

trainer = torch.optim.Adam(net.parameters(), lr=lr)
3 loss = nn.CrossEntropyLoss(reduction="none")
----> 4 d2l.train_ch13(net, train_iter, test_iter, loss, trainer, num_epochs)

File ~/anaconda3/envs/dl_env/lib/python3.10/site-packages/d2l/torch.py:1507, in train_ch13(net, train_iter, test_iter, loss, trainer, num_epochs, devices)
1504 timer, num_batches = d2l.Timer(), len(train_iter)
1505 animator = d2l.Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0, 1],
1506 legend=['train loss', 'train acc', 'test acc'])
-> 1507 net = nn.DataParallel(net, device_ids=devices).to(devices[0])
1508 for epoch in range(num_epochs):
1509 # Sum of training loss, sum of training accuracy, no. of examples,
1510 # no. of predictions
1511 metric = d2l.Accumulator(4)

IndexError: list index out of range

This happens as mac does not have cuda support.

by tweaking the above function by just changing cpu to mps, kernel always dies
def try_all_gpus(): #@save
"""Return all available GPUs, or [cpu(),] if no GPU exists."""
devices = [torch.device(f'cuda:{i}')
for i in range(torch.cuda.device_count())]
return devices if devices else [torch.device('mps')]

How can i run the nn.dataparallel or for that matter chapter 16 d2l.train_ch13 function from 16.2 section ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant