-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[Scala] Training differences between macOS and Ubuntu? #12553
Comments
Thanks for submitting this issue @mariussoutier |
Hi @mariussoutier thanks for your issue, could you please provide a minimum reproducible code? This looks weird to me too. |
@mariussoutier A Clojure user had a similar problem. Maybe something in this issue can diagnose (The Clojure package has since joined the main project) gigasquid/clojure-mxnet#5 A final solution wasn't found for his laptop but he could run on the 18.04 server |
Interesting. I've also noticed that MXNet-CPU is slower on my Ubuntu laptop than on my MacBook. The MacBook is from 2013 and the Dell from 2017, so has newer CPU, twice the RAM, and way faster SSD. I just don't know where I should investigate, I'm pretty new to Ubuntu and it already took me a day to set this all up. Would building MXNet from source on the laptop help? |
I would try using the Scala jars and comparing your dependencies against these Clojure docker files The Dockerfiles in this project's ci are 16.04 so might not be as relevant to you |
I found this and thought it might be helpful https://mc.ai/install-mxnet-on-ubuntu-18-04/ |
I thought it was just the Scala API that was problematic. |
@mariussoutier maybe this can be helpful: #11303. We will try to bring instruction on 18.04 since you are not the only one who asked for this... About the performance issue, could you please provide some code that can reproduce it? I will test to see what the issues came from |
@lanking520 Ah thanks, then I'll stop trying to compile it on Ubuntu. About the training performance, I'm seeing this with the MLP from the tutorials. |
@mariussoutier Are you seeing differences in Python API v/s Scala API as well in terms of training ? |
@piyushghai I gave up trying to train in Scala, am only using it for inference now. |
@mariussoutier Currently we do support |
Tested this with both MXNet 1.2.1 and 1.3.0 Staging. I have the identical code and dataset to train a MLP and a CNN on image data. On my Mac (MBP Late 2013) it converges easily within 5-10 epochs to an accuracy of 80% using learning rate of
0.00001
. On my Dell laptop, using Ubuntu 18 and both with and without GPU, it essentially doesn't converge at all (accuracy around 1,6%).How is this possible?
The text was updated successfully, but these errors were encountered: