Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[Scala] Training differences between macOS and Ubuntu? #12553

Closed
mariussoutier opened this issue Sep 13, 2018 · 12 comments
Closed

[Scala] Training differences between macOS and Ubuntu? #12553

mariussoutier opened this issue Sep 13, 2018 · 12 comments

Comments

@mariussoutier
Copy link

Tested this with both MXNet 1.2.1 and 1.3.0 Staging. I have the identical code and dataset to train a MLP and a CNN on image data. On my Mac (MBP Late 2013) it converges easily within 5-10 epochs to an accuracy of 80% using learning rate of 0.00001. On my Dell laptop, using Ubuntu 18 and both with and without GPU, it essentially doesn't converge at all (accuracy around 1,6%).

How is this possible?

@kalyc
Copy link
Contributor

kalyc commented Sep 13, 2018

Thanks for submitting this issue @mariussoutier
@mxnet-label-bot[Scala, OSX, Ubuntu]

@lanking520
Copy link
Member

Hi @mariussoutier thanks for your issue, could you please provide a minimum reproducible code? This looks weird to me too.

@gigasquid
Copy link
Member

gigasquid commented Sep 13, 2018

@mariussoutier A Clojure user had a similar problem. Maybe something in this issue can diagnose (The Clojure package has since joined the main project) gigasquid/clojure-mxnet#5

A final solution wasn't found for his laptop but he could run on the 18.04 server

@mariussoutier
Copy link
Author

Interesting. I've also noticed that MXNet-CPU is slower on my Ubuntu laptop than on my MacBook. The MacBook is from 2013 and the Dell from 2017, so has newer CPU, twice the RAM, and way faster SSD.

I just don't know where I should investigate, I'm pretty new to Ubuntu and it already took me a day to set this all up. Would building MXNet from source on the laptop help?

@gigasquid
Copy link
Member

gigasquid commented Sep 13, 2018

I would try using the Scala jars and comparing your dependencies against these Clojure docker files
for 18.04
https://hub.docker.com/r/magnetcoop/mxnet-clj-cpu/
https://hub.docker.com/r/magnetcoop/mxnet-clj-gpu/

The Dockerfiles in this project's ci are 16.04 so might not be as relevant to you

@gigasquid
Copy link
Member

gigasquid commented Sep 15, 2018

I found this and thought it might be helpful https://mc.ai/install-mxnet-on-ubuntu-18-04/
especially the part about gcc7 vs gcc6

@mariussoutier
Copy link
Author

I thought it was just the Scala API that was problematic. pip install mxnet-cu90mkl installs fine, I have to rewrite my code in Python to verify this assumption.

@lanking520
Copy link
Member

lanking520 commented Sep 15, 2018

@mariussoutier maybe this can be helpful: #11303. We will try to bring instruction on 18.04 since you are not the only one who asked for this... About the performance issue, could you please provide some code that can reproduce it? I will test to see what the issues came from

@mariussoutier
Copy link
Author

@lanking520 Ah thanks, then I'll stop trying to compile it on Ubuntu. About the training performance, I'm seeing this with the MLP from the tutorials.

@piyushghai
Copy link
Contributor

@mariussoutier Are you seeing differences in Python API v/s Scala API as well in terms of training ?

@mariussoutier
Copy link
Author

@piyushghai I gave up trying to train in Scala, am only using it for inference now.

@lanking520
Copy link
Member

@mariussoutier Currently we do support 18.04 now since we successfully get it static linked. Please feel free to try it again. Close this issue for now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants