-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Submission for issue #85 #145
base: master
Are you sure you want to change the base?
Conversation
Hi, please find below a review submitted by one of the reviewers: Score: 7 In general, they certify the original paper's results as "viable" and "robust". They communicate that they were able to reproduce the key results of the paper, although they found a few discrepancies with the implementation details provided in the original paper. These don't seem to radically affect any conclusion. Through no fault of their own, but because of difficulties encountered in running the original code and the unavailability of adequate computing resources, the authors admit that report falls short from providing an exhaustive investigation of the validity of the results in the original paper, and cannot guarantee that no error is present in the original results. Nonetheless, I find this report to be extremely useful in testing some of the main, basic findings of the original work, and highlighting the specific practical issues encountered in this effort. Continuing to document these experiences is extremely valuable for the community to better understand and overcome the obstacles that prevent machine learning from enjoying the status of reproducible, scientific field. The authors report that code from the original submission was shared publicly, but because of lack of version dependency specification, the authors of the report were unable to run the original code without errors, and therefore resorted to reimplementing the code themselves. This useful, actionable feedback certainly points to a shortcoming in the original paper's reproducibility status that could easily be addressed by the original authors. The new code is implemented in TensorFlow, which complements the original PyTorch implementation. The authors make useful observations about their practical experience reproducing the original work. For example, they inform us that certain experimental choices, such as the size of the perturbation Delta x in the "Gaussian Balls" experiment, were not documented by the original authors, so they contribute to the reproducibility of the paper by sharing their observation and insight on what reasonable choices for its magnitude might be. I find the willingness of the authors to share the difficulties they encountered in this reproducibility effort and their decision-making process around the technology to use (in the "Machine Learning Stack" section) extremely useful in painting a clear and powerful picture of many researchers' daily struggles with reproducing promising results with limited resources. They explicitly state that the absence of key ingredients for reproducibility (which forced them to have to reimplement the code from scratch -- which does however carry some added benefits, as they also point out) prevented them from being able to carry out the tasks they intended to run and put a damper on their efforts. Among other useful suggestions, they point to containerization as a possible solution to combat the issues that arise from differences in environments, package versions, etc. Perhaps some of the reproducibility and versioning issues could have been solved by engaging with the original authors in a conversation on OpenReview. It is unclear whether this happened in a non-public setting (one of the official reviewers wrote: "I have started a thread which is only visible by the authors, in order to keep the page limited to discussion about the content of the paper rather than technical help"). Prior to that comment, the authors had explicitly posted more information about their setup and library versions in this comment: https://openreview.net/forum?id=SJekyhCctQ¬eId=Hyx_7PKyT7. More in detail, the authors confirm the observation that adding the fingerprint portion of the loss causes the decision boundary to become highly non-linear, as discussed in the original paper. In discussing their results, the authors of the report provide sufficient detail about the experimental setup they use. Although I thank the authors for their transparency in sharing the hyperparameters and experimental choices they made, it would be useful to accompany them with a description of whether these choices were made to match the original implementation or if they represent the authors' desire to test the original method under different conditions. I am unsure how the "BIM" attack in the report maps to the BIM-a and BIM-b methods described in the first paper. For some of the results and plots, the authors invite the reader to check out supplementary material in a jupyter notebook. This contribution reports that the original authors had found reasonable choices for the alpha and beta parameters values in Delta y to be 0.25 and 0.75, and that the proposed method is robust to randomization of the sign of Delta y. The validity of these choices and statements is challenged in this reproducibility work by showing evidence of delayed convergence in one of their experiments using the original choice of parameters. They conclude that a different choice for Delta y could be superior, but no experimental evidence for that is shown in the report. In checking other hyperparameters, such as the choice of magnitude of the epsilon parameter in the "Gaussian Balls" experiment, the authors make clear, insightful suggestions on how to pick a sensible value, and why smaller values might make the optimization process harder. Unavailability of compute justifies the attempt to replicate only a few selected results from the paper, as well as the absence of ablation studies, or extensive hyperparameter sweeps. Algorithm 1 abruptly appears in the paper without reference in the text. The axis in Figure 1 are unlabeled and the values associated with the ticks are very hard to read. Figure 3 is not referenced in the text. Please be consistent with your capitalization and italicization of the term "Neural Fingerprinting". The report contains a few typos that can be easily fixed.
Confidence : 4 |
Hi, please find below a review submitted by one of the reviewers: Score: 6
|
Hi, please find below a review submitted by one of the reviewers: Score: 4 [Code] [Communication with original authors] [Hyperparameter Search] [Discussion on results] [Overall organization and clarity]
This report can be improved in many ways, both in terms of visuals, and in terms of content. |
#85