home |
copyright ©2016, [email protected]
overview |
syllabus |
src |
submit |
chat
30 marks
8,8,8 for generating the reports for hoemworks 7,8,9
6 marks for a presentation on last homework to the class
For each of the following tasks, create a sub-directory called CODE7, CODE8, CODE9 that includes:
- All related python code
- A README.md file containing your report.
- Sub-directories CODEx/img for images; CODEx/data for data.
Using some URL shortener (e.g. goo.gl), shorten the URL to hw/code/x
and paste into the usual submission page.
For each of the following, write a Markdown readme.md file (1500 words+) Note that there will be some overlap in the content of these reports. Feel free to "borrow" text from your earlier reports for your later reports-- ideally, improving the borrowed bits each time.
Your report must
- Explain all algorithms it discussed (so I know that you know what is going on in them)
- Succinctly describe the results. Not 1000 tables, but key insights relating to a small number of key displayed
results
-
No figure should be used unless it is discussed in the text;
- No text should claim a result unless there is an associated future.
Your report should be a technical report. For a list of sections of those reports, plus some tips, see
Two key parts of the reports are "threats to validity" and "future work". For notes on those sections, please see:
- threats to validity
- future work. Note that the better your future work section, the better your mark. In future work, we really can see what your learned from the experience of this work-- what strange quirks you have observered and what insights those quirks may bring.
-
Can you learn when enough is enough.
If the code runs in "eras" of, say, 100 evals per era, how to test in era X that there has been no future improvement expected?
- Type1: just compare median performance scores (design challenge: what performance score to use?)
- Type2: Use effect size tests to just if the performance scores are not changing (design challenge: what threshold should you use for a "small effect"?)
- Type2: Use effect size + hypothesis test (bootstrap) to judge no improvement (design challenge: how many bootstraps to use?)
For DE and MWS and SA, code up the Type1,Type2, Type3 comparison operators and use them to:
-
Find the final era computed by DE, MWS, SA (with early termination)
-
Computer the cdom loss numbers between era0 the final era
- Important implementation note: repeat the above with 20 different baseline populations. For each baseline, run DE,MWS,SA.
Apply the above for DTLZ7 with 2 objectives 10 decisions.
ALso comemnt on the runtime implications (if any) of using bootstrap.
Extend the GA given to you in class such that it implements two variants of the NSGA-II algorithm described in class (coming next week):
- Bdom with cubiod sorting
- Cdom
On
20 repeats of DTLZ1,3,5,7
with 2,4,6,8 objectives and 10,20,40 decisions,
using the stats.py
file given to you in class to comment on the value of cdom vs bdcom+cuboid.
The usual design challenges are left to you (but explain how you addressed them in your report):
- what performance measure?
- what values for cross-over, mutation, pop size, #generations?
Use DE to tune your defaults for GAs.
Create at least three options for mutation, crossover, select, number of candidates, number of generations. See if standard DE (default control settings) can improve on the scores seen in COde9.
Using the statistical machinery discussed in class (Scott-Knott, a12, bootstrap) to decide in any of
tuned or untuned GAs are best for
DTLZ1,3,5,7
with 2,4,6,8 objectives and 10,20,40 decisions. Remember to repeat your runs of tuned vs untuned using the
same baseline populations.
Use the stats.py
file given to test if your tunings are useful.
Also comment on the computation cost of tuning.