Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List datasets #10

Merged
merged 8 commits into from
Jul 19, 2021
Merged

List datasets #10

merged 8 commits into from
Jul 19, 2021

Conversation

jbrea
Copy link
Collaborator

@jbrea jbrea commented Jul 9, 2021

This is just a helper to browse the datasets easily.

@ablaom
Copy link
Member

ablaom commented Jul 13, 2021

@jbrea Thanks for looking into this. Please see my comments re DataFrames as dependency in your other PR #9 which I looked at first.

Does this PR branch off the other?

This is just a helper to browse the datasets easily.

Is this to browse the available data sets at OpenML, or to inspect a particular data set? Could you explain a bit more. Maybe give an example?

@jbrea
Copy link
Collaborator Author

jbrea commented Jul 13, 2021

Thanks for your feedback, @ablaom!

Things should be fixed now. Dependency on DataFrames is dropped and the PR doesn't branch off the other one anymore.
Additionally I added the function describe_dataset. See doc strings.

```
"""
describe_dataset(id) = Text(load_Dataset_Description(id)["data_set_description"]["description"])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of wrapping the string (?) in Text(...), how about applying Markdown.parse(...) to get proper markdown display? You will need to add using Markdown.

Screen Shot 2021-07-14 at 11 02 57 AM

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks. I changed this in the latest commit.

Copy link
Member

@ablaom ablaom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really cool. Thanks for this extension!

@ablaom
Copy link
Member

ablaom commented Jul 13, 2021

Now that you reference it in the "public" docstring for list_datasets, I think the docstring for load_List_And_Filter needs improving (I realize now that you did not write this). For example, I am struggling a bit. I tried

MLJOpenML.list_datasets("/tag/OpenML100/number of instances/<1000")

before realising "number of instances" should have no spaces. If I remove the spaces, I get a different error, a bug perhaps?

julia> MLJOpenML.list_datasets("/tag/OpenML100/numberinstances/<1000")
Error occurred : HTTP.ExceptionRequest.StatusError(400, "GET", "/api/v1/json/data/list//tag/OpenML100/numberinstances/<1000", HTTP.Messages.Response:
"""
HTTP/1.1 400 Bad Request
Date: Tue, 13 Jul 2021 23:38:38 GMT
Server: Apache/2.4.29 (Ubuntu)
Content-Length: 1134
Connection: close
Content-Type: text/html; charset=UTF-8

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Error</title>
<style type="text/css">

::selection { background-color: #E13300; color: white; }
::-moz-selection { background-color: #E13300; color: white; }

body {
        background-color: #fff;
        margin: 40px;
        font: 13px/20px normal Helvetica, Arial, sans-serif;
        color: #4F5155;
}

a {
        color: #003399;
        background-color: transparent;
        font-weight: normal;
}

h1 {
        color: #444;
        background-color: transparent;
        border-bottom: 1px solid #D0D0D0;
        font-size: 19px;
        font-weight: normal;
        margin: 0 0 14px 0;
        padding: 14px 15px 10px 15px;
}

code {
        font-family: Consolas, Monaco, Courier New, Courier, monospace;
        font-size: 12px;
        background-color: #f9f9f9;
        border: 1px solid #D0D0D0;
        color: #002166;
        display: block;
        margin: 14px 0 14px 0;
        padding: 12px 10px 12px 10px;
}

#container {
        margin: 10px;
        border: 1px solid #D0D0D0;
        box-shadow: 0 0 8px #D0D0D0;
}

p {
        margin: 12px 15px 12px 15px;
}
</style>
</head>
<body>
        <di

1134-byte body
""")
ERROR: MethodError: no method matching getindex(::Nothing, ::String)
Stacktrace:
 [1] list_datasets(filter::String; api_key::String, output_format::Type{NamedTuple})
   @ MLJOpenML ~/Dropbox/Julia7/MLJ/MLJOpenML/src/openml.jl:280
 [2] list_datasets(filter::String)
   @ MLJOpenML ~/Dropbox/Julia7/MLJ/MLJOpenML/src/openml.jl:279
 [3] top-level scope
   @ REPL[71]:1

Also, what is the meaning of "data/list" in "'/data/list/{filter}/{value}/{filter}/{value}/..." .

@jbrea Would you be willing to give this doc-string some attention?

@jbrea
Copy link
Collaborator Author

jbrea commented Jul 14, 2021

I think that docstring is taken pretty much as is from the API description, which, in turn, I find rather cryptic.
So I'm not sure I can really improve this docstring.
Maybe the best would be to always return the full list and have the user filter the data set using query tools in Julia?
Otherwise we could try to mimic the python API.

@codecov-commenter
Copy link

codecov-commenter commented Jul 14, 2021

Codecov Report

Merging #10 (e0f6ba7) into dev (dd720c8) will decrease coverage by 11.80%.
The diff coverage is 0.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##              dev      #10       +/-   ##
===========================================
- Coverage   54.46%   42.65%   -11.81%     
===========================================
  Files           1        1               
  Lines         112      143       +31     
===========================================
  Hits           61       61               
- Misses         51       82       +31     
Impacted Files Coverage Δ
src/openml.jl 42.65% <0.00%> (-11.81%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dd720c8...e0f6ba7. Read the comment docs.

@ablaom
Copy link
Member

ablaom commented Jul 15, 2021

I think that docstring is taken pretty much as is from the API description, which, in turn, I find rather cryptic.

Ahh. In that case, rather than duplicating poor documentation, let's provide a link to the poor documentation. Ie, have the list_datasets doc-string refer to the API description not the load_List_And_Filterdoc-string .

Maybe add a more detailed example that you know works.

@jbrea
Copy link
Collaborator Author

jbrea commented Jul 17, 2021

OK, this should be ready to be merged.

@ablaom ablaom mentioned this pull request Jul 18, 2021
@ablaom ablaom merged commit e0f6ba7 into JuliaAI:dev Jul 19, 2021
This was referenced Jul 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants