Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New lesson improvement #173

Merged
merged 6 commits into from
Dec 11, 2017

Conversation

pitviper6
Copy link
Contributor

@pitviper6 pitviper6 commented Nov 26, 2017

I've broken up Episode 7 into more granular episodes and have renumbered the episodes accordingly (there are now 13 episodes in the OR lesson)

I've kept the old Using Transformations doc in case we need the original - don't know where we should put it?

I've added a bit more time to the Transformation lesson estimates. It was originally 20 minute teaching, 40 minutes exercises (or checklists). As I've rewritten them, I added 5 minutes to each total.

I've pulled out the lesson on Exports so it stands alone but it needs a checklist or exercise.

I've transformed most of the exercises into checklists for all of the OpenRefine episodes.

I've also created grayed boxes for any text that is a button or menu location for all of the OpenRefine episodes.

@ccronje
Copy link
Contributor

ccronje commented Nov 28, 2017

@pitviper6 thanks for all your work on these issues! Happy to merge.

Copy link
Contributor

@ostephens ostephens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is really good - I love the split of the transformations episode into the finer grained episodes - it works really well.

I've made a number of comments - I've been a bit picky (sorry) - but overall I want to say this is brilliant. Thanks for putting in all this effort.

>5. Ensure the first row is used to create the column headings by checking the box `Parse next 1 line(s) as column headers`
>6. Make sure the `Parse cell text into numbers, dates, ...` box is not checked, so OpenRefine doesn't try to automatically detect numbers
>7. Once you are happy click the `Create Project >>` button at the top right of the screen. This will create the project and open it for you. Projects are saved as you work on them, there is no need to save copies as you go along.
>1. Locate the file which you have downloaded called `doaj-article-sample.csv`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reasoning for changing the steps here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That....is super weird. It looked fine in jekyll before I did the pull request. I'll fix this lesson. It's been a while, what's the best procedure for making changes in a pull request? Do the changes locally and then do a re-pull?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worked on this episode a few months ago but wasn't worried about it reverting back, it's easy enough to update ;-) I've updated a PR before by opening the file in a new tab and editing directly in github which allows you to commit at bottom of page - it then appears in the PR. To do it locally try this https://stackoverflow.com/questions/9790448/how-to-update-a-pull-request-from-forked-repo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitviper6 I did wonder if something had gone askew with the version somewhere. To update the PR you just commit more changes to the branch in your fork, and those commits will automatically appear in this PR.

@ccronje if you think we can revert any changes in this PR that have occurred by accident, then I'm OK with merging this then tidying - I just don't want to miss anything.

### Going Further
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reasoning behind removing the exercise here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't have been removed (see above - things somehow got weird between the last time I checked the layout in jekyll and the pull request). When I fix this I'll re-add the exercise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't even know what I thought I was correcting or changing in lesson #2. Let's not merge that one!

@@ -29,23 +29,23 @@ OpenRefine only displays a limited number of rows of data at one time. You can a
Most options to work with data in OpenRefine are accessed from drop down menus at the top of the data columns. When you select an option in a particular column (e.g. to make a change to the data), it will affect all the cells in that column. If you want to make changes across several columns, you will need to do this one column at a time.

## Rows and Records
OpenRefine has two modes of viewing data 'Rows' and 'Records'. At the moment we are in Rows mode, where each row represents a single record in the data set - in this case, an article. In Records mode, OpenRefine can link together multiple rows as belonging to the same Record.
OpenRefine has two modes of viewing data `Rows` and `Records`. At the moment we are in Rows mode, where each row represents a single record in the data set - in this case, an article. In Records mode, OpenRefine can link together multiple rows as belonging to the same Record.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the back-tick formatting should be used here because this is about the concepts of Row and Record, whereas the back-tick formatting is usually used to indicate some link or function in the UI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - I kept going back and forth with some of these and I don't think I was quite consistent.


### Choosing a good separator

The value that separates multi-valued cells is called a separator or delimiter. Choosing a good
separator is important. In the examples, we've seen the pipe character (\|) has been used.
separator is important. In the examples, we've seen the pipe character ("\|") has been used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need parentheses and inverted commas here, happy with either one or the other

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I like the parens, since there are so many inverted commas and apostrophes lurking around, so I'll change them all to parens across the lesssons.


Choosing the wrong separator can lead to problems. Consider the following multi-valued Author example.
with a pipe as a separator.
```
Jones, Andrew | Davis, S.
```

When we tell OpenRefine to split this cell on the pipe (\|), we will get the following two authors each in their own cell since there is a single pipe character separating them.
When we tell OpenRefine to split this cell on the pipe ("\|"), we will get the following two authors each in their own cell since there is a single pipe character separating them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need parentheses and inverted commas here, happy with either one or the other


The 'Clusters' are created automatically according to an algorithm. OpenRefine supports a number of different clustering algorithms - some experimentation may be required to see which clustering algorithm works best with any particular set of data, and you may find that using different algorithms highlights different clusters.

For more information on the methods used to create Clusters, see [https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth](https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth)

For each cluster, you have the option of 'merging' the values together - that is, replace the various inconsistent values with a single consistent value. By default, OpenRefine uses the most common value in the cluster as the new value, but you can select another value by clicking the value itself, or you can simply type the desired value into the 'New Cell Value' box.
For each cluster, you have the option of 'merging' the values together - that is, replace them with a single consistent value. By default, OpenRefine uses the most common value in the cluster as the new value, but you can select another value by clicking the value itself, or you can simply type the desired value into the 'New Cell Value' box.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence was changed for clarity following tutor feedback, I think we need to leave it as it is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird, I didn't edit any of the language beyond fixing typos and I did do an upstream grab of the repo to my local before editing. It seems like I ended up working with earlier files? Eeep, I hope not.

>* Try changing the clustering method being used - which ones work well?
{: .challenge}

>1. Split out the author names into individual cells using `Edit cells -> Split multi-valued cells`, using the pipe "\|" character as the separator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another mention of the pipe character - we should be consistent in how we present this across all episodes

>4. On the Date column dropdown select ```Edit column->Add column based on this column```. Using this function you can create a new column, while preserving the old column
>5. In the 'New column name' type "Formatted Date"
>6. In the 'Expression' box type the GREL expression ```value.toString("dd MMMM yyyy")```
{: .checklist}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why has the final bit of this exercise been removed? Has it gone somewhere else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - I split this exercise between the boolean lesson and the arrays lesson. So the missing bit is in the next lesson. Does that make sense to you guys?


Reconciliation services can be more sophisticated and often quicker than using the method described above to retrieve data from a URL. However, to use the ‘Reconciliation’ function in OpenRefine requires the external resource to support the necessary service for OpenRefine to work with, which means unless the service you wish to use supports such a service you cannot use the ‘Reconciliation’ approach.

There are a few services where you can find an OpenRefine Reconciliation option available. For example Wikidata has a reconciliation service at [https://tools.wmflabs.org/openrefine-wikidata/en/api](https://tools.wmflabs.org/openrefine-wikidata/en/api).
There are a few services where you can find an OpenRefine Reconciliation option available. For example WikiData has a (fledgling) reconciliation service at [https://tools.wmflabs.org/wikidata-reconcile/](https://tools.wmflabs.org/wikidata-reconcile/).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to not say 'fledgling' here - this will change over time and I think it would be better to avoid something we need to update or make a judgement call on

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not my edit! I agree, I'd remove fledgling and will do so.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other thing is the URL - I think the best URL to use is now https://tools.wmflabs.org/openrefine-wikidata/ (this is a new change - I'm just flagging here because I just noticed it!)

@@ -71,11 +71,11 @@ The next exercise demonstrates this two stage process in full.
{: .challenge}

## Reconciliation services
Reconciliation services allow you to lookup terms from your data in OpenRefine against external services, and use values from the external services in your data. The official wiki provides [detailed information about this feature](https://github.com/OpenRefine/OpenRefine/wiki/Reconciliation).
Reconciliation services allow you to lookup terms from your data in OpenRefine against external services, and use values from the external services in your data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reasoning behind removing this link here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I didn't remove anything conciously. Again, I wonder if the upstream I did didn't work, although it I got all the right messages.

@pitviper6
Copy link
Contributor Author

@ccronje @ostephens AAAARGH. Years ago, I created a github user account that I created for a class and promptly forgot about (Juliane666). Now it seems to be messing me up completely. I'd forgotten about it and now it's popping up and I really don't understand why, and I'm afraid to delete it.....So that's why there are these rogue commits, and I don't know why my terminal is suddenly using that account instead of the pitviper6 name I've been using forever.

re-added the word 'publisher'
replaced text that was accidentally deleted.
Removed 'fledgling', updated URLs.
@pitviper6
Copy link
Contributor Author

Ok, I've made all the changes - should I commence merging or do you guys want to take one more look?

@ostephens
Copy link
Contributor

I'll have a quick look

@ostephens
Copy link
Contributor

In terms of commits on the wrong username - possibly this? https://help.github.com/articles/why-are-my-commits-linked-to-the-wrong-user/

@pitviper6
Copy link
Contributor Author

pitviper6 commented Dec 10, 2017 via email

@ccronje
Copy link
Contributor

ccronje commented Dec 11, 2017

Sorry I'm a bit lost in terms of what will merge from the two GitHub accounts. If you both are happy to merge let's do it and review.

@ostephens
Copy link
Contributor

OK - let's merge, then sort out any issues

@ostephens ostephens merged commit c37ea53 into data-lessons:gh-pages Dec 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants