Skip to content

Commit

Permalink
fixed links
Browse files Browse the repository at this point in the history
  • Loading branch information
Tixierae committed Dec 5, 2019
1 parent 4964c15 commit bfeda04
Show file tree
Hide file tree
Showing 3 changed files with 165 additions and 63 deletions.
106 changes: 79 additions & 27 deletions CNN_IMDB/cnn_imdb.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,9 @@
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
Expand Down Expand Up @@ -60,10 +62,9 @@
"\n",
"print('functions defined')\n",
"\n",
"# make sure you replace the root path with your own!\n",
"path_root = 'C:\\\\Users\\\\mvazirg\\\\Desktop\\\\M2 graph & text\\\\Lab4_deep_learning\\\\final\\\\final\\\\'\n",
"path_root = # fill me \n",
"path_to_IMDB = path_root + 'new_data\\\\'\n",
"path_to_pretrained_wv = 'G:\\\\'\n",
"path_to_pretrained_wv = # fill me\n",
"path_to_plot = path_root\n",
"path_to_save = path_root"
]
Expand All @@ -89,7 +90,9 @@
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
Expand Down Expand Up @@ -130,13 +133,15 @@
"source": [
"## Data loading and further preprocessing\n",
"### Reading and inspection\n",
"Let's read the data. The raw data have already been pre-processed by me in this [script](https://github.com/Tixierae/deep_learning_NLP/blob/master/imdb_preprocess.py). `word_to_index` is a dictionary of word indexes sorted by decreasing frequency across the corpus. It is a 1-based index, as 0 is reserved for zero-padding."
"Let's read the data. The raw data have already been pre-processed by me in this [script](https://github.com/Tixierae/deep_learning_NLP/blob/master/CNN_IMDB/imdb_preprocess_new.py). `word_to_index` is a dictionary of word indexes sorted by decreasing frequency across the corpus. It is a 1-based index, as 0 is reserved for zero-padding."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
Expand Down Expand Up @@ -167,7 +172,9 @@
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
Expand Down Expand Up @@ -206,13 +213,15 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We can verify what our data look like by inverting the `word_to_index` dictionary and comparing a random review to the real one: [http://www.imdb.com/title/tt0118703/reviews](http://www.imdb.com/title/tt0110695/reviews):"
"We can verify what our data look like by inverting the `word_to_index` dictionary and comparing a random review to the real one: [https://www.imdb.com/review/rw1958804/?ref_=tt_urv](https://www.imdb.com/review/rw1958804/?ref_=tt_urv):"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
Expand All @@ -238,7 +247,9 @@
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
Expand Down Expand Up @@ -270,7 +281,9 @@
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
Expand Down Expand Up @@ -320,7 +333,9 @@
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
Expand Down Expand Up @@ -384,15 +399,17 @@
"## Defining CNN\n",
"For efficiency reasons, we will only implement two branches of the following architecture: \n",
"\n",
"<img src=\"https://github.com/Tixierae/deep_learning_NLP/raw/master/cnn_illustration.png\" alt=\"Drawing\" style=\"width: 400px;\"/>\n",
"<img src=\"https://github.com/Tixierae/deep_learning_NLP/raw/master/CNN_IMDB/cnn_illustration.png\" alt=\"Drawing\" style=\"width: 400px;\"/>\n",
"\n",
"By branch, I mean the part of the architecture that corresponds to a given filter size (e.g., the upper red part is one branch)."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
Expand Down Expand Up @@ -475,7 +492,9 @@
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
Expand Down Expand Up @@ -544,7 +563,9 @@
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
Expand All @@ -570,6 +591,7 @@
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
Expand Down Expand Up @@ -640,7 +662,9 @@
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
Expand Down Expand Up @@ -718,7 +742,9 @@
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
Expand Down Expand Up @@ -781,7 +807,9 @@
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
Expand Down Expand Up @@ -888,13 +916,15 @@
"\n",
"The idea is to rank the elements of the input document based on their influence on the prediction. An approximation can be given by the magnitudes of the first-order partial derivatives of the output of the model with respect to each word in the input document. The interpretation is that we identify which words in the document need to be *changed the least to change the class score the most*. The derivatives can be obtained by performing a single back-propagation pass. Note that here, we backpropagate the **class score** and not the loss (like we do during training).\n",
"\n",
"You can view the saliency maps as vector graphics PDFs in the root of the repo: https://github.com/Tixierae/deep_learning_NLP"
"You can view the saliency maps as vector graphics PDFs here: https://github.com/Tixierae/deep_learning_NLP/tree/master/CNN_IMDB"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
Expand Down Expand Up @@ -958,25 +988,47 @@
" fig.savefig(path_to_plot+'saliency_'+str(idx)+'.pdf',bbox_inches='tight')\n",
" fig.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Thank you for your interest!\n",
"If you found any error or have any suggestion for improvement, please file an issue on GitHub. You can also contact me at `[email protected]`.\n",
"If you use some of the code in this repository in your own work, please cite:\n",
"* bibtex:\n",
"```\n",
"@article{tixier2018notes,\n",
" title={Notes on Deep Learning for NLP},\n",
" author={Tixier, Antoine J-P},\n",
" journal={arXiv preprint arXiv:1808.09772},\n",
" year={2018}\n",
"}\n",
"```\n",
"* plain text:\n",
"```\n",
"Tixier, Antoine J-P. \"Notes on Deep Learning for NLP.\" arXiv preprint arXiv:1808.09772 (2018).\n",
"```"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 2",
"language": "python",
"name": "python3"
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.1"
"pygments_lexer": "ipython2",
"version": "2.7.13"
}
},
"nbformat": 4,
Expand Down
Loading

0 comments on commit bfeda04

Please sign in to comment.