forked from udacity/deep-learning
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
7199964
commit e191c3a
Showing
13 changed files
with
3,349 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,188 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Analyzing IMDB Data in Keras" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Imports\n", | ||
"import numpy as np\n", | ||
"import keras\n", | ||
"from keras.datasets import imdb\n", | ||
"from keras.models import Sequential\n", | ||
"from keras.layers import Dense, Dropout, Activation\n", | ||
"from keras.preprocessing.text import Tokenizer\n", | ||
"import matplotlib.pyplot as plt\n", | ||
"%matplotlib inline\n", | ||
"\n", | ||
"np.random.seed(42)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## 1. Loading the data\n", | ||
"This dataset comes preloaded with Keras, so one simple command will get us training and testing data. There is a parameter for how many words we want to look at. We've set it at 1000, but feel free to experiment." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Loading the data (it's preloaded in Keras)\n", | ||
"(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=1000)\n", | ||
"\n", | ||
"print(x_train.shape)\n", | ||
"print(x_test.shape)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## 2. Examining the data\n", | ||
"Notice that the data has been already pre-processed, where all the words have numbers, and the reviews come in as a vector with the words that the review contains. For example, if the word 'the' is the first one in our dictionary, and a review contains the word 'the', then there is a 1 in the corresponding vector.\n", | ||
"\n", | ||
"The output comes as a vector of 1's and 0's, where 1 is a positive sentiment for the review, and 0 is negative." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"print(x_train[0])\n", | ||
"print(y_train[0])" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## 3. One-hot encoding the output\n", | ||
"Here, we'll turn the input vectors into (0,1)-vectors. For example, if the pre-processed vector contains the number 14, then in the processed vector, the 14th entry will be 1." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# One-hot encoding the output into vector mode, each of length 1000\n", | ||
"tokenizer = Tokenizer(num_words=1000)\n", | ||
"x_train = tokenizer.sequences_to_matrix(x_train, mode='binary')\n", | ||
"x_test = tokenizer.sequences_to_matrix(x_test, mode='binary')\n", | ||
"print(x_train[0])" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"And we'll also one-hot encode the output." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# One-hot encoding the output\n", | ||
"num_classes = 2\n", | ||
"y_train = keras.utils.to_categorical(y_train, num_classes)\n", | ||
"y_test = keras.utils.to_categorical(y_test, num_classes)\n", | ||
"print(y_train.shape)\n", | ||
"print(y_test.shape)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## 4. Building the model architecture\n", | ||
"Build a model here using sequential. Feel free to experiment with different layers and sizes! Also, experiment adding dropout to reduce overfitting." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# TODO: Build the model architecture\n", | ||
"\n", | ||
"# TODO: Compile the model using a loss function and an optimizer.\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## 5. Training the model\n", | ||
"Run the model here. Experiment with different batch_size, and number of epochs!" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# TODO: Run the model. Feel free to experiment with different batch sizes and number of epochs." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## 6. Evaluating the model\n", | ||
"This will give you the accuracy of the model, as evaluated on the testing set. Can you get something over 85%?" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"score = model.evaluate(x_test, y_test, verbose=0)\n", | ||
"print(\"Accuracy: \", score[1])" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.6.1" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Analyzing IMDB Data in Keras - Solution" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## 4. Building the model architecture\n", | ||
"Build a model here using sequential. Feel free to experiment with different layers and sizes! Also, experiment adding dropout to reduce overfitting." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Building the model architecture with one layer of length 100\n", | ||
"model = Sequential()\n", | ||
"model.add(Dense(512, activation='relu', input_dim=1000))\n", | ||
"model.add(Dropout(0.5))\n", | ||
"model.add(Dense(num_classes, activation='softmax'))\n", | ||
"model.summary()\n", | ||
"\n", | ||
"# Compiling the model using categorical_crossentropy loss, and rmsprop optimizer.\n", | ||
"model.compile(loss='categorical_crossentropy',\n", | ||
" optimizer='rmsprop',\n", | ||
" metrics=['accuracy'])" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## 5. Training the model\n", | ||
"Run the model here. Experiment with different batch_size, and number of epochs!" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Running and evaluating the model\n", | ||
"hist = model.fit(x_train, y_train,\n", | ||
" batch_size=32,\n", | ||
" epochs=10,\n", | ||
" validation_data=(x_test, y_test), \n", | ||
" verbose=2)" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.6.1" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
Oops, something went wrong.