Adding notebooks for DLND

sebastianstock · Nov 14, 2017 · e191c3a · e191c3a
1 parent 7199964
commit e191c3a
Show file tree

Hide file tree

Showing 13 changed files with 3,349 additions and 0 deletions.
diff --git a/IMDB-keras/IMDB_In_Keras.ipynb b/IMDB-keras/IMDB_In_Keras.ipynb
@@ -0,0 +1,188 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Analyzing IMDB Data in Keras"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Imports\n",
+    "import numpy as np\n",
+    "import keras\n",
+    "from keras.datasets import imdb\n",
+    "from keras.models import Sequential\n",
+    "from keras.layers import Dense, Dropout, Activation\n",
+    "from keras.preprocessing.text import Tokenizer\n",
+    "import matplotlib.pyplot as plt\n",
+    "%matplotlib inline\n",
+    "\n",
+    "np.random.seed(42)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Loading the data\n",
+    "This dataset comes preloaded with Keras, so one simple command will get us training and testing data. There is a parameter for how many words we want to look at. We've set it at 1000, but feel free to experiment."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Loading the data (it's preloaded in Keras)\n",
+    "(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=1000)\n",
+    "\n",
+    "print(x_train.shape)\n",
+    "print(x_test.shape)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Examining the data\n",
+    "Notice that the data has been already pre-processed, where all the words have numbers, and the reviews come in as a vector with the words that the review contains. For example, if the word 'the' is the first one in our dictionary, and a review contains the word 'the', then there is a 1 in the corresponding vector.\n",
+    "\n",
+    "The output comes as a vector of 1's and 0's, where 1 is a positive sentiment for the review, and 0 is negative."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(x_train[0])\n",
+    "print(y_train[0])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3. One-hot encoding the output\n",
+    "Here, we'll turn the input vectors into (0,1)-vectors. For example, if the pre-processed vector contains the number 14, then in the processed vector, the 14th entry will be 1."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# One-hot encoding the output into vector mode, each of length 1000\n",
+    "tokenizer = Tokenizer(num_words=1000)\n",
+    "x_train = tokenizer.sequences_to_matrix(x_train, mode='binary')\n",
+    "x_test = tokenizer.sequences_to_matrix(x_test, mode='binary')\n",
+    "print(x_train[0])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "And we'll also one-hot encode the output."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# One-hot encoding the output\n",
+    "num_classes = 2\n",
+    "y_train = keras.utils.to_categorical(y_train, num_classes)\n",
+    "y_test = keras.utils.to_categorical(y_test, num_classes)\n",
+    "print(y_train.shape)\n",
+    "print(y_test.shape)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4. Building the  model architecture\n",
+    "Build a model here using sequential. Feel free to experiment with different layers and sizes! Also, experiment adding dropout to reduce overfitting."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# TODO: Build the model architecture\n",
+    "\n",
+    "# TODO: Compile the model using a loss function and an optimizer.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5. Training the model\n",
+    "Run the model here. Experiment with different batch_size, and number of epochs!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# TODO: Run the model. Feel free to experiment with different batch sizes and number of epochs."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 6. Evaluating the model\n",
+    "This will give you the accuracy of the model, as evaluated on the testing set. Can you get something over 85%?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "score = model.evaluate(x_test, y_test, verbose=0)\n",
+    "print(\"Accuracy: \", score[1])"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.1"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/IMDB-keras/IMDB_In_Keras_Solutions.ipynb b/IMDB-keras/IMDB_In_Keras_Solutions.ipynb
@@ -0,0 +1,81 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Analyzing IMDB Data in Keras - Solution"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4. Building the  model architecture\n",
+    "Build a model here using sequential. Feel free to experiment with different layers and sizes! Also, experiment adding dropout to reduce overfitting."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Building the model architecture with one layer of length 100\n",
+    "model = Sequential()\n",
+    "model.add(Dense(512, activation='relu', input_dim=1000))\n",
+    "model.add(Dropout(0.5))\n",
+    "model.add(Dense(num_classes, activation='softmax'))\n",
+    "model.summary()\n",
+    "\n",
+    "# Compiling the model using categorical_crossentropy loss, and rmsprop optimizer.\n",
+    "model.compile(loss='categorical_crossentropy',\n",
+    "              optimizer='rmsprop',\n",
+    "              metrics=['accuracy'])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5. Training the model\n",
+    "Run the model here. Experiment with different batch_size, and number of epochs!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Running and evaluating the model\n",
+    "hist = model.fit(x_train, y_train,\n",
+    "          batch_size=32,\n",
+    "          epochs=10,\n",
+    "          validation_data=(x_test, y_test), \n",
+    "          verbose=2)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.1"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}