diff --git a/index.html b/index.html
index 37fea04..b4a83e7 100644
--- a/index.html
+++ b/index.html
@@ -126,17 +126,14 @@ <h1 class="flex-none">
       
       <div class="blah w-100">
         <h1 class="f3 fw1 athelas mt0 lh-title">
-          <a href="/lstm-names/posts/my-first-post/" class="color-inherit dim link">
-            What is a name? LSTMs
+          <a href="/lstm-names/posts/what-is-a-name/" class="color-inherit dim link">
+            What is a name? Intro to Long Short Term Networks
             </a>
         </h1>
         <div class="f6 f5-l lh-copy nested-copy-line-height nested-links">
-          TODO:
-include link to notebooks (put on github) make your own sigmoid graph (optional)
-improve code highlighting improve code font size make graph where x axis is length y is ratio 
-Today I want to talk about names. Mine has been around since the Old Testament, but recently celebrities have been getting pretty creative with what they call their kids. A few weeks ago, my team was assigned a project parsing resumes for new hires, and when I realized that some students were born on Mars and never learned to put their name at the top of the page, I thought this would be an interesting problem to solve with a neural net.
+          Today I want to talk about names. Mine has been around since the Old Testament, but recently celebrities have been getting pretty creative with what they call their kids. A few weeks ago, my team was assigned a project parsing resumes for new hires, and when I realized that some students actually don&rsquo;t put their name at the top of the page, I thought this would be an interesting problem to solve using a neural net.
         </div>
-          <a href="/lstm-names/posts/my-first-post/" class="ba b--moon-gray bg-light-gray br2 color-inherit dib f7 hover-bg-moon-gray link mt2 ph2 pv1">read more</a>
+          <a href="/lstm-names/posts/what-is-a-name/" class="ba b--moon-gray bg-light-gray br2 color-inherit dib f7 hover-bg-moon-gray link mt2 ph2 pv1">read more</a>
         
       </div>
     </div>
diff --git a/index.xml b/index.xml
index 166ae74..321d2ea 100644
--- a/index.xml
+++ b/index.xml
@@ -12,15 +12,12 @@
     
     
     <item>
-      <title>What is a name? LSTMs</title>
-      <link>https://danep93.github.io/lstm-names/posts/my-first-post/</link>
+      <title>What is a name? Intro to Long Short Term Networks</title>
+      <link>https://danep93.github.io/lstm-names/posts/what-is-a-name/</link>
       <pubDate>Wed, 28 Mar 2018 13:55:10 -0400</pubDate>
       
-      <guid>https://danep93.github.io/lstm-names/posts/my-first-post/</guid>
-      <description>TODO:
-include link to notebooks (put on github) make your own sigmoid graph (optional)
-improve code highlighting improve code font size make graph where x axis is length y is ratio 
-Today I want to talk about names. Mine has been around since the Old Testament, but recently celebrities have been getting pretty creative with what they call their kids. A few weeks ago, my team was assigned a project parsing resumes for new hires, and when I realized that some students were born on Mars and never learned to put their name at the top of the page, I thought this would be an interesting problem to solve with a neural net.</description>
+      <guid>https://danep93.github.io/lstm-names/posts/what-is-a-name/</guid>
+      <description>Today I want to talk about names. Mine has been around since the Old Testament, but recently celebrities have been getting pretty creative with what they call their kids. A few weeks ago, my team was assigned a project parsing resumes for new hires, and when I realized that some students actually don&amp;rsquo;t put their name at the top of the page, I thought this would be an interesting problem to solve using a neural net.</description>
     </item>
     
   </channel>
diff --git a/posts/index.html b/posts/index.html
index ad65f5f..9cdb730 100644
--- a/posts/index.html
+++ b/posts/index.html
@@ -106,15 +106,12 @@ <h1 class="f2 f-subheadline-l fw2 white-90 mb0 lh-title">
   <div class="bg-white mb3 pa4 gray overflow-hidden">
     <span class="f6 db">Posts</span>
     <h1 class="f3 near-black">
-      <a href="/lstm-names/posts/my-first-post/" class="link black dim">
-        What is a name? LSTMs
+      <a href="/lstm-names/posts/what-is-a-name/" class="link black dim">
+        What is a name? Intro to Long Short Term Networks
       </a>
     </h1>
     <div class="nested-links f5 lh-copy nested-copy-line-height">
-      TODO:
-include link to notebooks (put on github) make your own sigmoid graph (optional)
-improve code highlighting improve code font size make graph where x axis is length y is ratio 
-Today I want to talk about names. Mine has been around since the Old Testament, but recently celebrities have been getting pretty creative with what they call their kids. A few weeks ago, my team was assigned a project parsing resumes for new hires, and when I realized that some students were born on Mars and never learned to put their name at the top of the page, I thought this would be an interesting problem to solve with a neural net.
+      Today I want to talk about names. Mine has been around since the Old Testament, but recently celebrities have been getting pretty creative with what they call their kids. A few weeks ago, my team was assigned a project parsing resumes for new hires, and when I realized that some students actually don&rsquo;t put their name at the top of the page, I thought this would be an interesting problem to solve using a neural net.
     </div>
   </div>
 </div>
diff --git a/posts/index.xml b/posts/index.xml
index fe6a00b..bbb19af 100644
--- a/posts/index.xml
+++ b/posts/index.xml
@@ -12,15 +12,12 @@
     
     
     <item>
-      <title>What is a name? LSTMs</title>
-      <link>https://danep93.github.io/lstm-names/posts/my-first-post/</link>
+      <title>What is a name? Intro to Long Short Term Networks</title>
+      <link>https://danep93.github.io/lstm-names/posts/what-is-a-name/</link>
       <pubDate>Wed, 28 Mar 2018 13:55:10 -0400</pubDate>
       
-      <guid>https://danep93.github.io/lstm-names/posts/my-first-post/</guid>
-      <description>TODO:
-include link to notebooks (put on github) make your own sigmoid graph (optional)
-improve code highlighting improve code font size make graph where x axis is length y is ratio 
-Today I want to talk about names. Mine has been around since the Old Testament, but recently celebrities have been getting pretty creative with what they call their kids. A few weeks ago, my team was assigned a project parsing resumes for new hires, and when I realized that some students were born on Mars and never learned to put their name at the top of the page, I thought this would be an interesting problem to solve with a neural net.</description>
+      <guid>https://danep93.github.io/lstm-names/posts/what-is-a-name/</guid>
+      <description>Today I want to talk about names. Mine has been around since the Old Testament, but recently celebrities have been getting pretty creative with what they call their kids. A few weeks ago, my team was assigned a project parsing resumes for new hires, and when I realized that some students actually don&amp;rsquo;t put their name at the top of the page, I thought this would be an interesting problem to solve using a neural net.</description>
     </item>
     
   </channel>
diff --git a/posts/what-is-a-name/index.html b/posts/what-is-a-name/index.html
new file mode 100644
index 0000000..e0d5dd1
--- /dev/null
+++ b/posts/what-is-a-name/index.html
@@ -0,0 +1,600 @@
+<!DOCTYPE html>
+<html lang="en-us">
+  <head>
+    <meta charset="utf-8">
+    <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
+    
+    <title>Data Science Exploration  | What is a name? Intro to Long Short Term Networks</title>
+    <meta name="HandheldFriendly" content="True">
+    <meta name="MobileOptimized" content="320">
+
+    <meta name="viewport" content="width=device-width,minimum-scale=1">
+    <meta name="generator" content="Hugo 0.38" />
+    
+    
+      <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
+    
+
+    <link href='https://danep93.github.io/lstm-names/dist/main.css' rel='stylesheet' type="text/css" />
+    
+      
+    
+
+    
+
+    <meta property="og:title" content="What is a name? Intro to Long Short Term Networks" />
+<meta property="og:description" content="Today I want to talk about names. Mine has been around since the Old Testament, but recently celebrities have been getting pretty creative with what they call their kids. A few weeks ago, my team was assigned a project parsing resumes for new hires, and when I realized that some students actually don&rsquo;t put their name at the top of the page, I thought this would be an interesting problem to solve using a neural net." />
+<meta property="og:type" content="article" />
+<meta property="og:url" content="https://danep93.github.io/lstm-names/posts/what-is-a-name/" />
+
+
+
+<meta property="article:published_time" content="2018-03-28T13:55:10-04:00"/>
+
+<meta property="article:modified_time" content="2018-03-28T13:55:10-04:00"/>
+
+
+
+
+
+
+
+
+
+
+
+<meta itemprop="name" content="What is a name? Intro to Long Short Term Networks">
+<meta itemprop="description" content="Today I want to talk about names. Mine has been around since the Old Testament, but recently celebrities have been getting pretty creative with what they call their kids. A few weeks ago, my team was assigned a project parsing resumes for new hires, and when I realized that some students actually don&rsquo;t put their name at the top of the page, I thought this would be an interesting problem to solve using a neural net.">
+
+
+<meta itemprop="datePublished" content="2018-03-28T13:55:10-04:00" />
+<meta itemprop="dateModified" content="2018-03-28T13:55:10-04:00" />
+<meta itemprop="wordCount" content="3241">
+
+
+
+<meta itemprop="keywords" content="" />
+<meta name="twitter:card" content="summary"/>
+<meta name="twitter:title" content="What is a name? Intro to Long Short Term Networks"/>
+<meta name="twitter:description" content="Today I want to talk about names. Mine has been around since the Old Testament, but recently celebrities have been getting pretty creative with what they call their kids. A few weeks ago, my team was assigned a project parsing resumes for new hires, and when I realized that some students actually don&rsquo;t put their name at the top of the page, I thought this would be an interesting problem to solve using a neural net."/>
+
+  </head>
+
+  <body class="ma0 avenir bg-near-white">
+
+    
+
+  <header>
+    <div class="bg-black">
+      <nav class="pv3 ph3 ph4-ns" role="navigation">
+  <div class="flex-l justify-between items-center center">
+    <a href="https://danep93.github.io/lstm-names/" class="f3 fw2 hover-white no-underline white-90 dib">
+      Data Science Exploration
+    </a>
+    <div class="flex-l items-center">
+      
+      
+
+
+
+
+
+
+
+
+    </div>
+  </div>
+</nav>
+
+    </div>
+  </header>
+
+
+    <main class="pb7" role="main">
+      
+  <div class="flex-l mt2 mw8 center">
+    <article class="center cf pv5 ph3 ph4-ns mw7">
+      <header>
+        <p class="f6 b helvetica tracked">
+          POSTS
+        </p>
+        <h1 class="f1">
+          What is a name? Intro to Long Short Term Networks
+        </h1>
+      </header>
+      <div class="nested-copy-line-height lh-copy f4 nested-links nested-img mid-gray">
+        <p><img src="/img/no-name.jpg" alt="no-name" /></p>
+
+<p><br><br>
+Today I want to talk about names. Mine has been around since the Old Testament, but recently celebrities have been getting pretty creative with what they call their kids. A few weeks ago, my team was assigned a project parsing resumes for new hires, and when I realized that some students actually don&rsquo;t put their name at the top of the page, I thought this would be an interesting problem to solve using a neural net. That is today’s mission: to distinguish names from regular words, and eventually, distinguish male names from female names.</p>
+
+<p>Long Short Term Memory Networks (LSTM) are a type of recurrent neural network that are great at this application; they use context from previous input features to predict what the next output should be. For example, if you wanted to predict the end of the sentence “in France they speak ___”, and use each (tokenized) word of the sentence as a feature, the LSTM would use “France” and “speak” to deduce the sentence ends in “French”. This assumes you have a network that’s already been trained on an extensive corpus of the English language, such as the Glove dataset. That’s exactly what Spacy is; it’s a Natural Language Processing library that’s pre-trained on this general English language corpus and uses sentence context to do things like classify words as adjectives, proper nouns, etc… Unfortunately, without sentence context, Spacy comes up short. When parsing the resume header</p>
+
+<p>john smith
+<br>
+JohnSmith68@gmail.com
+<br>
+My mission is to better the world and get paid</p>
+
+<p>Feel free to install spacy (easiest through pip) and test out some sentences. Disclaimer, be extra careful with word capitalization.</p>
+
+<div class="highlight"><div style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4">
+<table style="border-spacing:0;padding:0;margin:0;border:0;width:auto;overflow:auto;display:block;"><tr><td style="vertical-align:top;padding:0;margin:0;border:0;">
+<pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-go" data-lang="go"><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">1
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">2
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">3
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">4
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">5
+</span></code></pre></td>
+<td style="vertical-align:top;padding:0;margin:0;border:0;">
+<pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-go" data-lang="go"><span style="color:#a6e22e">Import</span> <span style="color:#a6e22e">spacy</span>
+<span style="color:#a6e22e">nlp</span> = <span style="color:#a6e22e">spacy</span>.<span style="color:#f92672">import</span>(<span style="color:#960050;background-color:#1e0010">‘</span><span style="color:#a6e22e">en</span><span style="color:#960050;background-color:#1e0010">’</span>)
+<span style="color:#a6e22e">mystr</span> = <span style="color:#960050;background-color:#1e0010">‘</span><span style="color:#a6e22e">my</span> <span style="color:#a6e22e">friend</span> <span style="color:#a6e22e">John</span> <span style="color:#a6e22e">wants</span> <span style="color:#a6e22e">dave</span> <span style="color:#a6e22e">to</span> <span style="color:#a6e22e">know</span> <span style="color:#a6e22e">what</span> <span style="color:#a6e22e">is</span> <span style="color:#a6e22e">popular</span> <span style="color:#a6e22e">these</span> <span style="color:#a6e22e">days</span><span style="color:#960050;background-color:#1e0010">’</span>
+<span style="color:#a6e22e">document</span> = <span style="color:#a6e22e">nlp</span>(<span style="color:#a6e22e">mystr</span>)
+print([<span style="color:#a6e22e">token</span> <span style="color:#66d9ef">for</span> <span style="color:#a6e22e">token</span> <span style="color:#a6e22e">in</span> <span style="color:#a6e22e">document</span> <span style="color:#66d9ef">if</span> <span style="color:#a6e22e">token</span>.<span style="color:#a6e22e">ent_type_</span> <span style="color:#f92672">==</span> <span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">PERSON</span><span style="color:#960050;background-color:#1e0010">&#39;</span>])</code></pre></td></tr></table>
+</div>
+</div>
+
+<p>I started with a few datasets found off Kaggle which is a really great platform that’s responsible for getting a bunch of people into the space of competitive data science. The datasets I used were a list of male names, female names, and the most common internet words. First thing I did was load them into python as data frame objects and do a little bit of cleanup.</p>
+
+<div class="highlight"><div style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4">
+<table style="border-spacing:0;padding:0;margin:0;border:0;width:auto;overflow:auto;display:block;"><tr><td style="vertical-align:top;padding:0;margin:0;border:0;">
+<pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-go" data-lang="go"><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 1
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 2
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 3
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 4
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 5
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 6
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 7
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 8
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 9
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">10
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">11
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">12
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">13
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">14
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">15
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">16
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">17
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">18
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">19
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">20
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">21
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">22
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">23
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">24
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">25
+</span></code></pre></td>
+<td style="vertical-align:top;padding:0;margin:0;border:0;">
+<pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-go" data-lang="go"><span style="color:#a6e22e">with</span> <span style="color:#a6e22e">open</span>(<span style="color:#a6e22e">females_names_path</span>) <span style="color:#a6e22e">as</span> <span style="color:#a6e22e">f</span>:
+    <span style="color:#a6e22e">female_lines</span> = <span style="color:#a6e22e">f</span>.<span style="color:#a6e22e">read</span>().<span style="color:#a6e22e">splitlines</span>()
+<span style="color:#a6e22e">with</span> <span style="color:#a6e22e">open</span>(<span style="color:#a6e22e">male_names_path</span>) <span style="color:#a6e22e">as</span> <span style="color:#a6e22e">f</span>:
+    <span style="color:#a6e22e">male_lines</span> = <span style="color:#a6e22e">f</span>.<span style="color:#a6e22e">read</span>().<span style="color:#a6e22e">splitlines</span>()
+<span style="color:#a6e22e">names_list</span> = <span style="color:#a6e22e">list</span>(<span style="color:#a6e22e">set</span>(<span style="color:#a6e22e">male_lines</span> <span style="color:#f92672">+</span> <span style="color:#a6e22e">female_lines</span>))
+<span style="color:#a6e22e">names_df</span> = <span style="color:#a6e22e">pd</span>.<span style="color:#a6e22e">DataFrame</span>(<span style="color:#a6e22e">np</span>.<span style="color:#a6e22e">array</span>(<span style="color:#a6e22e">names_list</span>), <span style="color:#a6e22e">columns</span> = [<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">word</span><span style="color:#960050;background-color:#1e0010">&#39;</span>])
+<span style="color:#a6e22e">names_df</span>[<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">target</span><span style="color:#960050;background-color:#1e0010">&#39;</span>] = <span style="color:#ae81ff">1</span>
+<span style="color:#960050;background-color:#1e0010">#</span> <span style="color:#a6e22e">create</span> <span style="color:#a6e22e">DF</span> <span style="color:#a6e22e">out</span> <span style="color:#a6e22e">of</span> <span style="color:#a6e22e">internet</span> <span style="color:#a6e22e">words</span>, <span style="color:#a6e22e">drop</span> <span style="color:#a6e22e">count</span> <span style="color:#a6e22e">column</span> <span style="color:#a6e22e">add</span> <span style="color:#a6e22e">target</span> <span style="color:#a6e22e">column</span>
+<span style="color:#960050;background-color:#1e0010">#</span><span style="color:#a6e22e">want</span> <span style="color:#ae81ff">50</span><span style="color:#f92672">/</span><span style="color:#ae81ff">50</span> <span style="color:#a6e22e">distribution</span> <span style="color:#a6e22e">in</span> <span style="color:#a6e22e">data</span>, <span style="color:#a6e22e">no</span> <span style="color:#a6e22e">bias</span>
+<span style="color:#a6e22e">internet_df</span> = <span style="color:#a6e22e">pd</span>.<span style="color:#a6e22e">read_csv</span>(<span style="color:#a6e22e">internet_words_path</span>, <span style="color:#a6e22e">nrows</span>=<span style="color:#a6e22e">names_df</span>.<span style="color:#a6e22e">size</span>)
+<span style="color:#a6e22e">internet_df</span>[<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">target</span><span style="color:#960050;background-color:#1e0010">&#39;</span>] = <span style="color:#ae81ff">0</span>
+<span style="color:#a6e22e">internet_df</span> = <span style="color:#a6e22e">internet_df</span>.<span style="color:#a6e22e">drop</span>([<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">count</span><span style="color:#960050;background-color:#1e0010">&#39;</span>], <span style="color:#a6e22e">axis</span>=<span style="color:#ae81ff">1</span>)
+<span style="color:#960050;background-color:#1e0010">#</span> <span style="color:#a6e22e">use</span> <span style="color:#a6e22e">outer</span> <span style="color:#a6e22e">so</span> <span style="color:#a6e22e">names</span> <span style="color:#a6e22e">in</span> <span style="color:#a6e22e">internet_df</span> <span style="color:#a6e22e">that</span> <span style="color:#a6e22e">appear</span> <span style="color:#a6e22e">in</span> <span style="color:#a6e22e">names_df</span> <span style="color:#a6e22e">get</span> <span style="color:#a6e22e">target</span> <span style="color:#ae81ff">1</span>
+<span style="color:#a6e22e">merged_df</span> = <span style="color:#a6e22e">pd</span>.<span style="color:#a6e22e">merge</span>(<span style="color:#a6e22e">names_df</span>, <span style="color:#a6e22e">internet_df</span>, <span style="color:#a6e22e">how</span>=<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">inner</span><span style="color:#960050;background-color:#1e0010">&#39;</span>, <span style="color:#a6e22e">on</span>=[<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">word</span><span style="color:#960050;background-color:#1e0010">&#39;</span>, <span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">target</span><span style="color:#960050;background-color:#1e0010">&#39;</span>])
+<span style="color:#a6e22e">merged_df</span> = <span style="color:#a6e22e">merged_df</span>.<span style="color:#a6e22e">dropna</span>()
+<span style="color:#a6e22e">merged_df</span>[<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">word</span><span style="color:#960050;background-color:#1e0010">&#39;</span>] = <span style="color:#a6e22e">merged_df</span>[<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">word</span><span style="color:#960050;background-color:#1e0010">&#39;</span>].<span style="color:#a6e22e">str</span>.<span style="color:#a6e22e">lower</span>()
+<span style="color:#a6e22e">merged_df</span>[<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">word</span><span style="color:#960050;background-color:#1e0010">&#39;</span>] = <span style="color:#a6e22e">merged_df</span>[<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">word</span><span style="color:#960050;background-color:#1e0010">&#39;</span>].<span style="color:#a6e22e">str</span>.<span style="color:#a6e22e">strip</span>()
+<span style="color:#a6e22e">merged_df</span>[<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">word_length</span><span style="color:#960050;background-color:#1e0010">&#39;</span>] = <span style="color:#a6e22e">merged_df</span>[<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">word</span><span style="color:#960050;background-color:#1e0010">&#39;</span>].<span style="color:#a6e22e">apply</span>(<span style="color:#a6e22e">lambda</span> <span style="color:#a6e22e">x</span>: len(<span style="color:#a6e22e">x</span>))
+<span style="color:#a6e22e">X</span> = <span style="color:#a6e22e">merged_df</span>[<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">word</span><span style="color:#960050;background-color:#1e0010">&#39;</span>]
+<span style="color:#a6e22e">y</span> = <span style="color:#a6e22e">merged_df</span>[<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">target</span><span style="color:#960050;background-color:#1e0010">&#39;</span>]
+<span style="color:#a6e22e">valid_chars</span> = {<span style="color:#a6e22e">x</span>:<span style="color:#a6e22e">idx</span><span style="color:#f92672">+</span><span style="color:#ae81ff">1</span> <span style="color:#66d9ef">for</span> <span style="color:#a6e22e">idx</span>, <span style="color:#a6e22e">x</span> <span style="color:#a6e22e">in</span> <span style="color:#a6e22e">enumerate</span>(<span style="color:#a6e22e">set</span>(<span style="color:#960050;background-color:#1e0010">&#39;&#39;</span>.<span style="color:#a6e22e">join</span>(<span style="color:#a6e22e">X</span>)))}
+<span style="color:#a6e22e">max_word_len</span> = <span style="color:#a6e22e">np</span>.<span style="color:#a6e22e">max</span>([len(<span style="color:#a6e22e">x</span>) <span style="color:#66d9ef">for</span> <span style="color:#a6e22e">x</span> <span style="color:#a6e22e">in</span> <span style="color:#a6e22e">X</span>])
+<span style="color:#a6e22e">max_features</span> = len(<span style="color:#a6e22e">valid_chars</span>) <span style="color:#f92672">+</span> <span style="color:#ae81ff">1</span>
+<span style="color:#a6e22e">x_data_sequences</span> = [[<span style="color:#a6e22e">valid_chars</span>[<span style="color:#a6e22e">char</span>] <span style="color:#66d9ef">for</span> <span style="color:#a6e22e">char</span> <span style="color:#a6e22e">in</span> <span style="color:#a6e22e">word</span>] <span style="color:#66d9ef">for</span> <span style="color:#a6e22e">word</span> <span style="color:#a6e22e">in</span> <span style="color:#a6e22e">X</span>]
+<span style="color:#a6e22e">x_data_sequences</span> = <span style="color:#a6e22e">sequence</span>.<span style="color:#a6e22e">pad_sequences</span>(<span style="color:#a6e22e">x_data_sequences</span>, <span style="color:#a6e22e">maxlen</span>=<span style="color:#a6e22e">max_word_len</span>)</code></pre></td></tr></table>
+</div>
+</div>
+
+<p>Where a target value of 1 means “name” and 0 means “internet word”</p>
+
+<p>Neural networks are bad at working with sentences, but great at working with vectors, which is why we’ll continue with our preprocessing by tokenizing each character followed by converting each word into a vector of integers. For example, if we create the mapping of ‘d’=1, ‘o’ = 2, ‘g’ = 3, the word dog will (almost) show up to our network as <1,2,3>. I say almost for two reasons: first because it’s important to give our LSTM vectors of constant size so we pad each vector with a bunch of 0’s until they’re all the same length. Second because before we reach the LSTM layer of our network we want to pass our input through an Embedding Layer. This turns our wasteful vectors (so many wasteful 0’s) into a dense vectors of higher dimensionality. Without wasting more time I give you Deep Learning</p>
+
+<div class="highlight"><div style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4">
+<table style="border-spacing:0;padding:0;margin:0;border:0;width:auto;overflow:auto;display:block;"><tr><td style="vertical-align:top;padding:0;margin:0;border:0;">
+<pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-go" data-lang="go"><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 1
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 2
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 3
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 4
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 5
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 6
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 7
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 8
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 9
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">10
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">11
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">12
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">13
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">14
+</span></code></pre></td>
+<td style="vertical-align:top;padding:0;margin:0;border:0;">
+<pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-go" data-lang="go"><span style="color:#a6e22e">embedding_layer</span> = <span style="color:#a6e22e">Embedding</span>(<span style="color:#a6e22e">max_features</span>, <span style="color:#a6e22e">output_dim</span>=<span style="color:#ae81ff">64</span>, <span style="color:#a6e22e">input_length</span>=<span style="color:#a6e22e">max_word_len</span>)
+<span style="color:#a6e22e">lstm_layer</span> = <span style="color:#a6e22e">LSTM</span>(<span style="color:#a6e22e">max_features</span>)
+<span style="color:#a6e22e">dropout_layer</span> = <span style="color:#a6e22e">Dropout</span>(<span style="color:#ae81ff">0.5</span>)
+<span style="color:#a6e22e">dense_layer</span> = <span style="color:#a6e22e">Dense</span>(<span style="color:#ae81ff">1</span>)
+<span style="color:#a6e22e">sigmoid_layer</span> = <span style="color:#a6e22e">Activation</span>(<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">sigmoid</span><span style="color:#960050;background-color:#1e0010">&#39;</span>)
+
+<span style="color:#a6e22e">model</span> = <span style="color:#a6e22e">Sequential</span>([<span style="color:#a6e22e">embedding_layer</span>, <span style="color:#a6e22e">lstm_layer</span>, <span style="color:#a6e22e">dropout_layer</span>, <span style="color:#a6e22e">dense_layer</span>, <span style="color:#a6e22e">sigmoid_layer</span>])
+<span style="color:#a6e22e">model</span>.<span style="color:#a6e22e">compile</span>(<span style="color:#a6e22e">loss</span>=<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">binary_crossentropy</span><span style="color:#960050;background-color:#1e0010">&#39;</span>, <span style="color:#a6e22e">optimizer</span>=<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">SGD</span><span style="color:#960050;background-color:#1e0010">&#39;</span>)
+
+<span style="color:#a6e22e">epochs</span> = <span style="color:#ae81ff">18</span>
+<span style="color:#a6e22e">batch_size</span> = <span style="color:#ae81ff">32</span>
+<span style="color:#a6e22e">X_train</span>, <span style="color:#a6e22e">X_test</span>, <span style="color:#a6e22e">y_train</span>, <span style="color:#a6e22e">y_test</span> = <span style="color:#a6e22e">train_test_split</span>(<span style="color:#a6e22e">x_data_sequences</span>, <span style="color:#a6e22e">y</span>, <span style="color:#a6e22e">test_size</span>=<span style="color:#ae81ff">0.2</span>, <span style="color:#a6e22e">random_state</span>=<span style="color:#ae81ff">0</span>)
+<span style="color:#a6e22e">history</span> = <span style="color:#a6e22e">model</span>.<span style="color:#a6e22e">fit</span>(<span style="color:#a6e22e">X_train</span>, <span style="color:#a6e22e">y_train</span>, <span style="color:#a6e22e">epochs</span> = <span style="color:#a6e22e">epochs</span>, <span style="color:#a6e22e">validation_split</span>=<span style="color:#ae81ff">0.33</span>, <span style="color:#a6e22e">batch_size</span> = <span style="color:#a6e22e">batch_size</span>)
+<span style="color:#a6e22e">predicted_y_test</span> = <span style="color:#a6e22e">model</span>.<span style="color:#a6e22e">predict</span>(<span style="color:#a6e22e">X_test</span>)</code></pre></td></tr></table>
+</div>
+</div>
+
+<p>So that was a lot, let’s walk it back a little bit.
+<strong>batch_size</strong> refers to how many iterations (not epochs, but training rows) you process before you update the parameter weights. An <strong>epoch</strong> is a complete pass through of all the training data.
+Our <strong>embedding layer</strong> takes in vectors of size max_word_len, with max_features distinct values, and outputs dense vectors of 128 dimensions each. Our <strong>Dropout layer</strong> dictates that after each batch_size iterations we will randomly drop X% of all neurons in our LSTM layer (where we have max_features neurons) before backpropagation (the part where a neural net goes back and updates it&rsquo;s connection weights). The values outputted by these dropped neurons do not affect the weight changes sent by backpropagation, nor will they be changed by those weight changes. LSTMs are notorious over-fitters so this helps keep classifications unbiased.
+ Our <strong>Dense layer</strong> takes our multiple neurons outputted by our LSTM layer and outputs into a single scalar value
+Our Sigmoid Activation layer takes that single value and fits it to a score between the range of 0 and 1, great for binary classification.</p>
+
+<p>Now we split our data into train and test sets and fit the model, specifying a total of 10 epochs and designating 33% of our training data to be used as a holdout/validation set. Holdout cross validation isn’t ideal but we’ll talk about that later. Below we’ve plotted the loss vs epoch</p>
+
+<p><img src="/img/first-loss-graph.png" alt="first-loss-graph" /></p>
+
+<p>As you can see, after each epoch our training and validation loss both go down, which is what we love to see. Here you can see the sweet spot for training and validation data where validation error is at a global minimum</p>
+
+<p><img src="/img/ideal-loss.png" alt="ideal-loss-graph" /></p>
+
+<p>Our graph hasn’t evened out at the end which means our model is relatively under-fitted. We’ll talk more about that later and what we can do to correct it.</p>
+
+<p>Let’s use common metrics and visualizations to determine how well our model performed. We create a data frame with all our info and make a few new ones. Rounded_y_pred takes the sigmoid score and rounds to either 1 or 0 since our y_true values are either 1 or 0, and never in between.</p>
+
+<p>SIXTH IMAGE HERE
+<div class="highlight"><div style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4">
+<table style="border-spacing:0;padding:0;margin:0;border:0;width:auto;overflow:auto;display:block;"><tr><td style="vertical-align:top;padding:0;margin:0;border:0;">
+<pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-go" data-lang="go"><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">1
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">2
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">3
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">4
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">5
+</span></code></pre></td>
+<td style="vertical-align:top;padding:0;margin:0;border:0;">
+<pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-go" data-lang="go"><span style="color:#a6e22e">f1_score</span> = <span style="color:#a6e22e">metrics</span>.<span style="color:#a6e22e">f1_score</span>(<span style="color:#a6e22e">eval_df</span>[<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">y_true</span><span style="color:#960050;background-color:#1e0010">&#39;</span>].<span style="color:#a6e22e">values</span>, <span style="color:#a6e22e">eval_df</span>[<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">rounded_y_pred</span><span style="color:#960050;background-color:#1e0010">&#39;</span>].<span style="color:#a6e22e">values</span>)
+<span style="color:#a6e22e">accuracy_score</span> = <span style="color:#a6e22e">metrics</span>.<span style="color:#a6e22e">accuracy_score</span>(<span style="color:#a6e22e">eval_df</span>[<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">y_true</span><span style="color:#960050;background-color:#1e0010">&#39;</span>].<span style="color:#a6e22e">values</span>, <span style="color:#a6e22e">eval_df</span>[<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">rounded_y_pred</span><span style="color:#960050;background-color:#1e0010">&#39;</span>].<span style="color:#a6e22e">values</span>)
+print(<span style="color:#e6db74">&#34;accuracy score: &#34;</span>, <span style="color:#a6e22e">str</span>(<span style="color:#a6e22e">accuracy_score</span>))
+print(<span style="color:#e6db74">&#34;f1 score: &#34;</span>, <span style="color:#a6e22e">str</span>(<span style="color:#a6e22e">f1_score</span>))
+print(<span style="color:#e6db74">&#34;log loss score: &#34;</span>, <span style="color:#a6e22e">str</span>(<span style="color:#a6e22e">history</span>.<span style="color:#a6e22e">history</span>[<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">loss</span><span style="color:#960050;background-color:#1e0010">&#39;</span>][<span style="color:#f92672">-</span><span style="color:#ae81ff">1</span>]))</code></pre></td></tr></table>
+</div>
+</div></p>
+
+<pre><code>('accuracy score: ', '0.706134564644')
+('f1 score: ', '0.693076128143')
+('log loss score: ', '0.652990719739')
+</code></pre>
+
+<p><strong>F1</strong> gives us a score based off the number of true/false negatives and true/false positives. This is very informative for binary classification
+<strong>Accuracy</strong> gives us a fraction of how many correct classifications vs incorrect classifications.
+<strong>Log Loss</strong> goes a bit deeper. If our sigmoid score is 0.51 for the word “Annabelle”, that means our model is 51% sure that word is a name, which is a hesitant yes but a yes at the end of the day. This shows up as a perfect classification as far as F1 and Accuracy are concerned. Log Loss incorporates the uncertainty since ideally we want our model to give us a score as close as possible to 1.</p>
+
+<p>You might be confused about the difference between accuracy and f1 score, especially considering we got similar scores for both. Accuracy works well when you assign the same &ldquo;penalty&rdquo; to False Positives (FP) and False Negatives (FN), but this is
+not always the case. Imagine you work for NASA; you&rsquo;d much rather get a false alarm that oxygen levels are dangerously low compared to no alarm. For a more in depth explanation, look up Precision and Recall and how they relate to F1.</p>
+
+<p>While our scores aren&rsquo;t horrible, I think we can do better.</p>
+
+<p>One of the hardest parts of getting a well performing Neural Net is tuning the hyper parameters just right. While parameters are the weights used to favor certain connections over others, hyperparameters dictate how you’re going to fit your model. Tuning the hyperparameters results in more accurate parameters which gives us a better accuracy. Here we use GridSearchCV to wrap our neural network in a KerasClassifier object that tries every possible combination of hyper parameters that we pass in. There are many more hyper parameters but these are some of the most common for tuning.
+As an added benefit, GridSearchCV employs KFold cross validation instead of holdout validation, which means each one of our training points will be eventually used for training our model. If k = 3, we will perform training 3 times, where each time we use a different 66% of our data for training and 33% for cross validation.</p>
+
+<div class="highlight"><div style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4">
+<table style="border-spacing:0;padding:0;margin:0;border:0;width:auto;overflow:auto;display:block;"><tr><td style="vertical-align:top;padding:0;margin:0;border:0;">
+<pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-go" data-lang="go"><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 1
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 2
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 3
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 4
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 5
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 6
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 7
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 8
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 9
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">10
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">11
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">12
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">13
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">14
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">15
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">16
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">17
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">18
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">19
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">20
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">21
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">22
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">23
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">24
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">25
+</span></code></pre></td>
+<td style="vertical-align:top;padding:0;margin:0;border:0;">
+<pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-go" data-lang="go"><span style="color:#a6e22e">def</span> <span style="color:#a6e22e">create_model</span>(<span style="color:#a6e22e">shuffle</span>=<span style="color:#a6e22e">True</span>, <span style="color:#a6e22e">optimizer</span>=<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">Adam</span><span style="color:#960050;background-color:#1e0010">&#39;</span>, <span style="color:#a6e22e">dropout</span>=<span style="color:#ae81ff">0.5</span>, <span style="color:#a6e22e">embed_dimensions</span>=<span style="color:#ae81ff">64</span>):
+    <span style="color:#a6e22e">embedding_layer</span> = <span style="color:#a6e22e">Embedding</span>(<span style="color:#a6e22e">max_features</span>, <span style="color:#a6e22e">output_dim</span>=<span style="color:#a6e22e">embed_dimensions</span>, <span style="color:#a6e22e">input_length</span>=<span style="color:#a6e22e">max_word_len</span>)
+    <span style="color:#a6e22e">lstm_layer</span> = <span style="color:#a6e22e">LSTM</span>(<span style="color:#a6e22e">max_features</span>)
+    <span style="color:#a6e22e">dropout_layer</span> = <span style="color:#a6e22e">Dropout</span>(<span style="color:#a6e22e">dropout</span>)
+    <span style="color:#a6e22e">dense_layer</span> = <span style="color:#a6e22e">Dense</span>(<span style="color:#ae81ff">1</span>)
+    <span style="color:#a6e22e">sigmoid_layer</span> = <span style="color:#a6e22e">Activation</span>(<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">sigmoid</span><span style="color:#960050;background-color:#1e0010">&#39;</span>)
+    <span style="color:#a6e22e">model</span> = <span style="color:#a6e22e">Sequential</span>([<span style="color:#a6e22e">embedding_layer</span>, <span style="color:#a6e22e">lstm_layer</span>, <span style="color:#a6e22e">dropout_layer</span>, <span style="color:#a6e22e">dense_layer</span>, <span style="color:#a6e22e">sigmoid_layer</span>])
+    <span style="color:#a6e22e">model</span>.<span style="color:#a6e22e">compile</span>(<span style="color:#a6e22e">loss</span>=<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">binary_crossentropy</span><span style="color:#960050;background-color:#1e0010">&#39;</span>, <span style="color:#a6e22e">optimizer</span>=<span style="color:#a6e22e">optimizer</span>, <span style="color:#a6e22e">metrics</span>=[<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">accuracy</span><span style="color:#960050;background-color:#1e0010">&#39;</span>])
+    <span style="color:#66d9ef">return</span> <span style="color:#a6e22e">model</span>
+
+<span style="color:#a6e22e">X_train</span>, <span style="color:#a6e22e">X_test</span>, <span style="color:#a6e22e">y_train</span>, <span style="color:#a6e22e">y_test</span> = <span style="color:#a6e22e">train_test_split</span>(<span style="color:#a6e22e">x_data_sequences</span>, <span style="color:#a6e22e">y</span>, <span style="color:#a6e22e">test_size</span>=<span style="color:#ae81ff">0.2</span>, <span style="color:#a6e22e">random_state</span>=<span style="color:#ae81ff">0</span>)
+<span style="color:#a6e22e">batch_size</span> = <span style="color:#ae81ff">32</span>
+<span style="color:#a6e22e">epochs</span> = <span style="color:#ae81ff">5</span>
+<span style="color:#a6e22e">hyperparams_dict</span> = {
+    <span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">optimizer</span><span style="color:#960050;background-color:#1e0010">&#39;</span>: [<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">SGD</span><span style="color:#960050;background-color:#1e0010">&#39;</span>, <span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">Adam</span><span style="color:#960050;background-color:#1e0010">&#39;</span>, <span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">RMSprop</span><span style="color:#960050;background-color:#1e0010">&#39;</span>, <span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">Adagrad</span><span style="color:#960050;background-color:#1e0010">&#39;</span>, <span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">Adadelta</span><span style="color:#960050;background-color:#1e0010">&#39;</span>, <span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">Adamax</span><span style="color:#960050;background-color:#1e0010">&#39;</span>],
+    <span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">dropout</span><span style="color:#960050;background-color:#1e0010">&#39;</span>: [<span style="color:#ae81ff">0.1</span>, <span style="color:#ae81ff">0.2</span>, <span style="color:#ae81ff">0.4</span>, <span style="color:#ae81ff">0.6</span>],
+    <span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">embed_dimensions</span><span style="color:#960050;background-color:#1e0010">&#39;</span>: [<span style="color:#ae81ff">32</span>,<span style="color:#ae81ff">64</span>,<span style="color:#ae81ff">128</span>,<span style="color:#ae81ff">256</span>],
+    <span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">batch_size</span><span style="color:#960050;background-color:#1e0010">&#39;</span>: [<span style="color:#ae81ff">20</span>, <span style="color:#ae81ff">32</span>, <span style="color:#ae81ff">40</span>],
+    <span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">epochs</span><span style="color:#960050;background-color:#1e0010">&#39;</span>: [<span style="color:#ae81ff">18</span>,<span style="color:#ae81ff">25</span>]
+}
+
+<span style="color:#a6e22e">model</span> = <span style="color:#a6e22e">KerasClassifier</span>(<span style="color:#a6e22e">build_fn</span>=<span style="color:#a6e22e">create_model</span>, <span style="color:#a6e22e">verbose</span>=<span style="color:#ae81ff">1</span>)
+<span style="color:#a6e22e">grid</span> = <span style="color:#a6e22e">GridSearchCV</span>(<span style="color:#a6e22e">estimator</span>=<span style="color:#a6e22e">model</span>, <span style="color:#a6e22e">param_grid</span>=<span style="color:#a6e22e">hyperparams_dict</span>, <span style="color:#a6e22e">n_jobs</span>=<span style="color:#f92672">-</span><span style="color:#ae81ff">1</span>, <span style="color:#a6e22e">verbose</span>=<span style="color:#ae81ff">1</span>)
+<span style="color:#a6e22e">grid_result</span> = <span style="color:#a6e22e">grid</span>.<span style="color:#a6e22e">fit</span>(<span style="color:#a6e22e">X_train</span>, <span style="color:#a6e22e">y_train</span>)
+print(<span style="color:#e6db74">&#34;Best: %f using %s&#34;</span> <span style="color:#f92672">%</span> (<span style="color:#a6e22e">grid_result</span>.<span style="color:#a6e22e">best_score_</span>, <span style="color:#a6e22e">grid_result</span>.<span style="color:#a6e22e">best_params_</span>))</code></pre></td></tr></table>
+</div>
+</div>
+
+<p>Sure enough, grid_result tells us we get our best accuracy with &hellip;<br>
+<em>Optimizer</em>: Adam <br>
+<em>Dropout</em>: 0.2 <br>
+<em>embed_dimensions</em>: 256 <br>
+<em>batch_size</em>: 32 <br>
+<em>Epochs</em>: 18 <br></p>
+
+<p>Disclaimer, have your computer plugged in if you’re going to try this. I ran this overnight and it was still running when I woke up.
+Now let’s take a look at our updated loss graph</p>
+
+<p><img src="/img/second-loss-graph.png" alt="second-loss-graph" /></p>
+
+<p>As you can see, we’re much closer to that sweet spot that’s circled above. Thanks to gridsearchcv our loss at epoch 1 is less than our loss at epoch 15 when using our initial hyper parameters. Now let’s check out our accuracy</p>
+
+<pre><code>accuracy score:  0.822559366755
+f1 score:  0.821381142098
+log loss score:  0.315463500042
+</code></pre>
+
+<p>As you can see we improved our metrics across the board by at least 10 points.</p>
+
+<p>Exploratory Data Analysis (EDA) is an important part of any data science problem. Rarely do you get perfect classification on your first try. Once you understand what type of problems your model is bad at you can attack the problem at the source</p>
+
+<p>Below I’ve visualized the F1 scores, where green dots are points that are correctly classified while red points are incorrectly classified. Points appearing in the left side of the screen received a sigmoid score of under 0.5, meaning our model predicted they are words, while points on the right side are predicted as names. The worst thing we can see is a bunch of red points at either extreme. Here I’m graphing the sigmoid score against word length and vowel:consonant ratio to see if either of those features affect classification.</p>
+
+<p><img src="/img/vowel-graph.png" alt="vowel ratio" />
+<img src="/img/word-length-graph.png" alt="word length" /></p>
+
+<p>Unfortunately there doesn’t appear to be any correlation between these dependent and independent variables. However, I think this is a great opportunity to point something out. Looking at our graph above, we have a bunch of green points located at both X-axis extremes of the graph; this is great because it means not only did we correctly classify both words and names, but often times we were VERY sure it was a name (sigmoid score ~1) or a word (sigmoid score ~1). Additionally, most of our red dots (incorrectly classified) are located near the 0.5 barrier, which means although we got them wrong, our model never claimed to be sure about it&rsquo;s classification. Let&rsquo;s compare this to the same graph we got before we tuned our hyperparameters</p>
+
+<p><img src="/img/first-wordlength.png" alt="first word length" /></p>
+
+<p>As you can see, our original model had plenty of correctly classified words and names but all our green points gravitated towards the middle barrier. Out of the thousands of words we passed in, there wasn&rsquo;t a
+single one that our model was 80% sure of what it was. I know we already mentioned that our accuracy went up a few percentage points after hyperparameter tuning, but I think this does a better job showing us exactly how much better of a model we ended up with.</p>
+
+<p>We ended up with a model that can distinguish between names and non-names with an accuracy of 83% and a log loss bordering 0.3. Ideally it would have performed a little better, however, I have to cut it some slack. Names are a tough domain, and most of the times I’m not even sure how I feel about specific ones. Liz Lemon has plenty of opinions though</p>
+
+<p><a href="https://www.youtube.com/watch?v=-2hvM-FNOEA">https://www.youtube.com/watch?v=-2hvM-FNOEA</a></p>
+
+<p>What if we wanted to go one step further and not just determine if it’s a name, but a boys name or girls name. A few things change. First off we modify our pandas data wrangling. Here we use scikit-learn get_dummies to one hot encode our vectors. If there are X possible values, one-hot encoding creates X new features, where each row will be populated by all 0’s except for one 1 designating</p>
+
+<div class="highlight"><div style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4">
+<table style="border-spacing:0;padding:0;margin:0;border:0;width:auto;overflow:auto;display:block;"><tr><td style="vertical-align:top;padding:0;margin:0;border:0;">
+<pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-go" data-lang="go"><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 1
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 2
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 3
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 4
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 5
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 6
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 7
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 8
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 9
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">10
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">11
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">12
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">13
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">14
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">15
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">16
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">17
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">18
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">19
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">20
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">21
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">22
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">23
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">24
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">25
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">26
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">27
+</span></code></pre></td>
+<td style="vertical-align:top;padding:0;margin:0;border:0;">
+<pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-go" data-lang="go"><span style="color:#a6e22e">male_df</span> = <span style="color:#a6e22e">pd</span>.<span style="color:#a6e22e">DataFrame</span>({<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">word</span><span style="color:#960050;background-color:#1e0010">&#39;</span>: <span style="color:#a6e22e">male_list</span>, <span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">target</span><span style="color:#960050;background-color:#1e0010">&#39;</span>: <span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">male</span><span style="color:#960050;background-color:#1e0010">&#39;</span>})
+<span style="color:#a6e22e">female_df</span> = <span style="color:#a6e22e">pd</span>.<span style="color:#a6e22e">DataFrame</span>({<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">word</span><span style="color:#960050;background-color:#1e0010">&#39;</span>: <span style="color:#a6e22e">female_list</span>, <span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">target</span><span style="color:#960050;background-color:#1e0010">&#39;</span>: <span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">female</span><span style="color:#960050;background-color:#1e0010">&#39;</span>})
+
+<span style="color:#a6e22e">internet_df</span> = <span style="color:#a6e22e">pd</span>.<span style="color:#a6e22e">read_csv</span>(<span style="color:#a6e22e">internet_words_path</span>) <span style="color:#960050;background-color:#1e0010">#</span><span style="color:#a6e22e">want</span> <span style="color:#ae81ff">50</span><span style="color:#f92672">/</span><span style="color:#ae81ff">50</span> <span style="color:#a6e22e">distribution</span> <span style="color:#a6e22e">in</span> <span style="color:#a6e22e">data</span>, <span style="color:#a6e22e">no</span> <span style="color:#a6e22e">bias</span>
+<span style="color:#a6e22e">internet_df</span> = <span style="color:#a6e22e">internet_df</span>.<span style="color:#a6e22e">drop</span>([<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">count</span><span style="color:#960050;background-color:#1e0010">&#39;</span>], <span style="color:#a6e22e">axis</span>=<span style="color:#ae81ff">1</span>)
+<span style="color:#a6e22e">internet_df</span>[<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">target</span><span style="color:#960050;background-color:#1e0010">&#39;</span>] = <span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">internet</span><span style="color:#960050;background-color:#1e0010">&#39;</span>
+
+<span style="color:#a6e22e">min_size</span> = <span style="color:#a6e22e">min</span>(<span style="color:#a6e22e">male_df</span>.<span style="color:#a6e22e">shape</span>[<span style="color:#ae81ff">0</span>], <span style="color:#a6e22e">female_df</span>.<span style="color:#a6e22e">shape</span>[<span style="color:#ae81ff">0</span>], <span style="color:#a6e22e">internet_df</span>.<span style="color:#a6e22e">shape</span>[<span style="color:#ae81ff">0</span>])
+<span style="color:#a6e22e">male_df</span> = <span style="color:#a6e22e">male_df</span>.<span style="color:#a6e22e">head</span>(<span style="color:#a6e22e">min_size</span>)
+<span style="color:#a6e22e">female_df</span> = <span style="color:#a6e22e">female_df</span>.<span style="color:#a6e22e">head</span>(<span style="color:#a6e22e">min_size</span>)
+<span style="color:#a6e22e">internet_df</span> = <span style="color:#a6e22e">internet_df</span>.<span style="color:#a6e22e">head</span>(<span style="color:#a6e22e">min_size</span>)
+<span style="color:#960050;background-color:#1e0010">#</span> <span style="color:#a6e22e">merge</span> <span style="color:#a6e22e">them</span>, <span style="color:#a6e22e">clean</span> <span style="color:#a6e22e">up</span>, <span style="color:#a6e22e">apply</span> <span style="color:#a6e22e">get_dummies</span>
+<span style="color:#a6e22e">name_frames</span> = [<span style="color:#a6e22e">male_df</span>, <span style="color:#a6e22e">female_df</span>]
+<span style="color:#a6e22e">merged_names_df</span> = <span style="color:#a6e22e">pd</span>.<span style="color:#a6e22e">concat</span>(<span style="color:#a6e22e">name_frames</span>)
+<span style="color:#960050;background-color:#1e0010">#</span><span style="color:#a6e22e">drop</span> <span style="color:#a6e22e">all</span> <span style="color:#a6e22e">names</span> <span style="color:#a6e22e">that</span> <span style="color:#a6e22e">appear</span> <span style="color:#a6e22e">in</span> <span style="color:#a6e22e">both</span> <span style="color:#a6e22e">boys</span> <span style="color:#a6e22e">and</span> <span style="color:#a6e22e">girls</span> <span style="color:#a6e22e">list</span>. <span style="color:#a6e22e">Later</span> <span style="color:#a6e22e">we</span><span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">ll</span> <span style="color:#a6e22e">test</span> <span style="color:#a6e22e">some</span> <span style="color:#a6e22e">gender</span> <span style="color:#a6e22e">neutral</span> <span style="color:#a6e22e">names</span>
+<span style="color:#a6e22e">merged_names_df</span> = <span style="color:#a6e22e">merged_names_df</span>.<span style="color:#a6e22e">drop_duplicates</span>(<span style="color:#a6e22e">subset</span>=<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">word</span><span style="color:#960050;background-color:#1e0010">&#39;</span>, <span style="color:#a6e22e">keep</span>=<span style="color:#a6e22e">False</span>)
+
+<span style="color:#a6e22e">frames</span> = [<span style="color:#a6e22e">merged_names_df</span>, <span style="color:#a6e22e">internet_df</span>]
+<span style="color:#a6e22e">merged_df</span> = <span style="color:#a6e22e">pd</span>.<span style="color:#a6e22e">concat</span>(<span style="color:#a6e22e">frames</span>)
+<span style="color:#960050;background-color:#1e0010">#</span><span style="color:#a6e22e">drop</span> <span style="color:#a6e22e">all</span> <span style="color:#a6e22e">names</span> <span style="color:#a6e22e">that</span> <span style="color:#a6e22e">appear</span> <span style="color:#a6e22e">in</span> <span style="color:#a6e22e">internet</span> <span style="color:#a6e22e">words</span> <span style="color:#a6e22e">list</span> <span style="color:#a6e22e">by</span> <span style="color:#a6e22e">getting</span> <span style="color:#a6e22e">rid</span> <span style="color:#a6e22e">of</span> <span style="color:#a6e22e">duplicates</span> <span style="color:#a6e22e">and</span> <span style="color:#a6e22e">keeping</span> <span style="color:#a6e22e">first</span> (<span style="color:#a6e22e">merged_names</span>)
+<span style="color:#a6e22e">merged_df</span> = <span style="color:#a6e22e">merged_df</span>.<span style="color:#a6e22e">drop_duplicates</span>(<span style="color:#a6e22e">subset</span>=<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">word</span><span style="color:#960050;background-color:#1e0010">&#39;</span>, <span style="color:#a6e22e">keep</span>=<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">first</span><span style="color:#960050;background-color:#1e0010">&#39;</span>)
+<span style="color:#a6e22e">merged_df</span> = <span style="color:#a6e22e">pd</span>.<span style="color:#a6e22e">get_dummies</span>(<span style="color:#a6e22e">merged_df</span>, <span style="color:#a6e22e">columns</span>=[<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">target</span><span style="color:#960050;background-color:#1e0010">&#39;</span>])
+<span style="color:#a6e22e">merged_df</span> = <span style="color:#a6e22e">merged_df</span>.<span style="color:#a6e22e">dropna</span>()
+<span style="color:#a6e22e">merged_df</span>[<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">word</span><span style="color:#960050;background-color:#1e0010">&#39;</span>] = <span style="color:#a6e22e">merged_df</span>[<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">word</span><span style="color:#960050;background-color:#1e0010">&#39;</span>].<span style="color:#a6e22e">str</span>.<span style="color:#a6e22e">lower</span>()
+<span style="color:#a6e22e">merged_df</span>[<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">word</span><span style="color:#960050;background-color:#1e0010">&#39;</span>] = <span style="color:#a6e22e">merged_df</span>[<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">word</span><span style="color:#960050;background-color:#1e0010">&#39;</span>].<span style="color:#a6e22e">str</span>.<span style="color:#a6e22e">strip</span>()
+<span style="color:#a6e22e">X</span> = <span style="color:#a6e22e">merged_df</span>[<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">word</span><span style="color:#960050;background-color:#1e0010">&#39;</span>]
+<span style="color:#a6e22e">y</span> = <span style="color:#a6e22e">merged_df</span>[[<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">target_male</span><span style="color:#960050;background-color:#1e0010">&#39;</span>, <span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">target_female</span><span style="color:#960050;background-color:#1e0010">&#39;</span>, <span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">target_internet</span><span style="color:#960050;background-color:#1e0010">&#39;</span>]].<span style="color:#a6e22e">values</span></code></pre></td></tr></table>
+</div>
+</div>
+
+<p>We can no longer assign a target of 1 or 0 because it’s no longer a yes no question and answer. Similarly, a single sigmoid score in the range of 0-1 doesn’t tell us much. Why can’t we just separate sigmoid into [0-.33) (.33-.66] (.66 -1] ? Final scores are highly dependent on the gradient that exists at your exact point on an activation function. Sigmoid has a steep curve between X values [-2,2], which means at that region, small changes in X will result in huge changes in Y. This means the function naturally gravitates towards either end of the curve.</p>
+
+<p>Instead we will use a softmax function, which is the multi class version of sigmoid which assigns probability scores to all classes where all probabilities add to 1. Here is the code for our new three way classification problem</p>
+
+<div class="highlight"><div style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4">
+<table style="border-spacing:0;padding:0;margin:0;border:0;width:auto;overflow:auto;display:block;"><tr><td style="vertical-align:top;padding:0;margin:0;border:0;">
+<pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-go" data-lang="go"><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 1
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 2
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 3
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 4
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 5
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 6
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 7
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 8
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79"> 9
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">10
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">11
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">12
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">13
+</span></code></pre></td>
+<td style="vertical-align:top;padding:0;margin:0;border:0;">
+<pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-go" data-lang="go"><span style="color:#a6e22e">batch_size</span> = <span style="color:#ae81ff">32</span>
+<span style="color:#a6e22e">epochs</span> = <span style="color:#ae81ff">18</span>
+<span style="color:#a6e22e">embedding_layer</span> = <span style="color:#a6e22e">Embedding</span>(<span style="color:#a6e22e">max_features</span>, <span style="color:#ae81ff">256</span>, <span style="color:#a6e22e">input_length</span>=<span style="color:#a6e22e">max_word_len</span>)
+<span style="color:#a6e22e">lstm_layer</span> = <span style="color:#a6e22e">LSTM</span>(<span style="color:#a6e22e">max_features</span>)
+<span style="color:#a6e22e">dropout_layer</span> = <span style="color:#a6e22e">Dropout</span>(<span style="color:#ae81ff">0.2</span>)
+<span style="color:#a6e22e">dense_layer</span> = <span style="color:#a6e22e">Dense</span>(<span style="color:#ae81ff">3</span>)
+<span style="color:#a6e22e">softmax_layer</span> = <span style="color:#a6e22e">Activation</span>(<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">softmax</span><span style="color:#960050;background-color:#1e0010">&#39;</span>)
+
+<span style="color:#a6e22e">model</span> = <span style="color:#a6e22e">Sequential</span>([<span style="color:#a6e22e">embedding_layer</span>, <span style="color:#a6e22e">lstm_layer</span>, <span style="color:#a6e22e">dropout_layer</span>, <span style="color:#a6e22e">dense_layer</span>, <span style="color:#a6e22e">softmax_layer</span>])
+<span style="color:#a6e22e">model</span>.<span style="color:#a6e22e">compile</span>(<span style="color:#a6e22e">loss</span>=<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">binary_crossentropy</span><span style="color:#960050;background-color:#1e0010">&#39;</span>, <span style="color:#a6e22e">optimizer</span>=<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">adam</span><span style="color:#960050;background-color:#1e0010">&#39;</span>)
+
+<span style="color:#a6e22e">X_train</span>, <span style="color:#a6e22e">X_test</span>, <span style="color:#a6e22e">y_train</span>, <span style="color:#a6e22e">y_test</span> = <span style="color:#a6e22e">train_test_split</span>(<span style="color:#a6e22e">x_data_sequences</span>, <span style="color:#a6e22e">y</span>, <span style="color:#a6e22e">test_size</span>=<span style="color:#ae81ff">0.2</span>, <span style="color:#a6e22e">random_state</span>=<span style="color:#ae81ff">0</span>)
+<span style="color:#a6e22e">history</span> = <span style="color:#a6e22e">model</span>.<span style="color:#a6e22e">fit</span>(<span style="color:#a6e22e">X_train</span>, <span style="color:#a6e22e">y_train</span>, <span style="color:#a6e22e">epochs</span> = <span style="color:#a6e22e">epochs</span>, <span style="color:#a6e22e">batch_size</span> = <span style="color:#a6e22e">batch_size</span>, <span style="color:#a6e22e">validation_split</span>=<span style="color:#ae81ff">0.33</span>)</code></pre></td></tr></table>
+</div>
+</div>
+
+<p>Notice that we changed our Dense(1) layer to Dense(3). That’s because before we wanted to pass a single scalar to our Sigmoid function, but now we want to pass 3 scores to our softmax function, to be converted to probabilities of certainty. Anything different would cause an error complaining on mismatched dimensions. We fit our model and use it to predict values on test data, indicated by y_true columns. Below are a few examples of how our predictions match up against our true test values.</p>
+
+<p><img src="/img/softmax-predictions.png" alt="softmax-predictions" /></p>
+
+<p>Woah. That&rsquo;s both impressive and hard to interpret so let me explain. When we changed our y-values to an array
+built by pd.get_dummies, we passed in targets where each target was an array of size 3, where each array has
+exactly 1 non-zero value. If the non-zero is in the first, second, or third column it represents a row that is
+a boys name, girls name, or internet word, respectively.
+Also remember, our softmax function turns our three scores (outputted by our Dense(3) layer) into probabilities
+of certainty for classification, where all certainties add up to 100%. Taking the first row as an example, our
+neural network is 18% sure that the word &ldquo;usually&rdquo; is a boys name, 20% sure it&rsquo;s a girls name, and 62% sure it&rsquo;s
+an internet word. In softmax world, we go with whatever classification has the highest confidence, which here is resoundingly
+class 3, or belonging to an internet word. The &ldquo;flat y_true&rdquo; column indicates the index of the nonzero value which indicates the class, while flat y_pred gives you the index of the maximum element in y_pred, which is the predicted classification.</p>
+
+<p>Here are some more examples in case you want to continue to pit yourself against a machine</p>
+
+<p><img src="/img/more-softmax-predictions.png" alt="more-softmax-predictions" /></p>
+
+<p>Pshh&hellip;. I knew Grissel was a girls name. Teutonic baby name
+meaning gray haired heroine. That name is <em>so hot right now</em>.</p>
+
+<p>Now let’s check out our new overall accuracy</p>
+
+<div class="highlight"><div style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4">
+<table style="border-spacing:0;padding:0;margin:0;border:0;width:auto;overflow:auto;display:block;"><tr><td style="vertical-align:top;padding:0;margin:0;border:0;">
+<pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-go" data-lang="go"><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">1
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">2
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">3
+</span></code></pre></td>
+<td style="vertical-align:top;padding:0;margin:0;border:0;">
+<pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-go" data-lang="go"><span style="color:#a6e22e">accuracy</span> <span style="color:#a6e22e">score</span>:  <span style="color:#ae81ff">0.747023809524</span>
+<span style="color:#a6e22e">f1</span> <span style="color:#a6e22e">score</span>:  <span style="color:#ae81ff">0.745798319467</span>
+<span style="color:#a6e22e">log</span> <span style="color:#a6e22e">loss</span> <span style="color:#a6e22e">score</span>:  <span style="color:#ae81ff">0.34714355651</span></code></pre></td></tr></table>
+</div>
+</div>
+
+<p>While our overall accuracy went down after switching from two-way to three-way classification, it makes sense. While initially our model had a 50% chance of getting any prediction correct, now it has a 33% chance, and there are many names where it’s a toss up (Taylor, Charlie, Alex ect&hellip;). In fact, let’s try some of those out.</p>
+
+<p>First, it&rsquo;s important to ensure that none of these &ldquo;testing&rdquo; words appear in our training dataset. Otherwise
+it&rsquo;s cheating, because our model has already seen them and seen the correct label</p>
+
+<p><div class="highlight"><div style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4">
+<table style="border-spacing:0;padding:0;margin:0;border:0;width:auto;overflow:auto;display:block;"><tr><td style="vertical-align:top;padding:0;margin:0;border:0;">
+<pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-go" data-lang="go"><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">1
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">2
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">3
+</span><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">4
+</span></code></pre></td>
+<td style="vertical-align:top;padding:0;margin:0;border:0;">
+<pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-go" data-lang="go"><span style="color:#a6e22e">training_names</span> = [<span style="color:#a6e22e">sequence_to_string</span>(<span style="color:#a6e22e">x</span>) <span style="color:#66d9ef">for</span> <span style="color:#a6e22e">x</span> <span style="color:#a6e22e">in</span> <span style="color:#a6e22e">X_train</span>]
+<span style="color:#a6e22e">difficult_names</span> = [<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">taylor</span><span style="color:#e6db74">&#39;,&#39;</span><span style="color:#a6e22e">tylor</span><span style="color:#e6db74">&#39;,&#39;</span><span style="color:#a6e22e">charli</span><span style="color:#e6db74">&#39;,&#39;</span><span style="color:#a6e22e">charlie</span><span style="color:#e6db74">&#39;,&#39;</span><span style="color:#a6e22e">charlie</span><span style="color:#e6db74">&#39;,&#39;</span><span style="color:#a6e22e">alex</span><span style="color:#e6db74">&#39;,&#39;</span><span style="color:#a6e22e">alexandra</span><span style="color:#e6db74">&#39;,&#39;</span><span style="color:#a6e22e">alexander</span><span style="color:#960050;background-color:#1e0010">&#39;</span>]
+<span style="color:#a6e22e">in_training</span> = <span style="color:#a6e22e">set</span>(<span style="color:#a6e22e">training_names</span>).<span style="color:#a6e22e">intersection</span>(<span style="color:#a6e22e">set</span>(<span style="color:#a6e22e">difficult_names</span>))
+print(<span style="color:#a6e22e">list</span>(<span style="color:#a6e22e">in_training</span>))</code></pre></td></tr></table>
+</div>
+</div>
+<div class="highlight"><div style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4">
+<table style="border-spacing:0;padding:0;margin:0;border:0;width:auto;overflow:auto;display:block;"><tr><td style="vertical-align:top;padding:0;margin:0;border:0;">
+<pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-go" data-lang="go"><span style="margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7c7c79">1
+</span></code></pre></td>
+<td style="vertical-align:top;padding:0;margin:0;border:0;">
+<pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-go" data-lang="go">[<span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">charlie</span><span style="color:#960050;background-color:#1e0010">&#39;</span>, <span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">taylor</span><span style="color:#960050;background-color:#1e0010">&#39;</span>, <span style="color:#960050;background-color:#1e0010">&#39;</span><span style="color:#a6e22e">alexander</span><span style="color:#960050;background-color:#1e0010">&#39;</span>]</code></pre></td></tr></table>
+</div>
+</div></p>
+
+<p>Now we know that we can&rsquo;t test our model on Charlie, Taylor, or Alexander. Moving forwards, let&rsquo;s see what our model (which correctly called Grissel btw) tells us about these other names</p>
+
+<p><img src="/img/final-testing-names.png" alt="final-testing-names" /></p>
+
+<p>You love to see that. Our model correctly identified Charli as a girls name and Charly as (just barely) a boys name. Think our model could have done better? 2 of these are celebrity baby names, 2 are regular words. Let’s see how many you get.</p>
+
+<p>Bronx
+Journey
+Racer
+Java</p>
+
+<p>Referenced below are all notebooks used in this blogpost. Want to try your own words? Check out plug-n-chug-softmax.</p>
+
+<p>And there you have it! Today we learned what it takes to reformat and preprocess your data, pass through an LSTM network, perform grid search cross validation to perfect hyper parameters, and evaluate your results. Thank you for reading, if you’ve gotten this far you should know I lied, all four of those names belong to real celebrity babies. A special thanks to Domenic Puzio <a href="https://www.linkedin.com/in/domenicpuzio/">https://www.linkedin.com/in/domenicpuzio/</a> for helping me get started and machinelearningmastery for covering every topic under the sun.</p>
+
+      </div>
+    </article>
+    <aside class="ph3 mt2 mt6-ns">
+      
+
+
+
+
+
+
+
+
+    </aside>
+  </div>
+
+    </main>
+    <footer class="bg-near-black bottom-0 w-100 pa3" role="contentinfo">
+  <div class="flex justify-between">
+  <a class="f4 fw4 hover-white no-underline white-70 dn dib-ns pv2 ph3" href="https://danep93.github.io/lstm-names/" >
+    &copy; 2018 Data Science Exploration
+  </a>
+  
+
+
+
+
+
+
+
+
+  </div>
+</footer>
+
+    <script src="https://danep93.github.io/lstm-names/dist/app.bundle.js" async></script>
+
+  </body>
+</html>
diff --git a/sitemap.xml b/sitemap.xml
index 16c5683..3bba0a8 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -3,7 +3,7 @@
   xmlns:xhtml="http://www.w3.org/1999/xhtml">
   
   <url>
-    <loc>https://danep93.github.io/lstm-names/posts/my-first-post/</loc>
+    <loc>https://danep93.github.io/lstm-names/posts/what-is-a-name/</loc>
     <lastmod>2018-03-28T13:55:10-04:00</lastmod>
   </url>