Learning about LLMs

Playground project to learn about LLMs.

Sources:

Andrej Karpathy's makemore project

Prerequisites:

Python Jupyter Notebook
Pytorch
Matplotlib

Bigram LLM
- The main idea is to get a count of character pairs which occur in the text.
- Arrange the dataset of the text so that the character pairs per letter are row-wise and column-wise (the second char in the pair in the col is the first char in the row)
- Get a probability of a letter following another letter based on the character pairs in a row (Char Pair for a letter / Total Count of Char pair occurences for that letter)
- repeat the loop since the column selected lines up with the starting char of the next pair by row index (repeat loop on that row)
Bigram LLM built with a Neural Network

Neural Networks

Neural Networks
- Following Andrej Karpathy's building micrograd
- Derivatives
- Notebook
  - Back Propagation using the Chain Rule

Architecture of a Neural Network

Made up of inputs, weights and bias that are inputs to layers of Neurons
Loss is calculated after data passes through the layers
- Mean squared error, Max-margin, Cross Entropy Loss, Negative Log Likelihood
- For regression, use Mean squared error, for Classification use Negative Log Likelihood
Back propagation pass is done to determine weight/bias adjustments needed to get closer to target output
Gradient Descent: Loop back to running predictions with the upated weights and repeat Loss back propagation and parameter adjustments to continually lower the Loss

Primary Components of a Neuron:

Visual Model of a Neuron

$x_n$: Inputs to the neuron
$w_n$: Weights (on the synapses)
Processing in the Neuron: The set of weights multiplied by their corresponding inputs with a bias
- what flows to the neuron are the multiple sets of inputs multiplied by the weights: $w_1 \times x_1, w_2 \times x_2, \ldots, w_n \times x_n$
- Added to this is some bias $b$ which can be used to adjust the sensitivity or "trigger happiness" of the neuron regardless of the input. $$\sum_n w_n x_n + b$$
- The product of the inputs, weights with the bias is piped to an Activation Function
  - The Activation Function is usually a squashing function of some kind (Sigmoid, Relu or Tanh)
  - The squashing function squashes so that the output plateaus and caps smoothly at 1 or -1 (as the inputs are increased or decreased from zero):
- The output of the neuron is the Activation function applied to the dot product of the weights/inputs+bias: $$f\left(\sum_n x_n w_n + b\right)$$

Layer of Neurons

Python Notebook

A set of Neurons evaluated independently
- Each neuron in a layer is not connected to each other, but are connected to all other neurons or inputs in adjacent layers.

Multi-Layer Perceptron (MLP)

Python Notebook

A network with multiple Layers of Neurons
The Layers feed into each other sequentially (in order)

Note on Large Datasets

In practice, for very large datasets, batching is done which takes a smaller subset of the data and uses that for the forward and backward pass
See Andrew Karpathy's Micrograd demo for example code

def loss(batch_size=None):

    if batch_size is None:
        Xb, yb = X, y
    else:
        ri = np.random.permutation(X.shape[0])[:batch_size]
        Xb, yb = X[ri], y[ri]

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
NeuralNetworks		NeuralNetworks
README.md		README.md
bigramllm.ipynb		bigramllm.ipynb
bigrams_neuralnetwork.ipynb		bigrams_neuralnetwork.ipynb
names.txt		names.txt
neural_net.jpeg		neural_net.jpeg
tanh.png		tanh.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning about LLMs

Sources:

Prerequisites:

Table of Contents

LLMs

Neural Networks

Architecture of a Neural Network

Primary Components of a Neuron:

Layer of Neurons

Multi-Layer Perceptron (MLP)

Note on Large Datasets

About

Releases

Packages

Languages

BrentGrammer/learning-LLMs

Folders and files

Latest commit

History

Repository files navigation

Learning about LLMs

Sources:

Prerequisites:

Table of Contents

LLMs

Neural Networks

Architecture of a Neural Network

Primary Components of a Neuron:

Layer of Neurons

Multi-Layer Perceptron (MLP)

Note on Large Datasets

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages