Skip to content

A playground project for learning about and experimenting with LLMs

Notifications You must be signed in to change notification settings

BrentGrammer/learning-LLMs

Repository files navigation

Learning about LLMs

Playground project to learn about LLMs.

Sources:

Prerequisites:

  • Python Jupyter Notebook
  • Pytorch
  • Matplotlib

Table of Contents

LLMs

  • Bigram LLM
    • The main idea is to get a count of character pairs which occur in the text.
    • Arrange the dataset of the text so that the character pairs per letter are row-wise and column-wise (the second char in the pair in the col is the first char in the row)
    • Get a probability of a letter following another letter based on the character pairs in a row (Char Pair for a letter / Total Count of Char pair occurences for that letter)
    • repeat the loop since the column selected lines up with the starting char of the next pair by row index (repeat loop on that row)
  • Bigram LLM built with a Neural Network

Neural Networks

Architecture of a Neural Network

  • Made up of inputs, weights and bias that are inputs to layers of Neurons
  • Loss is calculated after data passes through the layers
    • Mean squared error, Max-margin, Cross Entropy Loss, Negative Log Likelihood
    • For regression, use Mean squared error, for Classification use Negative Log Likelihood
  • Back propagation pass is done to determine weight/bias adjustments needed to get closer to target output
  • Gradient Descent: Loop back to running predictions with the upated weights and repeat Loss back propagation and parameter adjustments to continually lower the Loss

Primary Components of a Neuron:

Visual Model of a Neuron

  • $x_n$: Inputs to the neuron

  • $w_n$: Weights (on the synapses)

  • Processing in the Neuron: The set of weights multiplied by their corresponding inputs with a bias

    • what flows to the neuron are the multiple sets of inputs multiplied by the weights: $w_1 \times x_1, w_2 \times x_2, \ldots, w_n \times x_n$

    • Added to this is some bias $b$ which can be used to adjust the sensitivity or "trigger happiness" of the neuron regardless of the input. $$\sum_n w_n x_n + b$$

    • The product of the inputs, weights with the bias is piped to an Activation Function

      • The Activation Function is usually a squashing function of some kind (Sigmoid, Relu or Tanh)

      • The squashing function squashes so that the output plateaus and caps smoothly at 1 or -1 (as the inputs are increased or decreased from zero):

        tanh function

    • The output of the neuron is the Activation function applied to the dot product of the weights/inputs+bias: $$f\left(\sum_n x_n w_n + b\right)$$

Layer of Neurons

Python Notebook

  • A set of Neurons evaluated independently

    • Each neuron in a layer is not connected to each other, but are connected to all other neurons or inputs in adjacent layers.

      Nueron Layers

Multi-Layer Perceptron (MLP)

Python Notebook

  • A network with multiple Layers of Neurons
  • The Layers feed into each other sequentially (in order)

Note on Large Datasets

  • In practice, for very large datasets, batching is done which takes a smaller subset of the data and uses that for the forward and backward pass
  • See Andrew Karpathy's Micrograd demo for example code
def loss(batch_size=None):

    if batch_size is None:
        Xb, yb = X, y
    else:
        ri = np.random.permutation(X.shape[0])[:batch_size]
        Xb, yb = X[ri], y[ri]

About

A playground project for learning about and experimenting with LLMs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published