Skip to content

A playground project for learning about and experimenting with LLMs

Notifications You must be signed in to change notification settings


Repository files navigation

Learning about LLMs

Playground project to learn about LLMs.



  • Python Jupyter Notebook
  • Pytorch
  • Matplotlib

Table of Contents


  • Bigram LLM
    • The main idea is to get a count of character pairs which occur in the text.
    • Arrange the dataset of the text so that the character pairs per letter are row-wise and column-wise (the second char in the pair in the col is the first char in the row)
    • Get a probability of a letter following another letter based on the character pairs in a row (Char Pair for a letter / Total Count of Char pair occurences for that letter)
    • repeat the loop since the column selected lines up with the starting char of the next pair by row index (repeat loop on that row)
  • Bigram LLM built with a Neural Network

Neural Networks

Architecture of a Neural Network

  • Made up of inputs, weights and bias that are inputs to layers of Neurons
  • Loss is calculated after data passes through the layers
    • Mean squared error, Max-margin, Cross Entropy Loss, Negative Log Likelihood
    • For regression, use Mean squared error, for Classification use Negative Log Likelihood
  • Back propagation pass is done to determine weight/bias adjustments needed to get closer to target output
  • Gradient Descent: Loop back to running predictions with the upated weights and repeat Loss back propagation and parameter adjustments to continually lower the Loss

Primary Components of a Neuron:

Visual Model of a Neuron

  • $x_n$: Inputs to the neuron

  • $w_n$: Weights (on the synapses)

  • Processing in the Neuron: The set of weights multiplied by their corresponding inputs with a bias

    • what flows to the neuron are the multiple sets of inputs multiplied by the weights: $w_1 \times x_1, w_2 \times x_2, \ldots, w_n \times x_n$

    • Added to this is some bias $b$ which can be used to adjust the sensitivity or "trigger happiness" of the neuron regardless of the input. $$\sum_n w_n x_n + b$$

    • The product of the inputs, weights with the bias is piped to an Activation Function

      • The Activation Function is usually a squashing function of some kind (Sigmoid, Relu or Tanh)

      • The squashing function squashes so that the output plateaus and caps smoothly at 1 or -1 (as the inputs are increased or decreased from zero):

        tanh function

    • The output of the neuron is the Activation function applied to the dot product of the weights/inputs+bias: $$f\left(\sum_n x_n w_n + b\right)$$

Layer of Neurons

Python Notebook

  • A set of Neurons evaluated independently

    • Each neuron in a layer is not connected to each other, but are connected to all other neurons or inputs in adjacent layers.

      Nueron Layers

Multi-Layer Perceptron (MLP)

Python Notebook

  • A network with multiple Layers of Neurons
  • The Layers feed into each other sequentially (in order)

Note on Large Datasets

  • In practice, for very large datasets, batching is done which takes a smaller subset of the data and uses that for the forward and backward pass
  • See Andrew Karpathy's Micrograd demo for example code
def loss(batch_size=None):

    if batch_size is None:
        Xb, yb = X, y
        ri = np.random.permutation(X.shape[0])[:batch_size]
        Xb, yb = X[ri], y[ri]


A playground project for learning about and experimenting with LLMs






No releases published


No packages published