Skip to content

Home

Hi, my name is Luke and you've found my website.

I am fascinated by language and enjoy working on things that make it faster and easier for others to disseminate and understand technical information.

Here are some things you can look at.

The picture on the right is from when I went crazy one night with marshmallows and sticks in order to study the associahedron in \(\mathbb{R}^3\) for my undergrad thesis.

Transformer From Scratch In PyTorch: Model

The Transformer architecture, first introduced in (Vaswani et. al. 2017), is an encoder-decoder model that can be used in many scenarios of supervised sequence learning. The success of the Transformer is primarily due to its performance, simple architecture, and its ability to parallelize input which drastically speeds up training. This is in comparison with previous traditional sequence learning models, such as recurrent neural networks, which would process elements of a sequence one at a time.

In this post, we'll build the Transformer model from scratch in PyTorch with an emphasis on modularity and performance. Note that in our implementation, we will be following the Pre-Layer Normalization version of the Transformer.

Testing Cheatsheet for Python

To successfully manage complex software systems, engineers need to write tests. The reason is because every change you make to an existing, working software system carries with it a certain probability that you will introduce a bug. To combat this nonzero probability, unit tests are a very simple line of defense against creating such bugs, and good tests will generally run over all possible code paths to relieve developers of the mental load towards making trivial errors.

Vanishing Gradients in RNNs and LSTMs

Recurrent neural networks (RNNs), which can be thought of as feedforward neural networks with self-connections, perform extremely well for supervised sequence learning and are capable of solving many problems that feedforward neural networks typically cannot solve. However, in the past, RNN training procedures suffered from the vanishing gradient problem. This problem led to the invention of the Long Short-Term Memory (LSTM) model. In this work, we review the vanishing gradient problem for vanilla RNNs, and show how the LSTM is able to address this problem. To do this, we offer closed-form gradient update formulae which allow us to mathematically analyze network loss.

Recurrent Neural Networks and the Reber Grammar in PyTorch

Recurrent neural networks are a special type neural network that have been heavily studied for decades towards problems involving sequence prediction, which standard feed-forward neural networks tend not to be that great at. What makes RNNs different is that it processes input sequentially and generates hidden states which are then used in future computations.

Here, we'll offer an overview of RNNs, present an explicit RNN, and then implement an RNN in Pytorch to learn an artificial grammar known as the Reber Grammar. You can find the complete PyTorch code, which we'll also introduce here, in this Github gist.

Phrased-Based Translation Model and Decoder

While IBM models were effective at word alignment, they don't produce the best translation models, since in order to use it for translation one must perform a word-by-word approach. However, when it comes to translation, this isn't the best approach.

For starters, a word-by-word approach does not take into the context that each word appears in. For example, suppose we are machine that has seen a translation of "Tienes hambre?" and we are now given "Tienes tarea?" Following a naive maximum-likelihood, word-by-word approach, since we know that "Tienes hambre?" -> "Are you hungry?", we might end up with a translation "Tienes tarea?" -> "Are you homework?" when it should be "Do you have homework?". Conversely, if we flipped our examples we might end up translating "Tienes hambre?" -> "Do you have hungry?"

IBM Model 1: Statistical Machine Translation and Alignment

In the area of statistical machine translation, the IBM models were a series of models proposed by IBM in the 1990s to perform (1) machine translation and (2) word alignment from one source language into another. While these models are now superseded by more modern neural translation models, the ideas and algorithms introduced by these models were later utilized in other statistical machine translation models and they're of historical importance. Here, we'll discuss the probability theory behind the first model, IBM Model 1, and implement the model in Python.

Des

I have been reading Applied Cryptography by Bruce Schneider, although not too seriously because it is very outdated. It is still useful because there really aren't many books on Cryptography or any books that actually explain implementations. While reading about block ciphers, I found it helpful to actually implement DES in order to understand what Schneider is saying when it comes to design, security, and attacks on encryption algorithms.

Rational interpolation

I recently uncovered a hidden gem when it comes to literature on rational interpolation. It's this PhD thesis by someone named Antonio Cosmin Ionita titled Lagrange rational interpolation and its applications to approximation of large-scale dynamical systems. I've found that finding good resources on rational interpolation is really hard, so this is a great resource.