Training a next-word prediction model from scratch

In the last post we played around with BERT and saw that it predicted words pretty well. To show how much BERT improves word prediction over previous state-of-the-art models, we will train our own word prediction model using a lesser model. The model we’ll use is an LSTM - we’re not going to delve into what these model are or how they work in the post (you can read more about them here).

Read More

Using pre-trained BERT to predict words in a sentence

BERT is a language model that has shown state of the art results on many natural language tasks (see here for a more in-depth explanation). BERT works by masking certain words in text, then trying to ‘recover’ those masked words. For example, in the sentence “The cat ate the mouse”, BERT might mask the word ‘mouse’, then try to predict that word in the sentence “The cat ate the [MASK]”. We’ll go more in-depth on what BERT is and how it works in later posts - in this post we’ll play around with BERT and see how we can use it to predict words in a sentence. To do this, we’ll follow these steps:

Read More

Change-Point Models! (with an application in R)

Change-point models are a useful statistical tool for detecting change-points in time series data. Change-point models attempt to answer the question, “are there points in time when my data changes”?  The model will not only estimate the how the process changed (for example, went from a mean of 3 to a mean of 5), but also when these changes occurred. This is what makes these models so interesting- they quantify how and when a generative process changes over time.

Read More