Predicting Masked Words with BERT (Refreshed for 2025)
Note (May 2025): This post is an updated version of my original 2020 walkthrough of using BERT for masked word prediction. I’ve refreshed the code to use the latest version of Hugging Face Transformers with PyTorch instead of TensorFlow, and clarified the explanation and examples.