Text generation is one of the most fascinating applications of deep learning. With the advent of large language models like GPT-2, we can now generate human-like text that’s coherent, contextually relevant, and surprisingly creative. In this tutorial, you’ll discover how to implement text generation using GPT-2. You’ll learn through hands-on examples that you can run right away, and by the end of this guide, you’ll understand both the theory and practical implementation details.
After completing this tutorial, you will know:
- How GPT-2’s transformer architecture enables sophisticated text generation
- How to implement text generation with different sampling strategies
- How to optimize generation parameters for different use cases
Let’s get started.
Text Generation with GPT-2 Model
Photo by Peter Herrmann. Some rights reserved.
Overview
This tutorial is in four parts; they are:
- The Core Text Generation Implementation
- Contrastive Search: What are the Parameters in Text Generation?
- Batch Processing and Padding
- Tips for Better Generation Results
The Core Text Generation Implementation
Let’s start with a basic implementation that demonstrates the fundamental concept. In below, you are going to create a class that generates text based on a given prompt, using a pre-trained GPT-2 model. You will extend this class in the subsequent sections of this tutorial.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
import torch from transformers import GPT2LMHeadModel, GPT2Tokenizer
class TextGenerator: def __init__(self, model_name=‘gpt2’): “”“Initialize the text generator with a pre-trained model.
Args: model_name (str): Name of the pre-trained model to use. Any of: ‘gpt2’, ‘gpt2-medium’, ‘gpt2-large’ ““” self.tokenizer = GPT2Tokenizer.from_pretrained(model_name) self.model = GPT2LMHeadModel.from_pretrained(model_name) self.device = ‘cuda’ if torch.cuda.is_available() else ‘cpu’ self.model.to(self.device)
def generate_text(self, prompt, max_length=100, temperature=0.7, top_k=50, top_p=0.95): “”“Generate text based on the input prompt.
Args: prompt (str): Input text to continue from max_length (int): Maximum length of generated text temperature (float): Controls randomness in generation top_k (int): Number of highest probability tokens to consider top_p (float): Cumulative probability threshold for token filtering
Returns: str: Generated text including the prompt ““” try: # Encode the input prompt inputs = self.tokenizer(prompt, return_tensors=“pt”) input_ids = inputs[“input_ids”].to(self.device) attention_mask = inputs[“attention_mask”].to(self.device)
# Configure generation parameters gen_kwargs = { “max_length”: max_length, “temperature”: temperature, “top_k”: top_k, “top_p”: top_p, “pad_token_id”: self.tokenizer.eos_token_id, “no_repeat_ngram_size”: 2, “do_sample”: True, }
# Generate text with torch.no_grad(): output_sequences = self.model.generate( input_ids, attention_mask=attention_mask, **gen_kwargs )
# Decode and return the generated text generated_text = self.tokenizer.decode( output_sequences[0], skip_special_tokens=True ) return generated_text except Exception as e: print(f“Error during text generation: {str(e)}”) return prompt |
Let’s break down this implementation.
In this code, you use the GPT2LMHeadModel
and GPT2Tokenizer
classes from the transformers
library to load a pre-trained GPT-2 model and tokenizer. As a user, you don’t even need to understand how GPT-2 works. The TextGenerator
class hosts them and uses them in a GPU if you have one. If you haven’t installed the library, you can do so with the pip
command:
pip install transformers torch |
In the generate_text
method, you handle the core generation process with several important parameters:
max_length
: Controls the maximum length of generated texttemperature
: Adjusts randomness (higher values = more creative)top_k
: Limits vocabulary to $k$ highest probability tokenstop_p
: Uses nucleus sampling to dynamically limit tokens
Here’s how to use this implementation to generate text:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
... # Create a text generator instance generator = TextGenerator()
# Example 1: Basic text generation prompt = “The future of artificial intelligence will” generated_text = generator.generate_text(prompt) print(f“Generated text:\n{generated_text}\n”)
# Example 2: More creative generation with higher temperature creative_text = generator.generate_text( prompt=“Once upon a time”, temperature=0.9, max_length=200 ) print(f“Creative generation:\n{creative_text}\n”)
# Example 3: More focused generation with lower temperature focused_text = generator.generate_text( prompt=“The benefits of machine learning include”, temperature=0.5, max_length=150 ) print(f“Focused generation:\n{focused_text}\n”) |
The output may be:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
Generated text: The future of artificial intelligence will be determined by how much it learns and how it adapts to new situations.
It’s also possible that the future will not be as good as we think. As it stands, we are dealing with AI that is more complex than the human brain. It will do things we have no control over, such as play with computers to find a clue that will allow you to stop a car from moving, for example. But if we can figure out how to use
Creative generation: Once upon a time this has been the case. I was in a similar situation when I took this picture. This has also happened in other situations, as well.
And a note for your reader who has experienced this problem: ‘I would imagine that you have experienced the same problem,’ I don’t know how much longer you’ll continue to take this. Just try to get it to pass through you as often as possible. There is a large amount of negative energy that goes around this and you can try it with your friends, family members and your colleagues. Try to understand it the best you possibly can. You do not have to be super good at it. ‘ -John L. Gossett, A former CIA officer .
Focused generation: The benefits of machine learning include:
Improved accuracy of predictions. . Improved accuracy in predicting the future. Increased understanding of the natural world. More accurate predictions and better prediction of future events. Higher probability of predicting future outcomes. Low risk of error in prediction. Lower risk for error. Optimization of prediction based on data. Inference of data from previous years. Predictions of past years based upon past experience. Better prediction accuracy. A more accurate prediction can be made using a more powerful machine. The benefits include:- – Improved prediction in estimating future changes in the environment. This can reduce the risk that future actions will be wrong. – Improved predictability in forecasting future trends. If you are not able to predict future developments, you |
You used three different prompts here, and three strings of text were generated. The model is trivial to use. You just need to pass on a tokenizer-encoded prompt to the generate_text
method along with the attention mask. The attention mask is provided by the tokenizer, but essentially just a tensor of all ones in the same shape as the input.
Contrastive Search: What are the Parameters in Text Generation?
If you look at the generate_text
method, you will see that there are several parameters passed via gen_kwargs
. Some of the most important parameters are top_k
, top_p
, and temperature
. You can see the effect of top_k
and top_p
by experimenting with different values:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
... generator = TextGenerator()
# Example of sampling effects prompt = “The scientist discovered”
# Using top-k sampling top_k_text = generator.generate_text( prompt, top_k=10, top_p=1.0, max_length=50 ) print(f“Top-k sampling (k=10):\n{top_k_text}\n”)
# Using nucleus (top-p) sampling nucleus_text = generator.generate_text( prompt, top_k=0, top_p=0.9, max_length=50 ) print(f“Nucleus sampling (p=0.9):\n{nucleus_text}\n”)
# Combining both combined_text = generator.generate_text( prompt, top_k=50, top_p=0.95, max_length=50 ) print(f“Combined sampling:\n{combined_text}\n”) |
The sample output may be:
Top-k sampling (k=10): The scientist discovered that the protein is able to bind to the receptor, as long as the molecules are not in contact with each other. The scientists then used this to study the effects of protein synthesis on the body’s natural immune system.
The
Nucleus sampling (p=0.9): The scientist discovered that the air’s nitrogen, carbon and oxygen are all carbon atoms.
“We know that nitrogen and carbon are very small and very little in the atmosphere. But we didn’t know what that means for the whole planet,” said
Combined sampling: The scientist discovered that the first and only way to prevent the growth of a virus from spreading was to introduce a small amount of bacteria into the body.
“We wanted to develop a vaccine that would prevent viruses from getting into our blood,” said |
The top_k
and top_p
parameters are to fine-tune the sampling strategy. To understand what it is, remember that the model outputs a probability distribution over the vocabulary for each token. There are a lot of tokens. Of course, you can always pick the token with the highest probability, but you can also pick a random token so that you can generate different output from the same prompt. This is the algorithm of contrastive search that used by GPT-2.
The top_k
parameter limits the choice to the $k>0$ most likely tokens. Instead of considering thousands of tokens in the vocabulary, setting top_k
shortlists the consideration to a more tractable subset.
The top_p
parameter further shortlists the choice. It considers only the tokens that their cumulative probability meets the top_p
parameter $P$. Then the generated token is sampled based on the probability.
The code above demonstrates three different sampling approaches.
- The first example set
top_k
to a small value, limiting the choice. The output is focused but potentially repetitive. - The second example turns off
top_k
by setting to 0. It setstop_p
to use nucleus sampling. The sampling pool will have the low probability tokens removed, offering more natural variation. - The third example, a combined approach, leverages both strategies for optimal results. Set a larger
top_k
to allow better diversity, so subsequently a largertop_p
can still provide a high-quality,natural generation.
However, what is probability of a token? That’s the temperature parameter. Let’s look at another example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
... generator = TextGenerator()
# Example of temperature effects prompt = “The robot carefully”
# Low temperature (more focused) focused = generator.generate_text( prompt, temperature=0.3, max_length=50 ) print(f“Low temperature (0.3):\n{focused}\n”)
# Medium temperature (balanced) balanced = generator.generate_text( prompt, temperature=0.7, max_length=50 ) print(f“Medium temperature (0.7):\n{balanced}\n”)
# High temperature (more creative) creative = generator.generate_text( prompt, temperature=1.0, max_length=50 ) print(f“High temperature (1.0):\n{creative}\n”) |
Note that the same prompt is used for all three examples. The output may be:
Low temperature (0.3): The robot carefully moves its head to the left, and the robot’s head moves to right. The robot then moves back to its normal position.
The next time you see the robots, you’ll see them moving in a different direction. They
Medium temperature (0.7): The robot carefully moved the arms and legs of the person holding the object in its hands. The robot, however, was still motionless, and the robot could not make an attempt to move the arm or legs.
The person’s body was
High temperature (1.0): The robot carefully moves through the robot and the next moment, it appears back at the control room. He gets up to walk from the floor, a second later, he’s hit and wounded. We then see the third part of the same robot: |
So what is the effect of temperature? You can see that:
- A low temperature of 0.3 produces more focused and deterministic output. The output is boring. Making it suitable for tasks requiring accuracy.
- The medium temperature of 0.7 strikes a balance between creativity and coherence.
- The high temperature of 1.0 generates more diverse and creative text. Each example uses the same max_length for fair comparison.
Behind the scenes, temperature is a parameter in the softmax function, which is applied to the output of the model to determine the output token. The softmax function is:
$$
s(x_j) = \frac{e^{x_j/T}}{\sum_{i=1}^{V} e^{x_i/T}}
$$
where $T$ is the temperature parameter and $V$ is the vocabulary size. Scaling model outputs $x_1,\dots,x_V$ with $T$ changes the relative probabilities of the tokens. A high temperature makes the probabilities more uniform, let the improbable tokens more likely to be chosen. A low temperature makes the probabilities more concentrated on the highest probability tokens, hence the output is more deterministic.
Batch Processing and Padding
The code above is good for a single prompt. However, in practice, you may need to generate text for multiple prompts. The following code shows how to handle multiple prompts generation efficiently:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
import torch from transformers import GPT2LMHeadModel, GPT2Tokenizer
class BatchGenerator: def __init__(self, model_name=“gpt2”): “”“Initialize the text generator with a pre-trained model.
Args: model_name (str): Name of the pre-trained model to use. Any of: “gpt2“, “gpt2–medium“, “gpt2–large“ ““” self.tokenizer = GPT2Tokenizer.from_pretrained(model_name) self.tokenizer.add_special_tokens({‘pad_token’: self.tokenizer.eos_token}) self.model = GPT2LMHeadModel.from_pretrained(model_name) self.device = “cuda” if torch.cuda.is_available() else “cpu” self.model.to(self.device)
def generate_batch(self, prompts, **kwargs): “”“Generate text for multiple prompts efficiently.
Args: prompts (list): List of input prompts batch_size (int): Number of prompts to process at once **kwargs: Additional generation parameters
Returns: list: Generated texts for each prompt ““” inputs = self.tokenizer(prompts, padding=True, padding_side=“left”, return_tensors=“pt”) outputs = self.model.generate( inputs[“input_ids”].to(self.device), attention_mask=inputs[“attention_mask”].to(self.device), **kwargs ) results = self.tokenizer.batch_decode(outputs, skip_special_tokens=True) return results
# Example usage of batch generation batch_generator = BatchGenerator() prompts = [ “The future of AI”, “Space exploration will”, “In the next decade”, “Climate change has” ]
generated_texts = batch_generator.generate_batch( prompts, max_length=100, temperature=0.7, do_sample=True, )
for prompt, text in zip(prompts, generated_texts): print(f“\nPrompt: {prompt}”) print(f“Generated: {text}”) |
The output may be:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
Prompt: The future of AI Generated: The future of AI is uncertain, and it is difficult to predict how it will play out,” says Professor Yuki Matsuo, director of the Centre for Artificial Intelligence and Machine Learning at Tokyo’s Tohoku University.
“But even if AI is not the only threat to the security of humans, it will be one of the most important, and that will change the way we think about the future of robotics.”
This article is reproduced with permission and was first published on May
Prompt: Space exploration will Generated: Space exploration will be a challenge as well, with the space agency’s space shuttle fleet approaching its ultimate goal of achieving a capacity of 1.5 billion people by 2030.
While the shuttle is capable of carrying astronauts to and from the International Space Station, NASA’s new shuttle, the first ever to have a manned mission to the moon, is currently under contract for six years. The agency is also developing a $10 billion satellite-orbital propulsion system that will enable a manned spacecraft to
Prompt: In the next decade Generated: In the next decade, the average salary of the top 10% of Americans rose from $12.50 to $16.50 an hour, according to the American Council of Economic Advisers. By the same time, the top 20% of Americans earned nearly $16.9 billion in annual income.
The top 1% is the major source of income for most Americans, with the middle and upper-income groups earning almost twice as much in income as the bottom 40%.
The
Prompt: Climate change has Generated: Climate change has reduced the chances of developing natural climate change.
In fact, the odds of climate change becoming more frequent and severe are extremely high. As a result, any policy that is designed to promote or reduce the occurrence of extreme weather events has a very high chance of causing severe weather, including extreme weather events in the United States.
The risk of extreme weather events, such as hurricanes, floods, and snowfalls, is more than twice as high as the risk for developing |
The BatchGenerator
implementation made some slight changes. The generate_batch
method takes a list of prompts and pass on other parameters to the generate
method of the model. Most importantly, it pads the prompts to the same length and then generates text for each prompt in the batch. The results are returned in the same order as the prompts.
GPT-2 model is trained to handle batched input. But to present the input in a tensor, all prompts need to be padded to the same length. The tokenizer can readily handle batched input. But GPT-2 model does not specify what should the padded token be. Hence you need to specify it, using the function add_special_tokens()
. The code above uses the EOS token. But indeed, you can use any token since the attention mask will force the model to ignore it.
Tips for Better Generation Results
You know how to use GPT-2 model to generate text. But what should you expect from the output? Indeed this is a question that depends on the task. But here are some tips that can help you get better results.
First is prompt engineering. You need to be specific and clear in your prompts for a high quality output. Ambiguous words or phrases can lead to ambiguous output and hence you should be specific, concise, and precise. You may also include relevant context to help the model understand the task.
Besides, you can also tune the parameters to get better results. Depends on the task, you may want the output to be more focused or more creative. You can adjust the temperature parameter to control the randomness of the output. You can also adjust the temperature
, top_k
and top_p
parameters to control the diversity of the output. The output generation is auto-regressive. You can set the max_length
parameter to control the length of the output by trading off the speed.
Finally, the code above is not fault-tolerant. You need to implement proper error handling, set reasonable timeouts, monitor memory usage, and implement rate limiting in production.
Further Reading
Below are some further readings that can help you understand the text generation with GPT-2 model better.
Summary
In this tutorial, you learned how to generate text with GPT-2 and use the transfomers library to build real-world applications with a few lines of code. Particularly, you learned:
- How to implement text generation using GPT-2
- How to control generation parameters for different use cases
- How to implement batch processing for efficiency
- Best practices and common pitfalls to avoid