The Beginner’s Guide to Language Models with Python
Image by Author | Ideogram
Introduction
Language models — often known for the acronym LLM for Large Language Models, their large-scale version — fuel powerful AI applications like conversational chatbots, AI assistants, and other intelligent text and content generation apps. This article provides a concise and basic understanding of LLMs, followed by three code-based introductory examples to illustrate their use through several well-known frameworks like Hugging Face, Ollama, and Langchain. Don’t worry if some of these terms sound unfamiliar to you at this point: by the end of this reading, you’ll become acquainted with all of them.
What are Language Models?
At their essence, language models are natural language processing (NLP) systems capable of predicting the next word in a sequence, after having learned complex human language patterns by having been exposed to vast datasets of text data, typically thousands, millions, or even billions of documents. Of course, language models, in particular LLMs, have significantly evolved to accommodate many complex language understanding tasks beyond just next-word generation: they can answer questions, summarize or translate texts, and even extract insights or classify them.
Applications based on LLMs exist in different forms:
- API-based models like OpenAI’s GPT-4 (popularly known as “ChatGPT”) and Anthropic’s Claude are accessible worldwide via their websites or downloadable apps.
- Local models like LLaMA, Mistral, and Qwen, are normally run on personal or on-premises hardware.
- Hybrid models like Langchain enable app integration with other frameworks.
In the remainder of this article, we will stick to free and open-source models you will be able to try for free. Besides, the examples shown are configured to be run on an instance of Google Colab or Jupyter notebooks, thereby easing or even bypassing local configurations steps otherwise needed in your machine. Feel free to adapt them for their use in a Python IDE if you are acquainted with them.
Using Hugging Face’s Transformers Library
Hugging Face is a repository that provides open-source pre-trained language models ready to load and use for NLP tasks like text generation, translation, and sentiment analysis. It is powered by its centerpiece library: Transformers, which offers seamless integration with popular Python libraries like PyTorch, JAX, and TensorFlow. Best of all: they are free to use and require minimal setup, making AI development accessible to everyone.
Let’s start this practical tour by installing the transformers library in a new notebook of your own:
!pip install transformers |
We will now load a model from Hugging Face, specifically GPT-2 for text generation. When loading pre-trained models, we normally need to load not only the model itself, but also a compatible tokenizer responsible for splitting text inputs into logical language units called tokens (roughly equivalent to words in most cases) before being passed to the language model for processing it and generating follow-up text.
The following code imports the necessary packages and initializes the language model and tokenizer.
from transformers import GPT2LMHeadModel, GPT2Tokenizer
model_name = “gpt2” model = GPT2LMHeadModel.from_pretrained(model_name) tokenizer = GPT2Tokenizer.from_pretrained(model_name) |
Next we define a prompt for the model, which is typically a query, question, or request to which the language model will try to generate a response. Some less sophisticated models limit themselves to generating follow-up words that continue the input sequence or prompt, for instance to continue a tale that starts with “Once upon a time”. In our case, we will try asking GPT-2 what is the capital of Japan and … fingers crossed.
prompt = “What is the capital of Japan?”
inputs = tokenizer.encode(prompt, return_tensors=“pt”) output = model.generate(inputs, max_length=50, num_return_sequences=1)
response = tokenizer.decode(output[0], skip_special_tokens=True) print(response) |
Getting technical, in the above code you may have noticed that there are two processing steps taking place: encoding and decoding. The input sequence (the prompt) must be encoded into a numerical vector representation understandable by the model, and after it generates the raw (numerical) response, it is decoded back into text to make it understandable by us.
This is the decoded response:
The capital of Japan is Tokyo. It is the capital of Japan. It is the capital of Japan. It is the capital of Japan. It is the capital of Japan. It is the capital of Japan |
Well, at least it gave the right answer! These pre-trained models are comparatively smaller, more manageable for testing environments, and not as absurdly powerful and high-performing as ChatGPT ones, hence it is not surprising they might be more limited in their text generation behavior. In particular, since we specified a maximum response length of 50 tokens, the model seems to prioritize the need to provide a response that closely matches that length, rather than keeping it short, simple, and logical.
Time to look at another example and introduce another framework.
Running Language Models Locally with Ollama
Ollama is a framework that enables running language models locally in a streamlined and efficient way. For example, one of its available models is Qwen, which is a versatile open-source LLM capable of generating human-like text.
In contrast with Hugging Face models accessible via the Transformers library, Ollama’s models allow offline inference, reducing dependency on external APIs and internet connectivity (if used locally). The downside is the significant disk space needed to download the framework in your machine. While the usual procedure to try running one of Ollama’s language models locally would involve running our Python code and doing the necessary configurations on our machine, there is a workaround to be able to emulate the process in the comfort of our Google Colab notebook.
Let’s see how.
We first install Colab-xterm, a Google Colab extension to be able to run a command line terminal inside our notebook:
!pip install colab–xterm %load_ext colabxterm |
Next, the following simple instruction will open a terminal inside the notebook:
Inside the command line terminal (not in a standard notebook cell), write and run the following command to install Ollama in the notebook’s environment:
curl https://ollama.ai/install.sh | sh |
Once the command has been executed and Ollama is installed, let’s return to our next notebook cell where we will load models “locally”, namely the Mistral and Qwen models.
!ollama pull mistral !ollama pull qwen |
If the models were correctly loaded into our environment, they should be listed after running:
The next instruction will install langchain-ollama, a package that enables seamless integration between Ollama and LangChain, a framework for streamlining the development of AI-powered LLM applications through tools for model interaction, prompt management, and the chaining of multiple components -typically language tasks or processing steps- into workflows:
%pip install –U langchain–ollama |
The joint use of LangChain and Ollama in the example below allows us to formulate slightly more sophisticated prompts than those typically used in simpler models like Hugging Face’s GPT-2.
The ChatPromptTemplate class helps craft dynamic prompts by defining templates with placeholders (see curly braces) for user input, thereby making interactions more consistent and flexible.
Meanwhile, the Ollama class encapsulates the connection to an Ollama-powered local language model like our downloaded Qwen model, handling request formatting, model invocation, and response processing in an easy-to-use interface thanks to LangChain capabilities.
from langchain_core.prompts import ChatPromptTemplate from langchain_ollama.llms import OllamaLLM
template = “”“Question: {question} Answer: Let’s think logically.”“”
prompt = ChatPromptTemplate.from_template(template) model = OllamaLLM(model=“qwen”)
chain = prompt | model chain.invoke({“question”: “What is the result of dividing 36 by 9?”}) |
Response:
When dividing a number by another number, you are essentially finding how many times the second number fits into the first number.
So for the question of dividing 36 by 9, we can solve it as follows:
36 ÷ 9 = 4
Therefore, the result of dividing 36 by 9 is 4. |
Wow! This looks more astonishing, doesn’t it?
In our last example, we will dive slightly deeper into Langchain.
Building a Simple LLM App with LangChain
Now we will illustrate the use of the LangChain framework with Hugging Face models, namely the GPT-2 model we used earlier.
The main differences between using Hugging Face pre-trained models with or without LangChain are:
- Flexibility: LangChain provides the ability to chain multiple prompts, handle memory, and integrate with other AI tools like Hugging Face, as well as APIs, databases, etc., making LangChain models versatile for app development.
- Prompt management: LangChain supports structured prompt templates and dynamic inputs for flexibility and a higher degree of interaction
First, we install the community version of LangChain in our notebook as follows:
!pip install langchain[community] transformers huggingface_hub |
The next step is to log in to Hugging Face. To do this, you’ll need your own Hugging Face API token. Obtaining it is pretty easy—you simply need to register at https://huggingface.co/join and create an account. Once registered, you can generate your API token by browsing https://huggingface.co/settings/tokens. Copy your token and save it in a secure place: you’ll need it to authenticate and access models.
from huggingface_hub import login
login(token=“YOUR TOKEN HERE”) |
Make sure you paste and replace the “YOUR TOKEN HERE” string with your token ID. Otherwise, the connection to Hugging Face components needed will fail.
Time to import the necessary components (notice that we assume both langchain
and transformers
have been already installed, as done in the previous examples).
from langchain.llms import HuggingFacePipeline from langchain.prompts import PromptTemplate from langchain.chains import LLMChain from transformers import pipeline |
The core code for this example loads the GPT-2 model into a pipeline, which is the simplest abstraction level offered by Hugging Face for loading and utilizing LLMs. We then initialize and use a HuggingFacePipeline object to create a LangChain LLM.
Unlike conventional Hugging Face LLMs, a LangChain LLM provides an extra layer of composability, enabling seamless integration into structured workflows, chaining multiple prompts, and interacting with other AI tools.
The next steps are fairly familiar from other examples, with subtle syntax differences: defining a prompt template, creating a chain to process it, and running the app by passing the input prompt as an argument to obtain the response.
generator = pipeline(“text-generation”, model=“gpt2”) llm = HuggingFacePipeline(pipeline=generator) template = “What is the capital of {country}?” prompt = PromptTemplate(input_variables=[“country”], template=template) llm_chain = LLMChain(prompt=prompt, llm=llm) response = llm_chain.run(country=“Germany”) print(response) |
Response:
What is the capital of Germany? It is the greatest of the world‘s great cities. Its magnificent cathedral and its splendid gardens make it the world’s largest city, yet its citizens are only able to see the sight of the whole, or to go |
Everything worked so well… except for the model that didn’t really answer our question about telling us which is the capital of Germany. Well, in fact, the text becomes nonsensical at some point! But to our assurance, that was something partly beyond our control and mainly related to the chosen model’s performance. Ah, the beauty of imperfection 😉
Wrapping Up
This article is a practical beginner’s guide for those readers interested in the realm of Language Models. After providing a concise and gentle introduction to Language Models (or LLMs for short), the article provides three practical hands-on examples based on Python, where you have the opportunity of getting familiar with common frameworks for using existing LLMs both in the cloud and locally, as well as building LLM applications: Hugging Face, Ollama, and Langchain. Through example prompts, several types of LLMs were loaded and tried out to solve some straightforward language tasks like question-answering.