Introduction
Logarithms are a cornerstone of mathematics, statistics, and data science, and even show up in all sorts of places in machine learning. They underpin fundamental concepts such as exponential growth, p-values, log-likelihood, and more. In machine learning and data science, we frequently work with data spanning several orders of magnitude. For instance, gene expression levels, financial time series, and populations can vary vastly in scale. A logarithmic transformation can, for example, help manage these differences, revealing underlying trends that might otherwise remain hidden.
A logarithm serves the purpose of determining to which exponent we must raise a certain base to get a given number. Formally, for a base \( b > 0 \) (and \( b \neq 1 \)):
\[
\log_b(a) = x \quad \text{if and only if} \quad b^x = a, \, (a > 0).
\]
The most common bases are:
- Base 10 (common logarithm): \(\log_{10}(a)\)
- Base \( e \) (natural logarithm): \(\ln(a)\) or \(\log_e(a)\)
- Base 2 (binary logarithm): \(\log_2(a)\)
In practical data-related work, the natural logarithm is especially common, especially in continuous mathematics and statistics (e.g., linear regressions with log-transformed outcomes or log-likelihood functions). The natural logarithm is widely used in data work because it simplifies multiplicative relationships into additive ones, making exponential processes and statistical models easier to analyze and interpret.
This tutorial provides a clear introduction to logarithms, their properties, and their common applications in machine learning. By the end of this tutorial, you will understand:
- What logarithms are and why they are useful
- Key properties that make them particularly powerful
- Applications in machine learning
- A brief demonstration in Python using SymPy
- A practical code example of logarithms in Python using PyTorch
Key Properties of Logarithms
Logarithms simplify multiplication and exponentiation into more manageable operations. Some fundamental properties of logarithms are:
Product Rule
This rule states that the logarithm of a product is equal to the sum of the logarithms of its individual factors. It allows complex multiplication operations to be broken down into simpler addition tasks.
\[
\log_b(xy) = \log_b(x) + \log_b(y)
\]
Quotient Rule
According to the quotient rule, the logarithm of a division is the difference between the logarithms of the numerator and the denominator. This property simplifies the process of handling division within logarithmic expressions.
\[
\log_b\left(\frac{x}{y}\right) = \log_b(x) – \log_b(y)
\]
Power Rule
The power rule indicates that the logarithm of an exponentiated value is equal to the exponent multiplied by the logarithm of the base value. This simplifies the manipulation of expressions where variables are raised to powers.
\[
\log_b(x^r) = r \, \log_b(x)
\]
Change of Base
The change of base formula allows logarithms to be converted from one base to another, facilitating easier computation or comparison. By expressing a logarithm in terms of a different base, calculations can be performed using more convenient or standardized logarithmic tables or calculators.
\[
\log_b(x) = \frac{\log_k(x)}{\log_k(b)}
\]
These properties drastically reduce the complexity of dealing with products, quotients, and powers. Rather than working with large or unwieldy numbers, it becomes much simpler to add or subtract logs or multiply by exponents.
Simple Logarithm Demonstration Using SymPy
Below is a brief demonstration of how you can use Python and the SymPy library to work with logarithms symbolically. SymPy allows exact manipulations of mathematical expressions.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
import sympy as sp
# Define symbolic variables x, y = sp.symbols(‘x y’, positive=True)
# Define an expression using logs expr = sp.log(x) + sp.log(y) print(“Original Expression:”, expr)
# Sympy can simplify the sum of logs into a single log simplified_expr = sp.simplify(expr) print(“Simplified Expression:”, simplified_expr)
# Demonstrate power rule expr_power = sp.log(x**2) print(“Power Expression:”, expr_power)
simplified_power_expr = sp.simplify(expr_power) print(“Simplified Power Expression:”, simplified_power_expr)
# Demonstrate numerical evaluation # Let’s define x=10, y=100 and evaluate log(x*y) value = expr.subs({x: 10, y: 100}) print(“Value of log(x) + log(y) for x=10, y=100:”, value.evalf()) |
Here’s an explanation of the above code:
- Symbolic Variables: We declare x and y as positive to avoid domain issues with the logarithm
- Expression Creation: We create expr = sp.log(x) + sp.log(y)
- Expression Simplification: sp.simplify(expr) uses the product rule of logs to simplify the sum into log(x*y)
- Power Rule: sp.log(x**2) is automatically recognized, and further simplification yields 2*log(x)
- Numerical Evaluation: We substitute numerical values into the expression and evaluate the result with .evalf()
Output:
Original Expression: log(x) + log(y) Simplified Expression: log(x*y) Power Expression: log(x**2) Simplified Power Expression: 2*log(x) Value of log(x) + log(y) for x=10, y=100: 6.90775527898214 |
Logarithms in Machine Learning
Logarithms are crucial in machine learning because they tame large numbers, stabilize calculations, and simplify the exponential relationships that often arise in models. Whether you’re dealing with enormous feature values, tiny probabilities, or exponential growth processes, log transformations can mean the difference between a model that converges smoothly and one that struggles with numerical overflow or underflow. Below, we’ll take a tour of the major reasons why logs are so essential, show how different log bases can be used, and illustrate some common scenarios in machine learning workflows.
Compressing and Transforming Data
Machine learning often involves data spanning multiple orders of magnitude — for example, pixel values in images, text word counts, or large price ranges in financial data. A log transform can compress these wide ranges into a more manageable scale. In practice, this often:
- Reduces skew: Highly skewed distributions become more symmetric
- Emphasizes relative changes: Instead of focusing on absolute differences, a log transform highlights ratios (e.g., going from 100 to 200 is the same ratio as going from 10 to 20)
For numeric features, using \(\log(x+1)\) (to avoid issues with zero or negative values) can help models learn relationships more easily.
Linearizing Exponential Relationships
Many processes in machine learning (such as growth, decay, or repeated multiplication) are naturally exponential. By taking the log, an exponential trend turns into a linear one, which certain algorithms handle more gracefully:
- Linear Regression: If the target variable grows exponentially with respect to features, using \(\log(y)\) as the target may stabilize variance and improve model performance
- Simpler Patterns: Multiplicative interactions can appear additively once you apply a log transform, making patterns clearer for algorithms that assume additive relationships
Different Bases, Similar Shapes
Although base \(e\) (the natural logarithm) is most common in machine learning — especially in continuous math and neural network frameworks — other bases can be useful in specific contexts:
- Log Base 10: Often used for interpretability or charting “orders of magnitude”
- Log Base 2: Common in computational contexts (e.g., algorithmic complexity) or fields like genomics where “fold changes” are often in powers of 2
Mathematically, switching between bases is straightforward (\(\log_b(x) = \log_k(x) / \log_k(b)\)), so choosing a base often comes down to convention or readability.
Logarithms as the Backbone of Cross-Entropy Loss
A prime example of logs in action is the cross-entropy loss function, widely used in classification tasks:
\[
\text{CrossEntropy}(y, \hat{p}) = – \log(\hat{p}(y)),
\]
where \(\hat{p}(y)\) is the predicted probability of the correct class \(y\). In multi-class scenarios, this becomes:
\[
-\sum_{i=1}^{k} \mathbf{1}_{i=y} \cdot \log(\hat{p}_i)
\]
When a model assigns a tiny probability to the correct class, taking the negative log of that probability results in a large penalty. This guides the model to shift probability mass toward the correct class across training steps. Most deep learning frameworks integrate this log transformation under the hood, automatically converting raw outputs (logits) into probabilities and then computing the log-likelihood.
Logistic Regression and Log-Odds
In logistic regression, the log function is used in the logit (log-odds) transform:
\[
\log\bigl(\tfrac{p}{1-p}\bigr) = \mathbf{w}^\mathsf{T}\mathbf{x} + b.
\]
This approach neatly confines predicted probabilities to the \([0, 1]\) interval while allowing the log-odds to range over all real numbers. Training relies on maximizing the log-likelihood, which again converts products into sums and avoids extreme numerical values.
Numerical Stability in Neural Networks
Neural networks regularly rely on the log-sum-exp trick to prevent overflow or underflow when dealing with exponentials. For instance, a softmax output layer computes:
\[
\hat{p}_i = \frac{\exp(z_i)}{\sum_{j=1}^{k} \exp(z_j)},
\]
and many frameworks rewrite \(\log(\sum_j \exp(z_j))\) as:
\[
\alpha + \log\Bigl(\sum_j \exp(z_j – \alpha)\Bigr),
\]
where \(\alpha\) is the maximum logit. This keeps values within stable numerical ranges. Anytime you see exponentials or probabilities in machine learning, logs are usually in the background ensuring computations remain robust.
Practical Example in PyTorch
Below is a short snippet showing how PyTorch’s CrossEntropyLoss internally applies logs for classification:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
import torch import torch.nn as nn
# Suppose we have a batch of 3 samples and 4 classes logits = torch.tensor([ [ 2.0, –1.0, 0.0, 4.0], [ 1.0, 2.0, 3.0, –1.0], [–2.0, 0.0, 2.0, 1.0] ])
# True labels (class indices) labels = torch.tensor([3, 2, 2])
# CrossEntropyLoss = softmax + negative log-likelihood criterion = nn.CrossEntropyLoss() loss = criterion(logits, labels)
print(f“Cross-Entropy Loss: {loss.item():.4f}”)
# For demonstration, compute softmax and log manually softmax_vals = torch.softmax(logits, dim=1) correct_probs = softmax_vals[range(logits.size(0)), labels] nll = –torch.log(correct_probs)
print(“Softmax probabilities:\n”, softmax_vals) print(“Negative log-likelihood per sample:”, nll) print(“Average loss:”, nll.mean().item()) |
And the output of the script:
Cross–Entropy Loss: 0.3294 Softmax probabilities: tensor([[0.1166, 0.0058, 0.0158, 0.8618], [0.0889, 0.2418, 0.6572, 0.0120], [0.0120, 0.0889, 0.6572, 0.2418]]) Negative log–likelihood per sample: tensor([0.1488, 0.4197, 0.4197]) Average loss: 0.32939615845680237 |
Here are a few additional key points regarding the above script:
- Logits: The raw outputs, which can be any real number
- CrossEntropyLoss: Internally applies softmax and then takes the negative log of the correct class probability
- Numerical Stability: Instead of computing these steps naively, PyTorch uses optimized and stable implementations that avoid underflow for very small probabilities and overflow for very large exponentials
Conclusion
Logarithms offer an elegant way to simplify large-scale or exponential relationships, making them essential in many areas of stats, data science, and even machine learning. Some key takeaways from our discussion are:
- Logs are a universal scaling trick: Whether used as a transform for input data or woven into a loss function, logarithms simplify computations by turning products into sums
- They handle wide numerical ranges: Critical for keeping computations stable in neural networks and other machine learning methods
- They linearize exponential processes: Making patterns more approachable for linear methods and helping with interpretability
- They’re everywhere in machine learning: From basic logistic regression to the cross-entropy loss that powers deep networks
Whether you are handling skewed data, stabilizing variance, or working with likelihoods, the log function can transform complex multiplicative patterns into more tractable additive ones. By mastering the fundamental log properties, you can apply them judiciously in your analyses — resulting in cleaner models, reduced numerical instability, and clearer interpretation of results.
In short, a solid grasp of logarithms — and when to apply them — can boost both the accuracy and stability of many machine learning models, making them a foundational concept for practitioners at any level.