In this article, you will learn how to use Python’s itertools module to simplify common feature engineering tasks with clean, efficient patterns.
Topics we will cover include:
- Generating interaction, polynomial, and cumulative features with itertools.
- Building lookup grids, lag windows, and grouped aggregates for structured data workflows.
- Using iterator-based tools to write cleaner, more composable feature engineering code.
On we go.
7 Essential Python Itertools for Feature Engineering
Image by Editor
Introduction
Feature engineering is where most of the real work in machine learning happens. A good feature often improves a model more than switching algorithms. Yet this step usually leads to messy code with nested loops, manual indexing, hand-built combinations, and the like.
Python’s itertools module is a standard library toolkit that most data scientists know exists but rarely reach for when building features. That’s a missed opportunity, as itertools is designed for working with iterators efficiently. A lot of feature engineering, at its core, is structured iteration over pairs of variables, sliding windows, grouped sequences, or every possible subset of a feature set.
In this article, you’ll work through seven itertools functions that solve common feature engineering problems. We’ll spin up sample e-commerce data and cover interaction features, lag windows, category combinations, and more. By the end, you’ll have a set of patterns you can drop directly into your own feature engineering pipelines.
You can get the code on GitHub.
1. Generating Interaction Features with combinations
Interaction features capture the relationship between two variables — something neither variable expresses alone. Manually listing every pair from a multi-column dataset is tedious. combinations in the itertools module does it in one line.
Let’s code an example to create interaction features using combinations:
df = pd.DataFrame({
“avg_order_value”: [142.5, 89.0, 210.3, 67.8, 185.0],
“discount_rate”: [0.10, 0.25, 0.05, 0.30, 0.15],
“days_since_signup”: [120, 45, 380, 12, 200],
“items_per_order”: [3.2, 1.8, 5.1, 1.2, 4.0],
“return_rate”: [0.05, 0.18, 0.02, 0.22, 0.08],
})
numeric_cols = df.columns.tolist()
for col_a, col_b in itertools.combinations(numeric_cols, 2):
feature_name = f”{col_a}_x_{col_b}”
df[feature_name] = df[col_a] * df[col_b]
interaction_cols = [c for c in df.columns if “_x_” in c]
print(df[interaction_cols].head())
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
import itertools import pandas as pd
df = pd.DataFrame({ “avg_order_value”: [142.5, 89.0, 210.3, 67.8, 185.0], “discount_rate”: [0.10, 0.25, 0.05, 0.30, 0.15], “days_since_signup”: [120, 45, 380, 12, 200], “items_per_order”: [3.2, 1.8, 5.1, 1.2, 4.0], “return_rate”: [0.05, 0.18, 0.02, 0.22, 0.08], })
numeric_cols = df.columns.tolist()
for col_a, col_b in itertools.combinations(numeric_cols, 2): feature_name = f“{col_a}_x_{col_b}” df[feature_name] = df[col_a] * df[col_b]
interaction_cols = [c for c in df.columns if “_x_” in c] print(df[interaction_cols].head()) |
Truncated output:
avg_order_value_x_items_per_order avg_order_value_x_return_rate \
0 456.00 7.125
1 160.20 16.020
2 1072.53 4.206
3 81.36 14.916
4 740.00 14.800
…
days_since_signup_x_return_rate items_per_order_x_return_rate
0 6.00 0.160
1 8.10 0.324
2 7.60 0.102
3 2.64 0.264
4 16.00 0.320
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
avg_order_value_x_discount_rate avg_order_value_x_days_since_signup \ 0 14.250 17100.0 1 22.250 4005.0 2 10.515 79914.0 3 20.340 813.6 4 27.750 37000.0
avg_order_value_x_items_per_order avg_order_value_x_return_rate \ 0 456.00 7.125 1 160.20 16.020 2 1072.53 4.206 3 81.36 14.916 4 740.00 14.800 ...
days_since_signup_x_return_rate items_per_order_x_return_rate 0 6.00 0.160 1 8.10 0.324 2 7.60 0.102 3 2.64 0.264 4 16.00 0.320 |
combinations(numeric_cols, 2) generates every unique pair exactly once without duplicates. With 5 columns, that is 10 pairs; with 10 columns, it is 45. This approach scales as you add columns.
2. Building Cross-Category Feature Grids with product
itertools.product gives you the Cartesian product of two or more iterables — every possible combination across them — including repeats across different groups.
In the e-commerce sample we’re working with, this is useful when you want to build a feature matrix across customer segments and product categories.
customer_segments = [“new”, “returning”, “vip”]
product_categories = [“electronics”, “apparel”, “home_goods”, “beauty”]
channels = [“mobile”, “desktop”]
# All segment × category × channel combinations
combos = list(itertools.product(customer_segments, product_categories, channels))
grid_df = pd.DataFrame(combos, columns=[“segment”, “category”, “channel”])
# Simulate a conversion rate lookup per combination
import numpy as np
np.random.seed(7)
grid_df[“avg_conversion_rate”] = np.round(
np.random.uniform(0.02, 0.18, size=len(grid_df)), 3
)
print(grid_df.head(12))
print(f”\nTotal combinations: {len(grid_df)}”)
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
import itertools
customer_segments = [“new”, “returning”, “vip”] product_categories = [“electronics”, “apparel”, “home_goods”, “beauty”] channels = [“mobile”, “desktop”]
# All segment × category × channel combinations combos = list(itertools.product(customer_segments, product_categories, channels))
grid_df = pd.DataFrame(combos, columns=[“segment”, “category”, “channel”])
# Simulate a conversion rate lookup per combination import numpy as np np.random.seed(7) grid_df[“avg_conversion_rate”] = np.round( np.random.uniform(0.02, 0.18, size=len(grid_df)), 3 )
print(grid_df.head(12)) print(f“\nTotal combinations: {len(grid_df)}”) |
Output:
Total combinations: 24
|
segment category channel avg_conversion_rate 0 new electronics mobile 0.032 1 new electronics desktop 0.145 2 new apparel mobile 0.090 3 new apparel desktop 0.136 4 new home_goods mobile 0.176 5 new home_goods desktop 0.106 6 new beauty mobile 0.100 7 new beauty desktop 0.032 8 returning electronics mobile 0.063 9 returning electronics desktop 0.100 10 returning apparel mobile 0.129 11 returning apparel desktop 0.149
Total combinations: 24 |
This grid can then be merged back onto your main transaction dataset as a lookup feature, as every row gets the expected conversion rate for its specific segment × category × channel bucket. product ensures you haven’t missed any valid combination when building that grid.
3. Flattening Multi-Source Feature Sets with chain
In most pipelines, features come from multiple sources: a customer profile table, a product metadata table, and a browsing history table. You often need to flatten these into a single feature list for column selection or validation.
customer_features = [
“customer_age”, “days_since_signup”, “lifetime_value”,
“total_orders”, “avg_order_value”
]
product_features = [
“category”, “brand_tier”, “avg_rating”,
“review_count”, “is_sponsored”
]
behavioral_features = [
“pages_viewed_last_7d”, “search_queries_last_7d”,
“cart_abandonment_rate”, “wishlist_size”
]
# Flatten all feature groups into one list
all_features = list(itertools.chain(
customer_features,
product_features,
behavioral_features
))
print(f”Total features: {len(all_features)}”)
print(all_features)
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
import itertools
customer_features = [ “customer_age”, “days_since_signup”, “lifetime_value”, “total_orders”, “avg_order_value” ]
product_features = [ “category”, “brand_tier”, “avg_rating”, “review_count”, “is_sponsored” ]
behavioral_features = [ “pages_viewed_last_7d”, “search_queries_last_7d”, “cart_abandonment_rate”, “wishlist_size” ]
# Flatten all feature groups into one list all_features = list(itertools.chain( customer_features, product_features, behavioral_features ))
print(f“Total features: {len(all_features)}”) print(all_features) |
Output:
|
Total features: 14 [‘customer_age’, ‘days_since_signup’, ‘lifetime_value’, ‘total_orders’, ‘avg_order_value’, ‘category’, ‘brand_tier’, ‘avg_rating’, ‘review_count’, ‘is_sponsored’, ‘pages_viewed_last_7d’, ‘search_queries_last_7d’, ‘cart_abandonment_rate’, ‘wishlist_size’] |
This might look like using + to concatenate lists, and it is for simple cases. But chain is especially useful when you have many sources, when sources are generators rather than lists, or when you’re building the feature list conditionally, where some feature groups are optional depending on data availability. It keeps the code readable and composable.
4. Creating Windowed Lag Features with islice
Lag features are important in many datasets. In e-commerce, for example, what a customer spent last month, their order count over the last 3 purchases, and their average basket size over the last 5 transactions can all be important features. Building these manually with index arithmetic is prone to errors.
islice lets you slice an iterator without converting it to a list first. This is useful when processing ordered transaction histories row by row.
# Transaction history for customer C-10482, ordered chronologically
transactions = [
{“order_id”: “ORD-8821”, “amount”: 134.50, “items”: 3},
{“order_id”: “ORD-8934”, “amount”: 89.00, “items”: 2},
{“order_id”: “ORD-9102”, “amount”: 210.75, “items”: 5},
{“order_id”: “ORD-9341”, “amount”: 55.20, “items”: 1},
{“order_id”: “ORD-9488”, “amount”: 178.90, “items”: 4},
{“order_id”: “ORD-9601”, “amount”: 302.10, “items”: 7},
]
# Build lag-3 features for each transaction (using 3 most recent prior orders)
window_size = 3
features = []
for i in range(window_size, len(transactions)):
window = list(itertools.islice(transactions, i – window_size, i))
current = transactions[i]
lag_amounts = [t[“amount”] for t in window]
features.append({
“order_id”: current[“order_id”],
“current_amount”: current[“amount”],
“lag_1_amount”: lag_amounts[-1],
“lag_2_amount”: lag_amounts[-2],
“lag_3_amount”: lag_amounts[-3],
“rolling_mean_3”: round(sum(lag_amounts) / len(lag_amounts), 2),
“rolling_max_3”: max(lag_amounts),
})
print(pd.DataFrame(features).to_string(index=False))
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
import itertools
# Transaction history for customer C-10482, ordered chronologically transactions = [ {“order_id”: “ORD-8821”, “amount”: 134.50, “items”: 3}, {“order_id”: “ORD-8934”, “amount”: 89.00, “items”: 2}, {“order_id”: “ORD-9102”, “amount”: 210.75, “items”: 5}, {“order_id”: “ORD-9341”, “amount”: 55.20, “items”: 1}, {“order_id”: “ORD-9488”, “amount”: 178.90, “items”: 4}, {“order_id”: “ORD-9601”, “amount”: 302.10, “items”: 7}, ]
# Build lag-3 features for each transaction (using 3 most recent prior orders) window_size = 3 features = []
for i in range(window_size, len(transactions)): window = list(itertools.islice(transactions, i – window_size, i)) current = transactions[i]
lag_amounts = [t[“amount”] for t in window] features.append({ “order_id”: current[“order_id”], “current_amount”: current[“amount”], “lag_1_amount”: lag_amounts[–1], “lag_2_amount”: lag_amounts[–2], “lag_3_amount”: lag_amounts[–3], “rolling_mean_3”: round(sum(lag_amounts) / len(lag_amounts), 2), “rolling_max_3”: max(lag_amounts), })
print(pd.DataFrame(features).to_string(index=False)) |
Output:
|
order_id current_amount lag_1_amount lag_2_amount lag_3_amount rolling_mean_3 rolling_max_3 ORD–9341 55.2 210.75 89.00 134.50 144.75 210.75 ORD–9488 178.9 55.20 210.75 89.00 118.32 210.75 ORD–9601 302.1 178.90 55.20 210.75 148.28 210.75 |
islice(transactions, i - window_size, i) gives you exactly the preceding window_size transactions without building intermediate lists for the full history.
5. Aggregating Per-Category Features with groupby
groupby lets you group a sorted iterable and compute per-group statistics cleanly.
Going back to our example, a customer’s behavior often varies significantly by product category. Their average spend on electronics might be 4× their spend on accessories. Treating all orders as one pool loses that signal.
Here’s an example:
orders = [
{“customer”: “C-10482”, “category”: “electronics”, “amount”: 349.99},
{“customer”: “C-10482”, “category”: “electronics”, “amount”: 189.00},
{“customer”: “C-10482”, “category”: “apparel”, “amount”: 62.50},
{“customer”: “C-10482”, “category”: “apparel”, “amount”: 88.00},
{“customer”: “C-10482”, “category”: “apparel”, “amount”: 45.75},
{“customer”: “C-10482”, “category”: “home_goods”, “amount”: 124.30},
]
# Must be sorted by the grouping key before using groupby
orders_sorted = sorted(orders, key=lambda x: x[“category”])
category_features = {}
for category, group in itertools.groupby(orders_sorted, key=lambda x: x[“category”]):
amounts = [o[“amount”] for o in group]
category_features[category] = {
“order_count”: len(amounts),
“total_spend”: round(sum(amounts), 2),
“avg_spend”: round(sum(amounts) / len(amounts), 2),
“max_spend”: max(amounts),
}
cat_df = pd.DataFrame(category_features).T
cat_df.index.name = “category”
print(cat_df)
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
import itertools
orders = [ {“customer”: “C-10482”, “category”: “electronics”, “amount”: 349.99}, {“customer”: “C-10482”, “category”: “electronics”, “amount”: 189.00}, {“customer”: “C-10482”, “category”: “apparel”, “amount”: 62.50}, {“customer”: “C-10482”, “category”: “apparel”, “amount”: 88.00}, {“customer”: “C-10482”, “category”: “apparel”, “amount”: 45.75}, {“customer”: “C-10482”, “category”: “home_goods”, “amount”: 124.30}, ]
# Must be sorted by the grouping key before using groupby orders_sorted = sorted(orders, key=lambda x: x[“category”])
category_features = {} for category, group in itertools.groupby(orders_sorted, key=lambda x: x[“category”]): amounts = [o[“amount”] for o in group] category_features[category] = { “order_count”: len(amounts), “total_spend”: round(sum(amounts), 2), “avg_spend”: round(sum(amounts) / len(amounts), 2), “max_spend”: max(amounts), }
cat_df = pd.DataFrame(category_features).T cat_df.index.name = “category” print(cat_df) |
Output:
|
order_count total_spend avg_spend max_spend category apparel 3.0 196.25 65.42 88.00 electronics 2.0 538.99 269.50 349.99 home_goods 1.0 124.30 124.30 124.30 |
These per-category aggregates become features on the customer row — electronics_avg_spend, apparel_order_count, and so on. The important thing to remember with itertools.groupby is that you must sort by the key first. Unlike pandas groupby, it only groups consecutive elements.
6. Building Polynomial Features with combinations_with_replacement
Polynomial features — squares, cubes, and cross-products — are a standard way to give linear models the ability to capture non-linear relationships.
Scikit-learn’s PolynomialFeatures does this, but combinations_with_replacement gives you the same result with full control over which features get expanded and how.
df_poly = pd.DataFrame({
“avg_order_value”: [142.5, 89.0, 210.3, 67.8],
“discount_rate”: [0.10, 0.25, 0.05, 0.30],
“items_per_order”: [3.2, 1.8, 5.1, 1.2],
})
cols = df_poly.columns.tolist()
# Degree-2: includes col^2 and col_a × col_b
for col_a, col_b in itertools.combinations_with_replacement(cols, 2):
feature_name = f”{col_a}^2″ if col_a == col_b else f”{col_a}_x_{col_b}”
df_poly[feature_name] = df_poly[col_a] * df_poly[col_b]
poly_cols = [c for c in df_poly.columns if “^2” in c or “_x_” in c]
print(df_poly[poly_cols].round(3))
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
import itertools
df_poly = pd.DataFrame({ “avg_order_value”: [142.5, 89.0, 210.3, 67.8], “discount_rate”: [0.10, 0.25, 0.05, 0.30], “items_per_order”: [3.2, 1.8, 5.1, 1.2], })
cols = df_poly.columns.tolist()
# Degree-2: includes col^2 and col_a × col_b for col_a, col_b in itertools.combinations_with_replacement(cols, 2): feature_name = f“{col_a}^2” if col_a == col_b else f“{col_a}_x_{col_b}” df_poly[feature_name] = df_poly[col_a] * df_poly[col_b]
poly_cols = [c for c in df_poly.columns if “^2” in c or “_x_” in c] print(df_poly[poly_cols].round(3)) |
Output:
avg_order_value_x_items_per_order discount_rate^2 \
0 456.00 0.010
1 160.20 0.062
2 1072.53 0.003
3 81.36 0.090
discount_rate_x_items_per_order items_per_order^2
0 0.320 10.24
1 0.450 3.24
2 0.255 26.01
3 0.360 1.44
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
avg_order_value^2 avg_order_value_x_discount_rate \ 0 20306.25 14.250 1 7921.00 22.250 2 44226.09 10.515 3 4596.84 20.340
avg_order_value_x_items_per_order discount_rate^2 \ 0 456.00 0.010 1 160.20 0.062 2 1072.53 0.003 3 81.36 0.090
discount_rate_x_items_per_order items_per_order^2 0 0.320 10.24 1 0.450 3.24 2 0.255 26.01 3 0.360 1.44 |
The difference from combinations is in the name: combinations_with_replacement allows the same element to appear twice. That’s what gives you the squared terms (avg_order_value^2). Use this when you want polynomial expansion without pulling in scikit-learn just for preprocessing.
7. Accumulating Cumulative Behavioral Features with accumulate
itertools.accumulate computes running aggregates over a sequence without needing pandas or NumPy.
Cumulative features — running total spend, cumulative order count, and running average basket size — are useful signals for lifetime value modeling and churn prediction. A customer’s cumulative spend at order 5 says something different than their spend at order 15. Here’s a useful example:
# Customer C-20917: chronological order amounts
order_amounts = [56.80, 123.40, 89.90, 245.00, 67.50, 310.20, 88.75]
# Cumulative spend
cumulative_spend = list(itertools.accumulate(order_amounts))
# Cumulative max spend (highest single order so far)
cumulative_max = list(itertools.accumulate(order_amounts, func=max))
# Cumulative order count (just using addition on 1s)
cumulative_count = list(itertools.accumulate([1] * len(order_amounts)))
features_df = pd.DataFrame({
“order_number”: range(1, len(order_amounts) + 1),
“order_amount”: order_amounts,
“cumulative_spend”: cumulative_spend,
“cumulative_max_order”: cumulative_max,
“order_count_so_far”: cumulative_count,
})
features_df[“avg_spend_so_far”] = (
features_df[“cumulative_spend”] / features_df[“order_count_so_far”]
).round(2)
print(features_df.to_string(index=False))
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
import itertools
# Customer C-20917: chronological order amounts order_amounts = [56.80, 123.40, 89.90, 245.00, 67.50, 310.20, 88.75]
# Cumulative spend cumulative_spend = list(itertools.accumulate(order_amounts))
# Cumulative max spend (highest single order so far) cumulative_max = list(itertools.accumulate(order_amounts, func=max))
# Cumulative order count (just using addition on 1s) cumulative_count = list(itertools.accumulate([1] * len(order_amounts)))
features_df = pd.DataFrame({ “order_number”: range(1, len(order_amounts) + 1), “order_amount”: order_amounts, “cumulative_spend”: cumulative_spend, “cumulative_max_order”: cumulative_max, “order_count_so_far”: cumulative_count, })
features_df[“avg_spend_so_far”] = ( features_df[“cumulative_spend”] / features_df[“order_count_so_far”] ).round(2)
print(features_df.to_string(index=False)) |
Output:
|
order_number order_amount cumulative_spend cumulative_max_order order_count_so_far avg_spend_so_far 1 56.80 56.80 56.8 1 56.80 2 123.40 180.20 123.4 2 90.10 3 89.90 270.10 123.4 3 90.03 4 245.00 515.10 245.0 4 128.78 5 67.50 582.60 245.0 5 116.52 6 310.20 892.80 310.2 6 148.80 7 88.75 981.55 310.2 7 140.22 |
accumulate takes an optional func argument — any two-argument function. The default is addition, but max, min, operator.mul, or a custom lambda all work. In this example, each row in the output is a snapshot of the customer’s history at that point in time. This is useful when building features for sequential models or training data where you must avoid leakage.
Wrapping Up
I hope you found this article on using Python’s itertools module for feature engineering helpful. Here’s a quick reference for when to reach for each function:
| Function | Feature Engineering Use Case |
|---|---|
combinations |
Pairwise interaction features |
product |
Cross-category feature grids |
chain |
Merging feature lists from multiple sources |
islice |
Lag and rolling window features |
groupby |
Per-group aggregation features |
combinations_with_replacement |
Polynomial / squared features |
accumulate |
Cumulative behavioral features |
A useful habit to build here is recognizing when a feature engineering problem is, at its core, an iteration problem. When it is, itertools almost always has a cleaner answer than a custom function with hard-to-maintain loops. In the next article, we’ll focus on building features for time series data. Until then, happy coding!
