Sustainability Toolkit

AI has a footprint.
Pretending otherwise
doesn't help anyone.

This toolkit is a starting point, not a verdict. If you're using AI (and you are), you're generating compute load. Some of it is necessary. Some of it isn't. The work is learning to tell the difference. Not using less. Using deliberately.

LightModerateHeavy
Typical team usage

Not a guilt gauge. A compass. Shows where typical individual, team, and enterprise usage sits relative to common benchmarks. Ranges, not false precision.

The goal isn't to use less AI. It's to use it with the same intentionality you'd bring to any material decision.

See how AI is reshaping the workplace more broadly.

AI and the Workplace →
01 — Baseline

You can't reduce what you haven't named.

Most teams don't have a clear picture of what AI they're running, how often, or for what. That's not negligence. It's how tools get adopted. You install them, you use them, you don't inventory them.

But AI tools aren't free to run. They consume energy. The consumption varies a lot, depending on which model, how often, and where it's running.

Start here: map your AI touchpoints. Which tools does your team use? How often? For what types of tasks? This isn't a data science exercise. It's a ten-minute conversation with a piece of paper.

What generates AI energy use, roughly in order of impact: training large foundation models (not your problem directly, but worth knowing it happened) → inference at enterprise scale → individual use of frontier models for everyday tasks.
Baseline Worksheet — Map Your AI Footprint
Tool name Frequency Task type Model tier
02 — Measure

Not everything. Just the signals that actually tell you something.

Token volume.

The most honest proxy for compute, where platforms expose it. More tokens = more compute = more energy. Imprecise. Still better than nothing.

Model selection.

This is where most of the variance lives. Running a frontier model on a three-word email subject line isn't a sustainability issue. It's a waste of compute and money. The right model for the right task is the highest-leverage decision most teams aren't making deliberately.

GPT-4 class on a 500-word summary vs. GPT-4o mini: ~10–20× higher token cost. Same quality on most summaries.
Frontier

Large models

Best for: complex reasoning, multi-step synthesis, nuanced judgment calls. High cost per token.

Mid-tier models handle most everyday tasks at 3–8× lower compute cost than frontier equivalents.
Mid-tier

Balanced models

Best for: drafting, summarizing, Q&A, most daily work tasks. Good quality, lower cost.

Small models handle classification, extraction, and simple generation at a fraction of the cost. Most teams underuse them.
Small / Local

Efficient models

Best for: classification, extraction, structured data, simple single-turn tasks. Lowest compute cost.

On-device vs. cloud inference.

Matters more as AI moves closer to local models. On-device is generally more efficient for simple tasks. Cloud is necessary for complex ones. Knowing the difference is an emerging skill worth building now.

Avoided emissions.

Underused as a frame. When AI replaces a process that had its own footprint: a business trip, a print run, a redundant approval cycle. That displacement counts. Name it. Track it.

When you add an AI workflow to your inventory: what did this replace? The answer changes the net impact math.

Tools worth knowing

Electricity Maps Carbon intensity by region, real-time
Green Software Foundation Standards and measurement frameworks
Provider reports Microsoft, Google, Anthropic each publish sustainability data. None of them are perfect. All are better than guessing.
03 — React

Three levels. Start at the one that's yours.

Individual

Right-size your model.

GPT-4 for a subject line is like driving a semi truck to pick up a sandwich. Use a smaller model for simple, high-frequency tasks. Most platforms let you choose. Most people don't.

Team

Build a use-case filter.

Before deploying a new AI workflow, name one thing it replaces. That replacement had a footprint too, and the net matters. Then ask: what model does this actually need? What's the frequency? If the answers point toward a high-compute, high-frequency workflow that doesn't replace anything, that's the conversation to have before you build, not after.

Enterprise

Procurement questions that matter.

These are reasonable questions. Vendors who won't answer them are telling you something.

Where are your data centers, and what's the carbon intensity of the grid they run on?
What sustainability SLAs do you offer, and how are they measured?
How do you report on scope 3 emissions from your customers' AI usage?
What is the vendor doing to reduce inference cost per token over time?
What are you doing beyond carbon credits to address your environmental footprint?
04 — Best Practices

Organized by role. The decisions that move the needle.

Choose the smaller model when the task doesn't need the frontier one.

Why it matters: Model selection is the single highest-leverage behavioral change.
How to start: Next time you open an AI tool, look for the model selector. Try the mid-tier once.

Write prompts that get it right in fewer iterations. Every retry is compute.

Why it matters: Prompt iteration is invisible compute spend. It adds up.
How to start: Before submitting a prompt, add one sentence of context and one sentence of format instruction. Measure whether you iterate less.

When in doubt about whether to run an AI task, ask: would I make this decision with a human analyst?

Why it matters: Not every task earns the compute cost. This question surfaces the ones that don't.
How to start: Apply it to the next five AI tasks you run. Notice the pattern.

Before deploying an agent: what does this replace?

Why it matters: If nothing, it's additive compute. Name what it replaces before you build.
How to start: Make this question part of your agent deployment checklist. No answer = not ready to ship.

Set a 90-day review cadence for every agent before you ship it.

Why it matters: Agents drift. Workflows get stale. A check-in prevents zombie compute spend.
How to start: Put a calendar reminder the day you ship. 90 days. "Is this still earning its keep?"

Ask vendors for data center location and grid carbon intensity before signing.

Why it matters: Where compute runs matters. The same model running on a coal-heavy grid vs. a renewables-heavy one has very different emissions.
How to start: Add to your standard vendor questionnaire. Use the procurement questions above.

Don't accept carbon credits as a substitute for real measurement.

Why it matters: Credits are an accounting tool. They're not the same as reduced emissions. Ask for both.
How to start: When a vendor says "carbon neutral": ask how. If the answer is credits only, that's your answer.
Where We Stand

This is that. For compute.

The Aeron wasn't designed to be ergonomic. It was designed because sitting in most office chairs for eight hours causes real physical harm. Nobody had treated that as a design problem worth solving seriously. The chair came from the conviction that material decisions and human outcomes are inseparable. You don't get to call something well-designed if it damages the body it's built for.

We're not neutral on AI's environmental impact. It's real, underdisclosed, and worth naming honestly. The answer isn't to use less AI. It's to use it with the same intentionality we bring to any material decision. We call it three disciplines internally:

Weight class discipline

Right model for the right task, every time.

Grid-aware scheduling

Where it's possible, run heavy inference when the grid is cleaner.

Vanity gate

If you can't estimate the environmental cost of a project, you're not ready to run it.

We don't accept vendor carbon credits as a substitute for real measurement. We don't think you should either.

See the broader picture on AI and work.

AI and the Workplace →

What sustainability question do you want us to answer next?

Questions already in the queue
How does MillerKnoll measure Scope 3 emissions from AI vendor usage?
Which AI providers have the best verified sustainability track records right now?
Is on-device AI actually more efficient for our use cases?