Sustainability Toolkit — MillerKnoll AI

01 — Baseline

You can't reduce what you haven't named.

Most teams don't have a clear picture of what AI they're running, how often, or for what. That's not negligence. It's how tools get adopted. You install them, you use them, you don't inventory them.

But AI tools aren't free to run. They consume energy. The consumption varies a lot, depending on which model, how often, and where it's running.

Start here: map your AI touchpoints. Which tools does your team use? How often? For what types of tasks? This isn't a data science exercise. It's a ten-minute conversation with a piece of paper.

What generates AI energy use, roughly in order of impact: training large foundation models (not your problem directly, but worth knowing it happened) → inference at enterprise scale → individual use of frontier models for everyday tasks.

Baseline Worksheet — Map Your AI Footprint

Tool name Frequency Task type Model tier

02 — Measure

Not everything. Just the signals that actually tell you something.

Token volume.

The most honest proxy for compute, where platforms expose it. More tokens = more compute = more energy. Imprecise. Still better than nothing.

Model selection.

This is where most of the variance lives. Running a frontier model on a three-word email subject line isn't a sustainability issue. It's a waste of compute and money. The right model for the right task is the highest-leverage decision most teams aren't making deliberately.

GPT-4 class on a 500-word summary vs. GPT-4o mini: ~10–20× higher token cost. Same quality on most summaries.

Frontier

Large models

Best for: complex reasoning, multi-step synthesis, nuanced judgment calls. High cost per token.

Mid-tier models handle most everyday tasks at 3–8× lower compute cost than frontier equivalents.

Mid-tier

Balanced models

Best for: drafting, summarizing, Q&A, most daily work tasks. Good quality, lower cost.

Small models handle classification, extraction, and simple generation at a fraction of the cost. Most teams underuse them.

Small / Local

Efficient models

Best for: classification, extraction, structured data, simple single-turn tasks. Lowest compute cost.

On-device vs. cloud inference.

Matters more as AI moves closer to local models. On-device is generally more efficient for simple tasks. Cloud is necessary for complex ones. Knowing the difference is an emerging skill worth building now.

Avoided emissions.

Underused as a frame. When AI replaces a process that had its own footprint: a business trip, a print run, a redundant approval cycle. That displacement counts. Name it. Track it.

When you add an AI workflow to your inventory: what did this replace? The answer changes the net impact math.

Tools worth knowing

Electricity Maps Carbon intensity by region, real-time

Green Software Foundation Standards and measurement frameworks

Provider reports Microsoft, Google, Anthropic each publish sustainability data. None of them are perfect. All are better than guessing.

03 — React

Three levels. Start at the one that's yours.

Individual

Right-size your model.

GPT-4 for a subject line is like driving a semi truck to pick up a sandwich. Use a smaller model for simple, high-frequency tasks. Most platforms let you choose. Most people don't.

Team

Build a use-case filter.

Before deploying a new AI workflow, name one thing it replaces. That replacement had a footprint too, and the net matters. Then ask: what model does this actually need? What's the frequency? If the answers point toward a high-compute, high-frequency workflow that doesn't replace anything, that's the conversation to have before you build, not after.

Enterprise

Procurement questions that matter.

These are reasonable questions. Vendors who won't answer them are telling you something.

Where are your data centers, and what's the carbon intensity of the grid they run on?

What sustainability SLAs do you offer, and how are they measured?

How do you report on scope 3 emissions from your customers' AI usage?

What is the vendor doing to reduce inference cost per token over time?

What are you doing beyond carbon credits to address your environmental footprint?

04 — Best Practices

Organized by role. The decisions that move the needle.

Choose the smaller model when the task doesn't need the frontier one.

Why it matters: Model selection is the single highest-leverage behavioral change.

How to start: Next time you open an AI tool, look for the model selector. Try the mid-tier once.

Write prompts that get it right in fewer iterations. Every retry is compute.

Why it matters: Prompt iteration is invisible compute spend. It adds up.

How to start: Before submitting a prompt, add one sentence of context and one sentence of format instruction. Measure whether you iterate less.

When in doubt about whether to run an AI task, ask: would I make this decision with a human analyst?

Why it matters: Not every task earns the compute cost. This question surfaces the ones that don't.

How to start: Apply it to the next five AI tasks you run. Notice the pattern.

Before deploying an agent: what does this replace?

Why it matters: If nothing, it's additive compute. Name what it replaces before you build.

How to start: Make this question part of your agent deployment checklist. No answer = not ready to ship.

Set a 90-day review cadence for every agent before you ship it.

Why it matters: Agents drift. Workflows get stale. A check-in prevents zombie compute spend.

How to start: Put a calendar reminder the day you ship. 90 days. "Is this still earning its keep?"

Ask vendors for data center location and grid carbon intensity before signing.

Why it matters: Where compute runs matters. The same model running on a coal-heavy grid vs. a renewables-heavy one has very different emissions.

How to start: Add to your standard vendor questionnaire. Use the procurement questions above.

Don't accept carbon credits as a substitute for real measurement.

Why it matters: Credits are an accounting tool. They're not the same as reduced emissions. Ask for both.

How to start: When a vendor says "carbon neutral": ask how. If the answer is credits only, that's your answer.

AI has a footprint.
Pretending otherwise
doesn't help anyone.

You can't reduce what you haven't named.

Not everything. Just the signals that actually tell you something.

Token volume.

Model selection.

Large models

Balanced models

Efficient models

On-device vs. cloud inference.

Avoided emissions.

Tools worth knowing

Three levels. Start at the one that's yours.

Right-size your model.

Build a use-case filter.

Procurement questions that matter.

Organized by role. The decisions that move the needle.

Choose the smaller model when the task doesn't need the frontier one.

Write prompts that get it right in fewer iterations. Every retry is compute.

When in doubt about whether to run an AI task, ask: would I make this decision with a human analyst?

Before deploying an agent: what does this replace?

Set a 90-day review cadence for every agent before you ship it.

Ask vendors for data center location and grid carbon intensity before signing.

Don't accept carbon credits as a substitute for real measurement.

This is that. For compute.

Weight class discipline

Grid-aware scheduling

Vanity gate

AI has a footprint.Pretending otherwisedoesn't help anyone.

You can't reduce what you haven't named.

Not everything. Just the signals that actually tell you something.

Token volume.

Model selection.

Large models

Balanced models

Efficient models

On-device vs. cloud inference.

Avoided emissions.

Tools worth knowing

Three levels. Start at the one that's yours.

Right-size your model.

Build a use-case filter.

Procurement questions that matter.

Organized by role. The decisions that move the needle.

Choose the smaller model when the task doesn't need the frontier one.

Write prompts that get it right in fewer iterations. Every retry is compute.

When in doubt about whether to run an AI task, ask: would I make this decision with a human analyst?

Before deploying an agent: what does this replace?

Set a 90-day review cadence for every agent before you ship it.

Ask vendors for data center location and grid carbon intensity before signing.

Don't accept carbon credits as a substitute for real measurement.

This is that. For compute.

Weight class discipline

Grid-aware scheduling

Vanity gate

AI has a footprint.
Pretending otherwise
doesn't help anyone.