Blog
Blog
All About That Baseline
All About That Baseline

The Leap Team

Baselining isn’t the flashiest topic, even in the energy world, but it’s arguably one of the most important ones for understanding how virtual power plant (VPP) performance is measured — and how grid revenue is captured.
Baselines 101
A baseline is an estimate of what a customer’s electricity consumption would have been without a grid event.
When a grid event is called, distributed energy resources (DERs) respond by reducing load, shifting usage, or discharging energy back to the grid. To measure whether that load curtailment actually happened, and by how much, you need two numbers: the actual load during the event, and the expected load if no action had been taken.
That expected load is the baseline. The difference between the baseline and actual consumption represents the delivered curtailment, which ultimately determines performance and compensation.

Here’s a simple example:
A building used 500 kWh during a grid event — how do we know if it reduced load as intended? If the building normally uses 600 kWh at the time of the event, it delivered 100 kWh of load reduction. If the building normally uses 450 kWh, it actually consumed more power than expected during the event and underperformed.

How are baselines calculated?
Grid services administrators use several different methodologies to calculate baselines, depending on the program.
Most baselines are built from historical usage data, looking back at how a site or device consumed electricity on similar days before the event. The most common approach is often called an "X of Y" formula: for example, averaging the highest five load values from the ten most recent comparable days. This gives a reasonable approximation of what "normal" usage looks like.

But historical data alone doesn't always tell the full story. Weather is one major variable; a baseline calculated on mild spring days may significantly underestimate how hard a building's HVAC would have been working on a hot summer event day. To address this, many programs include a weather adjustment that scales the baseline up or down to better reflect conditions on the day of the event itself.
Some emergency programs take a different approach altogether: instead of comparing performance to recent historical days, programs like PJM’s capacity market and NYISO’s Special Case Resources (SCR) calculate baselines using a customer’s average load during the prior year’s highest grid-stress hours — for example, the top five summer peak hours. The idea is that those hours best represent the conditions under which emergency events occur.
This creates an important dynamic for flexible assets. If a site is already reducing load during peak system hours — to manage demand charges, for example — that reduced consumption becomes part of the baseline itself. As a result, there may be less measurable curtailment available during an emergency event, even though the asset is still providing meaningful grid value.
The baseline challenges for DERs
Even well-designed baselines can run into practical distortion, particularly for DERs. Newly installed battery systems, EV charging infrastructure, and other flexible assets may not have the historical data needed for traditional lookback methodologies to accurately estimate expected behavior.
Battery storage and EV charging also introduce another layer of complexity because they often shift load on a day-to-day basis. A battery may charge overnight, discharge during peak hours, and continuously optimize around rates, solar production, or operational goals. Over time, those shifting behaviors become embedded in the historical data used to calculate the baseline itself.
The more consistently a flexible asset responds to price signals or grid conditions, the more that behavior can get “baked into” the expected baseline. In some cases, that means the asset’s actual flexibility is only partially reflected in measured performance.

Because of this, understanding how a program calculates baselines increasingly needs to be part of DER operating strategy itself, especially for storage and EV assets whose charging and discharging behavior can directly influence future performance calculations.
Some grid services programs address this challenge by using device-level telemetry, rather than meter-level data, to provide a more direct view into how an asset responded during an event in order to calculate DER performance.
Baselining in a dynamic DER environment
Baseline management is an ongoing discipline. Different programs use different methodologies, and the rules that apply to one market may not apply to another. Some programs offer multiple baseline options, allowing aggregators to select the methodology that best fits a given portfolio.
The goal is always the same: a baseline that accurately represents what would have happened. When baselines are well-matched to the resources they're measuring, the value that DERs actually deliver to the grid gets properly credited. When they're not, that value can go uncaptured — not because the resource underperformed, but because the measurement didn't reflect it.
How Leap helps
Leap’s platform calculates baselines across a variety of load types on behalf of our 100+ partners, applying the methodologies each program requires and staying current as those rules evolve.
Accurate baseline calculations depend on continuous access to high-quality interval data, and Leap’s platform manages the ongoing collection and validation of that data stream across utilities, meters, and device integrations. That includes constantly requesting, processing, and validating the interval data needed to support performance measurement and settlement.
From there, Leap helps ensure the appropriate baseline methodologies are being applied based on program requirements and asset type, particularly for dynamic resources like battery storage and EV charging. Because we understand how different baseline methodologies interact with DER operations, we can help partners think strategically about dispatch behavior, charging patterns, and usage strategies that support strong grid performance while still balancing other asset priorities like bill savings or backup power readiness.
By performing baseline calculations internally, our platform automatically provides performance insights and preliminary revenue estimates far earlier than most program administrators, often months ahead of official settlements. Our technology also offers more accurate revenue forecasting, helping partners better understand the grid revenue potential of their portfolios before events even occur.
Beyond baseline management, Leap works closely with program administrators and regulators to advocate for baseline approaches that fairly reflect how modern DERs operate.
The goal is to make sure that your portfolio’s grid contributions are accurately reflected in what you get paid. Baselines are the mechanism that makes that possible, and getting them right is a core part of how we maximize revenue for our partners.
Baselining isn’t the flashiest topic, even in the energy world, but it’s arguably one of the most important ones for understanding how virtual power plant (VPP) performance is measured — and how grid revenue is captured.
Baselines 101
A baseline is an estimate of what a customer’s electricity consumption would have been without a grid event.
When a grid event is called, distributed energy resources (DERs) respond by reducing load, shifting usage, or discharging energy back to the grid. To measure whether that load curtailment actually happened, and by how much, you need two numbers: the actual load during the event, and the expected load if no action had been taken.
That expected load is the baseline. The difference between the baseline and actual consumption represents the delivered curtailment, which ultimately determines performance and compensation.

Here’s a simple example:
A building used 500 kWh during a grid event — how do we know if it reduced load as intended? If the building normally uses 600 kWh at the time of the event, it delivered 100 kWh of load reduction. If the building normally uses 450 kWh, it actually consumed more power than expected during the event and underperformed.

How are baselines calculated?
Grid services administrators use several different methodologies to calculate baselines, depending on the program.
Most baselines are built from historical usage data, looking back at how a site or device consumed electricity on similar days before the event. The most common approach is often called an "X of Y" formula: for example, averaging the highest five load values from the ten most recent comparable days. This gives a reasonable approximation of what "normal" usage looks like.

But historical data alone doesn't always tell the full story. Weather is one major variable; a baseline calculated on mild spring days may significantly underestimate how hard a building's HVAC would have been working on a hot summer event day. To address this, many programs include a weather adjustment that scales the baseline up or down to better reflect conditions on the day of the event itself.
Some emergency programs take a different approach altogether: instead of comparing performance to recent historical days, programs like PJM’s capacity market and NYISO’s Special Case Resources (SCR) calculate baselines using a customer’s average load during the prior year’s highest grid-stress hours — for example, the top five summer peak hours. The idea is that those hours best represent the conditions under which emergency events occur.
This creates an important dynamic for flexible assets. If a site is already reducing load during peak system hours — to manage demand charges, for example — that reduced consumption becomes part of the baseline itself. As a result, there may be less measurable curtailment available during an emergency event, even though the asset is still providing meaningful grid value.
The baseline challenges for DERs
Even well-designed baselines can run into practical distortion, particularly for DERs. Newly installed battery systems, EV charging infrastructure, and other flexible assets may not have the historical data needed for traditional lookback methodologies to accurately estimate expected behavior.
Battery storage and EV charging also introduce another layer of complexity because they often shift load on a day-to-day basis. A battery may charge overnight, discharge during peak hours, and continuously optimize around rates, solar production, or operational goals. Over time, those shifting behaviors become embedded in the historical data used to calculate the baseline itself.
The more consistently a flexible asset responds to price signals or grid conditions, the more that behavior can get “baked into” the expected baseline. In some cases, that means the asset’s actual flexibility is only partially reflected in measured performance.

Because of this, understanding how a program calculates baselines increasingly needs to be part of DER operating strategy itself, especially for storage and EV assets whose charging and discharging behavior can directly influence future performance calculations.
Some grid services programs address this challenge by using device-level telemetry, rather than meter-level data, to provide a more direct view into how an asset responded during an event in order to calculate DER performance.
Baselining in a dynamic DER environment
Baseline management is an ongoing discipline. Different programs use different methodologies, and the rules that apply to one market may not apply to another. Some programs offer multiple baseline options, allowing aggregators to select the methodology that best fits a given portfolio.
The goal is always the same: a baseline that accurately represents what would have happened. When baselines are well-matched to the resources they're measuring, the value that DERs actually deliver to the grid gets properly credited. When they're not, that value can go uncaptured — not because the resource underperformed, but because the measurement didn't reflect it.
How Leap helps
Leap’s platform calculates baselines across a variety of load types on behalf of our 100+ partners, applying the methodologies each program requires and staying current as those rules evolve.
Accurate baseline calculations depend on continuous access to high-quality interval data, and Leap’s platform manages the ongoing collection and validation of that data stream across utilities, meters, and device integrations. That includes constantly requesting, processing, and validating the interval data needed to support performance measurement and settlement.
From there, Leap helps ensure the appropriate baseline methodologies are being applied based on program requirements and asset type, particularly for dynamic resources like battery storage and EV charging. Because we understand how different baseline methodologies interact with DER operations, we can help partners think strategically about dispatch behavior, charging patterns, and usage strategies that support strong grid performance while still balancing other asset priorities like bill savings or backup power readiness.
By performing baseline calculations internally, our platform automatically provides performance insights and preliminary revenue estimates far earlier than most program administrators, often months ahead of official settlements. Our technology also offers more accurate revenue forecasting, helping partners better understand the grid revenue potential of their portfolios before events even occur.
Beyond baseline management, Leap works closely with program administrators and regulators to advocate for baseline approaches that fairly reflect how modern DERs operate.
The goal is to make sure that your portfolio’s grid contributions are accurately reflected in what you get paid. Baselines are the mechanism that makes that possible, and getting them right is a core part of how we maximize revenue for our partners.

