Data Science

A Gravity Model in 6 Weeks: Site Selection a Board Will Defend

· 21 min read

A retail CFO sits in a Tuesday lease committee. The team brings a one-pager. The candidate site, the rent, the build-out cost, the projected first-year revenue. The revenue number has three decimals. Nobody in the room can defend how those three decimals were arrived at.

In This Article

  1. What you actually walk into a lease committee with
  2. The four inputs nobody collects properly
  3. The gravity model itself
  4. What a 6-week build actually looks like
  5. The three numbers a board actually wants
  6. Four failure modes that quietly kill the model
  7. What to do this week

This is the moment most new-store programs quietly break. The lease gets signed because the math looks like math. The store opens fourteen months later. Revenue lands at 62% of plan. Cannibalization of the existing store nine kilometres east takes another 8% off the portfolio. The committee writes off the gap as “soft category”, or “weather”, or “post-launch ramp longer than expected”. The model that produced the original number is never recalibrated, because nobody owns it and nobody trusts it.

The fix is not a better spreadsheet. The fix is a gravity model. Properly built, it predicts revenue for a candidate site within a 10 to 15% band, accounts for cannibalization of your existing footprint, and gets recalibrated every time a store opens and twelve months of actuals come in. It is the only forecasting method we have seen survive a CFO and a board over multiple expansion cycles.

A gravity model is not exotic. The math is older than most retail finance teams. The data is mostly available. But almost nobody builds one, because each piece on its own looks like a small project, and the team that owns site selection is rarely the team that owns analytics. So expansion runs on comp-store averages and three-variable regressions, and the portfolio quietly takes the loss.

This is the playbook we use when a CFO asks us to put a defendable forecast under their next ten openings. Six weeks, one calibrated model, a board memo on the day of the lease decision.

What you actually walk into a lease committee with

Before we get to the model, define the decision. The thing the lease committee needs to do is allocate roughly $1 to $5M of capex per site. They will ask, in this order:

  1. What is the expected revenue at this site over the first three years.
  2. What is the confidence interval. Specifically, what is the 10th and 90th percentile.
  3. How much of that revenue is cannibalized from existing stores.
  4. Net of cannibalization, what is the contribution to the portfolio.
  5. What has to be true for our forecast to land within plan.

A comp-store average answers question one badly and leaves the other four unanswered. A regression on (footfall, square footage, format) answers questions one and two, but treats every site as an island and ignores the fact that your nine kilometres east store and your candidate share half a population. A gravity model answers all five.

The deliverable from the modeling team is not “the forecast.” It is a memo with five numbers and a 1-page map. Point estimate, low, high, cannibalization, net portfolio impact. Anything more is theatre. Anything less is reckless.

The reason this matters is that capital allocators behave very differently when they get all five numbers than when they get only one. With one number, every decision is a yes-or-no. With five, the committee can run trade-offs. Should we open this site at 60% confidence, or wait for the data, or hold the capex for a higher-confidence candidate three months from now. That is the conversation that allows a board to defend the expansion plan, and it cannot happen without a model that produces all five.

The four inputs nobody collects properly

A gravity model rests on four data layers. Most retailers have versions of all four. Almost no retailer has all four at the resolution the model needs. Here is the difference between what is in the network planning folder and what the model actually consumes.

The four inputs to a gravity model: what most companies have today vs what the model needs.
The four inputs to a gravity model: what most companies have today vs what the model needs.
Population at house resolution. Network teams have district-level population from the national statistics office. The model needs population at the level of individual housing, weighted by household size and income. The reason is that catchment zones drawn on a 10-minute drive radius cut through dozens of housing blocks, and the average district number is unable to tell the model whether the candidate site sits next to a 2,000-household high-rise cluster or a 200-household private development. The gap is built using a residential cadastre, sometimes purchased, sometimes scraped, sometimes constructed from satellite imagery. It is the single biggest determinant of model quality.

Road network and traffic. Network teams have a road map. The model needs an average-daily-traffic estimate per road segment, plus an estimate of pedestrian flow at the segment closest to the site. The latter is often the bottleneck. Foot traffic counts at a candidate intersection are usually purchased from a specialist data provider, occasionally built from mobile-device pings, and almost never collected in-house consistently. Without pedestrian flow, the model overweights drivers and underweights walk-in customers, which is how a 60-store consumer electronics omnichannel retailer in a major European market under-forecast a 16,000 sqm city store by more than 30%.

Competitor footprint, with catchment overlap. Network teams count competitors per district. The model needs each competitor as a single point with its own approximate catchment, so that overlap with your candidate catchment can be computed. Counting competitors is not the same as modeling their pull. Two electronics stores in a district mean very different things when one of them sits on the same arterial as your site and the other is across a river with no easy crossing.

Real market capacity, not market share. Network teams reference market share figures from a syndicated source. The model needs the absolute market capacity in the catchment, computed from household counts, average income, average share of wallet on the category, and a corrective for the catchment’s distance from the regional income centre. Market share figures already assume the market is the size somebody else said it was. The model recomputes the market on the candidate’s own catchment, because that is what determines the ceiling.

The gravity model itself

The math is older than the discipline. Reilly’s law of retail gravitation, 1931. Huff’s probabilistic version, 1964. The version that goes into a modern site-selection model looks roughly like this:

For each candidate site i, projected sales V are a function of three things added together over every catchment cell j: the population in cell j, the gravitational pull of the candidate site on cell j, and the gravitational pull of every other competitor or own-store on cell j. The pull terms decay with distance and adjust for the road segment carrying traffic to the site. Calibration coefficients turn these terms into a sales figure, and they are fit by minimising squared error against the actuals of your existing stores.

In a slightly less abstract form:

> V(i) = Σ over j of [ Population(j) × K(i, j) ], where K(i, j) = Pull(i, j) ÷ ( Pull(i, j) + Σ Pull(c, j) for every competitor and own-store c )

The pull term itself is where most teams stop reading. It is the product of two ratios. The first ratio captures direct distance and decays sharply. The second ratio captures access, weighted by traffic volume and the perpendicular distance from the store to its nearest road segment, also adjusted by pedestrian flow. Each of these subterms has a coefficient, and the whole apparatus is calibrated by maximising the correlation between modelled and actual sales across the existing portfolio. Conjugate gradient or any modern optimiser converges in seconds.

What matters is what changes when you flip the inputs. Add a competitor next door, and K(i, j) for every nearby cell drops sharply, because the pull term in the denominator grows. Remove an own-store, and K(i, j) for cells that were partly absorbed by it goes up, which is the cannibalization recapture coming back. Open a new own-store, and every cell that the new store can reach pulls some of its share away from existing stores, which is cannibalization being computed automatically, in the same equation that produces the revenue forecast.

This is the single most important property of the model. Cannibalization is not a separate post-hoc adjustment. It falls out of the same gravity calculation that produces the revenue number for the new site. A regression model has to be retro-fitted with a cannibalization assumption that is almost always wrong. The gravity model produces the cannibalization number as a byproduct of the forecast.

A practical example. The 60-store consumer electronics retailer mentioned earlier had to decide whether to open a 2,000 sqm cybermarket on a major avenue in a regional capital, given that two existing stores sat within a 20-minute drive. The model returned a point estimate of roughly 109M in monthly revenue for the new store, and an estimated 6.5% portfolio uplift after recapture. It also estimated, in the same equation, that the two nearest existing stores would each lose 18 to 22% of their revenue to the new opening. That number, in particular, was what made the committee comfortable. They had a portfolio-net answer, not a single-store wish.

What a 6-week build actually looks like

The reason teams stall on this is that the data work looks like a year of pipeline-building before any model is calibrated. It is not. A focused team of two analysts, one geomarketing specialist, and a CFO-level sponsor can deliver a calibrated model and a board memo in six weeks. The cadence is what makes it possible.

The 6-week build: what gets shipped at the end of each week, and what carries over.
The 6-week build: what gets shipped at the end of each week, and what carries over.
Week 1: diagnostic. Inventory the four input layers. Score what you already have and what is missing. Pick the pilot city, usually the one with the most existing stores plus at least one credible candidate site. This is where the project either closes a 50% data gap or admits it and goes to procurement. The deliverable is a one-page data-readiness memo.

Week 2: catchment geometry. Build catchment zones for every existing store and every candidate. Three zones each, 10, 20, and 30-minute drive isochrones, computed off a routing engine that respects time-of-day, not a Euclidean buffer. Overlay them on the population layer. The deliverable is a map per store and per candidate, with population, household count, and average income summed per zone.

Week 3: calibration. Pull two years of actuals from the data warehouse. Build the gravity model in code, not Excel, because you will recalibrate. Fit coefficients against the existing portfolio. Target R² north of 0.7. Anything lower means an input layer is broken, almost always the pedestrian flow or the income proxy, and the answer is to fix the input, not loosen the threshold. The deliverable is a calibration report with residuals per existing store.

Week 4: candidate scoring. Run the calibrated model against each candidate site. Produce point, low, and high estimates for first-year, second-year, and third-year revenue. The low and high are not invented. They come from sensitivity analysis on the three biggest uncertainty drivers, which the calibration step will have identified. The deliverable is a ranked candidate list with three-year revenue ranges.

Week 5: cannibalization and portfolio. For each candidate, run the model twice. Once with the candidate added to the network, once without. The delta on every existing store is the cannibalization. Net the candidate’s revenue against the cannibalization to get the portfolio contribution. This is the number the CFO actually cares about. The deliverable is a portfolio impact table per candidate.

Week 6: board memo and recalibration plan. Two pages. Page one, the five numbers and the map. Page two, the assumptions, the confidence intervals, the cannibalization estimates, and the recalibration cadence going forward. The recalibration plan is critical. Every time a store opens and twelve months of actuals come in, the model should refit, the residual on the new store should be reviewed, and the coefficients should be updated. A gravity model that is never recalibrated decays into a regression model within a year.

The team behind the build matters. The analytics owner cannot be the network planning team’s spreadsheet champion, because that person has the wrong incentives. The owner should be a fractional or full-time geomarketing analyst reporting into Strategy or Finance. The network planning team consumes the model. They do not own it. This separation is what stops the model from quietly bending to support deals that are already politically decided.

The three numbers a board actually wants

The output of the model is many things. The output of the project is three numbers per candidate, plus a fourth for the portfolio.

Point estimate. First-year revenue at the candidate site, in absolute currency. This is what gets compared to lease economics and capex.

90% confidence interval. Low and high bounds on the first-year revenue. The interval should come from a sensitivity analysis, not a guess. Sensitivity is run on the three inputs that drive most of the model’s variance: usually competitor footprint, pedestrian flow, and household income in the inner catchment. If your confidence interval is wider than 30% of the point estimate, something in the calibration is broken. If it is narrower than 10%, your sensitivity analysis is too tight and you are pretending to know more than you do.

Cannibalization estimate. Revenue lost from existing stores due to the new opening. This is the number that almost no retailer reports and almost every board ends up asking about. With a gravity model, it falls out automatically. Without one, it is invented. We have reviewed expansion plans for retailers in three different verticals where the cannibalization assumption was either zero or a single round number applied to every opening, and in every case the portfolio impact was overstated by 8 to 15%.

Net portfolio contribution. The candidate’s revenue minus the cannibalization. This is the number the board memo leads with. A candidate with 100M in revenue and 40M in cannibalization contributes 60M to the portfolio. A candidate with 70M in revenue and 5M in cannibalization contributes 65M. The second candidate is better, even though its standalone forecast is smaller. This is the conversation that the gravity model enables, and the conversation that a comp-store average makes impossible.

Before and after a new store opens: catchment redistribution and net portfolio contribution.
Before and after a new store opens: catchment redistribution and net portfolio contribution.
When the CFO sees those four numbers per candidate, and the recalibration cadence on top, the discussion shifts. It is no longer about whether to trust the forecast. It is about which sites in the candidate list contribute most to the three-year plan. That is the right conversation for a lease committee.

Four failure modes that quietly kill the model

These are the patterns we see when we audit existing gravity models. Any one of them is enough to break the forecast quietly.

Cannibalization is treated as a separate adjustment, not a model output. The team builds a clean gravity model for the candidate. Then they apply a flat 5% cannibalization haircut, because someone has read somewhere that 5% is a reasonable assumption. The actual cannibalization in a dense network can be 20 to 30% on the closest existing store. The portfolio impact is overstated, sometimes by more than the contribution of the new store itself.

Income and population data are five years old. A 200-store grocery hypermarket chain we reviewed had a calibration baseline tied to wages from a single historical year, never updated. The model still ran, the numbers still looked plausible, but the relative income ranking across districts had shifted enough that the model was systematically over-recommending the wealthier zones from the baseline year and under-recommending the new growth corridors. Refresh income and population data every two years, minimum. Three years is the upper limit before you should rebuild the calibration.

Competitor count instead of competitor catchment overlap. A district has eight electronics retailers. The model uses that count, weighted by some heuristic. But seven of those eight are in a part of the district that does not share a catchment with your candidate. The real competitive pressure on the candidate is one, not eight. This error is what makes site selection refuse otherwise excellent locations and what makes it accept others that quietly lose share to a competitor the model never saw.

The model is never recalibrated after openings. This is the most common one. A model is built, used to support a lease decision, then archived. Eighteen months later, the store opens, six months of actuals come in, and nobody refits the model. The next lease decision uses the unchanged model, even though four stores have opened since and the calibration would have moved. By the second expansion cycle, the model is decoration.

What to do this week

If you are a CFO, COO, or strategy lead with an expansion plan coming up, three things you can do without engaging anyone external.

Pull the last twelve months of new-store revenue forecasts and the last twelve months of actuals. Compute the average absolute deviation. If it is more than 15%, your forecasting method is not defendable, and the next lease committee should know that before they vote.

Pull the same period and compute the change in revenue for stores located within 10 km of any new opening in that twelve months. The number you get is your real cannibalization rate. Compare it to whatever assumption sits in your current forecast model. If they are far apart, the portfolio impact in your three-year plan is wrong.

Ask the network planning team to write the four-input checklist on a single page: population resolution, road and pedestrian flow source, competitor catchment method, market capacity construction. If any of these answers is “we use district-level numbers,” the data layer needs to be fixed before any forecasting method is going to land within plan.

That diagnostic, done over two afternoons, will tell you whether you have a forecasting problem, a data problem, or both. The answer determines whether the next 6-week build is feasible internally, or whether the apparatus has to be built end-to-end before the next opening.

Keep reading

Enjoyed this article?

Get weekly data strategy insights delivered to your inbox.

Get in Touch

Let's Discuss Your Project

Book a 30-minute discovery call. We'll assess your data maturity and recommend the right approach — no strings attached.

Book a Discovery Call →
Need help with your data strategy? Book a Discovery Call →