Grocery Price Index
Frequently Asked Questions

Explore the Grocery Price Index

We knew that it is important for the grocery industry to understand pricing trends, especially in such volatile times, and since we have the largest, most hyper-local, and most real-time data with history, we created the index to give this visibility to ourselves, our customers, and to the general public.

The biggest difference is that we created price indices not only at a national level, but at local levels as well. We know from our customers that price competition happens locally. Consumers shop for the best prices at stores that are nearby, which is a key reason why the demand for Datasembly's solutions have grown so rapidly. And we decided to create this index, which is actually a series of indices at state and major metro levels, to show pricing trends where it is most relevant to consumers.

We had a hypothesis that pricing trends will likely differ from very urban to very rural areas because of issues related to supply chain, different levels of demand, and because the competitive dynamics are different with generally fewer stores over a large area.

We also found out that the NCHS (National Center for Health Statistics) had defined a six-level urban-rural classification scheme for all U.S. counties and that we could align our store and pricing data to that scheme. This allowed us to see pricing trends across these different urban/rural segments and we found some very interesting variances that you can see within the index.

The following is detailed description of the six segments:

Large Central Metro (Inner Cities)

Counties in MSAs (metropolitan statistical area) of 1 million or more population that: 1. Contain the entire population of the largest principal city of the MSA, or 2. Have their entire population contained in the largest principal city of the MSA, or 3. Contain at least 250,000 inhabitants of any principal city of the MSA.

  • Large Fringe Metro (Suburbs)
  • Counties in MSAs of 1 million or more population that did not qualify as large central metro counties.
  • Medium Metro
  • Counties in MSAs of populations of 250,000 to 999,999.
  • Small Metro
  • Counties in MSAs of populations less than 250,000.
  • Micropolitan
  • Counties in micropolitan statistical areas.
  • Non-core (Rural)
  • Nonmetropolitan counties that did not qualify as micropolitan.

We used the United States Office of Management and Budget (OMB) definition of metropolitan statistical areas (MSA) and took the top 54 of the 392 within that list. The OMB defines a Metropolitan Statistical Area as one or more adjacent counties, or county equivalents, that have at least one urban core area of at least 50,000 population, plus adjacent territory that has a high degree of social and economic integration with the core as measured by the commuting tie.

We did an analysis of the categorization taxonomy of the top 15 grocery retailers in the country and found the most commonly used high level categories and rationalized the differences to come up with what we think is an easy-to-understand and comprehensive set of categories.

Two things to keep in mind. First, the scale of changes in most instances are smaller than they look on the graph as many of the spikes are just fractions of a percentage point. Also, we have observed that during COVID, many banners stopped putting many of their products on promotion so the week-to-week spikes are sometimes attributable to changes in promotion. You will probably notice significant changes during the holiday season in many instances and those often happen because of holiday promotions.

We can already use our customer's own set of hierarchical categories within our applications to allow them to do pricing, promotion, and assortment analysis using our data. We are currently working on a capability to create the same type of functionality you can see in our grocery pricing index, using our customer's own categories as well. We don't yet have a date for when this capability will be available.

One of the unique aspects of our algorithm is that we actually create an index for each individual product represented in the index. This means that we can aggregate those index in anyway we like (by geo, state, rural/urban segment) and we have actually already calculated indices by banner as well but don't show that in the public version.

First, we chose the products in each category that had the most coverage across stores in the United States. To get rid of the "noise" of hundreds of thousands of products we have in each category, we decided to use the top 1,000 products in terms of store coverage.

The index will be updated on a weekly basis.

  • Indices are always relative to the price of products collected in the first week of October 2019. We refer to this week as the base week.
Aggregations Calculation
  • For an individual product -> price_index = target_week.best_price / base_week.best_price. The best price is the lower of the list price or a promotion price if the product is on promotion.
  • For each store, aggregations are created for each category and the aggregation is simply the average of indices for each product in that category.
  • Banner level category aggregations are based on the Store level category aggregations
  • State level category aggregations are based on the Store level category aggregations
  • Geography level category aggregations are based on the Store level category aggregations
  • Overall category aggregations are based on the Banner level category aggregations
    Since the quantity of products is not uniform we apply weights to the aggregation calculations.
  • Since the quantity of products is not uniform we apply weights to the aggregation calculations.
Assortment Changes
  • The assortment of products for each category that represent the Grocery Pricing Index often changes. This happens at each level of aggregations differently as well since the assortment can be slightly different in different geographies.
  • There are generally two types of assortment changes that need to be accounted for:
    a. Product appears in a target week but not in the base week
    b. Product appears in a base week but not in a target week.
Appears in the target week but not in the base week
  • A best_price for that product for the base_week needs to be derived.
  • That best_price will be derived using the first week, after the base week, for which a best_price appears for that product. That week will be called new_base_week
  • That calculation is as follows for each category in each store:
    a. Find the first week, after the base week, for which a best_price appears for that product. This will be the new_base_week.
    b. Calculate the store index for that category for the new_ base_week using only products that have best_prices in the base_week and the new_base_week.
    c. Divide the best_price for the product in the new_base_week by "b" above.
    d. Use the value in "c" as the derived best_price for that product in the base_week that was missing the best_price.
Appears in the target week but not in the base week
  • This algorithm for this case is far simpler than the one above. There are no derived best prices that are actually created.
  • When a base price exists for a product but is missing in a target week, that product is just not included in the averaging of the products in that category to create the category index.