Data Engineering

How to Export Apple Health History into Google Sheets

· 14 min read

The Health app on iOS quietly records about thirty metrics a day from the phone alone, and another forty if you wear an Apple Watch. After a year of normal use the dataset is on the order of two to three million records, all sitting in a single XML file that nobody outside Apple ever expects you to read. This is a guide to pulling that file out of the phone, parsing it into something a spreadsheet can hold, and shipping the result to Google Sheets so you can actually look at it.

In This Article

  1. Step one: getting the file off the phone
  2. Step two: parse the XML into a DataFrame
  3. Step three: group correctly per metric class
  4. Step four: push to Google Sheets
  5. Making it run on a schedule
  6. What the data is actually good for
  7. When this approach is the wrong call

I built the first version of this pipeline in March 2021, mostly because I wanted to compare my resting heart rate against a long stretch of work-from-home weeks and the Health app’s built-in charts top out at one year. The script has survived four years of iOS upgrades with minor changes, which makes it more durable than most of the data infrastructure I have written for paying clients. The pattern below is what it ended up looking like by 2026.

Step one: getting the file off the phone

Open the Health app. Tap your profile photo in the top right. Scroll to the bottom and tap Export All Health Data. The app spends thirty seconds to a few minutes compressing the archive (longer if you have years of Watch data) and then offers a standard iOS share sheet.

The archive is a .zip containing two files that matter: export.xml and export_cda.xml. The first is the dump of every recorded sample. The second is the Clinical Document Architecture format, which only matters if you have medical records imported from a US health provider, and we are ignoring it here.

Move the zip to your laptop (AirDrop, email, the cloud locker of your choice). Unzip it. Put export.xml in the same directory as the notebook you are about to write. If your archive is large (over a hundred megabytes) you may want to confirm the file integrity with wc -l export.xml before parsing; if the line count is suspiciously small the export was truncated by the share sheet and you need to re-export.

b23-pipeline
b23-pipeline

Step two: parse the XML into a DataFrame

The XML schema is one element per measurement, with attributes for the metric type, unit, value, and three timestamps (creation date, start date, end date). For most metrics the start and end timestamps are within a second of each other; for sleep, mindful minutes, and workout records they bracket a real-world interval.

The parser walks the tree once and extracts six attributes per record into parallel arrays. This is faster than building a list of dicts and lets you go straight to a pandas DataFrame.

from xml.etree import ElementTree
import pandas as pd

tree = ElementTree.parse('export.xml')
root = tree.getroot()
records = root.findall('Record')

records_dict = {
    'type': [],
    'unit': [],
    'creationDate': [],
    'startDate': [],
    'endDate': [],
    'value': [],
}

for record in records:
    for attribute in records_dict.keys():
        records_dict[attribute].append(record.get(attribute))

On a typical year of phone-plus-watch data this loop runs in about thirty seconds on a M2 MacBook. The XML file is the bottleneck; the rest of the pipeline finishes in under five seconds.

The metric type identifiers Apple uses are not human-readable. HKQuantityTypeIdentifierStepCount is steps; HKQuantityTypeIdentifierActiveEnergyBurned is active calories. Build a translation dictionary for the metrics you care about. The list below covers the twelve I have found worth the column space in 2026 (Apple has added quite a few since 2021, including HRV, blood oxygen, and walking stability).

types_dict = {
    'HKCategoryTypeIdentifierMindfulSession': 'Mindful Session',
    'HKQuantityTypeIdentifierDistanceCycling': 'Cycling Distance',
    'HKQuantityTypeIdentifierDistanceSwimming': 'Swimming Distance',
    'HKQuantityTypeIdentifierDistanceWalkingRunning': 'Walking + Running Distance',
    'HKQuantityTypeIdentifierFlightsClimbed': 'Flights Climbed',
    'HKQuantityTypeIdentifierHeartRate': 'Heart Rate',
    'HKQuantityTypeIdentifierRestingHeartRate': 'Resting Heart Rate',
    'HKQuantityTypeIdentifierStepCount': 'Steps',
    'HKQuantityTypeIdentifierActiveEnergyBurned': 'Active Calories',
    'HKQuantityTypeIdentifierBasalEnergyBurned': 'Resting Calories',
    'HKQuantityTypeIdentifierWalkingHeartRateAverage': 'Walking Heart Rate Average',
    'HKQuantityTypeIdentifierHeartRateVariabilitySDNN': 'Heart Rate Variability',
}

Mindful Session is the odd one out: its value attribute is empty and the duration is encoded in endDate - startDate. You handle it with a small helper.

def td_to_minutes(td):
    return (td.days * 86400 + td.seconds) // 60

The DataFrame transformations are routine: rename columns, filter to the twelve known types, cast types, drop the rows you do not care about.

df = pd.DataFrame(records_dict)
df.columns = ['type', 'unit', 'date', 'start', 'end', 'value']
df = df[df['type'].isin(types_dict.keys())]
df['value'] = pd.to_numeric(df['value'], errors='coerce')
df['date'] = pd.to_datetime(df['date'], utc=True).dt.tz_convert('US/Pacific').dt.date
df['start'] = pd.to_datetime(df['start'], utc=True)
df['end'] = pd.to_datetime(df['end'], utc=True)

The timezone conversion is important. Apple stores everything in UTC, but a “day” of steps means the day in your local timezone, and a step recorded at 2 AM UTC in San Francisco belongs to the previous calendar day. Pick your timezone explicitly; do not let dt.date infer it from the parser default.

Step three: group correctly per metric class

This is the step where naive implementations go wrong. The twelve metrics split into three groups based on how they should be aggregated to a daily value.

b23-schema
b23-schema
Rate metrics (heart rate, walking heart rate average) have a unit of count/min. The right daily aggregation is the mean of all samples in the day, not the sum. Summing 1,440 heart rate samples gives you a number with no physical meaning.

Session metrics (mindful session) need their durations summed across the day, with the duration computed from end - start.

Cumulative metrics (steps, calories, distance, flights) sum cleanly across samples. Apple records steps in batches (the watch may write a single record for several minutes of activity), so the sum across samples is the day’s total.

rate = df[df['unit'] == 'count/min']
rate = rate.groupby(['date', 'type'], as_index=False).agg(value=('value', 'mean'))

sessions = df[df['type'] == 'HKCategoryTypeIdentifierMindfulSession'].copy()
sessions['value'] = (sessions['end'] - sessions['start']).map(td_to_minutes)
sessions = sessions.groupby(['date', 'type'], as_index=False).agg(value=('value', 'sum'))

cumulative = df[
    (df['unit'] != 'count/min') &
    (df['type'] != 'HKCategoryTypeIdentifierMindfulSession')
]
cumulative = cumulative.groupby(['date', 'type'], as_index=False).agg(value=('value', 'sum'))

daily = pd.concat([rate, sessions, cumulative], ignore_index=True)
daily['type'] = daily['type'].map(types_dict)

Pivot from long to wide and round to two decimals. The result is one row per calendar day, twelve columns of metrics, one column of date.

result = (
    daily
    .pivot(index='date', columns='type', values='value')
    .reset_index()
    .round(2)
    .sort_values('date')
)

A quick sanity check before you push: result['Steps'].sum() against the Health app’s lifetime step total. If they are within five percent you are good. The five percent slack comes from records that arrived after the export was generated and from a handful of metric types Apple has changed how they store between iOS versions.

Step four: push to Google Sheets

The cleanest path is gspread with a Google Cloud service account. Create a project in the Google Cloud console, enable the Sheets API and Drive API, create a service account, download the JSON key, and share the destination sheet with the service account’s email address. This used to be a fifteen-step ceremony; as of 2026 the console flow is a single page.

import gspread
from gspread_dataframe import set_with_dataframe

gc = gspread.service_account(filename='service-account.json')
sh = gc.open_by_key('1AbCdEfGhIjKlMnOpQrStUvWxYz0123456789AbCdEfG')
worksheet = sh.get_worksheet(0)
worksheet.clear()
set_with_dataframe(worksheet, result, include_index=False, resize=True)

set_with_dataframe writes the whole frame in one API call, which keeps you well under Google’s per-minute write quota even for several years of history. worksheet.clear() before the write is what makes the pipeline rerunnable; without it, a shorter dataset on the second run leaves stale rows from the first.

Making it run on a schedule

The one-shot version above is fine for a quarterly look. If you want the data fresh every week, the iOS Shortcuts app on iOS 17 and later can run a “Export Health Data” automation on a schedule and drop the archive in iCloud Drive. A macOS launchd job picks it up, runs the parser, and rewrites the sheet. The whole thing fits in about eighty lines of Python and one Shortcut.

A few things break this loop in practice. iOS occasionally fails the export silently (the Shortcut returns success but the file is one kilobyte); add a size check and alert on it. Apple has changed the XML schema with new metric types in roughly half the major iOS releases; rerun the metric inventory once a year to catch the additions. The service account’s JSON key never expires by default, but Google sometimes flags it for rotation on accounts that have not used it in over a year; rotate it once before you forget how it was created.

b23-result
b23-result

What the data is actually good for

The unromantic answer: showing you, six months later, that the period when you felt productive was also the period when you slept seven hours a night and walked eight thousand steps a day. The cause-and-effect is not magic; it is the chart that makes you believe the cause-and-effect.

I have used the output for resting-heart-rate trend tracking against caffeine periods, for HRV against sleep duration, for step counts against work travel, and for mindful minutes against (frankly) whether I bothered to open the Calm app that month. The pipeline is overkill for any single one of these questions. It pays for itself when you have three or four of them at once and want to look at the same time series from different angles without re-exporting every time.

The data also makes a credible input to a personal data project if that is your thing: feed the daily frame into a notebook with matplotlib and seaborn, or hand it off to a small dashboard tool, or pipe it through a forecasting model. None of those steps need the XML; they only need the clean daily frame.

When this approach is the wrong call

If you only need a single number (“how many steps did I average last month?”) the Health app’s built-in summary view is faster. If you need real-time data (heart rate during a workout, blood oxygen during a flight), the Health export is wrong tool entirely; you want HealthKit on the device or a third-party app that streams the data live. If you need data from a non-iPhone family member, ask them to do their own export; there is no good way to merge two phones into one pipeline without writing code that nobody else can run.

Otherwise, this is the cheapest path I have found from “Apple has my data” to “I can plot my data.” About eighty lines of code, one service account, one iOS button, and one spreadsheet. The first time you build it takes a Saturday afternoon. After that, you have a personal data warehouse that grows by a few thousand rows a week and never sends a bill.

Keep reading

Enjoyed this article?

Get weekly data strategy insights delivered to your inbox.

Get in Touch

Let's Discuss Your Project

Book a 30-minute discovery call. We'll assess your data maturity and recommend the right approach — no strings attached.

Book a Discovery Call →
Need help with your data strategy? Book a Discovery Call →