Data Engineering

Pulling YouTube Channel Analytics with the Data API and Python

· 14 min read

The YouTube Data API is the cheapest way to pull a structured table of every video on a public channel, with view counts, like counts, durations, and a half-dozen other fields, into a notebook. It does not require the channel owner’s credentials, it is free up to a generous quota, and the Python client library hides all the OAuth ceremony unless you actually need it.

In This Article

  1. What the API gives you (and what it does not)
  2. Step one: get an API key
  3. Step two: understand the quota before you call anything
  4. Step three: find the channels
  5. Step four: enrich each channel with statistics
  6. Step five: walk each channel’s uploads
  7. Step six: load to pandas and ask questions
  8. Authentication for the things you can do as the owner
  9. When this approach is the wrong call

This is a tutorial for the case I run into most often: you want to benchmark a creator (yours, a competitor’s, a category leader’s) against the rest of the field and you want the data in pandas, not in a browser tab. We will pull channels matching a search query, fetch each channel’s headline statistics, walk down to the video level, and end with a DataFrame you can group, plot, or export. The whole thing fits in about a hundred lines of Python and one Google Cloud project.

The pattern below is the version I ended up with after five years of small revisions, including the moments when Google quietly broke things (dislikes vanished in late 2021, the default quota tightened twice, the OAuth consent screen became less forgiving). It is current as of 2026.

What the API gives you (and what it does not)

There are actually two YouTube APIs and they are easy to confuse. The Data API v3 returns public information about any channel, video, playlist, or comment. Anyone with an API key can call it. The Analytics API v2 returns private information about your own channels: watch time, audience demographics, traffic sources, retention curves. You authenticate as the channel owner with OAuth. There is no overlap. If you need watch time on someone else’s channel, the answer is that you cannot have it.

b22-twoapis
b22-twoapis
This article is about the Data API. The Analytics API is a different tool with a different shape and deserves its own piece.

Step one: get an API key

Open the Google Cloud console. Create a new project (or reuse one). Enable the YouTube Data API v3 from the API library. Go to Credentials, create an API key, restrict it to the YouTube Data API. Copy the key into a .env file or a secret manager. Do not commit it.

The whole flow takes about three minutes if you have done it before, ten if it is your first time. There is no billing required for the free tier.

import os
import googleapiclient.discovery
import pandas as pd

API_KEY = os.environ['YOUTUBE_API_KEY']
youtube = googleapiclient.discovery.build(
    'youtube', 'v3', developerKey=API_KEY,
)

The youtube object is the only client you need. It exposes resources (search, channels, videos, playlistItems, commentThreads, captions) and each resource exposes methods (list, insert, update, delete). For read-only analytics work, list is the only verb you will use.

Step two: understand the quota before you call anything

Every Data API request costs a number of units against a per-project daily budget. The default budget is 10,000 units per day. The cheap calls (most list reads) cost 1 unit each. The expensive calls are:

  • search.list costs 100 units
  • videos.insert (uploading) costs 1,600 units
  • captions.insert costs 400 units

A pull that searches for thirty channels and reads every video on each is dominated by the search. You burn 100 units to find them and another 1 unit per video to enrich them. A typical channel has 100 to 1,000 uploads. Three or four full pulls per day is the practical ceiling on the default budget.

b22-quota
b22-quota
If you need more, the console has a quota increase form. Google asks for your use case and turnaround time runs from a few days to a few weeks. In my experience the bar is low for analytics, demos, and academic work, and high for anything that smells like scraping search results to feed a downstream product.

Step three: find the channels

search.list is a single call that returns a page of search results. Set type='channel' to limit it to channels (it can also return videos and playlists). The relevance language hints (relevanceLanguage) and region (regionCode) bias results toward a market without strictly filtering.

request = youtube.search().list(
    part='snippet',
    q='data engineering',
    type='channel',
    maxResults=25,
    relevanceLanguage='en',
    regionCode='US',
)
response = request.execute()

channels = [
    {
        'channel_id': item['snippet']['channelId'],
        'title': item['snippet']['channelTitle'],
        'description': item['snippet']['description'],
        'published_at': item['snippet']['publishedAt'],
    }
    for item in response['items']
]

maxResults caps at 50 per page. To get more, use the pageToken returned in the response and call again. The 100-unit cost is per call, not per result, so paging to 200 results costs 400 units across four calls.

The search ranking is opaque and biased toward channels with recent activity, large subscriber counts, and content matching the query terms in the title or description. Treat the result as a candidate pool, not a ranked truth.

Step four: enrich each channel with statistics

channels.list accepts up to 50 channel IDs in a single comma-joined call and returns the parts you ask for. The statistics part has subscriberCount, videoCount, viewCount, and hiddenSubscriberCount. The contentDetails part has the relatedPlaylists.uploads ID, which is the magic that lets you walk every video on the channel without paginating through search results.

ids = ','.join(c['channel_id'] for c in channels)
request = youtube.channels().list(
    part='snippet,statistics,contentDetails',
    id=ids,
    maxResults=50,
)
response = request.execute()

channel_stats = {}
for item in response['items']:
    cid = item['id']
    channel_stats[cid] = {
        'subscribers': int(item['statistics'].get('subscriberCount', 0)),
        'videos': int(item['statistics'].get('videoCount', 0)),
        'views': int(item['statistics'].get('viewCount', 0)),
        'country': item['snippet'].get('country'),
        'uploads_playlist': item['contentDetails']['relatedPlaylists']['uploads'],
    }

A few traps. subscriberCount is rounded to three significant figures on the API as of the 2019 change, so a channel with 12,345 subscribers shows as 12,300. If the channel has hidden their subscriber count, subscriberCount is missing entirely and hiddenSubscriberCount is true. The country field is optional and missing on perhaps a third of channels in practice.

Step five: walk each channel’s uploads

playlistItems.list against the uploads playlist gives you every video on a channel, fifty at a time, in reverse chronological order. The video ID is the only field worth keeping at this stage.

def list_video_ids(playlist_id):
    ids = []
    page_token = None
    while True:
        r = youtube.playlistItems().list(
            part='contentDetails',
            playlistId=playlist_id,
            maxResults=50,
            pageToken=page_token,
        ).execute()
        ids.extend(item['contentDetails']['videoId'] for item in r['items'])
        page_token = r.get('nextPageToken')
        if not page_token:
            break
    return ids

b22-graph
b22-graph
Then batch the video IDs (fifty per call again) and call videos.list with the parts you care about.

def fetch_videos(video_ids):
    rows = []
    for i in range(0, len(video_ids), 50):
        chunk = ','.join(video_ids[i:i+50])
        r = youtube.videos().list(
            part='snippet,statistics,contentDetails',
            id=chunk,
        ).execute()
        for item in r['items']:
            rows.append({
                'video_id': item['id'],
                'channel_id': item['snippet']['channelId'],
                'published_at': item['snippet']['publishedAt'],
                'title': item['snippet']['title'],
                'duration': item['contentDetails']['duration'],
                'definition': item['contentDetails']['definition'],
                'caption': item['contentDetails']['caption'] == 'true',
                'views': int(item['statistics'].get('viewCount', 0)),
                'likes': int(item['statistics'].get('likeCount', 0)),
                'comments': int(item['statistics'].get('commentCount', 0)),
            })
    return rows

Two changes worth knowing about. Dislikes are gone. YouTube removed the public dislike count in December 2021. The dislikeCount field still exists in the schema but it returns 0 for any caller who is not the channel owner. If you see code online that pulls dislikes, it is either pre-2021 or assumes owner credentials.

Comments can be off. A creator can disable comments on a video, on a channel, or by default for kid-targeted content. When comments are off, commentCount is missing rather than zero. Use statistics.get('commentCount', 0) and treat zeros with suspicion.

The duration field is an ISO 8601 string (PT4M13S for four minutes thirteen seconds). The isodate library parses it into a timedelta:

import isodate
df['duration_sec'] = df['duration'].map(lambda s: isodate.parse_duration(s).total_seconds())

Step six: load to pandas and ask questions

By this point you have a flat list of dicts and a one-line conversion to a DataFrame.

df = pd.DataFrame(fetch_videos(video_ids))
df['published_at'] = pd.to_datetime(df['published_at'])
df['year'] = df['published_at'].dt.year
df['engagement_rate'] = (df['likes'] + df['comments']) / df['views'].clip(lower=1)

The questions worth asking depend on what you are benchmarking. A few that come up in client work:

  • Cadence: group by month, count uploads, plot. Is the channel accelerating or coasting?
  • Hit rate: percentile of views per video. The top decile drives the channel; the long tail is exploration. The ratio between them tells you whether the channel is one viral hit or a steady operator.
  • Format mix: parse duration_sec into short (under 60s), standard (60s to 20 min), and long (over 20 min). Track how the mix has shifted. Most channels have a discoverable inflection point where they pivoted into or out of Shorts.
  • Engagement gradient: likes per view by video age. New videos tend to have higher engagement; the cliff tells you how much of the action happens in the first week.

None of these need the Analytics API. They are all derivable from public counters.

Authentication for the things you can do as the owner

If you do own the channel and want watch-time, audience retention, or traffic source breakdowns, you need OAuth and the Analytics API. The flow is:

  1. Add OAuth 2.0 client credentials in the Cloud console (Web application or Desktop application).
  2. Download the client secret JSON.
  3. Use google-auth-oauthlib to run the local server flow, which opens a browser tab, asks you to sign in, and writes a token file.
from google_auth_oauthlib.flow import InstalledAppFlow

flow = InstalledAppFlow.from_client_secrets_file(
    'client_secret.json',
    scopes=['https://www.googleapis.com/auth/yt-analytics.readonly'],
)
creds = flow.run_local_server(port=0)

analytics = googleapiclient.discovery.build(
    'youtubeAnalytics', 'v2', credentials=creds,
)

The Analytics API has a different shape from the Data API. Instead of resource-and-list, you call reports.query with dimensions, metrics, a date range, and filters. The full reference is in the official docs and the things you can ask for are roughly: estimated minutes watched, average view duration, subscribers gained, traffic source type, device type, geography. Aggregations come pre-cooked.

When this approach is the wrong call

If you only need a quick lookup of one or two channels, the YouTube Studio dashboard and Social Blade cover most questions in five clicks. If you need historical data older than the API returns (in some cases the per-video stats are time-of-call snapshots with no history), you need a separate ingestion job that runs daily and writes to your own store. If you need watch time, retention, or traffic sources on a channel you do not own, the API does not give you that and no amount of clever querying will.

Otherwise, this is the path. About a hundred lines of Python, one API key, one DataFrame, and an afternoon of exploration. The pattern stays valid as long as the Data API does, which is to say indefinitely.

Keep reading

Enjoyed this article?

Get weekly data strategy insights delivered to your inbox.

Get in Touch

Let's Discuss Your Project

Book a 30-minute discovery call. We'll assess your data maturity and recommend the right approach — no strings attached.

Book a Discovery Call →
Need help with your data strategy? Book a Discovery Call →