YouTube Analyzer Script

Overall, this script allows for efficient data retrieval and analysis of YouTube channel video view counts and cost estimation.

I wrote this script the summer of 2020 while working at a small startup company. You can see the source code for this script on my GitHub, which is linked below.

This is a Python script that pulls data from the YouTube Data API v3 using the requests library to make HTTP requests. The script pulls video data from specified YouTube channels, including the number of views for each video, and calculates the cost per mille (CPM) for the channel based on the average views per video.

The script begins by importing the necessary libraries, including sys, yaml, json, requests, Retry, HTTPAdapter, and numpy. The script uses yaml to load keys and channel IDs from external YAML files.

The s object creates a new Session object from the requests library and sets verify to False to disable SSL certificate verification. The script then disables SSL warnings for the urllib3 package and sets a retry policy for the requests library using the Retry and HTTPAdapter objects.

The script defines several functions. The getSingleID function extracts the channel ID from a YouTube channel URL. The getChannelIDs function loads the channel URLs from a YAML file and extracts the channel IDs for each URL using the getSingleID function. The getVideoIDs function uses the YouTube Data API v3 to retrieve the video IDs for a given channel ID, and the getViews function retrieves the view count for a given video ID.

The getData function writes the video data for each channel to a CSV file. The function begins by writing a header row to the CSV file, including the video view counts and the CPM for the channel. The function then loops through each channel ID, retrieves the video IDs for the channel using the getVideoIDs function, and retrieves the view count for each video using the getViews function. The function calculates the mean and standard deviation for the video view counts and removes any outliers more than two standard deviations from the mean. Finally, the function calculates the CPM for the channel using the getCPM function and writes the data to the CSV file.

The getCPM function calculates the CPM for a given set of video views. The function calculates the average view count per video and multiplies it by 20 (the assumed cost per view).

The main function calls the getData function to retrieve the video data for each channel and write it to a CSV file.

Here’s an example of what the output in the data.csv file will look like:

In this example, there are three YouTube channels listed, and for each channel, the script has retrieved data for the 10 most recent videos and calculated the average number of views for those videos. It then filters out any view counts that are more than 2 standard deviations away from the mean (in order to remove any outliers), and calculates the CPM (cost per thousand views) based on the remaining view counts. Finally, it writes all of this data to a CSV file named data.csv. The first row of the file contains the column headers (including “url” for the channel URL, “view1” through “view10” for the view counts, and “cpm” for the calculated CPM).