Automating Your Browser with Selenium

9 months ago

Screenshot by Dropbox Capture.png

It is frustrating how little control we have over the tools and data in our lives ...

BUT

As a developer we can often get around the lack of convenience with a bit of time, knowledge and effort.

In my case today I decided I was going to clean up my Watch Later playlist on YouTube.

My ADHD means that, just like browser bookmarks I will never get to read, my WL is full of stuff I will never watch with titles like "What's new in 2021".

There is no "nuke and start over option", but apparently using Google Takeaway you can request a download.

So I looked into the API and it SEEMED to do everything I needed.

Except they don't give you access to the Watch Later via API!

A privacy thing maybe?

Weird.

Enter Selenium!

But wait, Google blocks Selenium from logging in.

Damn.

Even with 2FA turned off?

Well the solution, as you might have already guessed, is to not automate the logging in part. This also has the nice benefit of using any passkey or authentication protection you like, and to be able to log into any Google account!

We simply tell our script to wait for active confirmation that it is safe to continue.

Another wrinkle is the infinite scroll. We might have been able to get around that using JS or CSS snooping, but checking for no changes plus a timeout seems to work.

Screenshot by Dropbox Capture.png

So here for your delight is a Python script that uses Selenium to download your Watch Later list to a CSV, I will follow up with code to then use that CSV list ...

import time
import csv
import undetected_chromedriver as uc
from selenium.webdriver.common.by import By

# Function to wait for user to log into YouTube
def wait_for_manual_login(driver):
    driver.get('https://www.youtube.com/')
    print("Please log in to your YouTube account.")
    input("After logging in, navigate to your 'Watch Later' playlist and then press Enter here to continue...")

# Function to scrape Watch Later videos
def scrape_watch_later(driver):
    driver.get('https://www.youtube.com/playlist?list=WL')
    time.sleep(2)
    
    # Scroll to the bottom of the page to load all videos
    last_height = driver.execute_script("return document.documentElement.scrollHeight")
    new_height = last_height
    start_time = time.time()
    timeout = 60  # seconds

    while True:
        driver.execute_script("window.scrollTo(0, document.documentElement.scrollHeight);")
        time.sleep(2)
        new_height = driver.execute_script("return document.documentElement.scrollHeight")
        
        if new_height == last_height:
            # Check if the timeout has been reached
            if time.time() - start_time > timeout:
                print("Reached the end of the page or timed out.")
                break
        else:
            start_time = time.time()  # Reset the timeout counter

        last_height = new_height
    
    # Extract video URLs
    video_elements = driver.find_elements(By.XPATH, '//*[@id="video-title"]')
    video_urls = [video.get_attribute('href') for video in video_elements]
    return video_urls

# Function to save URLs to a CSV file
def save_to_csv(video_urls, filename='watch_later_videos.csv'):
    with open(filename, mode='w', newline='') as file:
        writer = csv.writer(file)
        writer.writerow(['Video URL'])
        for url in video_urls:
            writer.writerow([url])

# Main function
def main():
    options = uc.ChromeOptions()
    options.add_argument("--start-maximized")  # Start the browser maximized
    
    driver = uc.Chrome(options=options)
    
    try:
        wait_for_manual_login(driver)
        video_urls = scrape_watch_later(driver)
        save_to_csv(video_urls)
        print(f"Scraped {len(video_urls)} videos. URLs saved to 'watch_later_videos.csv'.")
    finally:
        driver.quit()

if __name__ == "__main__":
    main()

technology programming python hivedevs it

240

235.568 STEM

2 comments

@douglas.life -7

6 months ago

Keep up the good work. 👏🎵

Dear beloved Hive creator,

Coding poet Gudasol here to support you sharing your art + life on Hive.

As a fellow creator, I know how hard it is to get the word out there.

I built cXc.world to help creators like us get more support from the blockchain community + beyond.

Share your music on cXc.world, and copy the Markdown for a easy post includes embedded players for Spotify, Youtube, Soundcloud.

That way, you can earn HIVE + stack streams on centralized platforms, as they do still matter.

Not a music creator? No problem. You can still use cXc.world to find + share music you love.

What's next?

Preview the next evolution of cXc, Tetra.earth.

Expose local music from your area!

We're helping grassroots musicians, and you can too by adding their music (no sign up or WAX account required).

how to add music on cXc.world

Join our community 🐬

Find fellow music lovers in cXc's Discord

Bad news: Saying see you later to Hive! 👋

We didn't get the needed support to continue cXc.world on Hive, as our DHF proposal lacked votes, but [Good News Everyone] cXc.world will add a Markdown copy button, allowing you to easily share your music + music you find on Hive.

For now, we're on WAX, with tools you can use to mint your own Music/Media NFT collection.

Curious about the future of Earth + ET relations? New economic systems?

Find more apps + art from Gudasol

Want to build tools like I used to share this?

I'd love to show you some tips on AI Code generation

0.000 STEM

@stemsocial 64

9 months ago

Thanks for your contribution to the STEMsocial community. Feel free to join us on discord to get to know the rest of us!

Please consider delegating to the @stemsocial account (85% of the curation rewards are returned).

You may also include @stemsocial as a beneficiary of the rewards of this post to get a stronger support.

0.000 STEM