Approx. read time: 24.7 min.

Post: How to Spy on Competitors with Python & Data Studio

Spying on Competitors using Python & Data Studio

Spying on competitors using Python and Google Data Studio involves several steps, including data collection, analysis, and visualization. By automating these processes, you can uncover valuable insights about your competition’s online performance and adjust your own strategy accordingly. This guide provides a comprehensive walkthrough.

1. Identify Key Competitors and Metrics

The first step is to clearly define who your competitors are and what you want to measure about them. Spend time researching and listing your main competitors – these could be direct business rivals or simply other websites ranking well for your target keywords. Next, decide on the key metrics you want to track. Here are some examples of metrics for competitor analysis:

  • Website Traffic: Estimate their website visits or page views (using tools like SimilarWeb or traffic analytics).
  • SEO Rankings: Track how competitors rank for important keywords that you are also targeting.
  • Backlinks: Observe the quantity and quality of sites linking to your competitors (an indicator of authority).
  • Content Output: Note how often they publish new content (blog posts, videos, etc.) and what topics they cover.
  • Social Media Engagement: Monitor followers, likes, shares, and comments on platforms like Twitter, Facebook, Instagram, and LinkedIn.

Choosing the right metrics will depend on your goals. For example, if SEO is your primary concern, focus on rankings, backlinks, and content. If you’re interested in overall marketing, you might also track social media and traffic metrics. Having clear metrics helps focus your data collection efforts.

2. Data Collection with Python

Once you know what to track, you can start collecting data. Python is a versatile programming language that excels at data gathering tasks. You can use various Python libraries and APIs to fetch competitor data programmatically. Below are some common methods:

  • Web Scraping: Use libraries like Beautiful Soup or Scrapy to scrape public information from competitor websites. This could include scraping their website for content (e.g., titles of recent articles) or extracting specific data like product prices, etc., that are publicly visible.
  • Official APIs: Many platforms provide APIs to access data in a structured way. For example, you can use social media APIs such as the Twitter API or the Facebook Graph API to gather data on a competitor’s social media performance. These typically require signing up for developer access and using API keys, but they provide reliable data (likes, shares, followers, etc.) in compliance with platform rules.
  • SEO Tool APIs: Consider using APIs from SEO analytics tools like Ahrefs, SEMrush, or Moz. These services often offer data on keyword rankings, backlinks, domain authority, and more. For instance, Ahrefs and SEMrush have API endpoints (available to subscribers) that a Python script can query to get competitor keyword rankings or backlink profiles. This can save a lot of manual work if you have access to those tools.

Example: Let’s say you want to track a competitor’s social media following as one of your metrics. We can write a simple Python script to retrieve, for example, the follower count of a competitor’s Instagram account. We’ll use the requests library to fetch the page and BeautifulSoup to parse the HTML for the follower information. (Note: Be mindful of each platform’s terms of service. Scraping certain social media sites may violate their rules, so it’s best to use official APIs when possible.)

import requests
from bs4 import BeautifulSoup

# Target the Instagram page of the competitor (replace 'competitorusername' with the actual username)
url = "https://www.instagram.com/competitorusername/"
response = requests.get(url, headers={"User-Agent": "Mozilla/5.0"})
soup = BeautifulSoup(response.text, "html.parser")

# Extract the content of the meta tag that holds the follower info
meta_tag = soup.find("meta", property="og:description")
if meta_tag:
    description = meta_tag.get("content", "")
    print(description)

In the above code, we fetch the Instagram page HTML and look for a meta tag with property="og:description". Instagram (as of now) uses that tag to store a description that includes the account’s follower count, following count, and post count. For example, the description might look like: “15k Followers, 500 Following, 120 Posts – See Instagram photos and videos from Competitor Name”. Our script simply prints out this description. In a real scenario, you might parse the numbers out of that text (using Python string methods or regex) to get the follower count as an integer for easier tracking or comparison. Always ensure that your scraping is done politely (with proper delays and within legal bounds).

This is just one example. You can similarly use Python to collect all sorts of competitor data – such as grabbing prices from competitor product pages, scraping review counts, or pulling SEO metadata from their site’s HTML. Python’s rich ecosystem of libraries makes it relatively straightforward to get the data you need.

3. Data Cleaning and Analysis

Raw data collected from different sources can be messy or unstructured. Before you draw insights, it’s important to clean and organize this data. Python’s Pandas library is excellent for data cleaning and analysis. With Pandas, you can load data into dataframes (tables) and then:

  • Remove or fill in missing values (for example, if some metrics couldn’t be retrieved for certain competitors).
  • Normalize and format data (such as converting all follower counts to numbers, removing commas or text).
  • Combine data from multiple sources. You might have one dataset from scraping and another from an API; these can be merged on a common key (e.g., competitor name).
  • Perform calculations or create new metrics. For instance, you could calculate the percentage difference between your traffic and a competitor’s traffic, or rank competitors by number of backlinks.
  • Summarize the data. You could use Pandas to group data (say, average values per competitor or totals) and identify trends. For example, if you scraped the number of blog posts each competitor published per month, you could sum those up to see who is the most active content creator.

By cleaning and structuring the data, you make it much easier to analyze and visualize later. You might also do some preliminary analysis in Python itself. For instance, you can quickly see which competitor has the highest values for a metric or use simple charts (via libraries like Matplotlib) to plot trends. However, in this article we’re focusing on using Google Data Studio for visualization, which is the next step.

4. Visualization with Google Data Studio

Google Data Studio (now known as Looker Studio) is a free tool by Google that allows you to turn your data into informative and shareable dashboards. Once you have your competitor data ready, visualizing it in Data Studio can help you and your stakeholders quickly grasp the insights. Here’s how you can use Data Studio in the process:

  • Connect Your Data: Import your cleaned dataset into Google Data Studio. One convenient way is to first upload your data to a Google Sheet, especially if you have it in a CSV or Excel format. Data Studio can connect to Google Sheets easily. (Data Studio also supports connecting to CSVs, databases, and other sources directly if needed.)
  • Create Reports: Use Data Studio’s drag-and-drop interface to create charts and tables. For example, you might create a bar chart comparing the backlink counts of all competitors, or a time series line chart showing the trend of social media followers over the past year for each competitor (if you have time-series data). You can also create a table that lists competitors alongside all the metrics you collected, sort by any metric, etc.
  • Interactive Filters: Data Studio allows you to add filters and date range selectors. For instance, if you have data on multiple competitors, you can add a dropdown filter to view one competitor at a time, or a date range filter if your data has a time component (like monthly numbers).
  • Share Insights: Once your report is built, you can easily share it via a link. This is useful if you want to provide a client or team member with an up-to-date view of competitor comparisons. Reports can be viewed with live data, so if your underlying data source (like the Google Sheet) updates, the report will reflect those changes.

Using Google Data Studio transforms raw numbers into visuals like graphs and charts, making it easier to spot patterns. For example, a chart might reveal that one competitor consistently outranks others in organic search presence, or that another competitor has had a sudden spike in social media activity.

5. Best Practices for Competitor Analysis with Python

Before we dive into a practical example, keep in mind some best practices when “spying” on competitors using Python and Data Studio:

  • Automation: One advantage of using Python is the ability to automate repetitive tasks. Consider scheduling your data collection scripts to run at regular intervals (e.g., weekly or monthly) using task schedulers or cron jobs. This way, you can maintain an up-to-date dataset on your competitors without manual effort each time.
  • Compliance & Ethics: Always ensure your methods comply with legal regulations and terms of service. Scrape only publicly available data and respect robots.txt rules. Avoid any intrusive or unethical data gathering. If an official API is available and within your means, use it instead of scraping HTML, as this is more reliable and often allowed by the platform.
  • Data Validation: Double-check the data you collect. Scripts can break if a website changes its layout or an API’s response format changes. Validate that the numbers you’re seeing make sense (e.g., no sudden drops to zero unless that truly happened, no duplicate entries, etc.). This will ensure you base your analysis on correct information.
  • Continuous Monitoring: Competitor analysis isn’t a one-time task. Industries change, and competitors adjust their strategies. Set up an ongoing process to monitor changes over time. By continuously tracking, you can catch important shifts — for example, if a competitor suddenly gains a lot of backlinks or significantly improves their site’s speed and SEO, you’d want to know promptly.

Case Study: Using Python & Data Studio to Investigate Keyword Ranking Drops

Now, let’s walk through a practical example of competitor spying using Python and Data Studio. In this scenario, imagine you notice that some of your important keyword rankings have dropped significantly in a recent update. You want to investigate not just what happened to your site, but also understand how your competitors are performing for those keywords. In other words, you need to see which competitor pages are now ranking, and glean insights from them.

To do this efficiently, we’ll use a Python script to gather Google search results for a set of keywords, and then use Data Studio to visualize which competitors and pages show up most frequently. This approach can quickly highlight what content is dominating the results and may explain why your rankings fell (perhaps competitors have strong content that you need to match or outperform).

Why Python for SEO Analysis?

Python is an incredibly powerful programming language that can do just about anything when it comes to data. One of its most common uses in SEO is automating tedious or large-scale tasks. For instance, instead of manually checking where each competitor ranks for 50 different keywords, you could write a Python script to fetch that information automatically.

Another benefit of Python is the variety of ways to accomplish the same task. There are multiple libraries and techniques for web scraping, API calls, data parsing, etc. This means if one approach doesn’t work (or stops working, as things change), there’s usually an alternative available. However, it also means there’s a learning curve – you might need to try different libraries or debug scripts when things aren’t functioning as expected. The key is to have a clear idea of what you want to achieve; if you can imagine a data-related task, chances are high that Python can be used to automate it.

What Our Specific Script Does (and Doesn’t Do)

It’s important to clarify the purpose and limitations of the Python script we’ll use in this case study. Most SEO ranking tools (like SERP trackers) will give you an average ranking or track changes over time for a keyword. Our script, by contrast, performs a real-time crawl for the keywords at the moment you run it, using your computer’s IP address to query Google. It’s a snapshot, not a continuous tracker.

The goal of this script is to solve a specific problem: identify the top-ranking pages (especially competitors’ pages) for each of the keywords that dropped in ranking. Traditional keyword rank trackers might tell you that your page dropped from rank 3 to rank 7, but they won’t always tell you which competitor’s page replaced you in those higher spots. Knowing what pages have risen can be crucial – those pages hold clues about what content or SEO optimizations are currently favored by Google for that keyword.

What this script doesn’t do is monitor rankings over time or provide an ongoing history of a keyword’s performance. It’s not meant to replace a full-fledged rank tracking tool. Instead, consider it a quick investigative tool to fetch fresh data when you notice something odd (like a sudden ranking drop). It gives you immediate insight into the competitive landscape for your keywords at that point in time.

What You’ll Need to Get Started

Before running the script, make sure you have the following:

  • Python 3 installed: You can download it from the official site. If you’re new to Python, it might help to go through the official Python tutorial or the popular free book Automate the Boring Stuff with Python to familiarize yourself with the basics.
  • A code editor or IDE: This is where you’ll write and run your Python code. The example here uses PyCharm (Community Edition), but you can use VS Code, Sublime Text, Jupyter Notebook, or even the basic IDLE that comes with Python.
  • A virtual environment (optional but recommended): It’s good practice to set up a virtual environment for your project to manage dependencies. If you haven’t set one up before, this guide can help. This step ensures that the libraries you install for this project don’t conflict with other Python projects on your system.
  • Required Python libraries: Ensure you have the following libraries available in your environment:
    • requests – for making HTTP requests (to fetch Google search results page HTML)
    • lxml – for parsing HTML (we’ll use it to parse the Google results)
    • csv – for writing out the results to a CSV file (this is part of Python’s standard library, no extra installation needed)
    • urllib.parse – also part of Python’s standard library, used to help parse URLs

    If you don’t have a library like requests or lxml installed, you can add it via pip. For example, run pip install requests lxml in your terminal.

1. Prepare a List of Keywords to Investigate

The first thing we need is a list of the keywords we want to investigate – those that experienced significant ranking drops. Let’s assume you identified several such keywords from your SEO tracking software. For this example, we will use a short list of sample keywords:

  • SEO Tips
  • Local SEO Advice
  • Learn SEO
  • Search Engine Optimization Articles
  • SEO Blog
  • SEO Basics

You might have more or fewer keywords depending on your situation. Create a plain text file called searches.txt and put each keyword on a separate line in that file, exactly as you would type them into Google. Our script will read from this file to know what to search for.

Disclaimer: Be cautious with the number of queries you run in one go. Automating too many Google searches rapidly can lead to your IP address being temporarily flagged or blocked by Google for sending automated traffic. In our example we only have a few keywords. If you plan to run dozens or hundreds of queries, you should implement delays between searches and possibly use an official API or scraping service that abides by Google’s terms. Always use such scripts responsibly.

2. Run the Python Ranking Script

Now it’s time to write and run the Python script that will perform the searches and collect the results. The high-level flow of the script is as follows:

  • Open the searches.txt file and read the keywords one by one.
  • For each keyword, send a request to Google Search and retrieve the HTML of the first page of results.
  • Parse the HTML to extract the titles and URLs of the search results on that page.
  • Write those results into a CSV file (let’s call it data.csv) along with the keyword they correspond to.

By the end of the run, we’ll have a CSV file that lists each keyword, and for each keyword, the URLs and titles of the top results (which will include competitor pages). Here’s the code broken into steps:

Step 2a: Import necessary modules

from urllib.parse import urlparse, parse_qs
from lxml.html import fromstring
import requests
import csv

We import urlparse and parse_qs from Python’s urllib.parse module to help with URL handling, fromstring from lxml.html to parse the HTML content, requests to fetch the HTML from Google, and csv to write out the results.

Step 2b: Define the scraping function

def scrape_run():
    # Open the file containing our search queries (keywords)
    with open('searches.txt', 'r') as searches:
        for query in searches:
            user_query = query.strip()
            if not user_query:
                continue  # skip any empty lines
            # Fetch the Google search result page for the query
            response = requests.get(
                "https://www.google.com/search?q=" + requests.utils.requote_uri(user_query),
                headers={"User-Agent": "Mozilla/5.0"}
            )
            html_content = response.text
            # Parse the HTML content
            page = fromstring(html_content)
            # Google search result links (CSS selector might need updating if Google changes layout)
            links = page.cssselect('.r a')
            # Prepare to append results to CSV file
            csv_filename = 'data.csv'
            for result in links:
                raw_url = result.get('href')           # the URL found in search result link
                title = result.text_content().strip()  # the title text of the search result
                if raw_url.startswith("/url?"):
                    # Google uses a redirect URL. Extract the actual target URL from the query params.
                    parsed_url = urlparse(raw_url)
                    query_params = parse_qs(parsed_url.query)
                    actual_url = query_params.get('q', [None])[0]
                    if actual_url is None:
                        actual_url = raw_url
                else:
                    # The URL is not in the redirected format (could be a direct link)
                    actual_url = raw_url
                # Save the keyword, actual URL, and title to the CSV file
                with open(csv_filename, 'a', newline='', encoding='utf-8') as f:
                    writer = csv.writer(f)
                    writer.writerow([user_query, actual_url, title])

Let’s break down a few important parts of the above code:

  • We strip each query to remove newline characters. If the line is empty, we skip it (to avoid sending empty queries to Google).
  • We construct the Google search URL by appending the query onto https://www.google.com/search?q=. We use requests.utils.requote_uri to ensure the query is properly URL-encoded (spaces, special characters, etc. are handled safely).
  • We set a User-Agent header in the request. This is a polite way to identify the traffic as a browser (in this case, a generic Mozilla string). Google might block requests that have no User-Agent or look like a bot by default. Using a common User-Agent helps us get the results. Keep in mind, scraping Google directly like this can still potentially get blocked if done aggressively.
  • We parse the returned HTML content with lxml. We then use a CSS selector .r a to find result links. (In Google’s HTML for search results, each result title link used to be within an element with class r; this could change over time, so this selector might need updating in the future.) Each result link element (result) has an href attribute and text content (the clickable title).
  • Google often doesn’t list the raw URL directly in the href; instead it uses a redirect URL like /url?q=http://actual-target.com&sa=U&ved=.... That’s why we check if raw_url starts with /url?. If it does, we parse it and extract the real URL from the q parameter. If it doesn’t (for instance, for some Google features or direct links), we take it as is.
  • Finally, we open (or create) data.csv in append mode and write a row containing the keyword, the extracted URL, and the title of the result. We do this for each result on the page.

After the loop finishes, our CSV file will contain an entry for every search result for every keyword in searches.txt. We might get multiple rows for the same keyword (since a search yields multiple results). Essentially, the CSV will look like:

Keyword, URL, Title
SEO Tips, http://example.com/page1, Title of Page 1
SEO Tips, http://example2.com/pageA, Title of Page A
Local SEO Advice, http://example3.com/pageX, Title of Page X
... etc.

You can open this CSV in Excel or a text editor to verify the data after running the script.

Step 2c: Run the function to perform the scrape

scrape_run()

When you execute scrape_run(), the script will go through all the keywords and populate data.csv with the results. Depending on the number of keywords and your internet connection, this could take a little time. For our small example list, it should finish in just a few seconds.

Note: As mentioned, scraping Google directly can trigger anti-robot measures if you do it on a large scale. If you plan to use this approach for a lot of keywords regularly, consider using the Google Custom Search API (which requires an API key and has usage limits), or third-party SERP API services that are designed for this purpose. For one-off analysis of a moderate list of keywords, the above script can work fine, especially if you use delays or split the list to be safe.

3. Use Data Studio to Analyze the Results

Now that the script has run, you should have a file data.csv containing your raw results. The next step is to turn this raw data into something insightful. Google Data Studio will help us do that by creating a dashboard of charts.

You can manually import the CSV into Google Data Studio, but an easy method is to first upload the CSV to Google Sheets (or copy its contents into a Google Sheet). Once the data is in a Google Sheet, you can use it as a data source in Data Studio with just a few clicks.

There’s actually a ready-made Data Studio template for this kind of analysis, provided by Cardinal Digital Marketing (it was featured in an article on Search Engine Journal). You can find it here: Python Rank Investigation Data Studio template. The template is essentially a report that expects your data to be in a certain format (which matches the CSV we generated). Here’s how to use it:

  1. Click the template link and follow the instructions to make a copy of the associated Google Sheet and Data Studio report to your own Google account. (Usually, you’ll need to log in, then select to make a copy so that you can edit and use it with your data.)
  2. In the copied Google Sheet, you’ll find a sample dataset. Replace that sample data with the data from your data.csv — essentially, you can copy-paste your CSV content into the sheet, making sure the columns align (Keyword, URL, Title).
  3. Go to the copied Data Studio report. It might prompt you to select a data source; point it to the Google Sheet that now contains your data. If it’s already connected to the copied sheet, just refresh the data.

Once your Data Studio report is set up with your data, it will generate visualizations automatically (assuming you used the template). Typically, the template includes charts like:

  • Occurrences by Competitor Domain: A bar chart or table showing which domains (competitors) appear most frequently across all the top results for your keywords. This gives a sense of who your top competitors are in aggregate.
  • Occurrences by Page: Another chart listing the specific URLs (pages) that appear multiple times across the keyword set. This helps identify if a particular blog post or resource page from a competitor is ranking for many of your target keywords.
  • Filters for drilling down: Controls that let you filter the report by a specific keyword or a specific competitor, so you can focus on one at a time if needed.
Google Data Studio report showing competitor domains by rank frequency
Figure: A sample Data Studio dashboard (from Cardinal Digital Marketing’s template) showing the number of organic occurrences by company (domain) for each rank position in Google results.

In the above example dashboard (based on our sample data), you can see a chart at the top titled “Number of Organic Occurrences by Company by Rank”. Each colored segment represents a Google search rank position (1 through 10), and each group of bars represents a competitor’s domain. For instance, it shows that moz.com and ahrefs.com appear in the search results for our keywords multiple times (with bars indicating how often they appear at rank 1, rank 2, etc.). This tells us that Moz and Ahrefs are two dominant competitors across the set of keywords we checked.

However, just knowing the domains isn’t enough to form a strategy. We need to know which pages on those domains are ranking. That’s where the next chart comes in handy. Typically, the second chart in the template (often titled “Number of Organic Occurrences by Page by Rank”) will list the top-ranking URLs themselves.

Filter controls for Root URL, Page, Keyword in Data Studio
Figure: Filter controls in the Data Studio report let you zero in on a specific competitor (Root URL), a specific page, or a particular keyword.

The screenshot above shows filter dropdowns for Root URL, Page, and Keyword. Using these, you could, for example, select one competitor’s domain (Root URL) to see all the keywords for which they have a top 10 ranking, or select one keyword to see which domains/pages rank for it in the snapshot data.

By analyzing the Data Studio report, you might discover insights such as:

  • A specific competitor has multiple high-ranking pages for many of your keywords. This competitor might be a major threat or a leader in your niche, so studying their content strategy could be beneficial.
  • Certain pages (either on your site or competitor sites) rank for numerous keywords. Perhaps a competitor’s comprehensive guide or resource is capturing many keywords – you might need to create or improve a similar cornerstone piece of content.
  • Maybe you’ll find that for some keywords that dropped, the top results are now populated by a certain type of content (for example, videos, forums, or a Google-featured snippet). This could indicate a shift in what Google considers relevant for that query, and you may need to adjust your approach (like creating video content or optimizing for featured snippets).

Once you identify the top competitor pages from the Data Studio analysis, you can go directly to those pages and do a deeper evaluation. Ask questions like: What is the content like on this page? Is it longer or more up-to-date than mine? Does it target the keyword in the title and headings? How is the user experience? Does it have lots of backlinks? By understanding why those pages perform well, you can strategize how to improve your own content or SEO to compete better.

Getting Stuck?

Don’t be discouraged if some of this seems complex. Implementing a Python-based competitor analysis for the first time can be challenging. If you’re getting stuck at any point – whether it’s with writing the Python code, setting up the environment, or configuring the Data Studio report – help is available. You can reach out to the developer (Evan at Architek) who originally collaborated on this script for tips or even custom solutions. Additionally, feel free to contact our team at MiltonMarketing.com for free help and support with your SEO challenges. We’re happy to assist and provide guidance to get your competitor analysis up and running.

What’s Your Idea?

Hopefully, this walkthrough has sparked some creative ideas on how you can use Python to automate and enhance your SEO competitor analysis (and other digital marketing tasks). The combination of Python and data visualization tools like Data Studio is very powerful. Once you get comfortable, you can expand on these techniques – for example, tracking competitors over time, integrating other data sources (like PPC advertising data or technical SEO metrics), or even building an alert system that notifies you when a competitor makes a notable move.

The sky is the limit. Every time you find yourself wishing for data or spending time collecting information manually, think about how you might fetch and process that data with a Python script. By doing so, you’ll save time and uncover insights that give you a competitive edge. Happy spying and analyzing!

About the Author: Bernard Aybout (Virii8)

I am a dedicated technology enthusiast with over 45 years of life experience, passionate about computers, AI, emerging technologies, and their real-world impact. As the founder of my personal blog, MiltonMarketing.com, I explore how AI, health tech, engineering, finance, and other advanced fields leverage innovation—not as a replacement for human expertise, but as a tool to enhance it. My focus is on bridging the gap between cutting-edge technology and practical applications, ensuring ethical, responsible, and transformative use across industries. MiltonMarketing.com is more than just a tech blog—it's a growing platform for expert insights. We welcome qualified writers and industry professionals from IT, AI, healthcare, engineering, HVAC, automotive, finance, and beyond to contribute their knowledge. If you have expertise to share in how AI and technology shape industries while complementing human skills, join us in driving meaningful conversations about the future of innovation. 🚀