How to Scrape the Apple App Store: Ratings and Reviews

June 27, 2026 11 min read

Scraping the Apple App Store with Browserbeam

You want to track how Calm, Headspace, and Insight Timer stack up on ratings. Or pull every recent review of your own app to feed a sentiment model. The data sits right there on apps.apple.com, public and free to read. Then you open the page source and find nothing useful: the App Store is now a Svelte single-page app, the class names look like svelte-j9szs8, and a plain requests.get() returns an empty shell.

The old playbook is gone too. Guides still tell you to parse an embedded JSON cache or pull reviews from Apple's RSS feed. As of mid-2026, the JSON cache is no longer in the page, and the RSS reviews feed returns an empty document for every app we tested. So we render the page like a real browser, then read the parts that stay stable across deploys.

This guide builds an App Store scraper that survives Apple's hashed class names and pulls the data people actually want.

Here's what you'll build:

  • A two-call scraper that returns any app's full page as clean markdown
  • Structured JSON extraction for app metadata (rating, developer, category, size, age rating)
  • A review miner that pulls title, star rating, author, date, and full text
  • A multi-app comparison that ranks meditation apps by rating and review sentiment
  • A fallback path using Apple's free iTunes Search and Lookup APIs
  • CSV, JSON, and SQLite exporters for the scraped data

TL;DR: App Store pages render with JavaScript, so use a browser. observe returns the whole page as markdown without touching the hashed CSS classes. For structured output, execute_js reads stable signals: semantic tags (h3, time), the aria-label on the star rating, and the [aria-labelledby^="review-"] attribute on each review card. A datacenter proxy is enough. The iTunes Lookup API covers metadata when you don't need review text.


Don't have an API key yet? Create a free Browserbeam account - you get 5,000 credits, no credit card required.

What Data the App Store Page Exposes

Every app page on apps.apple.com follows the same layout. Here's what each section gives you, and which ones you can read without logging in.

Section URL pattern Fields you get
App header /us/app/<slug>/id<APP_ID> Name, subtitle, price, in-app purchase flag, developer, category, chart rank, age rating, size, languages
Ratings summary same page Average rating (rounded, like 4.8), total rating count (abbreviated, like 41M)
Reviews same page ~10 most-helpful reviews: title, star rating, author, date, full body
Information list same page Seller, exact size, compatibility, full language list, copyright
In-app purchases same page Product names and prices (for example, "Calm Premium $69.99")
Privacy same page Data Used to Track You, Data Linked to You, Data Not Linked to You

Two limits matter. The page shows roughly ten reviews (the ones Apple ranks as most helpful), not the full history. And anything tied to an Apple ID, like personal download counts or recommendations, never appears on a public page, so it's off the table.

Everything in the table above is visible to anyone with a browser. That's exactly what we'll read.

Why the App Store Broke Your Old Scraper

Before we write code, it helps to know why the obvious approaches fail. Three things changed.

It's a JavaScript app, not a document

The App Store web frontend is a Svelte single-page application. The HTML that arrives from the server is a near-empty skeleton. The app name, ratings, and reviews all get injected by JavaScript after load. A requests + BeautifulSoup script sees the skeleton and returns nothing.

This is the same problem you hit on any modern React, Vue, or Svelte site. If you've read our guide on scraping JavaScript-heavy sites, the fix is familiar: render the page in a real browser first, then read the DOM.

The class names are randomized

Open the rendered page and inspect a review. The container looks like this:

<div class="container svelte-j9szs8" aria-labelledby="review-11994640656-title">
  <h3 class="title svelte-j9szs8">Favorite app ever</h3>
  <ol class="stars svelte-1fdd9o7" aria-label="5 Stars">...</ol>
  <time class="date svelte-j9szs8">11/26/2024</time>
  <span class="author svelte-j9szs8">l.lacx</span>
  <p class="content svelte-1b6lxg0">I personally love this app...</p>
</div>

That svelte-j9szs8 suffix is a build hash. Svelte generates it at compile time to scope styles. It changes whenever Apple ships a new build. If you write a selector like .container.svelte-j9szs8, your scraper works today and breaks next week.

The trick is to ignore the hashes and target what stays put: semantic HTML tags (h3, time, p), ARIA attributes (aria-label, aria-labelledby), and element roles. Those exist for accessibility and SEO, so Apple keeps them stable.

The embedded JSON cache is gone

Older App Store scrapers grabbed a <script> tag that held the page data as JSON (the "shoebox" or media-api cache). That tag no longer ships in the page. There is no application/ld+json block either. The data lives only in the rendered DOM now. So reading the DOM is not a shortcut, it's the only route.

Quick Start: Scrape Any App in Two Calls

Let's pull Spotify's page. Create a session on the app URL, then observe to read it as markdown. Two calls. We block images, fonts, and media because we only want text, and that cuts load time and credits.

# Step 1: Create the session
SESSION_ID=$(curl -s -X POST https://api.browserbeam.com/v1/sessions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://apps.apple.com/us/app/spotify-music-and-podcasts/id324684580",
    "proxy": { "kind": "datacenter", "country": "us" },
    "block_resources": ["image", "font", "media"]
  }' | jq -r '.session_id')

# Step 2: Read the page as markdown, then close
curl -s -X POST "https://api.browserbeam.com/v1/sessions/$SESSION_ID/act" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"steps": [{"observe": {}}, {"close": {}}]}' \
  | jq -r '.page.markdown.content'

The markdown comes back clean, hashes and all stripped away:

Spotify: Music and Podcasts

Songs, Playlists & Audiobooks

Free

-   41M Ratings 4.8
-   Age Rating 13+ In-App Controls
-   Chart #1 Music
-   Developer Spotify
-   Language EN + 62 More
-   Size 281.8 MB

...

Ratings & Reviews

-   4.8 out of 5  41M Ratings

-   ### Favorite app ever
    11/26/2024
    l.lacx
    I personally love this app. I have had it for over 4 years...

That's the whole page in one string: header facts, the rating summary, and the reviews. For a quick look or for feeding an LLM, observe is all you need. When you want columns instead of prose, move to execute_js.

The id at the end of the URL (id324684580) is the App ID. You'll reuse it everywhere, including the official APIs later. Grab it from any App Store link.

Extracting App Metadata as Structured JSON

For clean fields, run JavaScript against the rendered DOM with execute_js. We read three stable signals: the h1 for the name, the aria-label that contains "out of 5" for the rating, and the Information dl for seller, category, and size.

SESSION_ID=$(curl -s -X POST https://api.browserbeam.com/v1/sessions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://apps.apple.com/us/app/notion-notes-tasks-ai/id1232780281",
    "proxy": { "kind": "datacenter", "country": "us" },
    "block_resources": ["image", "font", "media"]
  }' | jq -r '.session_id')

curl -s -X POST "https://api.browserbeam.com/v1/sessions/$SESSION_ID/act" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"steps": [{"execute_js": {"code": "const t=(el,s)=>el.querySelector(s)?.textContent?.trim()||null; const info={}; document.querySelectorAll(\"dl > div\").forEach(r=>{const dt=t(r,\"dt\"); const dd=r.querySelector(\"dd\")?.textContent?.replace(/\\s+/g,\" \").trim(); if(dt) info[dt]=dd;}); const x=document.body.innerText; const rt=x.match(/([\\d.]+)\\s+out of 5/); const ct=x.match(/out of 5\\s+([\\d.,KM]+)\\s+Ratings/); return {name:t(document,\"h1\"), rating:rt?rt[1]:null, rating_count:ct?ct[1]:null, seller:info.Seller, category:info.Category, size:info.Size};", "result_key": "app"}}, {"close": {}}]}' \
  | jq '.extraction.app'

The result is the structured object you actually wanted:

{
  "name": "Notion: Notes, Tasks, AI",
  "rating": "4.8",
  "rating_count": "87K",
  "seller": "Notion Labs, Incorporated",
  "category": "Productivity",
  "size": "550.8 MB",
  "age_rating": "4+"
}

Notice what we never wrote: a single svelte-* class. Tags and ARIA attributes carry the whole extraction.

When the markup is too messy, use an AI selector

Sometimes the structure is awkward enough that hand-written selectors aren't worth it. Browserbeam's extract step accepts plain-English ai >> selectors that resolve to a real selector under the hood and get cached for reuse:

session.extract(
    name="h1 >> text",
    rating="ai >> the overall average star rating",
    rating_count="ai >> the total number of ratings shown",
)
print(session.extraction)
# {"name": "Notion: Notes, Tasks, AI", "rating": "4.8", "rating_count": "87K Ratings"}

This is handy on a Svelte page because you describe the value, not the DOM path. The downside is cost and a small accuracy risk, so reserve it for the fields that resist CSS. For more on extraction schemas, see the browser-to-JSON guide.

Mining Ratings and Reviews

Reviews are the prize, and they need a careful selector. Each review card carries aria-labelledby="review-<id>-title", so [aria-labelledby^="review-"] finds them all. There's one catch: the page renders a hidden detail-view clone of every card with the class is-detail-view. Skip it with :not(.is-detail-view), or every review shows up twice.

Inside each card, the fields use semantic tags: h3 for the title, ol.stars with an aria-label like "5 Stars" for the rating, time for the date, .author for the username, and p.content for the body.

SESSION_ID=$(curl -s -X POST https://api.browserbeam.com/v1/sessions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://apps.apple.com/us/app/spotify-music-and-podcasts/id324684580",
    "proxy": { "kind": "datacenter", "country": "us" },
    "block_resources": ["image", "font", "media"]
  }' | jq -r '.session_id')

curl -s -X POST "https://api.browserbeam.com/v1/sessions/$SESSION_ID/act" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"steps": [{"execute_js": {"code": "const t=(el,s)=>el.querySelector(s)?.textContent?.trim()||null; const cards=document.querySelectorAll(\"[aria-labelledby^=review-]:not(.is-detail-view)\"); return [...cards].map(c=>({title:t(c,\"h3\"), rating:c.querySelector(\"ol.stars\")?.getAttribute(\"aria-label\"), date:t(c,\"time\"), author:t(c,\".author\"), body:t(c,\"p.content\")}));", "result_key": "reviews"}}, {"close": {}}]}' \
  | jq '.extraction.reviews'

For Spotify, you get the five most-helpful reviews as structured rows:

[
  { "title": "Favorite app ever", "rating": "5 Stars", "date": "11/26/2024", "author": "l.lacx", "body": "I personally love this app. I have had it for over 4 years..." },
  { "title": "Please go back", "rating": "5 Stars", "date": "11/13/2020", "author": "Emanjeudy", "body": "I love the app but not so much the new update..." },
  { "title": "Stop changing the queue!!!!!", "rating": "5 Stars", "date": "12/29/2022", "author": "I_am_theSquirrel", "body": "Stop changing the way the queue works every single time..." },
  { "title": "Where's the functionality?", "rating": "5 Stars", "date": "09/23/2018", "author": "CoreySomers", "body": "1) I can't play JUST a playlist or JUST an album..." },
  { "title": "4 Stars but…", "rating": "4 Stars", "date": "10/01/2024", "author": "Luna Gilmore", "body": "…This is the #1 Music app I use but there's too many adds..." }
]

A few real-world quirks to plan for. The date field is not uniform: recent reviews show relative dates like "Jun 8" while older ones use "MM/DD/YYYY". Promo blocks (Apple's "Editors' Choice" blurb) sometimes sit among the reviews, but they have no time or .author, so the selector above leaves them out. And the page caps at the most-helpful reviews, so this is a sample, not the archive.

Comparing Multiple Apps

The pattern shines when you run it across a category. Here's a script that scrapes three meditation apps and ranks them. It creates a fresh session per app, extracts the rating and review count, and prints a leaderboard.

from browserbeam import Browserbeam
import time

client = Browserbeam(api_key="YOUR_API_KEY")

APPS = {
    "Insight Timer": "https://apps.apple.com/us/app/insight-timer-meditate-sleep/id337472899",
    "Calm":          "https://apps.apple.com/us/app/calm/id571800810",
    "Headspace":     "https://apps.apple.com/us/app/headspace-sleep-meditation/id493145008",
}

JS = """
const text = document.body.innerText;
const rating = text.match(/([\\d.]+)\\s+out of 5/);
const reviews = document.querySelectorAll('[aria-labelledby^="review-"]:not(.is-detail-view)');
return {
  name: document.querySelector('h1')?.textContent?.trim(),
  rating: rating ? rating[1] : null,
  sample_reviews: reviews.length,
};
"""

rows = []
for name, url in APPS.items():
    s = client.sessions.create(
        url=url,
        proxy={"kind": "datacenter", "country": "us"},
        block_resources=["image", "font", "media"],
    )
    s.execute_js(code=JS, result_key="app")
    data = s.extraction["app"]
    rows.append({"name": name, "rating": float(data["rating"]), "reviews": data["sample_reviews"]})
    s.close()
    time.sleep(1)

rows.sort(key=lambda r: r["rating"], reverse=True)
for r in rows:
    print(f"{r['name']:<16} {r['rating']}  ({r['reviews']} sample reviews)")

Running it produces a real ranking you can refresh on a schedule:

Insight Timer    4.9  (4 sample reviews)
Calm             4.8  (4 sample reviews)
Headspace        4.8  (4 sample reviews)

The page rating is rounded to one decimal, so ties are common. When you need to break them, the official Lookup API returns the exact float. That's the next section.

The Official Apple Endpoints (and Where They Fall Short)

Not every job needs a browser. Apple runs two free JSON endpoints that take no key and no account. They're the fastest path when you only need metadata.

iTunes Lookup API

Pass an App ID and get the app's metadata as JSON:

curl -s "https://itunes.apple.com/lookup?id=337472899&country=us" | jq '.results[0]
  | {trackName, sellerName, primaryGenreName, averageUserRating, userRatingCount, price, version}'
{
  "trackName": "Insight Timer: Meditate, Sleep",
  "sellerName": "Insight Network Inc",
  "primaryGenreName": "Health & Fitness",
  "averageUserRating": 4.89,
  "userRatingCount": 441457,
  "price": 0,
  "version": "20.23.0"
}

This is where the exact numbers live. The page shows "4.9" and "441K"; the API gives 4.89 and 441457. Use the page for what users see, the API when you need precision.

iTunes Search API

Discover apps by keyword. Add entity=software so you only get apps:

curl -s "https://itunes.apple.com/search?term=meditation&entity=software&country=us&limit=3" \
  | jq -r '.results[] | "\(.trackName) | \(.sellerName) | \(.averageUserRating)"'

What the official APIs won't give you

Two gaps push you back to the browser. First, neither endpoint returns review text. Second, and this is the part that surprises people in 2026: Apple's old RSS customer reviews feed is effectively dead.

The classic recommendation was to hit https://itunes.apple.com/us/rss/customerreviews/page=1/id=<APP_ID>/sortby=mostrecent/json and parse the entry array. We tested it against Spotify, Duolingo, Notion, and others. Every response now comes back with feed metadata and no entry array at all, in both the JSON and XML variants. If a tutorial still tells you to scrape reviews from that feed, it predates the change.

So the scorecard looks like this:

Need Best tool Why
App metadata, fast and free iTunes Lookup API Exact rating, count, version, no rendering
Find apps by keyword iTunes Search API Relevance-ranked catalog search
Review text Browser (Browserbeam) RSS feed returns nothing; data is only on the page
Chart rank, in-app prices, privacy labels Browser (Browserbeam) Rendered into the DOM, not in any API

Start with the API for metadata. Reach for the browser the moment you need anything Apple doesn't expose, which is most of the interesting data.

DIY Playwright vs Browserbeam

You can render the page yourself with Playwright or Puppeteer. The browser automation works the same; the difference is everything around it.

Concern DIY (Playwright/Selenium) Browserbeam API
JavaScript rendering You install and run the browser Handled
Proxies Buy, rotate, and debug bans yourself proxy: { kind: "datacenter" }
Resource blocking Custom request interception block_resources: [...]
Page stability Write your own waits Automatic stability detection
Output Parse raw HTML yourself Markdown or JSON via observe / execute_js
Scaling Manage browser pools and memory API handles concurrency

Here's the same App Store extraction in raw Playwright, so you can weigh it:

from playwright.sync_api import sync_playwright
import json

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    page.goto("https://apps.apple.com/us/app/calm/id571800810")
    page.wait_for_selector("[aria-labelledby^='review-']")

    data = page.evaluate("""
      () => {
        const t = (el, s) => el.querySelector(s)?.textContent?.trim() || null;
        const cards = document.querySelectorAll('[aria-labelledby^="review-"]:not(.is-detail-view)');
        return [...cards].map((c) => ({
          title: t(c, 'h3'),
          rating: c.querySelector('ol.stars')?.getAttribute('aria-label'),
          author: t(c, '.author'),
        }));
      }
    """)
    print(json.dumps(data, indent=2))
    browser.close()

The selector logic is identical. What you take on with DIY is the browser install, the proxy contract, the memory tuning, and the crash handling. For one page on your laptop, Playwright is fine. For hundreds of apps across storefronts on a schedule, the managed path saves the parts that aren't about the data. Our Puppeteer vs Playwright vs Browserbeam comparison digs into the tradeoffs.

Saving Your Data

Once you have the rows, persist them. Pick the format that fits the next step.

JSON

import json

with open("app_store.json", "w", encoding="utf-8") as f:
    json.dump(rows, f, indent=2, ensure_ascii=False)

CSV

import csv

with open("reviews.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=["title", "rating", "date", "author", "body"])
    writer.writeheader()
    writer.writerows(reviews)

SQLite

import sqlite3

conn = sqlite3.connect("apps.db")
conn.execute("""
    CREATE TABLE IF NOT EXISTS reviews (
        app TEXT, title TEXT, rating TEXT, date TEXT, author TEXT, body TEXT
    )
""")
for r in reviews:
    conn.execute(
        "INSERT INTO reviews VALUES (?, ?, ?, ?, ?, ?)",
        ("spotify", r["title"], r["rating"], r["date"], r["author"], r["body"]),
    )
conn.commit()
print(f"Stored {len(reviews)} reviews")

CSV drops straight into a spreadsheet or pandas. SQLite is better when you re-scrape over time and want to dedupe by author and date.

Common Mistakes When Scraping the App Store

1. Selecting on the hashed class names

Writing .container.svelte-j9szs8 feels natural because that's what you see in the inspector. It breaks on Apple's next deploy when the hash changes. Target semantic tags (h3, time, p) and ARIA attributes (aria-label, aria-labelledby) instead. They're there for accessibility and outlast any build.

2. Forgetting the :not(.is-detail-view) filter

The review section renders each card twice: the visible summary and a hidden detail-view clone. A bare [aria-labelledby^="review-"] returns every review twice with half the fields empty on the duplicate. Add :not(.is-detail-view) and the count is correct.

3. Using plain HTTP requests

requests.get() on an App Store URL returns an empty skeleton because the content is rendered by JavaScript. If your scraper comes back with no reviews and no rating, this is almost always why. You need a real browser.

4. Trusting the dead RSS feed

The RSS customer reviews endpoint returns an empty document now. Code built on it fails silently, looping over an entry array that never exists. Get review text from the rendered page instead.

5. Reaching for residential proxies first

App Store pages render fine through a datacenter proxy in our testing, and datacenter traffic is cheaper. Start there. Only switch to residential if you start seeing blocks at higher volume. Our residential vs datacenter proxy guide covers when the upgrade is worth it.

Frequently Asked Questions

How do I scrape App Store reviews?

Render the app page in a browser, then select review cards with [aria-labelledby^="review-"]:not(.is-detail-view). Each card holds the title (h3), star rating (ol.stars aria-label), date (time), author (.author), and body (p.content). The page shows the most-helpful reviews, not the full history.

Can I get App Store reviews from an API?

Not anymore in practice. Apple's iTunes Lookup and Search APIs return only metadata, never review text. The old RSS customer reviews feed now returns an empty document. Scraping the rendered page is the working way to collect review text as of 2026.

How do I find an app's App Store ID?

Open the app's App Store page and read the URL. The number after id is the App ID, like 324684580 in .../id324684580. You need it for both the page URL and the official APIs.

Why does my App Store scraper return an empty page?

The App Store web frontend is a JavaScript single-page app, so a plain HTTP request gets an empty shell. Use a headless browser or a browser API that renders the page before you read the DOM.

Do I need residential proxies for the App Store?

No, a datacenter proxy worked for every app page we tested. Residential is a fallback if you hit blocks at scale, not a starting requirement.

Public App Store data like app names, prices, ratings, and public reviews is lower-risk to collect than anything behind a login. Respect Apple's Terms of Service and robots.txt, keep request rates polite, and never touch Apple ID account data. For commercial use, get legal advice for your specific case.

How many reviews can I scrape per app?

The app page surfaces roughly ten most-helpful reviews. To go deeper you'd page through the dedicated reviews view, but expect Apple to cap how far that goes. For full-archive needs, plan around that limit rather than assuming the whole history is reachable.

Can I scrape the Google Play Store the same way?

The approach carries over: render the page, target stable attributes, fall back to official endpoints for metadata. The DOM and APIs differ, so the selectors won't match one to one, but the create-then-read pattern is the same.

Start Building Your App Store Scraper

The App Store stopped being a static document and became a Svelte app, which is why the old scrapers return empty pages. The fix is steady: render the page, then read what stays stable. observe hands you the whole page as markdown. execute_js turns it into structured rows when you target semantic tags and ARIA attributes instead of the svelte-* hashes. Datacenter proxies are enough, and the iTunes Lookup API fills in exact metadata when you don't need review text.

Swap the Spotify URL for any app and the same two-call pattern works. Point it at a category, loop the App IDs, and you have a ratings tracker that refreshes on a schedule.

For the full API reference, see the Browserbeam docs. If you're scraping other rating-heavy sites, the IMDb scraping guide handles a React DOM with the same create-then-read approach, and the YouTube scraping guide covers another JavaScript-heavy target. What will you track first?

You might also like:

How to Scrape YouTube: Videos and Transcripts

Scrape YouTube video data, channel listings, and transcripts. Working Python, TypeScript, and Ruby code. Bypasses headless detection with residential proxies.

10 min read May 01, 2026

Give your AI agent a faster, leaner browser

Structured page data instead of raw HTML. Your agent processes less, decides faster, and costs less to run.

Stability detection built in
Fraction of the payload size
Diffs after every action
No credit card required. 5,000 free credits included.