Comparing Popular Web Scraping & Proxy APIs

Our report benchmarks nearly a dozen popular unblockers and web scraping APIs. These remote scrapers simplify the process of web data collection by overcoming CAPTCHAs, JavaScript virtual machines, and other roadblocks erected by anti-bot systems.

Though growing in popularity every year, web scraping APIs have become especially relevant with the rise of AI models, closing down of online platforms, and commercialization of bot protection approaches.

Our main goal is to see how well the APIs are able to unblock protected websites in late 2024 (for earlier reports refer to: comparison of web scraping APIs (2023), comparison of proxy APIs (2023)). We also take a look at their features and pricing strategies to get a well-rounded view of the market.

Summary

Our list of participants included 11 API providers, which we tested on 10 protected websites at a rate of 10 requests per second.
Five APIs managed to open all targets consistently, while the others failed to unblock between one and five websites. Oxylabs, Zyte, and Bright Data had the highest average success rate of around 98%.
Zyte’s API worked extremely fast, managing to unblock all targets without headless browsers. Bright Data’s Web Unlocker, though second to last in speed, was the only API that didn’t fail a single run.
G2 (Cloudflare) proved to be the hardest target by success rate, while the largest number of participants – five – failed to unblock Allegro (DataDome).
Compared to proxy APIs (also called web unblockers), web scraping APIs have more features: asynchronous delivery, data parsing capabilities, and specialized endpoints. Some also include a proxy mode, making the distinction arbitrary.
We’re seeing more providers release specialized endpoints for popular targets. In addition, several AI-based parsing approaches have appeared, ranging from models trained on page types to AI-generated parsing schemas.
Credit-based pricing models are usually chosen by providers that service smaller customers; while extremely cheap for basic websites, they impose huge multipliers when accessing challenging targets.
Out of the business-oriented providers, Zyte is very hard to beat on price for unblocking tasks. For targets that rely on JavaScript or need special functionality (e. g. localized Google or Amazon queries), Smartproxy and Oxylabs offer a compelling balance between performance and cost.

Participants

Our research includes 11 major providers of web scraping and proxy APIs (often called web unblockers). The tools technically form different product categories, but we decided not to separate them. Both tend to use the same tech, and web scraping APIs sometimes have a proxy mode as one of the integration formats, which further blurs the distinction.

Most participants are well known in the industry, though not necessarily for their scraping infrastructure. Here’s the full list:

Participant	Tested products	Target audience
Bright Data	Web Unlocker, SERP API	Companies & enterprise
Infatica	Web, SERP, E-Commerce APIs	Individuals & small businesses
NetNut	Web Unblocker	Companies & enterprise
Nimble	Web, SERP, E-Commerce APIs	Companies & enterprise
Oxylabs	Web Scraper API	Companies & enterprise
Rayobyte	Scraping Robot	Individuals & small businesses
ScraperAPI	Scraping API	Individuals & small businesses
Scrapingdog	Web Scraping API	Individuals & small businesses
Smartproxy	Web, SERP, E-Comm, Social APIs	Small to medium businesses
SOAX	Web Unblocker	Companies & enterprise
Zyte	Zyte API	Individuals to enterprise

Methodology

We gave all participants the methodology doc in advance. Some actively monitored our progress, making adjustments to their scrapers on the fly. This is fine, as web scraping is a dynamic process, and it should be approached as such. Hopefully, we also helped to improve the success rate for actual customers.

We chose 10 targets based on their popularity and bot protection system. Our goal was to try the scrapers with all major anti-bot vendors.

Target	Bot protection
Allegro (products)	DataDome
Amazon (products)	In-house
Canadagoose (products)	Kasada
G2 (product reviews)	Cloudflare
Google (SERPs)	In-house
Indeed (location directories)	Shape
Instagram (HTML profiles)	In-house
Lowe’s (products)	Akamai
Safeway (products)	Imperva
Walmart (products)	PerimeterX

There are some caveats to consider:

Anti-bot systems may have different levels of protection based on the website (or even categories of the same website).
Some bot protection vendors focus on securing sensitive endpoints (such as internal APIs or login pages), so they may not show up in full force against simple collection of public content.

We ran at least three tests for each target throughout several weeks. We fetched ~6,000 unique URLs, navigating directly to the page. The rate was 10 requests per second, with a timeout of 600 seconds. This is enough to trigger bot protection systems – and, as we’ll see, seriously tax some of the scrapers.

We used a custom Python script – its function was to simply send the request to the scraper and receive the response, measuring the time it took to reach us. Our server was located in the US.

Participants were free to suggest the optimal parameters for the targets, and some did. Otherwise, we used our own discretion, starting with the simplest configuration and enabling optional features (such as premium proxies and headless browsers) if we couldn’t unblock or load the valuable content.

We verified a request’s success by checking the response code and HTML size. The latter was necessary, as some websites (such as Safeway) tend to return 200-coded responses without data.

Some providers imposed concurrency limits that could potentially affect our scraping rate:

ScraperAPI’s largest public plan has a limit of 100 concurrent threads, which isn’t enough for 10 requests/second, especially when headless browsers are involved.
Infatica and Scrapingdog have the same restriction.
Zyte’s default rate limit is 500 requests per minute (~8.3/second).

ScraperAPI, Scrapingdog, and Zyte lifted their restrictions for us. Infatica wasn’t able to, forcing us to scrape most websites at 1 req/s. We also encountered SOAX’s internal limits and decided to stick with ~5 req/s for more complex targets.

Benchmark Results

The results show the best run each API had with each target. We’ll provide comments to give you more context where necessary.

Overall Performance

Oxylabs	Zyte	Bright Data	Smartproxy	Nimble	NetNut
98.50%	98.38%	97.90%	96.29%	95.48%	80.82%
SOAX	ScraperAPI	Scrapingdog	Infatica	Rayobyte
69.30%	67.72%	43.84%	38.40%	37.65%

Zyte	NetNut	Smartproxy	Scrapingdog	Nimble	SOAX
6.61 s	9.71 s	10.91 s	10.92 s	13.01 s	13.41 s
Oxylabs	ScraperAPI	Infatica	Bright Data	Rayobyte
13.45 s	15.39 s	17.15 s	22.08 s	26.24 s

Five providers managed to open all targets more or less consistently, which is an excellent result. We can distinguish Oxylabs and Zyte for their overall success rate; and though it doesn’t reflect here, Bright Data was amazingly dependable, never failing a single test.

The rest had at least one target that gave them trouble. But you shouldn’t discount these APIs just by looking at one aggregate number. For example, aside from Lowe’s and Safeway, NetNut performed flawlessly with most websites, and so on.

In terms of response time, Zyte’s API was a speed demon, beating others by up to four times. The provider made some adjustments during the process, and by the end, it could somehow open all targets without requiring JavaScript rendering.

Bright Data obviously prioritized unblocking success, and so it performed slower than expected. We believe that a better way to scale its APIs would be through more parallel requests, which our testing parameters didn’t fully exploit.

Hardest Targets to Unblock

We omitted Infatica’s and SOAX’s results from the success rate column, as they were tested under lower rate limits (one and five requests per second, respectively).

	Avg. success rate	Participants with over 80% failure rate
G2	60.39%	4
Lowe’s	67.17%	3
Allegro	68.32%	5
Safeway	68.93%	4
Canadagoose	75.73%	4
Indeed	81.40%	1
Instagram	89.82%	0
Google	93.77%	0
Walmart	94.54%	1
Amazon	96.12%	0

G2 (Cloudflare) proved the hardest to unblock looking at the average success rate of all providers. However, it was actually Allegro that the most participants failed to open consistently. Its average success rate is boosted by the rest.

On the other hand, most APIs managed to open Google and Amazon nearly perfectly. As major web scraping targets, they’re the baseline for any commercial data collection service.

Breakdown by Individual Target

Provider	Success rate	Response time
Oxylabs	100%	1.96 s
Smartproxy	100%	2.38 s
Bright Data	99.90%	3.68 s
Nimble	99.80%	12.06 s
NetNut	99.62%	6.18 s
Zyte	99.13%	4.80 s
Scraping Dog	7.88%	10.85 s
ScraperAPI	7.01%	26.68 s
SOAX	2.12%	6.30 s
Rayobyte	1.54%	7.65 s
Infatica	Failed to unblock

As one participant exclaimed, Allegro is very hard to unblock. The website uses DataDome, and it’s even featured as a success case in the anti-bot vendor’s website.

In reality, we saw one of two extremes: an API either opened Allegro perfectly, or it failed completely. The same story repeated consistently across our tests. All in all, Oxylabs and Smartproxy performed particularly well here.

Provider	Success rate	Response time
ScraperAPI	100%	3.79 s
Oxylabs	100%	5.08 s
Bright Data	99.85%	5.88 s
Smartproxy	99.83%	5.05 s
Nimble	99.82%	6.39 s
Zyte	99.80%	3.26 s
NetNut	99.73%	6.21 s
SOAX	99.67%	12.11 s
Infatica	94.66%	8.85 s
Rayobyte	87.86%	12.93 s
Scraping Dog	78.23%	13.97 s

Amazon is the website for web scraping, so unblocking it is a must for any self-respecting service. As such, Amazon proved to be the least problematic target.

We saw consistent results with little deviation across runs. Though the difference was minimal, ScraperAPI had the best showing.

Provider	Success rate	Response time
NetNut	99.90%	7.01 s
Zyte	99.88%	15.26 s
ScraperAPI	99.79%	3.58 s
Bright Data	99.60%	4.45 s
Oxylabs	98.87%	4.09 s
Nimble	90.73%	11.85 s
Smartproxy	79.88%	6.22 s
Rayobyte	12.95%	56.83 s
SOAX	Failed to unblock
Infatica	Failed to unblock
Scraping Dog	Failed to unblock

The Canada Goose store includes only several hundred products, but it’s visited by hundreds of thousands of people every month. The website uses Kasada, which was a hard nut to crack. The participants either had a bypass method, or they failed to unblock this target.

Like in The Web Scraping Club’s benchmark, NetNut’s unblocker had the best success rate, though it wasn’t the fastest. Some APIs had significant variance between runs: Nimble, ScraperAPI, and Smartproxy failed several tests and then fixed their scrapers for others.

Provider	Success rate	Response time
NetNut	99.80%	4.79 s
SOAX	99.38%	13.75 s
Bright Data	91.74%	26.80 s
Zyte	90.12%	6.71 s
Oxylabs	87.35%	27.45 s
Smartproxy	83.95%	6.92 s
Nimble	69.11%	39.23 s
Scraping Dog	19.80%	3.33 s
ScraperAPI	1.36%	22.29 s
Rayobyte	0.32%	94.20 s
Infatica	Failed to unblock

G2, a major company review website, is protected by Cloudflare. We found it to be the most challenging target, giving even the most solid APIs a run for their money.

Again, NetNut showed the best performance, both in success rate and response time. As with Canada Goose, the results weren’t always consistent between runs for more than one participant.

Provider	Success rate	Response time
Zyte	100%	0.81 s
NetNut	100%	2.10 s
Nimble	100%	3.24 s
Smartproxy	100%	5.37 s
Oxylabs	99.98%	4.79 s
Scraping Dog	99.97%	2.93 s
Bright Data	99.86%	10.12 s
Infatica	95.07%	2.44 s
SOAX	94.13%	8.70 s
Rayobyte	92.20%	4.49 s
ScraperAPI	51.93%	5.83 s

Google is a must for any web scraping API. The search engine is protected by the infamous reCAPTCHA, which quickly rate limits suspicious visitors. But it proved no challenge to all but one API.

We’re particularly impressed with Zyte’s performance. Zyte API not only achieved a perfect success rate, but it returned requests in under one second – much faster than the rest.

Provider	Success rate	Response time
NetNut	100%	2.52 s
Smartproxy	100%	3.38 s
Bright Data	100%	4.67 s
Oxylabs	99.88%	3.69 s
Infatica	99.84%	3.12 s
Nimble	99.76%	10.80 s
Zyte	99.53%	10.85 s
SOAX	98.92%	12.84 s
ScraperAPI	98.80%	5.02 s
Scrapingdog	25.46%	20.03 s
Rayobyte	9.19%	21.51 s

Contrary to our expectations, Indeed wasn’t a hard target for the scrapers. The website employs Shape, a notoriously hard anti-bot system, but we either failed to trigger it or Indeed is using a lenient configuration.

In any case, at least five providers had amazing results, making it hard to distinguish one outlier. Similar outcomes repeated throughout all runs, with the exception of ScraperAPI.

Provider	Success rate	Response time
Nimble	99.97%	7.01 s
SOAX	99.73%	8.96 s
Oxylabs	99.55%	27.46 s
Smartproxy	99.48%	23.46 s
Zyte	99.13%	2.63 s
Bright Data	96.61%	55.04 s
NetNut	96.21%	25.31 s
Infatica	93.04%	20.40 s
ScraperAPI	79.33%	21.90 s
Scraping Dog	75.36%	8.83 s
Rayobyte	62.75%	13.63 s

Instagram is another major source of web data, though TikTok has probably started challenging it in popularity by now. The social media network uses its own bot protection system that redirects suspicious users to a login page. In our tests, however, Instagram didn’t cause big issues for most participants.

Overall, Nimble’s results look the best on paper. It’s also interesting that Zyte adjusted its scraper in the process, and our last test ran successfully without JavaScript rendering enabled. As a result, Zyte’s response time is mighty impressive.

Provider	Success rate	Response time
Zyte	100%	17.78 s
Smartproxy	99.98%	24.20 s
SOAX	99.83%	14.16 s
Nimble	99.81%	18.56 s
Oxylabs	99.75%	29.58 s
Bright Data	99.14%	75.61 s
ScraperAPI	63.13%	34.45 s
NetNut	27.00%	16.09 s
Scraping Dog	9.90%	23.31 s
Rayobyte	5.79%	39.36 s
Infatica	1.40%	50.93 s

Lowe’s is a decently popular target that employs Akamai’s bot protection system. It brought down a third of the participants, including NetNut, which came out strong with other anti-bots.

On the other hand, six APIs succeeded over 99% of the time, which is a great result.

Provider	Success rate	Response time
Zyte	100%	1.65 s
Smartproxy	99.81%	28.36 s
Oxylabs	99.69%	27.57 s
Nimble	95.95%	9.82 s
Bright Data	92.33%	29.33 s
ScraperAPI	75.84%	25.32 s
Scraping Dog	50.07%	2.61 s
Rayobyte	6.61%	2.09 s
NetNut	0.05%	11.55 s
SOAX	0.04%	27.31 s
Infatica	Failed to unblock

Safeway, the U.S. supermarket chain, is protected by Imperva and imposes aggressive geo-restrictions outside North America. The website isn’t a very popular target, so most participants found it tricky, requiring several runs to adjust.

All in all, Zyte’s performance looks amazing on paper, but it was Bright Data that ensured consistent results throughout all tests.

Provider	Success rate	Response time
Smartproxy	99.98%	3.80 s
ScraperAPI	99.98%	5.04 s
Bright Data	99.98%	5.20 s
Oxylabs	99.88%	2.84 s
Nimble	99.88%	11.12 s
SOAX	99.25%	16.58 s
Rayobyte	97.32%	9.68 s
Zyte	96.22%	2.31 s
NetNut	85.91%	15.68 s
Scraping Dog	71.70%	12.46 s
Infatica	Failed to unblock

Though probably eclipsed by Amazon, Walmart is a major e-commerce data source. It tends to juggle anti-bot systems but is generally associated with PerimeterX.

Most participants didn’t find Walmart problematic. However, we did see Nimble’s and Scraping Robot’s success rates crash after PerimeterX’s update late August.

Other Observations

When accessing protected targets, commercial APIs can be brittle. Less popular websites may need to be unblocked, even if the provider has a general bypass for that bot system. Popular targets like Walmart or G2 may too temporarily break after major updates.
Providers use different approaches for unblocking the same websites. Nimble relies on what it calls browserless drivers – they render JavaScript without invoking traditional headless browsers. We saw a big reliance on these drivers. On the other hand, by the end of our tests Zyte was able to access all targets without requiring browser-rendered HTML at all.
There’s a big difference between running tests at one request per second and ten or more. First, we wouldn’t have discovered that providers had scaling issues. Second, some websites don’t start seriously blocking until five requests per second or more.

Feature Overview

Let’s take a quick look at what you can do with the scraper and proxy APIs.

Proxy vs API Integration

The question of integration method is often already decided before purchase: if your codebase takes the proxy format, you’ll naturally gravitate towards it. But is there a real difference between the features API and proxy integration methods offer? In a way, yes.

	Proxy APIs (unblockers)	Web scraping APIs
Data delivery	Real-time	Real-time or on-demand, sometimes with batching & cloud storage
Geo-location selection	Often country-wide, sometimes up to city & ASN	Usually at the country level
Sessions	✅	✅
Custom headers & cookies	✅	✅
JavaScript rendering	A toggle	A toggle with optional instructions for scrolling, waiting, and more
Specialized endpoints	Usually unavailable	For popular websites with tailored parameters (e. g. ASIN entry, ZIP selection for Amazon)
Data parsing	Usually unavailable	Through specialized endpoints, manual selectors, or lately LLMs
Output formats	HTML	HTML, JSON, sometimes CSV

Proxy APIs:

	Integration	Geolocation	Sessions	Custom headers	JS rendering	Specialized endpoints	Data parsing
Bright Data	Proxy, async API	150+ countries with city & ASN targeting	✅	✅	Automated, toggle	Search engines	Specialized endpoints
NetNut	Proxy	150+ countries	✅	✅	Toggle	❌	❌

Web scraping APIs:

	Integration	Geolocation	Sessions	Custom headers	JS rendering	Specialized endpoints	Data parsing
Infatica	Real-time, async API	150+ countries	✅	✅	Toggle	Search, e-commerce	Specialized endpoints
Nimble	Real-time, async API (with batching, cloud storage)	150+ countries with state & city targeting	✅	✅	Toggle, instructions	Search, e-commerce, social media	Manual, autoparser, special endpoints
Oxylabs	Real-time, async API (with batching, cloud storage), proxy	150+ countries with ZIP for Amazon, city & coordinates for Google	✅	✅	Toggle, instructions	Search, e-commerce	Manual, special endpoints, parser builder
Rayobyte	Real-time, async API (with batching)	150+ countries	✅	✅	Toggle, instructions	Search, e-commerce	Manual, special endpoints
ScraperAPI	Real-time, async API (with batching), proxy	12 countries with 50+ upon request, ZIP code for Amazon	✅	✅	Toggle, instructions	Search, e-commerce	Manual, special endpoints
Scraping Dog	Real-time, async API, proxy	15 countries	✅	✅	Toggle, instructions	Search, e-commerce, social media, more	Special endpoints
Smartproxy	Real-time, async API (with batching), proxy	150+ countries with ZIP for Amazon, city & coordinates for Google	✅	✅	Toggle, instructions	Search, e-commerce, social media	Manual, special endpoints
SOAX	Real-time	150+ countries	❌	Cookies	Toggle	Search, e-commerce, social media	Special endpoints
Zyte	Real-time API, proxy	150+ countries	✅	✅	Toggle, instructions, scripting	❌	Manual, category based

Proxy APIs are often meant to be a direct upsell to proxy servers with a drop-in replacement process. At the same time, because you’re effectively outsourcing the page opening stage, proxy APIs need to go beyond regular proxy network features like geo-location to cover request manipulation and even JavaScript rendering. So they do.

Despite their wealth of features, proxy APIs can still be limited. For example, they rarely offer specialized endpoints, on-demand access to scraped output, or data structuring features. Another big drawback for complex scenarios is incompatibility with headless browser libraries, with no browser instructions on their own. This is where web scraping APIs offer more flexibility.

Exceptions exist. Bright Data’s SERP API integrates as a proxy, but in reality it’s a highly specialized scraper with data parsing and custom parameters. Funnily enough, some providers that sell web unblockers also offer web scraping APIs with a fully-featured proxy mode. In these scenarios, the difference hinges on the pricing method and, likely, marketing strategy.

How do you work with proxy and web scraping APIs? Obviously, the main requirement is sending an HTTP request to the provider’s server. However, the way you configure that request can differ. It’s usually either a GET request with parameters in the URL, headers, or a POST request with a JSON payload.

Exploring Individual Features

All modern proxy and web scraping APIs can render JavaScript. With pages becoming increasingly interactive, a question that arises more often every year is: what else can I do on the page? Proxy APIs tend to ignore it; the solution of scraping API developers is to expose browser controls through special parameters.

	Screenshot	Click	Input	Scroll	Wait
Nimble	✅	✅	✅	✅	✅
Oxylabs	✅	✅	✅	✅	✅
Rayobyte	✅	✅	✅	✅	✅
ScraperAPI	❌	✅	✅	✅	✅
Scraping Dog	✅	❌	❌	❌	✅
Smartproxy	✅	✅	✅	✅	✅
SOAX	✅	❌	❌	❌	✅
Zyte	✅	✅	✅	✅	✅

Bright Data, Infatica, NetNut – only basic rendering functionality available.

It’s possible to combine the instructions. For example, you can select a field, enter text, click on it, and wait for the response. Providers impose execution time limits, which often range between 60 and 120 seconds.

Zyte takes this a step further. Its clients get access to a cloud-hosted VS Code environment, where they can write their own interaction scripts.

The latter functionality isn’t common: instead, we’re seeing new categories emerge with the aim to increase web scraping success while providing standard compatibility with headless browser libraries. Some examples would be Undetect, Bright Data’s Scraping Browser, and major anti-detect browsers like Multilogin and Gologin.

Specialized endpoints are tailor made for websites or their properties (such as Amazon product pages). They often have custom parameters and data parsing capabilities. For instance, a Google SERP endpoint may be able to fetch local results (city-wide or in particular coordinates), which would otherwise be unavailable when targeting a general-purpose API.

	Google	Amazon	Others
Bright Data	SERP, ads, search types, local search	❌ (available in other products)	Bing, Yandex, DDG
Infatica	SERP, ads	Search, product	Booking
Nimble	SERP, ad optimization, local search	Search, product (incl. ZIP code)	Bing, Yandex, adding more fast
Oxylabs	SERP, ads, search types, hyperlocal search	Product, search, sellers, reviews, more (incl. ZIP code)	Walmart, Bing, Etsy, BestBuy, Target
Rayobyte	SERP	Product	❌
ScraperAPI	SERP, several search types	Product, search, offers, reviews (incl. ZIP)	Walmart
Scraping Dog	SERP, search types	Product, search (incl. ZIP)	LinkedIn, Twitter, Yelp, Indeed
Smartproxy	SERP, search types, hyperlocal search	Search, product, sellers, reviews, more (incl. ZIP)	❌
SOAX	SERP, search types	Search, product, reviews, questions	Walmart, all major search engines & social media platforms

NetNut, Zyte – no specialized endpoints available for the tested products.

Compared to the year before, we’re seeing an interesting trend: scraper vendors have been introducing more specialized endpoints to their products. One example is ScraperAPI, which now offers scrapers for Amazon, Google, and Walmart. Another is Nimbleway – the provider has set out to build what it calls online pipelines for targets in various verticals.

The direction is interesting, considering that LLMs have reduced the barrier to entry particularly when it comes to parsing, and that they tempt to consolidate toward one all-encompassing tool. Maybe a single purpose reassures that the scraper will be fit for the task?

Data parsing is an area where some of the most exciting developments are taking place. Of course, this is thanks to machine learning and large language models. But we’re also seeing changes in less sophisticated approaches: since our previous report, Oxylabs, ScraperAPI, and Smartproxy have all implemented selector support for building parsers by hand.

	Manual parsing	Pre-made templates	Other
Bright Data	❌	Specialized endpoints
Infatica	❌	Specialized endpoints
Nimble	Selectors	Specialized endpoints	Autoparsing, AI parser schemas
Oxylabs	Selectors	Specialized endpoints	AI parser schemas
Rayobyte	Selectors	Specialized endpoints
ScraperAPI	Selectors	Specialized endpoints
Scrapingdog	❌	Specialized endpoints
Smartproxy	Selectors	Specialized endpoints
SOAX	❌	Specialized endpoints
Zyte	Selectors		Models trained on page types

NetNut – no data parsing available for the tested products.

Let’s explore several different approaches to AI-based parsing that lie in the modest Other column.

#1. Custom machine learning models trained on specific page types.

Zyte has been playing around with machine learning for years now. Instead of parsing individual targets, Zyte trained multiple in-house models for whole page categories: products, news, directories, etc. The caveat was that they relied on AI vision, which required browsers. Still, during its conference roughly a year ago, Zyte bragged about being dozens of times cheaper and more accurate than ChatGPT.

Since then, Zyte has adapted the models to non-rendered requests, significantly cutting down the cost. It’s also experimenting with supplementary LLM features. They can make the schema more flexible by adding custom data points, and they can also transform data: translate, normalize, summarize, etc.

#2. A universal AI parser.

Similarly to Zyte, Nimble uses HTML-trained AI agents to extract data from various page types. Unlike Zyte, the provider automatically chooses the relevant agent depending on the page, keeping the decision process in the backend.

In a way, this makes the customer’s job easier. But it’s also way less predictable (Willl this target work? What’s the schema?). During our tests, we found the functionality to be more miss than hit: it parsed Lowe’s but failed to structure Canadagoose or G2. We’re sure it’s bound to improve fast.

To make the agents more robust, Nimble is preparing to release the ability to generate custom schemas. This feature will accept simple, likely natural language instructions and translate them into parsers. According to Nimble’s documentation, these parsers will get reusable IDs and heal automatically after identifying a failure.

For now, Nimble’s stopgap solution combines the dynamic parser with manual selectors to build a parser for the page.

#3. LLM-assisted parsers generated upon request.

This is the approach Oxylabs announced during its recent web scraping conference. Basically, you send a URL with natural language instructions to an LLM, then it generates a schema and selectors for scraping the data points. You get a preview of the output and the ability to adjust the schema to your needs. Once you’re happy, the selectors get added to the API request code.

Oxylabs’ approach is highly pragmatic, as the language model is invoked only once and not with every page access. However, it has limitations, namely that once a parser breaks, you have to manually repeat the generation process.

Pricing Approaches

We’ll overview the pricing models of the participants and how much our benchmarks would’ve cost.

Request, Credit-Based and Black Box Models

There are multiple ways to price proxy and web scraping APIs. The former use traffic or requests as the main metric. The latter charge for successful requests, either by keeping their model simplistic (one page = one request) or creating increasingly elaborate schemes based on credits.

Zyte’s model is closer to credits, for it includes variables that affect the final rate. But it’s also unique because the cost may change with time, depending on how hard Zyte finds the target to scrape. According to the provider, these revisions take place once per quarter and affect around 0.1% of websites. Still, such a pricing scheme works as a kind of black box.

	Model	Structure	Price range	Trial
Bright Data	Requests	PAYG, subscription	$1-$2,000	7 days for companies
Infatica	Credits	Subscription	$25-$240	5k req, 7 days
NetNut	Requests	Subscription	Not public	7 days for companies
Nimble	Requests	PAYG, subscription	$3-$3,000	Available
Oxylabs	Requests	Subscription	$49-$2,000	5k req, 7 days
Rayobyte	Requests	PAYG	$1.8	5k free req / month
ScraperAPI	Credits	Subscription	$49-$299	1k free credits / month, 7-day trial
Scraping Dog	Credits	Subscription	$40-$200	1k credits, 30 days
Smartproxy	Requests	Subscription	$30-$500	1k req, 7 days
SOAX	Requests	Subscription	$2.5-$2,200	Available
Zyte	Dynamic	PAYG, subscription	$1-not specified	$5 credits for 30 days

The table provides some interesting data points:

Scraper vendors prefer trials over paying as you go. Only Scraping Robot has PAYG as its sole pricing model, and Zyte starts requiring commitment after $100. Furthermore, some trials extend to free plans that refresh monthly.
Credit-based pricing usually targets customers with smaller needs. This is evident from looking at the price ranges of public plans.

Price Modifiers

To understand how exactly request-based and credit-based pricing models compare, we’ll have to explore the base price and available modifiers.

The CPM at $100 column shows how much 1,000 requests would cost when spending $100 with each participant. It may a little be biased against enterprise-minded providers, as they start to scale well at $1,000 and up.

	Base CPM at $100	Price modifiers
Bright Data	$3	A list of premium websites (2x)
Infatica	$0.09	JS rendering (10x), E-comm & SERP (10x), JS + E-comm/SERP (20x), LinkedIn (130x)
Oxylabs	$1.80	–
Nimble	$3	–
Rayobyte	$1.80	–
ScraperAPI	$0.49	Amazon (5x), SERP (30x), Social (30x), JS rendering (10x), premium IPs (10x), premium IPs + JS (30x), ultra premium IPs (30x), ultra premium + JS (75x)
Scraping Dog	$0.09	Google (5x), JS rendering (5x), premium IPs (10x), premium + JS (25x), LinkedIn (200x)
Smartproxy	$1	–
SOAX	$2.50	–
Zyte	From $0.10	Target (up to 10x), parsing (up to 3x), JS rendering (up to 15x), JS + parsing (up to 25x), screenshot (up to 25x)

NetNut – no public pricing available.

Credit-based pricing models can have huge multipliers reaching tens or even hundreds of times. These variables interact with one another: for example, you can toggle both JavaScript rendering and better quality proxies. From the user standpoint, having these options exposed can feel burdensome, as you need to experiment with parameters and mind the credit cost.

Having said that, they’re really efficient for basic websites that require neither residential proxies nor JavaScript rendering. The low baseline price also makes these scrapers look really good in marketing materials. However, for hard targets like G2, you’ll be likely to overpay.

Request-based models have the opposite problem: they’re really expensive if the target doesn’t bite. But given that these providers are often enterprise-oriented, different considerations kick in, such as scalability and unblocking success.

The Cost to Run Our Benchmarks

So, how much did we pay to complete the full 180,000 requests (10 * 6,000 * 3)? The graph shows aggregate costs, taking the rate of the closest suitable plan.

Three participants failed to unblock some targets consistently (at least 20% of the time) and use credit-based pricing. We didn’t want to speculate on the configuration, so we excluded the following from the graph:

Infatica: Canadagoose, G2, Lowe’s, Safeway, Walmart.
Scrapingdog: Allegro, Canadagoose, G2.
ScraperAPI: Allegro, G2.

Hover on a label to highlight, click to filter.

For general unblocking without JavaScript rendering or data parsing, Zyte delivered incredible value considering its performance results. The provider’s price was closer to entry-level APIs like Infatica and Scrapingdog than the premium competitors.

Smartproxy and Oxylabs also look compelling, more so if you need headless browsers or the bundled parsing features. And while this API may not be the most efficient choice in general, ScraperAPI’s prices for Amazon and Walmart in particular are definitely worthy of attention.

Conclusion

This concludes the report. Assuming that very few readers will reach this part, we moved the summary to the beginning. But since you’re here – thank you for getting to the end! If you have any questions, feel free to contact us through info at proxyway dot com or our Discord server.