We use affiliate links. They let us sustain ourselves at no cost to you.

Comparing Popular Web Scraping & Proxy APIs

Our report benchmarks nearly a dozen popular unblockers and web scraping APIs. These remote scrapers simplify the process of web data collection by overcoming CAPTCHAs, JavaScript virtual machines, and other roadblocks erected by anti-bot systems. 

Though growing in popularity every year, web scraping APIs have become especially relevant with the rise of AI models, closing down of online platforms, and commercialization of bot protection approaches. 

Our main goal is to see how well the APIs are able to unblock protected websites in late 2024 (for earlier reports refer to: comparison of web scraping APIs (2023)comparison of proxy APIs (2023)). We also take a look at their features and pricing strategies to get a well-rounded view of the market. 

Summary

  • Our list of participants included 11 API providers, which we tested on 10 protected websites at a rate of 10 requests per second.
  • Five APIs managed to open all targets consistently, while the others failed to unblock between one and five websites. Oxylabs, Zyte, and Bright Data had the highest average success rate of around 98%.
  • Zyte’s API worked extremely fast, managing to unblock all targets without headless browsers. Bright Data’s Web Unlocker, though second to last in speed, was the only API that didn’t fail a single run.
  • G2 (Cloudflare) proved to be the hardest target by success rate, while the largest number of participants – five – failed to unblock Allegro (DataDome)
  • Compared to proxy APIs (also called web unblockers), web scraping APIs have more features: asynchronous delivery, data parsing capabilities, and specialized endpoints. Some also include a proxy mode, making the distinction arbitrary.
  • We’re seeing more providers release specialized endpoints for popular targets. In addition, several AI-based parsing approaches have appeared, ranging from models trained on page types to AI-generated parsing schemas. 
  • Credit-based pricing models are usually chosen by providers that service smaller customers; while extremely cheap for basic websites, they impose huge multipliers when accessing challenging targets. 
  • Out of the business-oriented providers, Zyte is very hard to beat on price for unblocking tasks. For targets that rely on JavaScript or need special functionality (e. g. localized Google or Amazon queries), Smartproxy and Oxylabs offer a compelling balance between performance and cost. 

Participants

Our research includes 11 major providers of web scraping and proxy APIs (often called web unblockers). The tools technically form different product categories, but we decided not to separate them. Both tend to use the same tech, and web scraping APIs sometimes have a proxy mode as one of the integration formats, which further blurs the distinction. 

Most participants are well known in the industry, though not necessarily for their scraping infrastructure. Here’s the full list:

ParticipantTested productsTarget audience
Bright DataWeb Unlocker, SERP APICompanies & enterprise
InfaticaWeb, SERP, E-Commerce APIsIndividuals & small businesses
NetNutWeb UnblockerCompanies & enterprise
NimbleWeb, SERP, E-Commerce APIsCompanies & enterprise
OxylabsWeb Scraper APICompanies & enterprise
RayobyteScraping RobotIndividuals & small businesses
ScraperAPIScraping APIIndividuals & small businesses
ScrapingdogWeb Scraping APIIndividuals & small businesses
SmartproxyWeb, SERP, E-Comm, Social APIsSmall to medium businesses
SOAXWeb UnblockerCompanies & enterprise
ZyteZyte APIIndividuals to enterprise

Methodology

We gave all participants the methodology doc in advance. Some actively monitored our progress, making adjustments to their scrapers on the fly. This is fine, as web scraping is a dynamic process, and it should be approached as such. Hopefully, we also helped to improve the success rate for actual customers.

We chose 10 targets based on their popularity and bot protection system. Our goal was to try the scrapers with all major anti-bot vendors.

TargetBot protection
Allegro (products)
DataDome
Amazon (products)In-house
Canadagoose (products)Kasada
G2 (product reviews)Cloudflare
Google (SERPs)In-house
Indeed (location directories)Shape
Instagram (HTML profiles)In-house
Lowe’s (products)Akamai
Safeway (products)Imperva
Walmart (products)PerimeterX

There are some caveats to consider:

  • Anti-bot systems may have different levels of protection based on the website (or even categories of the same website).
  • Some bot protection vendors focus on securing sensitive endpoints (such as internal APIs or login pages), so they may not show up in full force against simple collection of public content.

We ran at least three tests for each target throughout several weeks. We fetched ~6,000 unique URLs, navigating directly to the page. The rate was 10 requests per second, with a timeout of 600 seconds. This is enough to trigger bot protection systems – and, as we’ll see, seriously tax some of the scrapers.

We used a custom Python script – its function was to simply send the request to the scraper and receive the response, measuring the time it took to reach us. Our server was located in the US

Participants were free to suggest the optimal parameters for the targets, and some did. Otherwise, we used our own discretion, starting with the simplest configuration and enabling optional features (such as premium proxies and headless browsers) if we couldn’t unblock or load the valuable content.

We verified a request’s success by checking the response code and HTML size. The latter was necessary, as some websites (such as Safeway) tend to return 200-coded responses without data.

Some providers imposed concurrency limits that could potentially affect our scraping rate:

  • ScraperAPI’s largest public plan has a limit of 100 concurrent threads, which isn’t enough for 10 requests/second, especially when headless browsers are involved.
  • Infatica and Scrapingdog have the same restriction.
  • Zyte’s default rate limit is 500 requests per minute (~8.3/second).


ScraperAPI, Scrapingdog, and Zyte lifted their restrictions for us. Infatica wasn’t able to, forcing us to scrape most websites at 1 req/s. We also encountered SOAX’s internal limits and decided to stick with ~5 req/s for more complex targets. 

Benchmark Results

The results show the best run each API had with each target. We’ll provide comments to give you more context where necessary. 

Overall Performance

Oxylabs Zyte Bright Data Smartproxy Nimble NetNut
98.50% 98.38% 97.90% 96.29% 95.48% 80.82%
SOAX ScraperAPI Scrapingdog Infatica Rayobyte  
69.30% 67.72% 43.84% 38.40% 37.65%  

Hover on a label to highlight, click to hide.

Zyte NetNut Smartproxy Scrapingdog Nimble SOAX
6.61 s 9.71 s 10.91 s 10.92 s 13.01 s 13.41 s
Oxylabs ScraperAPI Infatica Bright Data Rayobyte  
13.45 s 15.39 s 17.15 s 22.08 s 26.24 s  

Hover on a label to highlight, click to hide.

Five providers managed to open all targets more or less consistently, which is an excellent result. We can distinguish Oxylabs and Zyte for their overall success rate; and though it doesn’t reflect here, Bright Data was amazingly dependable, never failing a single test. 

The rest had at least one target that gave them trouble. But you shouldn’t discount these APIs just by looking at one aggregate number. For example, aside from Lowe’s and Safeway, NetNut performed flawlessly with most websites, and so on. 

In terms of response time, Zyte’s API was a speed demon, beating others by up to four times. The provider made some adjustments during the process, and by the end, it could somehow open all targets without requiring JavaScript rendering. 

Bright Data obviously prioritized unblocking success, and so it performed slower than expected. We believe that a better way to scale its APIs would be through more parallel requests, which our testing parameters didn’t fully exploit. 

Hardest Targets to Unblock

We omitted Infatica’s and SOAX’s results from the success rate column, as they were tested under lower rate limits (one and five requests per second, respectively). 

 Avg. success rateParticipants with over 80% failure rate
G260.39%4
Lowe’s67.17%3
Allegro68.32%5
Safeway68.93%4
Canadagoose75.73%4
Indeed81.40%1
Instagram89.82%0
Google93.77%0
Walmart94.54%1
Amazon96.12%0

G2 (Cloudflare) proved the hardest to unblock looking at the average success rate of all providers. However, it was actually Allegro that the most participants failed to open consistently. Its average success rate is boosted by the rest. 

On the other hand, most APIs managed to open Google and Amazon nearly perfectly. As major web scraping targets, they’re the baseline for any commercial data collection service.

Breakdown by Individual Target

ProviderSuccess rateResponse time
Oxylabs100%1.96 s
Smartproxy100%2.38 s
Bright Data99.90%3.68 s
Nimble99.80%12.06 s
NetNut99.62%6.18 s
Zyte99.13%4.80 s
Scraping Dog7.88%10.85 s
ScraperAPI7.01%26.68 s
SOAX2.12%6.30 s
Rayobyte1.54%7.65 s
InfaticaFailed to unblock 

As one participant exclaimed, Allegro is very hard to unblock. The website uses DataDome, and it’s even featured as a success case in the anti-bot vendor’s website.

In reality, we saw one of two extremes: an API either opened Allegro perfectly, or it failed completely. The same story repeated consistently across our tests. All in all, Oxylabs and Smartproxy performed particularly well here.

ProviderSuccess rateResponse time
ScraperAPI100%3.79 s
Oxylabs100%5.08 s
Bright Data99.85%5.88 s
Smartproxy99.83%5.05 s
Nimble99.82%6.39 s
Zyte99.80%3.26 s
NetNut99.73%6.21 s
SOAX99.67%12.11 s 
Infatica94.66%8.85 s
Rayobyte87.86%12.93 s
Scraping Dog78.23%13.97 s

Amazon is the website for web scraping, so unblocking it is a must for any self-respecting service. As such, Amazon proved to be the least problematic target.

We saw consistent results with little deviation across runs. Though the difference was minimal, ScraperAPI had the best showing.

ProviderSuccess rateResponse time
NetNut99.90%7.01 s
Zyte99.88%15.26 s
ScraperAPI99.79%3.58 s
Bright Data99.60%4.45 s
Oxylabs98.87%4.09 s
Nimble90.73%11.85 s
Smartproxy79.88%6.22 s
Rayobyte12.95%56.83 s
SOAXFailed to unblock 
InfaticaFailed to unblock  
Scraping DogFailed to unblock 

The Canada Goose store includes only several hundred products, but it’s visited by hundreds of thousands of people every month. The website uses Kasada, which was a hard nut to crack. The participants either had a bypass method, or they failed to unblock this target.

Like in The Web Scraping Club’s benchmark, NetNut’s unblocker had the best success rate, though it wasn’t the fastest. Some APIs had significant variance between runs: Nimble, ScraperAPI, and Smartproxy failed several tests and then fixed their scrapers for others.

ProviderSuccess rateResponse time
NetNut99.80%4.79 s
SOAX99.38%13.75 s
Bright Data91.74%26.80 s
Zyte90.12%6.71 s
Oxylabs87.35%27.45 s
Smartproxy83.95%6.92 s
Nimble69.11%39.23 s
Scraping Dog19.80%3.33 s
ScraperAPI1.36%22.29 s
Rayobyte0.32%94.20 s
InfaticaFailed to unblock 

G2, a major company review website, is protected by Cloudflare. We found it to be the most challenging target, giving even the most solid APIs a run for their money.

Again, NetNut showed the best performance, both in success rate and response time. As with Canada Goose, the results weren’t always consistent between runs for more than one participant. 

ProviderSuccess rateResponse time
Zyte100%0.81 s
NetNut100%2.10 s
Nimble100%3.24 s
Smartproxy100%5.37 s
Oxylabs99.98%4.79 s
Scraping Dog99.97%2.93 s
Bright Data99.86%10.12 s
Infatica95.07%2.44 s
SOAX94.13%8.70 s
Rayobyte92.20%4.49 s
ScraperAPI51.93%5.83 s

Google is a must for any web scraping API. The search engine is protected by the infamous reCAPTCHA, which quickly rate limits suspicious visitors. But it proved no challenge to all but one API.

We’re particularly impressed with Zyte’s performance. Zyte API not only achieved a perfect success rate, but it returned requests in under one second – much faster than the rest.

ProviderSuccess rateResponse time
NetNut100%2.52 s
Smartproxy100%3.38 s
Bright Data100%4.67 s
Oxylabs99.88%3.69 s
Infatica99.84%3.12 s
Nimble99.76%10.80 s
Zyte99.53%10.85 s
SOAX98.92%12.84 s
ScraperAPI98.80%5.02 s
Scrapingdog25.46%20.03 s
Rayobyte9.19%21.51 s

Contrary to our expectations, Indeed wasn’t a hard target for the scrapers. The website employs Shape, a notoriously hard anti-bot system, but we either failed to trigger it or Indeed is using a lenient configuration. 

In any case, at least five providers had amazing results, making it hard to distinguish one outlier. Similar outcomes repeated throughout all runs, with the exception of ScraperAPI. 

ProviderSuccess rateResponse time
Nimble99.97%7.01 s
SOAX99.73%8.96 s
Oxylabs99.55%27.46 s
Smartproxy99.48%23.46 s
Zyte99.13%2.63 s
Bright Data96.61%55.04 s
NetNut96.21%25.31 s
Infatica93.04%20.40 s
ScraperAPI79.33%21.90 s
Scraping Dog75.36%8.83 s
Rayobyte62.75%13.63 s

Instagram is another major source of web data, though TikTok has probably started challenging it in popularity by now. The social media network uses its own bot protection system that redirects suspicious users to a login page. In our tests, however, Instagram didn’t cause big issues for most participants.  

Overall, Nimble’s results look the best on paper. It’s also interesting that Zyte adjusted its scraper in the process, and our last test ran successfully without JavaScript rendering enabled. As a result, Zyte’s response time is mighty impressive. 

ProviderSuccess rateResponse time
Zyte100%17.78 s
Smartproxy99.98%24.20 s
SOAX99.83%14.16 s
Nimble99.81%18.56 s
Oxylabs99.75%29.58 s
Bright Data99.14%75.61 s
ScraperAPI63.13%34.45 s
NetNut27.00%16.09 s
Scraping Dog9.90%23.31 s
Rayobyte5.79%39.36 s
Infatica1.40%50.93 s

Lowe’s is a decently popular target that employs Akamai’s bot protection system. It brought down a third of the participants, including NetNut, which came out strong with other anti-bots. 

On the other hand, six APIs succeeded over 99% of the time, which is a great result. 

ProviderSuccess rateResponse time
Zyte100%1.65 s
Smartproxy99.81%28.36 s
Oxylabs99.69%27.57 s
Nimble95.95%9.82 s
Bright Data92.33%29.33 s
ScraperAPI75.84%25.32 s
Scraping Dog50.07%2.61 s
Rayobyte6.61%2.09 s
NetNut0.05%11.55 s
SOAX0.04%27.31 s
InfaticaFailed to unblock 

Safeway, the U.S. supermarket chain, is protected by Imperva and imposes aggressive geo-restrictions outside North America. The website isn’t a very popular target, so most participants found it tricky, requiring several runs to adjust. 

All in all, Zyte’s performance looks amazing on paper, but it was Bright Data that ensured consistent results throughout all tests. 

ProviderSuccess rateResponse time
Smartproxy99.98%3.80 s
ScraperAPI99.98%5.04 s
Bright Data99.98%5.20 s
Oxylabs99.88%2.84 s
Nimble99.88%11.12 s
SOAX99.25%16.58 s
Rayobyte97.32%9.68 s
Zyte96.22%2.31 s
NetNut85.91%15.68 s
Scraping Dog71.70%12.46 s
InfaticaFailed to unblock 

Though probably eclipsed by Amazon, Walmart is a major e-commerce data source. It tends to juggle anti-bot systems but is generally associated with PerimeterX. 

Most participants didn’t find Walmart problematic. However, we did see Nimble’s and Scraping Robot’s success rates crash after PerimeterX’s update late August. 

Other Observations

  • When accessing protected targets, commercial APIs can be brittle. Less popular websites may need to be unblocked, even if the provider has a general bypass for that bot system. Popular targets like Walmart or G2 may too temporarily break after major updates. 
  • Providers use different approaches for unblocking the same websites. Nimble relies on what it calls browserless drivers – they render JavaScript without invoking traditional headless browsers. We saw a big reliance on these drivers. On the other hand, by the end of our tests Zyte was able to access all targets without requiring browser-rendered HTML at all. 
  • There’s a big difference between running tests at one request per second and ten or more. First, we wouldn’t have discovered that providers had scaling issues. Second, some websites don’t start seriously blocking until five requests per second or more.

Feature Overview

Let’s take a quick look at what you can do with the scraper and proxy APIs. 

Proxy vs API Integration

The question of integration method is often already decided before purchase: if your codebase takes the proxy format, you’ll naturally gravitate towards it. But is there a real difference between the features API and proxy integration methods offer? In a way, yes.

 Proxy APIs (unblockers)
Web scraping APIs
Data deliveryReal-timeReal-time or on-demand, sometimes with batching & cloud storage
Geo-location selectionOften country-wide, sometimes up to city & ASNUsually at the country level
Sessions
Custom headers & cookies
JavaScript renderingA toggleA toggle with optional instructions for scrolling, waiting, and more
Specialized endpointsUsually unavailableFor popular websites with tailored parameters (e. g. ASIN entry, ZIP selection for Amazon)
Data parsingUsually unavailableThrough specialized endpoints, manual selectors, or lately LLMs
Output formatsHTMLHTML, JSON, sometimes CSV

Proxy APIs:

  Integration Geolocation Sessions Custom headers JS rendering  Specialized endpoints Data parsing
Bright Data Proxy, async API 150+ countries with city & ASN targeting Automated, toggle Search engines Specialized endpoints
NetNut Proxy 150+ countries Toggle

Web scraping APIs:

  Integration Geolocation Sessions Custom headers JS rendering  Specialized endpoints Data parsing
Infatica Real-time, async API 150+ countries Toggle  Search, e-commerce Specialized endpoints
Nimble Real-time, async API (with batching, cloud storage) 150+ countries with state & city targeting Toggle, instructions  Search, e-commerce, social media  Manual, autoparser, special endpoints
Oxylabs Real-time, async API (with batching, cloud storage), proxy 150+ countries with ZIP for Amazon, city & coordinates for Google Toggle, instructions Search, e-commerce Manual, special endpoints, parser builder
Rayobyte Real-time, async API (with batching) 150+ countries Toggle, instructions Search, e-commerce Manual, special endpoints
ScraperAPI Real-time, async API (with batching), proxy 12 countries with 50+ upon request, ZIP code for Amazon Toggle, instructions Search, e-commerce Manual, special endpoints
Scraping Dog Real-time, async API, proxy 15 countries Toggle, instructions Search, e-commerce, social media, more Special endpoints
Smartproxy Real-time, async API (with batching), proxy 150+ countries with ZIP for Amazon, city & coordinates for Google Toggle, instructions Search, e-commerce, social media Manual, special endpoints
SOAX Real-time 150+ countries Cookies Toggle Search, e-commerce, social media Special endpoints
Zyte Real-time API, proxy 150+ countries Toggle, instructions, scripting Manual, category based

Proxy APIs are often meant to be a direct upsell to proxy servers with a drop-in replacement process. At the same time, because you’re effectively outsourcing the page opening stage, proxy APIs need to go beyond regular proxy network features like geo-location to cover request manipulation and even JavaScript rendering. So they do. 

Despite their wealth of features, proxy APIs can still be limited. For example, they rarely offer specialized endpoints, on-demand access to scraped output, or data structuring features. Another big drawback for complex scenarios is incompatibility with headless browser libraries, with no browser instructions on their own. This is where web scraping APIs offer more flexibility. 

Exceptions exist. Bright Data’s SERP API integrates as a proxy, but in reality it’s a highly specialized scraper with data parsing and custom parameters. Funnily enough, some providers that sell web unblockers also offer web scraping APIs with a fully-featured proxy mode. In these scenarios, the difference hinges on the pricing method and, likely, marketing strategy. 

How do you work with proxy and web scraping APIs? Obviously, the main requirement is sending an HTTP request to the provider’s server. However, the way you configure that request can differ. It’s usually either a GET request with parameters in the URL, headers, or a POST request with a JSON payload.

Exploring Individual Features

All modern proxy and web scraping APIs can render JavaScript. With pages becoming increasingly interactive, a question that arises more often every year is: what else can I do on the page? Proxy APIs tend to ignore it; the solution of scraping API developers is to expose browser controls through special parameters.

 ScreenshotClickInputScrollWait
Nimble   
Oxylabs   
Rayobyte   
ScraperAPI    
Scraping Dog 
Smartproxy    
SOAX
✅ 
Zyte    

Bright Data, Infatica, NetNut – only basic rendering functionality available. 

It’s possible to combine the instructions. For example, you can select a field, enter text, click on it, and wait for the response. Providers impose execution time limits, which often range between 60 and 120 seconds. 

Zyte takes this a step further. Its clients get access to a cloud-hosted VS Code environment, where they can write their own interaction scripts.

The latter functionality isn’t common: instead, we’re seeing new categories emerge with the aim to increase web scraping success while providing standard compatibility with headless browser libraries. Some examples would be Undetect, Bright Data’s Scraping Browser, and major anti-detect browsers like Multilogin and Gologin. 

Specialized endpoints are tailor made for websites or their properties (such as Amazon product pages). They often have custom parameters and data parsing capabilities. For instance, a Google SERP endpoint may be able to fetch local results (city-wide or in particular coordinates), which would otherwise be unavailable when targeting a general-purpose API. 

 GoogleAmazonOthers
Bright DataSERP, ads, search types, local search❌ (available in other products)Bing, Yandex, DDG
InfaticaSERP, adsSearch, productBooking
NimbleSERP, ad optimization, local searchSearch, product (incl. ZIP code)Bing, Yandex, adding more fast
OxylabsSERP, ads, search types, hyperlocal search Product, search, sellers, reviews, more (incl. ZIP code)Walmart, Bing, Etsy, BestBuy, Target 
RayobyteSERPProduct
ScraperAPISERP, several search typesProduct, search, offers, reviews (incl. ZIP)Walmart
Scraping DogSERP, search typesProduct, search (incl. ZIP)LinkedIn, Twitter, Yelp, Indeed
SmartproxySERP, search types, hyperlocal searchSearch, product, sellers, reviews, more (incl. ZIP)
SOAX
SERP, search typesSearch, product, reviews, questionsWalmart, all major search engines & social media platforms

NetNut, Zyte – no specialized endpoints available for the tested products.

Compared to the year before, we’re seeing an interesting trend: scraper vendors have been introducing more specialized endpoints to their products. One example is ScraperAPI, which now offers scrapers for Amazon, Google, and Walmart. Another is Nimbleway – the provider has set out to build what it calls online pipelines for targets in various verticals.

The direction is interesting, considering that LLMs have reduced the barrier to entry particularly when it comes to parsing, and that they tempt to consolidate toward one all-encompassing tool. Maybe a single purpose reassures that the scraper will be fit for the task?

Data parsing is an area where some of the most exciting developments are taking place. Of course, this is thanks to machine learning and large language models. But we’re also seeing changes in less sophisticated approaches: since our previous report, Oxylabs, ScraperAPI, and Smartproxy have all implemented selector support for building parsers by hand. 

 Manual parsingPre-made templatesOther
Bright DataSpecialized endpoints 
InfaticaSpecialized endpoints 
NimbleSelectorsSpecialized endpointsAutoparsing, AI parser schemas
OxylabsSelectorsSpecialized endpointsAI parser schemas
RayobyteSelectorsSpecialized endpoints 
ScraperAPISelectorsSpecialized endpoints 
ScrapingdogSpecialized endpoints 
SmartproxySelectorsSpecialized endpoints 
SOAX
Specialized endpoints 
ZyteSelectors Models trained on page types

NetNut – no data parsing available for the tested products.

Let’s explore several different approaches to AI-based parsing that lie in the modest Other column.

#1. Custom machine learning models trained on specific page types.

Zyte has been playing around with machine learning for years now. Instead of parsing individual targets, Zyte trained multiple in-house models for whole page categories: products, news, directories, etc. The caveat was that they relied on AI vision, which required browsers. Still, during its conference roughly a year ago, Zyte bragged about being dozens of times cheaper and more accurate than ChatGPT.

Since then, Zyte has adapted the models to non-rendered requests, significantly cutting down the cost. It’s also experimenting with supplementary LLM features. They can make the schema more flexible by adding custom data points, and they can also transform data: translate, normalize, summarize, etc.

#2. A universal AI parser.

Similarly to Zyte, Nimble uses HTML-trained AI agents to extract data from various page types. Unlike Zyte, the provider automatically chooses the relevant agent depending on the page, keeping the decision process in the backend. 

In a way, this makes the customer’s job easier. But it’s also way less predictable (Willl this target work? What’s the schema?). During our tests, we found the functionality to be more miss than hit: it parsed Lowe’s but failed to structure Canadagoose or G2. We’re sure it’s bound to improve fast. 

To make the agents more robust, Nimble is preparing to release the ability to generate custom schemas. This feature will accept simple, likely natural language instructions and translate them into parsers. According to Nimble’s documentation, these parsers will get reusable IDs and heal automatically after identifying a failure. 

For now, Nimble’s stopgap solution combines the dynamic parser with manual selectors to build a parser for the page. 

#3. LLM-assisted parsers generated upon request.

This is the approach Oxylabs announced during its recent web scraping conference. Basically, you send a URL with natural language instructions to an LLM, then it generates a schema and selectors for scraping the data points. You get a preview of the output and the ability to adjust the schema to your needs. Once you’re happy, the selectors get added to the API request code. 

Oxylabs’ approach is highly pragmatic, as the language model is invoked only once and not with every page access. However, it has limitations, namely that once a parser breaks, you have to manually repeat the generation process.

Pricing Approaches

We’ll overview the pricing models of the participants and how much our benchmarks would’ve cost. 

Request, Credit-Based and Black Box Models

There are multiple ways to price proxy and web scraping APIs. The former use traffic or requests as the main metric. The latter charge for successful requests, either by keeping their model simplistic (one page = one request) or creating increasingly elaborate schemes based on credits. 

Zyte’s model is closer to credits, for it includes variables that affect the final rate. But it’s also unique because the cost may change with time, depending on how hard Zyte finds the target to scrape. According to the provider, these revisions take place once per quarter and affect around 0.1% of websites. Still, such a pricing scheme works as a kind of black box. 

 ModelStructurePrice rangeTrial
Bright DataRequestsPAYG, subscription$1-$2,0007 days for companies
InfaticaCreditsSubscription$25-$2405k req, 7 days
NetNutRequestsSubscriptionNot public7 days for companies
NimbleRequestsPAYG, subscription$3-$3,000Available
OxylabsRequestsSubscription$49-$2,0005k req, 7 days
RayobyteRequestsPAYG$1.85k free req / month
ScraperAPICreditsSubscription$49-$2991k free credits / month, 7-day trial
Scraping DogCreditsSubscription$40-$2001k credits, 30 days
SmartproxyRequestsSubscription$30-$5001k req, 7 days
SOAXRequestsSubscription$2.5-$2,200Available
ZyteDynamicPAYG, subscription$1-not specified$5 credits for 30 days

The table provides some interesting data points:

  • Scraper vendors prefer trials over paying as you go. Only Scraping Robot has PAYG as its sole pricing model, and Zyte starts requiring commitment after $100. Furthermore, some trials extend to free plans that refresh monthly.
  • Credit-based pricing usually targets customers with smaller needs. This is evident from looking at the price ranges of public plans. 

Price Modifiers

To understand how exactly request-based and credit-based pricing models compare, we’ll have to explore the base price and available modifiers. 

The CPM at $100 column shows how much 1,000 requests would cost when spending $100 with each participant. It may a little be biased against enterprise-minded providers, as they start to scale well at $1,000 and up. 

 Base CPM at $100Price modifiers
Bright Data$3A list of premium websites (2x)
Infatica$0.09JS rendering (10x), E-comm & SERP (10x), JS + E-comm/SERP (20x), LinkedIn (130x)
Oxylabs$1.80
Nimble
$3
Rayobyte$1.80
ScraperAPI$0.49Amazon (5x), SERP (30x), Social (30x),
JS rendering (10x), premium IPs (10x), premium IPs + JS (30x), ultra premium IPs (30x), ultra premium + JS (75x)
Scraping Dog$0.09Google (5x), JS rendering (5x), premium IPs (10x), premium + JS (25x), LinkedIn (200x)
Smartproxy$1
SOAX$2.50
ZyteFrom $0.10Target (up to 10x), parsing (up to 3x), JS rendering (up to 15x), JS + parsing (up to 25x), screenshot (up to 25x)

NetNut – no public pricing available.

Credit-based pricing models can have huge multipliers reaching tens or even hundreds of times. These variables interact with one another: for example, you can toggle both JavaScript rendering and better quality proxies. From the user standpoint, having these options exposed can feel burdensome, as you need to experiment with parameters and mind the credit cost. 

Having said that, they’re really efficient for basic websites that require neither residential proxies nor JavaScript rendering. The low baseline price also makes these scrapers look really good in marketing materials. However, for hard targets like G2, you’ll be likely to overpay.

Request-based models have the opposite problem: they’re really expensive if the target doesn’t bite. But given that these providers are often enterprise-oriented, different considerations kick in, such as scalability and unblocking success.

The Cost to Run Our Benchmarks

So, how much did we pay to complete the full 180,000 requests (10 * 6,000 * 3)? The graph shows aggregate costs, taking the rate of the closest suitable plan. 

Three participants failed to unblock some targets consistently (at least 20% of the time) and use credit-based pricing. We didn’t want to speculate on the configuration, so we excluded the following from the graph:

  • Infatica: Canadagoose, G2, Lowe’s, Safeway, Walmart.
  • Scrapingdog: Allegro, Canadagoose, G2.
  • ScraperAPI: Allegro, G2.

Hover on a label to highlight, click to filter.

For general unblocking without JavaScript rendering or data parsing, Zyte delivered incredible value considering its performance results. The provider’s price was closer to entry-level APIs like Infatica and Scrapingdog than the premium competitors. 

Smartproxy and Oxylabs also look compelling, more so if you need headless browsers or the bundled parsing features. And while this API may not be the most efficient choice in general, ScraperAPI’s prices for Amazon and Walmart in particular are definitely worthy of attention

Conclusion

This concludes the report. Assuming that very few readers will reach this part, we moved the summary to the beginning. But since you’re here – thank you for getting to the end! If you have any questions, feel free to contact us through info at proxyway dot com or our Discord server

Picture of Adam Dubois
Adam Dubois
Proxy geek and developer.