Crawlera is a rotating proxy network managed by a data extraction company called ScrapingHub. Over 13 years, ScrapingHub has built a whole ecosystem around web scraping: an open-source scraping framework, headless browser, data collection services, and more. Crawlera makes one – albeit a very important – piece of its data collection puzzle.
Crawlera was born as an in-house solution to a nagging problem. As web scrapers, the guys at ScrapingHub dealt with proxies every day. However, no matter which provider they chose, they would soon run into issues with blocks and reliability. Eventually, they stopped focusing on what IPs they had, and instead put all their efforts toward how they worked. What emerged was the ‘the world’s smartest proxy network’.
While that does sound like marketing speak (it is), the slogan hides a whole philosophy behind it. Crawlera won’t sell you ‘good’ IPs. Instead, it will take mediocre datacenter proxies and use its magic to bring websites like Google and even Yandex to their knees.
How? By renting IPs from other providers and throwing its intelligent management layer on top. Though it calls itself a proxy network, Crawlera actually covers a good deal of the complexities involved in web scraping. It automatically rotates and selects healthy IPs, adds browser headers, throttles requests, controls cookies, and even detects bans. Quite a bit more than your average proxy provider.
So, IPs themselves are in the background here. What matters is data. This is clearly reflected in the pricing plans: they’re based on requests, and you’re only charged for ones that succeed. Crawlera claims that this lets companies scale their operations more easily and transparently, and it’s ready to help them reach into billions of requests if needed.
Any drawbacks? Sure. The cheaper plans limit everything from concurrent requests to customer support (no help on weekends!). Throttling and automatic retries may preserve proxies, but they might also slow you down. There’s no SOCKS5. And, of course, the IPs aren’t really suited for things outside of web scraping.
In this review, we’ll try to find out if Crawlera can compete with the dominant residential proxy networks. I’ll take a closer look at Crawlera’s features, performance, user experience, and customer support. If you’re time-constrained, the provider’s key features with pros & cons are just below.
- Great scraping performance
- Failed requests cost nothing
- Intelligent IP management
- Generous free trial
- Cheaper plans limit features
- Performance restrictions under load
- Limited use cases
- Proxy types: Datacenter, residential
- Protocols: HTTP, HTTPS
- Locations: 12+ countries
- Targeting: Country
- Authentication: API key
- Sub-users: Yes
- Dashboard: Yes
- Support: 24/5 (24/7 for enterprise)
- Pricing: From $99 for 100k requests
- Payment options: Credit card, PayPal
- Trial: 14 days
Crawlera sells access to a datacenter proxy pool. The most demanding clients can also rent residential IPs, but those cost extra and are reserved for the Enterprise plan.
Unimpressive IPs, impressive proxy management features.
Datacenter proxies are Crawlera’s main product – or, at least, the basis for it. What you’re really buying is the proxy management features mounted on top, and the provider will constantly remind you to ‘focus on the data, not proxies’.
So, we can’t know for sure how many IPs there are. Nor is there much information about the location coverage: Crawlera told us there are 12 countries with more than 1 thousand IPs, and it can get other locations on demand. That’s likely because Crawlera rents proxies from other providers, so it can accommodate to your needs.
The proxies use backconnect servers: you get one gateway address, and it rotates the IPs for you in the backend. Sticky sessions are available, but not for the cheapest plan. Keeping the same IP imposes a 12 second delay between requests, which gets longer if you ignore the limit.
You can create sub-users, and authentication is done by using a provided API key (it can also function as user:pass). IP whitelisting is possible but only for Enterprise clients.
In short, what we see is pretty unspectacular. The interesting part is that which we don’t see – Crawlera’s proxy management logic. Here are some of its underlying features:
- Request throttling – Crawlera looks at the website you’re scraping, estimates its load, ban history, and then limits the request rate just enough so you wouldn’t get banned.
- Automatic retries – Crawlera keeps a database of different ban types, captchas, and response codes. Once it detects a block, the system automatically switches the IP and retries the request.
- Cookies and sessions – the proxies can be configured to store cookies, so that you wouldn’t need to do so in your crawler. This feature isn’t available with the cheaper plans, which simply discard the cookies.
- Headers – Crawlera automatically assign browser headers to your requests chosen from its internal pool.
Evidently, a lot going is on in the background. The aim is both to get the most out of datacenter proxies and make scraping more reliable. At times it can feel like this is hindering your performance – and that you’re dealing with a black box – but that’s the trade-off you have to take. If you don’t like it, there’s an option to modify or disable most of the value-added features.
Only for edge cases.
Crawlera doesn’t treat residential proxies as a standalone product. Rather, it’s a paid add-on for enterprise clients that need to deal with the most unruly websites. While we had access to a custom Enterprise plan, it didn’t include the residential proxy functionality. I would assume it follows the same proxy management logic as the datacenter IPs, but that’s merely a guess.
Request-based pricing with feature lock.
Positioning itself as a data provider that also happens to sell proxies, Crawlera has a corresponding pricing model. It charges not by IPs (the datacenter standard) or traffic (99% of residential proxy providers) but rather by successful requests. That’s right: you only pay for the data you get.
There are a few gotchas here and there: for example, a redirect counts as a successful request. But these are all details; in general, the pricing model is transparent and simple to understand. I would only question the feature lock. I’m not sure if Crawlera has decided that smaller customers don’t need things like sticky sessions, more concurrent connections, or residential IPs, or if it’s simply an attempt to upsell.
The cheapest, Basic plan starts from $99 per month. This will give you 200k requests, with the ability to top up for an extra fee. The most popular Advanced plan costs $349 for 2.5M requests. So, it’s nearly three times more efficient. The Enterprise plan starts from $999 and is tailored according to your needs.
Most regular proxy providers shy away from free trials because people would simply abuse the proxies and not buy anything. Crawlera prevents proxy abuse by design, so it can afford to offer a generous 14-day trial. The included 10k requests are more than enough to try out the product (or even run a small scraping job).
Crawlera Performance Tests
Great web scraping performance.
We ran Crawlera through multiple performance tests to see if its proxies were stable, fast, and could access some of the major websites.
Proxy Quality Test
|IP Protocol||IPv4 (100%)|
|Unique IPs||51 (0%)|
|Average Response Time||1.57 s|
As expected, the IPs came from a data center. They all used the IPv4 protocol, which is still considered to be superior to IPv6. We made a lot of requests but could identify only 51 proxies. We don’t know if it’s because these were the only IPs we received or if Crawlera’s proxy management logic decided there was no need to assign more for a website that wouldn’t block them.
Throughout days of testing, Crawlera’s proxy server proved to be very stable, and 99.91% requests we made were successful. This is a great result.
Success Rate Test
The performance continued to delight us when we took Crawlera out of the sandbox and into the real world. The proxies did especially well with e-commerce websites: Amazon, eBay, and Craigslist all had a 100% success rate. Search engines proved harder to crack, especially Google (86.72%). The hardest challenge for Crawlera was social media platforms, Instagram in particular. This is understandable, considering how abused Instagram is by datacenter IPs.
Due to Crawlera’s proxy management measures, we experienced few outright blocks. Whenever the proxies failed, it was mostly because of timeouts (over 30s). Increasing the timeout threshold would have likely made the success rate even better at the expense of speed.
Speaking of speed, the response times were generally good and reached 1.99s on average.
|Concurrency||Success Rate||Avg. Response Time|
|15 Req/s||99.76%||2.62 s|
|50 Req/s||99.94%||1.36 s|
|74 Req/s||99.92%||1.36 s|
|95 Req/s||81.14%||24.56 s|
|133 Req/s||49.75%||26.07 s|
The one area where Crawlera didn’t shine was performance under load. It started struggling at around 95 requests per second: the success rate plummeted, and the average response time soared to nearly 25 seconds. Past this point, increasing the requests any more made the proxies almost unworkable. Other major proxy providers can cope with markedly larger numbers.
Crawlera told us that it limits the request rate intentionally, to protect websites from DDoS attacks. The limits can be lifted once it verifies that the client’s use case is legitimate and that the website can handle the load.
Overall, Crawlera performed very well with some of the major web scraping targets. It didn’t scale as well as we would have liked, but that was by design.
How to Use Crawlera
API-based setup and a dashboard with very detailed statistics.
You can register on Crawlera in three ways: 1) By creating an account from scratch; 2) by logging in with your Google account; 3) by using GitHub. The website will then push you to try out the 14-day free trial (which is nice), but you’ll have to enter your billing details first (which is not as nice but perfectly understandable). After activating a plan, you’ll be able to start using the service.
The dashboard includes not only Crawlera but ScrapingHub’s other products as well. To boost your scraping efforts, you can supplement the proxies with a cloud scraper, headless browser, and even access ready-made datasets.
If you don’t care about the extras, Crawlera’s part of the dashboard has all the main features you would expect. It lets you buy and modify subscriptions; provides setup instructions; allows creating and managing new users. I’m just not sure if you can assign request limits to the newly created users, which can be important for some.
The dashboard does a particularly good job at visualising your proxy use. You’ll be able to see not only the requests you have left, but also which sites you accessed, when, and whether the request succeeded. You can even narrow things down to each individual request. Very granular.
The page for proxy setup includes short code examples for cURL and the main programming languages, as well as your API key. Crawlera uses the key to authenticate proxies. In need, it can also serve as user:pass credentials: enter it instead of a username and leave the password field blank. If you want to fetch HTTPS requests, you’ll have to install an additional certificate.
To help you get around the more advanced features and use cases, Crawlera has a rich documentation page with the main information. If you find yourself at a loss, you have further options of the knowledge base and the community forum. Or simply open a support ticket.
Overall, Crawlera is comfortable and reasonably straightforward to work with.
Great for enterprise clients, not so great for everyone else.
Customers with cheaper plans can get help by reading the documentation, asking in the community forum, or creating a ticket. There is live chat functionality on the website. However, it’s reserved for sales.
Support tickets usually receive a response within a business day, at least according to Crawlera. I tried creating a ticket within business hours and received a response in around 60 minutes. This is satisfactory, but the fact that you won’t be getting help on weekends doesn’t inspire confidence.
Enterprise clients have it much better. They get their own dedicated account manager, 24/7 support, and direct access to Crawlera’s engineers.
Crawlera occupies an interesting spot in the proxy market. It isn’t selling pure proxies, neither a data collection service. Rather, it tries to meet you somewhere in-between – with features that help you complete the job but don’t quite do it for you.
I find this approach fascinating. It lets users focus on data while still allowing some freedom of customization. The proxy management features should be especially helpful for less experienced scrapers or companies that haven’t or won’t write all the scraping logic by themselves.
The performance results are very good, considering that we’re working with datacenter proxies here. This shows that they can be still powerful in the right hands, as long as you don’t target the most sensitive targets. Google, Amazon? Crawlera handles them admirably.
Even if it wouldn’t, you only have to pay for successful requests. So, with some exceptions, the question boils down not to ‘if’ but ‘when’ you will receive your data. Sooner is the answer, but sometimes you’ll have to do with later as well.
However, you’ll also find Crawlera opinionated. With good intentions and some business interests in mind, it’ll tell you that you don’t need more concurrent requests with the Basic or Advanced plans, that throttling requests this way and retrying that way work best. The approach is based on years of experience and accumulated best practices, and it starkly differs from the near blank slate you get working with other proxy providers.
Crawlera’s offer might not always coincide with your needs, and it might not be the best option if you have the scraping logic figured out. But if you have some data to collect, and find proxy management problematic, Crawlera might be just what you were looking for.