A beginner-friendly web scraper for Windows and macOS.
Sticking with the theme of web scraping, today we’ll be reviewing another web scraper: Octoparse. Similar to ParseHub, it’s fairly easy to use and doesn’t require any coding.
I’ll be looking at the pricing, user interface, and features. I’ll also investigate how compatible and safe the software really is.
Just like ParseHub, Octoparse is free to download and use with limited features. But if you wanted to get a bit more use out of the software, you’d need to buy a subscription. While the free version doesn’t limit the number of pages scraped, it limits concurrent local runs to 2 and lets you build up to 10 crawlers.
The first paid plan is Standard, billed monthly at $89 ($75 per month if billed yearly). The next one, Professional, comes to $249 per month ($209 per month if billed yearly). Lastly, Octoparse has an Enterprise plan, but it’s meant for big custom solutions, so the price for this plan is not advertised. Octoparse also offers a five-day money-back guarantee.
The jump in features from the free version to the Standard plan is quite something: the limits on data exports and concurrent local runs disappear, and instead of 10 crawlers, you get to build up to 100 of them. New features include scheduled extractions, average speed extraction, auto IP Rotation, task templates, API access, and email support. The Professional plan tops it all with an increased number of crawlers (up to 250) as well as high speed extraction, advanced API, high priority email support, and 1-on-1 training and task reviews.
In addition to the pricing plans, Octoparse also offers data scraping services. Starting at $399, they’re aimed at those who don’t want the hassle of getting the information themselves. Another separate product Octoparse offers is Crawler Service. It provides custom-made crawlers for individual needs that can be run on Octoparse (starting at $189).
All in all, Octoparse seems to have options for both solo hustlers and big enterprises.
Interface & Features
I don’t want to keep referring back to ParseHub, but, once again, Octoparse’s interface is really similar to the former web scraper. It has a built-in browser view, and the scraping works using the same click-and-scrape method. You can easily select which exact information you need extracted as well as instruct the software to automatically repeat the same scraping tasks on each page of a site. As there’s no coding knowledge needed, Octoparse works for beginners, as well.
A unique (yet limited to paid plans only) feature is task templates. These are super convenient if you want to maximize the efficiency of your scraping tasks. Octoparse has quite a big variety of templates, anything from Amazon (different country ones, too) to Google Maps and Twitter.
What makes these templates so convenient is that they have the whole selection set up for you, and you don’t need to go through the pages clicking on what needs to be downloaded. The information is extracted based on its placement on the page, so these templates already know where to locate the price, name, and product pictures according to different layouts (for example, US Amazon and India Amazon won’t have the reviews on the same side, so there are 2 different templates).
You can download whatever data you extracted in Excel format or directly into your own database via an API.
Octoparse’s biggest disadvantage is that it’s not compatible with macOS and Linux operating systems. If you wanted to use Octoparse on a Mac, you’d have to either use a remote desktop connection or fire up a virtual machine such as Oracle VM Virtualbox. However, there seems to be some good news – at the start of May Octoparse announced that a Mac version should be coming out soon! When? Unfortunately, that was never specified. Let’s say the company is keeping the mystery alive.
Edit: Octoparse has released a beta version for macOS devices. Yay!
Customer Service & Tutorials
On Octoparse’s website, you can find several links to contact the customer support. All of them lead to the same submit form. There’s no live chat, which is a bit disappointing: if there was a problem with your scraper, you couldn’t reach out straight away.
The levels of customer support vary by your subscription plan. I’m not sure if this is what Octoparse really meant to call its employees, but according to the website, if you’re only using the free plan, then you’re entitled to the lazy support. However, as you go up in plans, then you can contact support via email and become high-priority.
Octoparse has impressed me with the amount of video content on YouTube. There are several video tutorials explaining everything from what web scraping is to building a custom task in Octoparse.
On a side note: many of the video tutorials focus on working with features that are only available for paying customers (e.g. task templates), but Octoparse doesn’t really mention that. So, if I was going by what I’ve seen on YouTube, I’d be pretty disappointed to find out I had to pay extra money to have that ready-made Amazon task template.
Octoparse vs ParseHub
So here we are, the final showdown. Both scrapers are very similar in interface, features, and experience needed to start scraping.
However, if we’re comparing the free versions only, Octoparse offers a bit more for less. While ParseHub lets you build 5 projects and allows 200 pages per run, Octoparse’s limit is 10 crawlers and doesn’t limit the number of pages crawled. However, if you looked at the formats for downloading your scraped data, you’d see that Octoparse has fewer options than ParseHub (e.g. no possibility of getting the results in JSON).
If you’re looking at a paid version, ParseHub’s starting point is a bit more expensive ($149 p/m) compared to Octoparse. However, both programs offer very similar features. One that ParseHub doesn’t have is those handy task templates.
In the end, it all comes down to your preference. Think through what your scraping projects will be like, test out the free versions of both, and see which one fits your needs more.
Is Octoparse Safe?
It doesn’t come as a surprise to me that people are still on the fence about web scraping. Is it legal? Does using a scraper make you a criminal?
Let’s get one thing out of the way: scraping is legal, but it all depends what information you’re gathering and how. If you want to make sure you’re not violating any laws, think if the information you’re scraping is publicly available. If yes, then you’re most likely fine. If you’re not sure, it’s always a good idea to look into the Terms & Conditions of the website and the robots.txt file.
So, if you’re using Octoparse for the right reasons, then yes – it’s perfectly safe.
What Proxies Should You Use?
The free version of Octoparse doesn’t include IP address rotation like the paid versions do. So, if you’ll be using it, you’ll need to get proxies.
Some might say it’s inconvenient not to include IP rotation; I say it’s a blessing in disguise. Now you can decide what proxies you want to use! Proxies are definitely just as important as the scraper itself, so make sure to pick reliable ones.
In general, when choosing proxies for a scraping project, you’d want them to be rotating and residential to avoid any potential blocks. We’ve compiled a list of the best web scraping proxy providers to make your choice easier.
Overall, Octoparse is a solid beginner-friendly web scraper. It has a decent free plan for smaller projects, but doesn’t shy away from making custom Enterprise solutions.
Octoparse doesn’t limit the number of pages and allows you to build 10 scrapers in the free version, while offering neat task templates in the paid version. It has also impressed me with extensive video tutorials, which almost compensated for the customer service not having an online chat. The downsides here are limited result extraction formats and an email-only customer service.