We use affiliate links. They let us sustain ourselves at no cost to you.

Zyte Extract Summit 2025 (Dublin): A Recap

Our virtual impressions from the second edition of Zyte’s annual web scraping conference.

Adam Dubois
extract summit 2025

Six weeks after running Extract Summit in Austin (we cover it here), Zyte brought the web scraping conference to its home turf of Ireland. 

It would be wrong to call the Dublin edition a reprise of the first event; for the most part, it had a new line-up and focused on different areas of web data collection. As such, both events complement each other.

As always, our aim is to give you a brief (and very human) summary of the conference’s talks. Zyte has made them available on demand, so you can do some opinionated window shopping before committing to the full videos.

Organizational Matters

Zyte didn’t change much in the way it organized the event, so there’s no need to waste metaphorical ink describing it. The conference spanned two days (the first being dedicated to workshops), and there was an option to watch everything online. You had a Slido form for questions, and that’s about it. 

The execution wasn’t flawless: for the better part of the event, online viewers had to choose between mono audio on their headphones or cranking up the volume to the max on speakers. But other than that, Zyte’s organizing committee did a solid job.

Main Themes

AI? AI. It’s unavoidable, really. But if Austin was largely about LLM parsing and agent-assisted code generation, Dublin gave much more attention to the unblocking side of things. We had a panel discussion with none other than Antoine Vastel, and Kieron Spearing gave a structured drill-down into how websites construct their requests. We loved that. 

The panel of lawyers focused on intellectual property in particular, which is the hot-button topic of the day. And, of course, team Zyte once again tried to sell their internal-project-made-public (VS Code extension), which is proper etiquette for the host in these kinds of events. 

The last brief presentation, which for some reason escaped the official agenda, tried to straight out dissuade viewers from building their own scrapers, claiming that outsourcing was the more rational choice. Though structured as a personal story, the talk was nonetheless delivered by Zyte’s developer advocate, so it was hard to take it at face value.

The Talks

Talk 1. How to Make AI Coding Work for Enterprise Web Scraping

This is the one presentation that does repeat Austin. Zytans Ian Lennon (CPO) and John Rooney (Dev Engagement Manager) introduced their new VS Code agentic spider builder to the audience in Dublin. 

John first gave a tech demo where he wrote a quick spider to scrape and structure some e-commerce pages. Ian then took over and addressed the bigger picture from the business point of view. The extension is free for now, so we recommend giving it a go to see if it works for you. We were told that it’s already saved a lot of dev resources in Zyte’s internal use. 

As for the talk, we suggest watching the Austin version. John executed the demonstration live, which unfortunately resulted in the LLM executing itself mid-process. But even companies like Meta don’t always get live demos 100% right, so we respect John for his bravery.

extract summit 2025 dublin talk 1
Guess what Zyte’s Web Scraping Copilot is trying to solve.

Talk 2. Scraping a Synthetic Web: Dead Internet Theory Meets Web Data Extraction

If you thought the dead internet theory was fringe – or you didn’t know about it at all – this is the talk for you. Domagoj Maric, AI Customer Delivery Manager at Pontis Tech, described the many ways bots have infiltrated into our browsing lives, manipulating facts and impacting our decisions. 

It’s a sprawling talk filled with examples, personal experiences, and even an overview of relevant legislation. Domagoj went as far as to build his own social media bot, proving how cheap and fast this process is. To spoil it a little, 10k comments cost just $2, and this is with current token prices. 

While there was less to do with web scraping than the title led us to believe, this truly was a fascinating presentation that we recommend without reservations.

extract summit 2025 dublin talk 2
When they’re not sowing discontent in the West, Russian bots are busy making delicious cupcakes.

Panel 1. Antiban Panel

This is probably the only panel we’ve seen that brought bot makers and bot breakers to the same stage. It was hosted by Zyte’s CEO Shane Evans and comprised Antoine Vastel (Head of Research at Castle), Fabien Vauchelles (Scrapoxy), and Kenny Aires (Team Lead at Zyte). Antoine is a bit of a mythical figure in our niche, and he was able to participate because his current role doesn’t deal with web scraping that much. 

The panel addressed a range of topics, such as how anti-bot companies distinguish between good and bad bots, or how the busy month of November impacts the data extraction and protection industries. However, it mostly dealt with change: in detection techniques, the role of proxies, and the cost of web scraping in general. 

We learned a lot. One of the main findings for us was that proxies are becoming less important in the big picture, to the point where they’re now considered a weak signal. Even the consistency of a fingerprint is no longer the ultimate giveaway due to improving botting tools and edge cases from regular users. 

Anti-bots face the constraint of retaining a good user experience, bots are constrained by scraping costs, and no one knows what exactly to do with AI agents yet. A great discussion overall.

extract summit 2025 dublin panel 1
Can you find the impostor?

Talk 3. AI and the Web: What 2025 Changed and What Comes Next

Zyte’s Senior Data Scientist Ivan Sanchez returned to talk about LLMs. Compared to Austin, this presentation gave a more high-level outlook; it overviewed the prevailing trends and allowed itself to speculate a little. 

Ivan spent a lot of time talking about reasoning models. He believes that GPT-4o and beyond caused a revolution of sorts that not only improved answers but unlocked new capabilities. The paradigm shifted from guessing the next word to solving problems. Reasoning models become even more powerful when made into AI agents, which is where we currently stand. 

The next part dealt with broader market movements, such as more foundational models (including Google’s turnaround with Gemini and Meta’s setbacks), China leading the open source, concerns about a potential bubble, and agents as the new consumers of web data. The presentation is worth watching, especially if you’re not well acquainted with the developments in AI.

extract summit 2025 dublin talk 3
What an inspiring time to be a small publisher.

Talk 4. The Anatomy of a Request: Bypassing Protections and Scaling Data Extraction

An ex Michelin-star chef, Kieron Spearing from CentricSoftware, now runs 5,000 scrapers that make 130M requests per day. It’s a pretty huge scale, if you ask us! Kieran shared his process for scaling web scraping operations and not going insane with maintenance. It was a practical and highly actionable talk. 

According to the speaker, building resilient scrapers starts with the methodology. This requires experimenting with the request through cookies, headers, proxies, and other identifiers, until you’re left with the leanest working configuration. 

As a chef, Kieron is a big proponent of preparation. If there’s one thing we took away it’s that every minute spent in investigation will save ten in implementation. But there was much more: for example, that the browser’s dev tools may not honor the original header order, or that going through a website’s API is always worth it, even if it requires much more upfront unblocking.

extract summit 2024 dublin talk 4
Kieron’s awesome prep list for building a web scraper.

Panel 2. The Future of Data Laws: AI, Web Data, and Intellectual Property

The inimitable Sanaea Daruwalla, Zyte’s Chief Legal Officer, invited three more lawyers to talk about intellectual property in the age of AI. Its panelist Nikos Callum came from the F500 company Wesco, Dr Bernd Justin Jutte of University College Dublin represented academia, while Callum Henry works alongside Sanaea for Zyte. 

The discussion revolved around relevant legislation and legal concepts. It explored the EU’s AI act with its concept of risk tiers. We found it baffling that the level of risk should be self-assessed, and that this doesn’t apply to personal AI use. According to the panelists, the EU’s opt out requirement may also cause challenges, as there’s no set format for this procedure. 

We also had the chance to learn about US law, in particular its concept of fair use. Finally, the participants discussed some recent high-profile cases, namely the Anthropic book lawsuit and Getty vs Stability AI. It seems like so far judges have tended to favor AI companies in interpreting transformative use, but nothing has been set in stone yet. 

The panel discussion ended on a funny note: when it comes to giving legal advice on web scraping, large language models are much more cautious than even lawyers! Go figure. All in all, this one is highly recommended.

This was the only way to get all four panelists on one screen.

Talk 5. The New Era of AI Data Collection: A Deep Dive into Modern Web Scraping

Fabien Vauchelles, the man behind Scrapoxy, brought his famed slides to talk about the race between bots and anti-bots. Together with his collection of monochrome ducks, Fabien covered the main developments in bot protection. Then, he demonstrated how to build a self-healing scraper. 

Fabien’s anti-bot part talked about several threats. The network fingerprint, for one, is something that’s hard to create and easy to detect. The browser scene gave little relief, too, as our current champion Camoufox is open source and thus has been studied to death, and serious scraping requires expensive custom solutions. The presenter further identified new signals, such as the audio fingerprint. At least CAPTCHAs seem to be reaching a dead end for anti-bot tech. 

In the second part, Fabien showed several ways to maintain scrapers with large language models. He wrote an MCP server that injects middleware into Scrapy scrapers. Upon failure, an LLM generates new code until the spider works again. All a human needs to do is verify the pull request. 

Fabien’s conclusions weren’t very inspiring. In-house scraping is becoming too resource demanding for many new players; and at the same time, the internet is closing off. But hey: we’re still here, so it’s not all doom and gloom.

extract summit 2025 dublin talk 5
A hero isn’t someone who doesn’t fall but rather someone who gets back up.

Talk 6. IPv6-Powered Web Scraping: Design Patterns, Pitfalls & Practical Checklists

Yuli Azarch, CEO of Rapidseedbox, explained why IPv6 proxies should be used in web scraping and how to do that effectively. The why part basically boiled down to IPv6 adoption and the costs associated with getting IPv4 IPs; the how part had few slides but made the meat of the presentation. 

It turns out that websites don’t take IPv6 addresses as individual IPs – rather, they evaluate them in blocks of /48 (or septillion addresses). That’s why it’s best to have multiple /48 subnets or, in serious web scraping jobs, go as far as /29. Yuli found that setting up reverse DNS delegation also works to prevent blocks.

Frankly, we had such big expectations from this talk. Can you use IPv6 to scrape Google? Amazon? How many requests can you realistically make per a /48 subnet? What about IPv6-only residential proxy pools which are now emerging as a new product? Alas! But even if we ended up a little disappointed, we didn’t feel like our time was wasted watching the talk. 1.5x speed and skimming through the first half can give you a good bang for your buck.

extract summit 2025 dublin talk 6
The blueprint “it’s cheaper if you get in early” sounds vaguely familiar.

Ending Remarks

Thanks to Zyte for organizing yet another great conference. If you’re human and managed to get this far down the page – you have our sincerest admiration and respect. Otherwise, please give us your best recipe of cupcakes in the comments!