Is web scraping ethical? What is the future of online data collection? In this interview of our ethicality series, the Executive Director at i2Coalition, Christian Dawson, will share his thoughts about ethical data harvesting and issues in data scraping practices.
You can also check out the video interview here:
i2Coalition or the Internet Infrastructure Coalition, is a US-based organization which speaks for web hosting companies, data centers, and anything tech or web related. Established in 2012, the organization has led many awareness campaigns, but we’ll discuss the most recent one – the Ethical Web Data Collection initiative.
What prompted i2Coalition to launch the Ethical Web Data Collection initiative?
“So, first, a little bit about the i2Coalition. We are a trade association that chooses to be the voice of the internet’s infrastructure. Particularly when it comes to dealing with legislators and regulators, people on the internet who have a job of trying to solve problems on the internet through legislation.
Often those legislative leaders don’t have the knowledge to make laws that aren’t going to, what we call sort of, quote-unquote, “break the internet”. But that’s usually not their intention. They just don’t have the experience that the companies in this industry do. So, one of our jobs is to go in and be advocates for the small to medium businesses that build the internet’s infrastructure to ensure that it continues to be the incredible tool we all use daily.
One of the issues that we have in doing so, and being a broad advocate for the internet’s infrastructure, is that when new segments of the internet evolve, there tend to be a good deal of issues that need some resolving and some work to mature to the point where you can clearly advocate for a business segment. The web scraping industry is one where responsible organizations are doing good things, but some are cutting corners or finding paths that aren’t ideal.
Some work needs to be done to organize that segment of the internet’s infrastructure and how other segments have been organized. The goal was to bring responsible parties into the room to figure out how we can collectively make the internet a better, safer place for their segment of the internet’s infrastructure. So, we could advocate better for them and the internet in general.”
The data aggregation industry has many misconceptions regarding ethicality. So, how does the campaign address some of the concerns we’ve heard expressed in the market?
“When it comes to ethics in the data aggregation industry, you have somewhat mixed perspectives. But for the most part, many the major organizations that are doing work out there, when you sit down and you talk to them, you find that there are core principles that most of them hold when it comes to important issues such as data privacy, that those responsible organizations hold across the board.
The issue is that not all companies in this space or any space are necessarily ethics-oriented.
And part of the job of an organization like this is to bring responsible parties in the room together and to figure out how we can sort of put on paper the expectations for this industry to be considered ethical operators. In a way, we’re trying to figure out how to set a bar. And there are going to be players that are above the bar and below it. But identifying what that bar is collectively is an important task that this industry and most industries should do.”
Web data harvesting is a broad process: from proxy service selection to data scraping practices. Does your initiative touch on all parts of the procedure? What factors call for most of your attention?
“When it comes to looking at the parts of the data harvesting and data collection process, we started with the perspective that all potential segments of that process are on the table. And we wanted to take a very broad view of the ecosystem in general regarding the different parts of data collection. So, different types of companies can be at the table trying to help figure out what that bar is that determines a base level of ethicality in the web data collection industry.
As we continue our process of talking with these companies and trying to figure out how we can collectively come to a set of principles that everybody can agree to, the goal is to think very broadly, think about not only the process but the entire ecosystem of players involved in the process.”
What does the group hope to achieve? And why is it important to the industry?
“So, what the group is ultimately trying to achieve in this first set of exercises that it’s undertaking is creating a set of core principles. Which ultimately will be, in some respects, a note to the public saying that we, as an industry, are collectively stating that these are what we consider ethical practices for the data aggregation industry. And anything that is not above this bar, at this bar or above it, we’re not going to advocate for it; we don’t consider ethical practices.
Having that on paper and having that clearly stated by leaders in this industry, I think, is going to achieve the ability for the data collection industry, the web data collection industry, to go out there. The data aggregation industry goes out there and talks about solutions to problems on the internet in a way that gives them much more credibility. Because you understand that when they as an industry are talking, they’re not talking about the activities of every player, but the ones behaving ethically. And there is some grounding for that sort of ethical baseline.”
I2Coalition has previously led similar campaigns, like the VPN Trust initiative. Did you apply any insights from past experience to this campaign?
“So, we have done this kind of work before. We have a very successful program of many major VPN companies out there that we put together a couple of years ago. And it is called the VPN Trust Initiative. And you can find it on the internet at vpntrust.net. We went through a very similar process with that part of the internet’s infrastructure where VPN providers went out there and said, listen, we, as an industry, certainly have players that are using our important tool that does a lot of good for the internet, for nefarious purposes, for illegal purposes. And that’s not something that an industry advocate for.
And so, to figure out how we can talk to people about the good that the VPN industry does, we need to go out there and be very clear about what we are and are not speaking up for and advocating for. The VPN Trust principles are now in their second cycle. And you can go to read all about what the VPN industry is promising it’s doing. As far as its own ethicality. I think that it’s very important work. I also think that it’s excellent to see that that work continues to evolve. They went through a public comment period where they had the public weigh in on their principles last year. It’s something that perhaps the web data collection companies can look to as a model in the future, so that they can learn from the people out there in the general populace about what they’re expecting from that industry.”
Do you hope to grow the initiative beyond the five inaugural members you have now? And what are the action steps for attracting more industry players?
“So, the Ethical Web Data Collection Initiative is currently looking for new members. They’re looking to expand from the founding members that have come together to create this organization. Because now is the point where the organization has been founded, and it is time to have these important conversations about what this initial document that they are in the process of creating should be. The more voices that are part of this ecosystem, the better.
The more companies who can help shape the future of this industry through the creation of deliberate creation periodically, and companies coming together and talking, being in meetings together, putting their heads together, and figuring out how to set that bar, the better. And so, if you are interested, you can learn more about becoming a member. They would love to have you become part of the group.”
You often highlight how vital industry-led contributions are. Can you elaborate on this?
“So, it is really important the industry steps up and advocates for itself. When we look at the internet infrastructure, what we don’t really think of, is that it’s still mostly and predominantly driven by small to medium businesses. Not necessarily the types of organizations that traditionally could organize and have a clear voice for themselves. When legislators and regulators are figuring out how to solve problems online, they generally think of the internet as Google, Facebook, Amazon, Microsoft, and a few other big companies. They’re the ones that spring to mind. Or if they do think of the internet’s infrastructure, they think of the major telcos.
But the fact of the matter is that there are all these small businesses, the ones racking and stacking servers and data centers or having activities that drive the internet’s infrastructure daily. And most of these are small to medium businesses. They’re the ones whose activities need to be understood by these legislators and regulators to make sure that problems are solved online and that it is still possible to have a small business on the internet. Because that’s where innovation comes from. You have to create an environment where you can still start an internet company out of your basement for your dorm room or the middle of nowhere. We’re not going to have the internet that we all deserve. We will have the internet of a few major corporations giving you what they say you deserve.
So, protecting that innovation is one of our jobs.”
What would you say to an executive at another data aggregation company interested in becoming part of the initiative?
“If you are part of another data aggregation company and you’re considering becoming a part of this initiative, I would say that the work we’re doing right now is going to change this industry. It’s going to make it so that, first our goal is to have a really clear and effective document that we put out there to the rest of the world and really get recognized. To help identify to people the work that is done by this industry in a very clear way, when we talk about ethics.
When you are talking about what’s right and wrong to do, you will hopefully be referring to this document. Your organization being associated with this document is, I think, an important step in making sure that the industry matures. The goal immediately following that is to look for ways where this industry can better advocate for itself. The industry doesn’t want to advocate for its players that aren’t playing by the rules. The industry doesn’t want to advocate for the ones who are causing problems. Or that are leading to people considering legislation against this industry. They want to advocate for the groups of players within this industry that are playing by the rules and that are coming together collectively to try to figure out how to make this industry a better circle place.
I would say that you should think strongly about being part of that group that’s trying to play nice together and figure out a path forward that will take this industry to the next level.”
Is there any way a regular consumer could contribute to the initiative on a personal level?
“So, when it comes to contributions from a regular consumer, I would say that I think it’s an important part of the process for the industry itself to go out there, reach out to an individual consumer and make them part of the process. I don’t think that they’re quite there yet. They need to finish internal conversations before they can have a draft of principles. But my full expectation is that they will take the work they do over the next few months and figure out what they believe are the core principles that make up a morally adjusted collection of activities within this space.
And they want to put them before the average person so the person can give them feedback and let these companies know what they expect. When that happens, you should absolutely get yourself involved. Make sure that you comment, participate in any collective outreach that this group does, so that your voice can be heard. Because based on experiences that we’ve had with other groups that we’ve worked with, the things that are contributed during public comment periods and issues like this make a big difference in making sure that the industry is responding to the needs of the industry’s customers.”
Do you see any tendencies in how different industries evolve in terms of ethicality, and if so, what stage is data collection at?
“What we tend to find in any new industry is that there tends to be many new entrants that are trying new things. And when you end up trying new things, you know, some of them end up being outside the bounds of what we would call ethics or legality. It tends to happen more at an industry’s beginning than in a more mature industry. And this cycle repeated itself through the history of the internet, but probably the history of every type of new industry that’s ever come up in the history of humankind.
And so, one of the reasons we’re working with this industry right now is because they’re at this really unique inflection point where there have been a lot of new and emerging companies in this space. Many of them are doing wonderful things for the world. But ensuring that the responsible, the truth-telling ones, the ones that are really thinking through how they can meet their obligations under the law to ensure that those setting the tone for the rest of the industry is extremely important and needs to happen. For the industry to mature past the initial, what I call a gold rush stage. That happens not just in this industry but in every industry.
So, that’s basically what it is we’re trying to foster with this group. We’re trying to take the companies that want to see this part of the industry mature, who want to figure out how it can follow the rules by really starting to put some of those rules on paper. And we’ll try and move it past a gold rush stage and into a mature stage. It’s an important step in making sure that i2Coalition follows its mandate of ensuring that it tries to create a voice that builds a better internet and makes it a better, safer place.”