Plenary session
27th October 2020
4 p.m.

MARIA CARRIEDO: Hello, everybody. Welcome to the four o'clock session. I am Marie, and I will be chairing this session with my colleague Dmitry, and we are starting with a little bit of housekeeping.

Remember that if you want to ask any questions, you can do two things: you can ask it in the Q&A section in the window at any time, or you can wait until the speaker finishes the presentation and then you can ask for the audio queue, and we will be able to listen to you and to your question but not to see your video.

And also, remember we have several candidate for the RIPE Programme Committee today. We have four candidates and two seats available, and ^ and I are not going for re‑election, so these seats are available. So we have four candidates, we are you will be able to vote them from now, from four o'clock. The four candidates and forgive me if I pronounce your names in the wrong way: Fernando Garcia, etc. So these are the candidates, you have their biographies in the RIPE 81 programme website, and, having said that, our first speaker today is Giovane Moura, he is going to explain clouding app Internet, how centralised is DNS traffic becoming. The floor is yours.

GIOVANE MOURA: All right. So, good afternoon everybody. My name is Giovane Moura. So this is work that I did together, Martin and Christian and also my colleagues in New Zealand. This is a work that is actually being presented now as well live on IMC and here, so my colleague, Wes, is doing a presentation to IMC. The conference where the team is from and I'll be covering up here.

So, let's give a little bit of context before I move into the detail of the paper.

There have been some major concerns over the last few years over Internet centralisation. Just last week, the Department of Justice has just accused Google of illegally protecting its monopoly.

In Europe, this is July 2020, you will see regulators in Europe are trying to curb power of the big tech companies. We have some pushbacks from Europe and the US and specifically the US Congress also condemn the big text monopoly power and urge their breakup. There is a lot of things that's been happening this October. And it's not only government and representing layers pushback, we also seen some on draft in the IETF. This is a draft where he discusses specifically the consequences of centralised architectures on Internet infrastructure. I was sitting in the IETF. I don't remember which one it was, one of those anonymous hotels, I had this idea of doing this paper.

And there are various risks associated with centralisation in general. From a technical point of view, it's the most ‑‑ the major concern is to create a single point of failure. Of course there are other concerns as privacy, monopoly, market consolidation, but specifically in technical, technically speaking we have seen that it would happen in the past when a major provider has disruptions. In this case here it's seen the attack, there was a denial service attack against Dyn, a major DNS service provider, and anyone in red there in the US map had problems connecting to major websites like Spotify, New York Times, NetFlix and a bunch of others.

Another DNS provider, Route 53 from Amazon, also had an attack and some services also suffered from that. So that's what the downside of having a centralisation.

And the question I had in this paper is, like: How can I actually measure centralisation? And it is a very difficult question, because what sticks are you going to use to measure it? Are you going to talk about number of users? Are you going to be talking about the traffic? Are you going to be talking infrastructure? Market share?
So, I think the short answer is there are many ways to measure, and we are a bunch of DNS folks, so we are going to look at it. What we did was to look into DNS traffic. We'll look at the ‑‑ actually, traffic becomes not from the likes of users, but actually from resolvers, these are DNS servers that on behalf of users that send queries to them forward this to the servers, the servers that actually know the answer. So we measure here at the authoritative servers how much traffic come from the provider, and I'm going to show where we actually measure that.

So, what do we measure? We measure traffic from two country codes, top level domains from the Netherlands. We also measured New Zealand, it's 4.8 million inhabitants and 700,000 domain names. And b.root one of the 13 root servers which covers a much larger population and has a smaller zone, of course, only the TLDs. So that's ‑‑ we didn't want it to be bias just to one country or one zone, that's why you combine all the different vantage points. What we do is, we pick the top ‑‑ five of the largest content providers on the Internet: Google, Amazon, Microsoft. We also choose Facebook because it's a major ‑‑ it's the biggest social networking platform, and they have run a CDN for that, and we also chose CloudFlare, because they run a public DNS service, he wanted to see the influence that have too.

We look at how much traffic comes from those providers, to Netherlands and New Zealand and b.root. What we did was we look at the country codes, we look at like we get snapshots per year, a week in 2018, in 2019, and 2020 for that now, for New Zealand and for b.root and the specific dates per year for b.root. So we get these queries for example in the specifically week in 2020, we get ‑‑ we got like 13 billion queries for that NL out of those 11 billion were valid, and this number for autonomous system and so on and so forth. This is the datasets that we are going to look and compare.

So, what did we find about this Cloud providers?
So, this graph here shows the percentage of traffic that comes from each of these Cloud providers per year for b.root. It's not that much. Roughly 10% of the traffic. It has grown from 18 to 19. But what happens when you look at New Zealand, for example, we see that much more traffic are actually being originated from those Cloud providers. These are ‑‑ if you look at all the traffic that in New Zealand for that domain, how much of those actually come from the Cloud provider. If you combine all of them, roughly 30% in 2018 came from only five provider, five companies.

For .nl it's even worse. You'll see here the differences. So, what I can actually conclude here, this is roughly one third of the traffic for New Zealand and the Netherlands are actually originated from five Cloud providers, and if you can see there, there were more than 40,000 autonomous systems in total validated. That's a big concentration. That's beneficial A and B, and b.root only has fewer traffic from the clouds, and one of the reasons is that the routes get a lot of traffic, a lot of garbage traffic, and a lot of that comes from what is called Chromium‑based garbage queries.

Interesting, also, if you compare New Zealand to the Netherlands, we see that Google has a small penetration and New Zealand than in the Netherlands, so it's interesting to compare different countries to see how each Cloud provider performs in terms of volume queries, in particular the TLD.
So the DNS can be used to store different types of records, and in this figure here, we have a column, each column here shows a different Cloud provider and the colours shows the different types of DNS records. It can be AP addresses, names, whatever. But if you look at the Netherlands here, most of the queries for '18 and '19 are actually, and 2020 as well, these are IPv4 addresses. New Zealand has a similar pattern, but we see some other folks asking NS queries, I'm going to get into that later. And b.root, a lot of A and AAAA, AAAA meaning IPv6.

So we can also look into per year how they change per year, this is like all the queries we see from NL, NZ and B, per Cloud provider, how do they break down when you compare them. Again, A queries are the most common ones. In 2018, we started to see some, in 2019 some providers having more NS queries, and in 2020 in a similar pattern.

So, the short answer in most queries are actually A records. There is a but and what is the but about? Well, you see here, this is ‑‑ I'm just repeating figures here, this is 2018 for DOS NL, and this is 2020, let's focus here on Google. If you compare Google in 2018, there are a lot of more than 60% of queries were actually a queries, but if you look at 2020, the majority of queries are actually NS queries, they are asking for domain names. And I went to analyse why this was happening. And I realised that the queries from Google servers were actually being QNAME minimised. There is an RFC that specifies that you should only send the minimal necessary information to the server to preserve users' privacy, so, I realised that that was what was happening and we actually confirmed with Google that they deployed that in December 2019, they were like partially deploying it in January 2020, they were fully deployed, so, our data connection actually shows that because if you deploy QNAME minimisation, the RFC, the documentation that standardised that states the first query you have to send is actually an ANS query and not an A record query, so that's why you see the colours going from purple to blue here.

What we did was extend our data set and look a little bit back in time and we thought for only Google here the percentage of queries over time per month for Google and the blue line here is the one that matters, that's the one NS queries, what do we see here in October 2019? Very few queries from Google, so .nl were ‑‑ but in 2020 almost 20% and they stabilise in January to be almost half of the queries. And at the same time, you see a drop of A and AAAA queries here.

So, we could actually see that. And that's an interesting finding because a lot of people simply bash centralisation and they have a right for that. But there are also pro sides of centralisation, which is like when a security feature, a privacy feature like QNAME minimisation 7816 is deployed, if you benefit, if you deploy it on your large infrastructure, you are going to benefit a lot of users at the same time. It's the same for DNSSEC validation, that's a pro thing of centralisation.

And we decided also to look where the queries from Google were coming from, and Google very kindly make available publicly what are the IP addresses they actually send queries to you, which are part, which IP addresses are coming from the public service. It turns out that the most of the queries are actually originating from the Google public DNS, roughly 85, 88% for the Netherlands here. So, it's public DNS pretty much what we were saying being deployed in that way and you can confirm in that way then.

It's also interesting to confirm junk queries, junk queries are queries for domains if they exist in the zone and you can look here how much junk we see from each content provider for the Netherlands per year; for example, this one here, CloudFlare, had almost 40% in 2019, junk queries. Then, in 2020, it reduces a little bit of its own queries, so it's ‑‑ we are comparing here, let's say, CloudFlare to CloudFlare itself.

It's interesting in New Zealand, it was Amazon in 2020 sending roughly, more junk queries than the others, proportionally speaking. And b.root, it was CloudFlare, all the way here in 2019, but in 2020 they sort of normalised.

So we see all these differences also in junk traffic towards the clouds.
It varies a lot, it varies widely per zone. And there is a bunch of things that led to reduction in junk. As you can see in b.root in 2020, probably NS‑SEC aggressive caching and Chromium deployments now domains junk as well. That's one of the things you can measure.

Since we have these vantage points, we can look back into the content providers to measure their technology, the option, we can get queries from them and it comes with different protocols. You can see if they are deploying DNSSEC, IPv6 and how much the traffic is TCP, so, we also did that because our goal is to prepare the different clouds in regards to technology option. Personally, I always thought these guys are all up to date and up to speed because they are large, so probably they use the latest technologies, and we have figured out now if that's the case.

So, DNSSEC provides authenticity, integrity and we expect the clouds to have a similar usage of that and how can you measure that? You can measure by analysing the proportional queries that ask for records related to DNSSEC validation, DNS and DNSKEY queries specifically. Here are the drafts again, this is for .nl in the wake of 2020, it's hard to see actually the DNS queries for Google, you can see here it's a fairly small proportion because they don't need them very often.

Some in connection not sent, but you can let's just get a proportion here. Microsoft has said 1.1 billion queries in this period here, but only like 0.02 million were actually DNSSEC queries. And CloudFlare has sent half of the queries of Microsoft but way more DNS queries key queries, so this is an indication who is validates here, who is asking for resolvers, who is asking for DNSSEC‑related records. It shows that different clouds are using the technology in different ways.
So, the conclusion here for DNSSEC option, not all the clouds are the same. Some are using more than others specific for DNSSEC.

And we look at the IPv6 and IPv4 usage as well, and adoption. What we see here is, Google and CloudFlare, usually they split their traffic for both Netherlands and New Zealand like roughly equally distributed over v4 and v6, Facebook, actually in 2016, as you can see here for the New Zealand and for the Netherlands, so this is the Netherlands and this is v4 and v6 and if you go down here, we are splitting 2020 into two parts before v6, and you see that for 2020 and in 2019, 76% of the traffic from Facebook that are now it actually being v6. And even more for New Zealand here, as you can see.

And the one has very little traffic, IPv6 traffic is actually Microsoft and Amazon, it's roughly zero here and 2% here for Amazon.

Which is interesting, because, again, show the differences in adoption of technology.

So, we wonder why Microsoft and Amazon has such a little IPv6 adoption. Why so few IPv6 queries? And if you compare the number of unique IP addresses we see from Microsoft or for both New Zealand and Netherlands, we see the far majority of IPv4, and Microsoft is a similar pattern. So, they seem to have simply way more IP addresses than IPv4 queries.

Resolvers typically tend to send more queries to servers, they are sort of close by in measuring latency. So Facebook has a big ‑‑ well they did for Facebook, because they use like reverse DNS records, very nice, you can actually map every single IP address to certain locations, they use airport codes, they identify 13 locations from where they send DNS queries and we can work out the percentage of these locations were v4 and v6, and what we found per location, so each graph here is a location, each column here is a location. We see the locations here, 8 to 10, proportionately way more v4 traffic and if you look at this figure here, you see that the RTT of v4 addresses from these locations are, you know, it's under ‑‑ roughly around 50 milliseconds. But v6, which is showing this point higher here, it's above 120. You can expect those locations to send more v4 traffic because resolvers tend to stick to the lowest locations.

Unfortunately, I cannot explain for the biggest location because we didn't have any TCP track to do the analysis. So this analysis of RTT was done passively measuring TCP measurements that we had.

We can also analyse UDP versus TCP adoption usage per provider. So, UDP dominates the traffic for New Zealand and the Netherlands and this is sort of expected because it's much faster to use UDP than TCP because it can get an answer, a response within one round‑trip time. And TCP actually requires you to have an additional RTT, a handshake until you get an answer. And TCP typically in DNS is used for large queries or if your answer has been truncated then it should fall back on that.

But, you see here, for Facebook, you start to see like 15 and 14 percent for the Netherlands and 17 to 15 for New Zealand sending over TCP, and this is the only one seems to be doing that, and we also investigated that, why this is happening.

So, we ‑‑ looking at the data for 2020, we look at the queries coming from Facebook and these queries, the resolvers they can announce their EDNS 0 buffer size, this is a way to the resolver say, hey, this is the largest it can handle in a single UDP package, and if they are queried, DNS queries the large answers what's going to answer that, like if you have an answer, like 50, 100 bytes but it doesn't fit on your EDP buffer size, the server is not going to respond to that query and truncate it and then Facebook would have to then send a TCP query to fetch all the data again and that does it automatically for you in an easy way.

We see here in this figure that Facebook was 30% of their queries that we see here, actually have an UDP buffer size of 12 bytes. This is small. If you look at Google and Microsoft they are far larger, more than 1,024. One of the reasons why Facebook uses more TCP may be related to the size of their buffer sizes, of EDNS buffer size.

So, that brings us to the end. Clouds are ‑‑ they are not at all the same. So, the original idea that I had in the beginning that they would use technology similarly, did not prove to be the case. What we found, actually, was that these five cloud content providers, they are responsible for one third of the queries for the NLI ccTLDs, the Netherlands and New Zealand, and there is a massive, massive variation in technology adoption for DNSSEC transport protocols and routing. And centralisation has pros and cons, we have shown the pros of when you deploy security feed, it benefits many users at once but if it breaks it can affect many users at once, for example in the case of Dyn attack, and this figure on the right side shows different types of clouds. These are actual real clouds, they all have names. And that's exactly what happens in the real world. We have these Cloud providers but they are not all the same.

And if you are interested in the paper, you can download it here and I think I'll be open for questions. Thanks very much.
(Virtual applause)

CHAIR: I hope I'm clear enough. I see one question from Patrik Tarpey. And please don't forget to queue up for the mic if you prefer to say a question personally. The question is: Given that Facebook does not operate public DNS resolvers, does the significance presence of ‑ over TCP suggest an operating NAT resolution VADNS over TLS? So the question is, again: So that Facebook doesn't operate its own resolvers than the DNS curious over TCP being present suggest that the resolution in the Facebook hat, I suppose, uses DNS over TLS? That the TCP use is due to the app use of TCP transport over DNS protocol?
GIOVANE MOURA: What we measure is traffic coming to authoritative servers that are in New Zealand. These servers don't support TLS. I don't know I can speak for them because I work for them, we don't support them, we support DNS, standard DNS and UDP. I cannot say why Facebook is sending more queries over that, that's our infrastructure. Our data does not allow us to talk about that. The only thing I could say is that the DNS buffer size, but yeah... I don't know what's behind the resolvers, to be honest.

CHAIR: Understood. I'm checking the audio queue and I don't see anybody with more questions. I give it a minute. Or, folks, don't hesitate, we have a couple of minutes for that. We are right on track with our session.

Thank you for that.

So I guess, okay, one last check before we close. I don't see anybody queue. So just a reminder, the button to ask the question by your own advice is on the top left. There is a trio of buttons. The mike and a little blue square means "I want the mike" and your name would appear in yellow and I would let you speak or my co‑chair would let you.

So, thanks. I guess it will be it for you today and I'll introduce our next speaker. It's Boris Mimeur. And it's a MANRS, or Mutually Agreed Norms for Routing Security ‑ Project Update. So, please go ahead. You have about two minutes for you and then maybe a few for Q&A. Thank you.
BORIS MIMEUR: Thank you very much. So, as I introduced my name is Boris Mimeur, I am going to give you a quick update on the activity at MANRS, so I'll start by just setting the scene for MANRS and, you know, just using some fairly worthy slide, but it's just the kind of vision mission statement. So, at the end of the day, MANRS is all about ensuring the Internet domain routing security, so how do you make sure that the Internet is secure when it comes to routing?
I can read just the mission statement, it's all about improving the security and reliability of the global Internet routing system based on collaboration among participants and shared responsibility for the Internet infrastructure.

So I think for all the participants on that call and the rest of the world, it applies to all of us.

In terms of the current programmes that exist in terms of my day‑to‑day activity, I look after a platform that's close to network operators. I also look after an IXP locally in Canada, and I have those organisations comply to the MANRS kind of criteria. There is also something for CDN, Content Delivery Networks, and provider. So that you can find on our website. Not the point of that presentation, because it's a fairly short one.

It's more what happened this year and the initiatives which were started by the Internet Society through their MANRS programmes for ambassadors and fellows. I will introduce myself a bit more because as a MANRS participant and member, organisation support, I applied to the Ambassador Programme, I think it was late May/early June, and I was successful in becoming an ambassador. Today we have three main categories; there is training, research and polls.

Training is all about making sure that the larger community globally has access to training that would help them understand how to adopt RPKI our way and basically the security of routing research, the one that I'm looking after is a group basically ensuring that we are aware of what is coming next and what consideration we should have, I'll cover that in a minute. And polls is more looking at the high level and how do you connect that back to maybe some of the country security strategies and making sure that it's relevant to that world otherwise. So maybe more business centric and making sure that decision‑makers are aware of that.
So, as I said, as an ambassador I have a group of people called fellows. I don't have a reference to them because it's a very short call, but I wanted to update you on some of the initiatives that we started.

We started the activity in the middle of summer. We ramped up ‑‑ we didn't know each other, we are spread across the world, so a very good representation of what is a global platform. So we started with something very simple. It's analysis. Sadly, there were a few major instances and there are continuous instances of routing incidents, so we looked at how to clarify them, report them, some of that is going to come out through some of the writing that we have done for that group.

We also, and it's something I would welcome, people, in terms of the feedback on the value of that, but as a professional working on networks and so on, when you reach this new, you know, security‑related technologies, generally go to the RFCs but then to go and do this kind of mapping yourself for the RFCs that exist and that may not be easy to do that yourself. So, we took the fairly challenging and ambitious issues plan of basically mapping, linking all of them and having a fairly nice visual presentation to accelerate the adoption and for people to the not have to go through this cumbersome exercise of which one should I look at? Which one is the latest? And all that stuff. So again, that's going to come soon. But at the end of the day, it's kind of summarising and highlighting what you would need to know in the RFC, and I have seen some of the progress we have done and it's going to be very useful. So we'll definitely get feedback from the community once it's released.
RPKI validators option review and assessment.
So that is something that was well‑discussed last week within our group following the announcement from the RIPE.

There is also, I have seen some of presentations before, referencing the presentation at the IMC, actually happening as we speak, so, there is ‑‑ there are quite a few transits. The first is from an RPKI validator perspective, which one should you take, there are some options. There are people aware of the ones you should go to, but it's not really something that's well known and rather than just saying hey you should go with that because everybody is using that, it's more like having a qualified answer and also making sure that there is a scaling aspect that's answered. So we're working actively on that.

It's very much a community effort, right, so it's not just MANRS saying, hey, you should do it this way. We have been working in collaboration with the global cyber alliance on a survey to be released very soon, we are finalising the last few questions and obviously my presentation here is to encourage you all to answer the survey and provide some feedback.

In terms of what we are thinking longer term, so, when you deploy RPKI, you definitely have some toolings required. Today in MANRS provide Observatory. As a member, you can access fairly detailed report on the Internet as it's reported through the BGP routing. But there are also some tools that could help with detection reporting. Most of them are Open Source, some of them are commercial. At the end of the day, it depends on what the role you would have in your community, in your organisation, in the group you work with. So that was something like the position that you have with regards to the potential detection and mitigation that you would have to face.

The other one is the RPKI ecosystem infrastructure scaling. So this one is the adoption today of our way is, I would say, growing, but not where it should be, and we are actively looking at anticipating the potential feedback that may not be shared naturally by people saying, well, it doesn't scale really well here. Instead of saying that, let's look at it and acknowledge that there may be a problem and acknowledge there is a solution. So people a bit reluctant to adopting something new for business reasons or many different reasons can actually find the answer right away.

It's a bit of a longer term initiative simply because we need time to do that.

Beyond ROA. So the old ROA is great, great start, it's kind of founding or, let's say, base to get something bigger, but at the end of the day, BGP path validation is what is kind of missing today to really talk about reliable security for the Internet domain routing, at least from my perspective. And there is not like a right wrong answer today, like there is nothing really fully implemented or that can be adopted. So the point that we have from our perspective is to look at the solution that's going to lower the bar by providing as many answers or directions, and this one is more like directions, and ensuring that there is, again from a community perspective, a good collection of MANRS members contributing to the pros and cons and potentially recommendation of solution for path validation. The way we are planning doing this is by interacting with some of the universities already working on that. Some of the people in the research group are from universities. The MANRS community has also connection to some people actively working on that, so it's just a matter of combining the right people and ensuring that they work in the same direction.

It was meant to be very quick and short and sweet. So, I'll keep it extremely short and invite you to ask any questions, if you have any.

CHAIR: Thank you very much. I do not see any questions. And I do not see anybody asking for their voice to be heard. So, perhaps just the last Plenary of the day and people are a bit tired. I'll pass it to my colleague and thank you, Maria Isobel, for handling this session.
MARIA CARRIEDO: No problem. And thank you, Boris. As there are no questions from the audience, maybe I will break the ice to see if anybody else wants to ask.

You mentioned that you are one of the ambassadors of the MANRS programme. Some of us have heard a lot about this before. Maybe there are some attendees that are new that had never heard of it. You have mentioned there are fellowships, so if anyone wants to collaborate in this project, what should they do and what are the possibilities?
BORIS MIMEUR: So the best way to collaborate is by becoming a member. The way you become a member is by presenting your organisation and ensuring that you are follow all the different steps, depending on the organisation that you have, there is an action plan that's available online, so, if like me, maybe a year, a bit more than a year ago, you never really heard about MANRS, you know, don't wait; look at the action plan, integrate that in your road map to make sure that it's somewhere clear to the business decision‑maker that there is not just value but it's going to actually contribute to the larger adoption of RPKI ROA. For people not familiar with that, the other contribution is by leveraging the training material that's on the MANRS website. It's extremely well presented, very accessible. And the moment you become a MANRS member, you have access to the Observatory. So if it's all like a bit of a cloud to you in terms of what ‑‑ why you need to worry about that, the MANRS website is the best place to start, and then as you become a member, you'll have access to resources which would help you decide how you can tackle that Internet domain routing security challenge and the, you know, the challenges. There is a lot of work to be done on that front, and the point of MANRS is to reduce the involvement from any organisation by providing ‑‑ or lowering the barrier of entry, and what we do in the research group is making sure that we augment the content that's going to be arrived to training and that's not just training, but the point I have mentioned about RFC is how do you make sure that your vendors are compliant? I mean, in the portion of the RFC that you have to work on and present to your vendors. So you don't have to worry about that.

There are some tiny little things here and there that we are trying to highlight, as you know, making sure that, number 1, the vendors are going to implement that properly, and number 2, that as a customer, or user, or an implementer of network, you actually have clear understanding of what you need to achieve.
MARIA CARRIEDO: Thank you very much. That was clear and I hope many people will join the project if they are not there yet.
BORIS MIMEUR: It's something that everybody should do like right now ask the question to yourself, why am I not a MANRS member? And as I said, I have been through that, I have done it for a network, and an IXP. The value for the IXP is there are some tools now, I can mention my IXP Manager, for example, that would highlight the MANRS membership as well, so, it needs to be become clear to other participants in an IXP that you have implemented that. It's another way of growing the base, because, at the end of the day, if you want to get RPKI ROA successful as a founding step towards, you know, the BGP path validation, that needs to be adopted by everybody on the Internet, so that's why I decided to become ambassador. I used to be in Europe, so I worked with the RIPE before, and that's the reason I wanted to be here today and wanted also to thank you for the opportunity for us to present and promote and propagate the adoption of RPKI ROA.
MARIA CARRIEDO: Thank you, Boris. I see there are no more questions.

Okay, no more questions. Thank you very much, Boris, thank you to the speakers, thanks to Dmitry also, my co‑chair.

A couple of reminders. Please rate the talks. Remember, you can vote for the RIPE PC members and now the next session is at five, where we have the opportunity to meet the ‑‑ virtually the RIPE Chair and vice‑chair, so Mirjam and Niall will be there, and at six we have the virtual meet the RIPE NCC Executive Board. Thank you everybody for your attention. You still have a few minutes left to go refresh yourself before the next session, and well it was a pleasure to be with you. Thank you very much.

(Coffee break)