A Peer-to-Peer Hardcore History for Tech Nerds (Mathias's Version :)

🌈
Interviewee:
Mathias Buus,Mathias Buus, Distributed Network Technology Researcher, Creator of the Hypercore Project, and CEO of Holepunch

Interviewer:
jiang, Contributor of Social Layer

前往中文版
Core Contributer: Zhao Yang 赵阳
Visuals & Editor: dca5 & Shiyu

Editors' Note: This article is published in two parts. This is the first part, focusing on the history of P2P and a discussion of its fundamental technology stack. In the second part of the interview with Mathias, we will hear more application-focused perspectives: what Mathias considers the "true superpower" of P2P technology for building media and data distribution applications; and whether Pear's technology stack and ecosystem have the need and potential for integration with blockchain or Bitcoin.


Jiang: Hi, I'm Jiang. Now I am having an interview with Mathias Buus, founder of Holepunch. Can you give an introduction about yourself and tell us about your work on Holepunch Pear kit?

Mathias: Yes, certainly. I'm Mathias. I run Holepunch as the CEO. It's a peer-to-peer company where we focus on making peer-to-peer technology that is uncompromisingly peer-to-peer, which means that we only do things where computers connect to each other and exchange data directly, and never rely on servers. We do extensive cryptography and related work. We don't do blockchain or anything like that, but we develop many primitives to make peer-to-peer technology work.

We have a platform called the Pear Runtime, which encapsulates much of this functionality for developers so they can run it and get access to these tools through various modules we've been writing. We make this run on both desktops and phones, through our JavaScript runtime called Bare, and it's very exciting. So we do extensive foundational infrastructure for peer-to-peer technology. We also make apps on top.

We have an app called Keet, which is our chat app, a peer-to-peer chat app that uses this technology to create a chat app similar to Telegram, but operating entirely without servers. Peers connect directly to each other. It's very private, very scalable, very cost effective, and we're very excited about Keet and everything it brings to communication. We also work on other applications in the peer-to-peer space, all utilizing peer-to-peer technology.


Jiang: That's cool. Can you talk about the history of Hypercore and that project? How did you get involved and how does it evolve into such a technology stack?

Mathias: Yes, certainly. I've been interested in peer-to-peer technology throughout my adult life. I used to work extensively with BitTorrent. When I was younger, about 15 years ago, I started writing JavaScript, and I wrote a lot of JavaScript code for BitTorrent. I had a great time working with BitTorrent. BitTorrent is a really impressive protocol, very simple in a good way, very robust, very scalable, and very resilient. It provides an excellent introduction to peer-to-peer technology because it's understandable, but you also see the limitations of BitTorrent to some degree.

BitTorrent is really good for moving static files around, but it's not effective for data that changes. It doesn't have good mechanisms for modifying your data. It also has many limitations for connecting to peers, because it's quite old and built very much around file sharing. It's actually very difficult to append general data in BitTorrent. If you want to share a database, you have to treat it like you're sharing files.

I had extensive experience there, and I became very excited about the use cases that could be unlocked by addressing these issues. I started working on this project called Hypercore, which is somewhat like BitTorrent, it's heavily inspired by BitTorrent, but attempts to add features like mutability, so you can add new blocks to it, make the blocks different sizes rather than requiring them to be the same. You can share all kinds of things: files, databases and similar data. And you make it very modular, so it can run in all kinds of environments with various degrees of compute power. It doesn't have to be focused around file-sharing networks. It can be embedded in many other applications.

So I started working on that. I was very excited about it, and I worked on that project. As part of that work, we were also trying to use this technology for nonprofit file sharing in science, where there are many datasets that nobody can really access, and they're very expensive to maintain and host.

Jiang: And the non-profit is called Code for Science & Society.

Mathias: Yes, Code for Science & Society, CSS. I was very fortunate to receive funding from them to work on this type of project, primarily for science use cases. I did that for a couple of years and built a small team. As part of that journey, I also met Andrew, who's one of the co-founders of Holepunch, our peer-to-peer company where we do this work full-time.

I worked on that for a couple of years, and then it eventually ended, as many nonprofit projects unfortunately tend to do. I continued doing Hypercore work on my own. We did a lot of work on connectivity and tried to solve those challenges. Then, four years ago, we started Holepunch, which is a company built around these technologies. We're trying to scale it and make this a technology that can penetrate all markets and become a very serious technology that can solve real use cases. We've been excited about this for so many years, and we want to bring it everywhere. That's why we created Holepunch.

Jiang: The magic of hole punching is that it can connect any two nodes on the Internet when possible, which seems simple but is quite difficult, considering the current structure of the Internet. And please tell me about the core of Hypercore, incremental Merkle Tree and append-only log.

Mathias: Yes. Hypercore, as I mentioned, is very similar to BitTorrent. I don't know if people are familiar with the technical aspects of BitTorrent, but it's essentially a Merkle Tree that spans over a series of fixed-size blocks. Hypercore is very similar to that. It's a Merkle Tree, but it's an expandable Merkle Tree, where you can add new blocks. If you can visualize that, it's like you add the blocks and then the tree grows, but the old tree remains the same. You're simply adding blocks to it. That makes it very easy to do incremental synchronization. And with a Merkle Tree, you can do other things like sparse syncing, where you only retrieve a few blocks, not the entire dataset. It's very powerful, as long as you can trace a path to the root of the Merkle Tree. And it's very secure.

Now, if you have a block in a Merkle Tree that expands, you cannot simply trust the root of the Merkle Tree anymore, because it's not static. In BitTorrent, that identifier is actually the root of the Merkle Tree. So in Hypercore, we have methods of signing those updates with various strategies. For example, using a key pair, you can sign updates. We did extensive work on that. It worked very well, and that's the foundational component.

Other small but important changes include: the blocks don't have to be the same size. In BitTorrent, they have to be the same size for many reasons, and we support variable-size blocks. This doesn't really matter for file sharing, but it matters significantly for general-purpose data sharing where you don't know what you're storing. For something like KV (key-value) storage, for example, we have a module called Hyperbee, and another module called HyperDB, which creates a database over these data structures, and they do that by appending variable-size blocks. All the blocks are somewhat similar, but because it's database data that's structured and not exactly the same size, you need some variability. And with this type of sparse syncing, where you can get one block but not other blocks, you can make database queries relatively fast because you can retrieve only the blocks you need to examine for a query.

Jiang: Yeah, I think sparsity is one of the strengths of Hypercore.

Mathias: It's one of the things we wanted to build from the beginning. BitTorrent actually has sparseness. It's quite good at sparseness. That's one of the things I thought was really impressive, where you just get a portion of the data, but you can still fully verify it. Many blockchains don't operate sparsely.

Jiang: For example, when you're streaming video, you only read the blocks that you're currently watching.

Mathias: Exactly, and you can use it as a strategy where the user, while something is syncing, can explore the dataset, and then you can still sync the entire thing if you want to. It's always better to fully sync, because then you have more resilient copies. But you can choose your approach where you sync just a bit, enough to satisfy some condition. So it's quite useful.


Jiang: If I remember correctly, there was a white paper on the SLEEP protocol.

Mathias: Yes. We did write a white paper at some point. We were having a lot of fun with acronyms back then, so we tried to create acronym names. It was kind of like a pun on REST, you know, the REST system and HTTP, so we created one called SLEEP. It was very fun, though it's very outdated by now. We wrote a couple of papers. We're still iterating on this technology very intensively. We haven't written many papers recently, but we want to. We actually just started a project to create comprehensive protocol documentation, so we have more resources. I really like exploring these concepts with infographics and visual materials where you can see how the blocks combine. We're doing a lot of that work right now.

Jiang: And also for the incremental Merkle Tree, I remember someone in ScuttleButt proposed the Bamboo algorithm. It's an alternative scheme for incremental structures.

Mathias: I mean, there's not going to be one solution that's the end-all solution. Different approaches are always beneficial. I think our solution is really good.

Honestly, if all you wanted to do was sync static files, for example, if that's all you wanted to do, you should use BitTorrent, because that has better trade-offs for that than we do. But we have other trade-offs where we excel. In Hypercore, if you don't just want to do that, but also want to sync incremental data, our file syncing is actually a little bit slower than BitTorrent, but then it's much better at other things. So you always have to consider the trade-offs. There are different projects making different trade-offs.

But I think it's really great with peer-to-peer, because these protocols have to be open. If they're not open, you can't verify them, and if you can't verify them, it's not peer-to-peer at the protocol level. So the more the merrier in general, because that's how we get new ideas and move the technology forward.

Jiang: Yes, and can you tell me about Pear and what kind of platform you'd like to build, and the underlying technology?

Mathias: Sure. So Pear - the peer platform. First of all, it's called Pear because I worked with many Italians, and when they want to say "peer /pɪər/," they say "pear /peər/." So we thought, that's really amusing. Let's call it the Pear Runtime, because then we can have a cool logo, like a pear.

But the whole idea is this: we've been doing this work for about 15 years now, and we've created numerous modules around peer-to-peer technology. We have modules for file sharing, for database sharing, for collaborative databases, for distributed schemas, all kinds of things, connectivity and similar functionality. We've been working extensively over all these years. I think we probably have around 500 modules for various network functions.

What we wanted to do was create a foundation where you can run applications and get the baseline for running these apps everywhere. That's what we wanted to achieve with the Pear Runtime, where we said, let's create a layer where you can load an app from the network itself. So you use the peer-to-peer technology itself to load the app, so you can run something that's operating on the same principles, and then the app itself uses the same technology to perform some application function.

So with Keet, for example, that means Keet is an app that's distributed with Hypercore, and then the app itself uses Hypercore to create a chat system, which is somewhat mind-bending, but it creates really robust applications. You can scale very aggressively. So that's very exciting.

We've done extensive work on that, and recently, we've done significant work on getting this technology to work on mobile, because mobile is obviously incredibly important, since mobile phones are everywhere, but also very challenging, because mobile phones have different constraints than laptops.

Jiang: Yes, like battery life, connectivity, permissions, and the sandbox.

Mathias: The sandbox is a different environment to work with. It's very UI-heavy. Also, your desktop has ways to run things in the background—phones don't at all. We also had many struggles getting JavaScript to run on phones for what we needed, because all our code is written in JavaScript with a lot of C backing it. Getting that to work is really challenging, since all our networking code is in C.

So we wrote this thing called Bare, a JavaScript runtime. So Pear, Bare. We thought that was really amusing. It's a very minimal JavaScript runtime, so we can run this technology everywhere. We actually run 100% the same code for the peer-to-peer functionality on the phone as we do on desktops. For Keet, there's no difference. It's the same code, because it's code written with our modules that runs on the phone and runs on the desktop. So it's a very effective way to write applications that you write once and run everywhere.

Obviously, the UI is different. Phone UI is very different from desktop UI in general, because the screen is different, but the core functionality—the thing that makes the peer-to-peer system run—is the same, and I think that's really important.

And I think you also mentioned hole punching. Our company is called Holepunch because we're extremely focused on connectivity and making people connect to each other, which is a really difficult problem. One of the things we tried to solve early on was how to penetrate networks so devices can connect and send data. We became so excited about that technology that we named our company after it, which I think is appropriate.

We have a lot of sophisticated functionality in our various networking libraries. It tends to just work. I think that's impressive: it can penetrate most networks. Obviously there are some networks it cannot penetrate, and then you can use other strategies, like having other peers help you route your traffic. But you definitely need a very robust connectivity layer to get anything to work in peer-to-peer.

Jiang: Yes. And for Pear, it also allows bootstrapping the connection between two Pear instances.

Mathias: Yes. So that's one of the things we wanted to build in immediately. I don't know the technical term, but it's like this: if I want to connect to you, and we analyze the network, the network is either going to be very fast or it's going to be slow to establish that connection. If it's a difficult network, we can probably penetrate it in about 10 seconds or so, but 10 seconds is a long time for a user to wait. And if it's an impossible network, for some reason, like an extreme lockdown or something, we can't do it. So those are the three scenarios.

Luckily, most networks are actually relatively easy. Then some networks - a much smaller percentage - are very difficult. And then an even smaller percentage is just impossible. But you need something more reliable. You don't want to make an app that's like, "Hey, it works for 99% of people", that 1% failure rate means it's not a good app. So you want to have solutions where it always works.

And one of the ways we make it always work is we have a protocol where if I want to connect to you, I can ask another peer to relay that traffic to you, because you and I are both identified by node IDs, like public keys. It's very secure, because I can say, "Could you forward my traffic to this other peer," and I can verify that they can read it and that the data is going the right way, because everything is addressed by cryptography and key pairs.

We spent a lot of time creating a protocol where, when that happens, at some point, if you and I find a better route to each other, like a direct route, for example, we can then move the same connection over without re-establishing the connection, which means it's seamless from the app developer's point of view. You can have a connection that starts slow, has low bandwidth, and then suddenly becomes really fast as we connect directly to each other.

Jiang: But why would those third peers be willing to relay the traffic?

Mathias: Yes, that's a good question. So it depends on the application you're building, right? Because the third peer could also just be a peer you're running. Like you could be subsidizing your network, right? It could also be an app requirement, if you build it into your app where you say, well, that's just the cost of running the app, for example for 2% of traffic, we want other people to help, and that's just the protocol, and sure somebody could change the code, but that's how we build it.

Jiang: If there is a video group chat for five people, how many connections is everyone making with each other?

Mathias: So right now in Keet, it's very straightforward. Right now we do it very simply. It's like a fully connected topology—everybody connects to everybody. That's why calls don't scale well, but they work okay. If you go beyond about 10 people, you can do the math: it's very intensive. But we're moving that into a protocol now that's based on Hypercore, because right now it's based on direct connections for the calls, and then you don't need the direct connection. Generally you need about the square root of connections, which is obviously much better. So you can choose your strategies.

Video calls in general are tricky because they're real-time. If you and I talk to each other and there's too much latency, here even a second is too much latency, it's a long time. If I say something and you only hear it a second later, you're going to think that's strange. So you need it to arrive pretty fast, which means you can't take too many jumps.

Jiang: Yes. So Pear is an application development kit and platform for these decentralized, peer-to-peer applications. And we say everything can be bootstrapped or served by the peers, directly, or indirectly. So this is really challenging the client-server architecture. And it looks like a serverless application architecture that really doesn't have any server.

Mathias: Actually serverless? Yes, I always find that term amusing. It's really funny, because sometimes some of the product people we have at Holepunch suggest maybe we should rebrand it to be serverless. As a technical person, I'm like, serverless means something else. Because serverless is strange: serverless just means some other servers. That's the industry term, and I wish it didn't, but normal serverless systems like Amazon Lambda basically just mean short-lived servers, I think, to some degree stateless servers. But peer-to-peer is actually literally serverless, you don't have to have servers, like, You don't have to have peers.

I think there's also an interesting question, because there's a misunderstanding I think about peer-to-peer networks that I hear a lot, where people tend to think that peer-to-peer networks must mean that it can only be your laptops running the software. There's nothing wrong with running peer-to-peer software in a data center if you wanted to. It just means that it's a longer-lived peer. You can do whatever you want in peer-to-peer, and I think that's one of the strengths of it.

I always say, even today, if I was building a centralized network in a data center, I would still make it peer-to-peer, just in a data center, because you'd get that resilience. The technology is generally fast enough for normal operations, so why not get that extra resilience? So if your server is compromised or whatever, that doesn't take all the data down, you just need nodes to be alive at some point. And that's a problem you can solve by just running instances, or even running laptops.

I have a computer at home. I run a lot of peer-to-peer services on it and it just sits in my house and I host numerous peer-to-peer applications, because I already pay for the bandwidth, I have a subscription for the internet, and I pay every month, so it doesn't cost me anything additional. The power is pretty cheap, and I can do it with something like a Mac Mini. Just with that, I can have significant capacity. And if I did that in a data center, it would cost me so much money compared to just buying it, because data centers are very expensive. So yes, I'm very interested in that approach. Actually serverless.


To be continued...


☀️ 专栏相关文章

如何建造一个“码托邦”:Zoe&Cikey与ETH黄山
“无论交朋友还是推进项目,都尽量留下可以继续的连接与产物,友情可以延续,项目也可以延续。” 🌈受访人: Zoe,ETH 黄山黑客松组织者,造化自游科技创始人,10年+产品经理,分布式社区推动者和贡献者 Cikey,ETH 黄山黑客松组织者,KeyMap 创始人,alcove 核心开发者 访谈人: jiang, Contributor of Social Layer Go to the English Version Core Contributer: Shaojun Visuals & Editor: dca5 & Shiyu ETH黄山:让火花持续燃烧 Jiang:先从起点开始吧,ETH 黄山是如何缘起的?你们各自在其中扮演了什么角色? Cikey:我们不是临时起意。去年住在余村时我办过几场 Web3 活动,反馈一直不错。
加密飞行 Vol.14 | 我要三十五岁退休,去海上开船:Dominic Tarr
“放下电脑,去太平洋。” Crypto Flight Vol.14 | Interview with Dominic Tarr: I’m going to retire at the age of thirty and go to sea to captain a ship Abstract: Put down the computer, Go to the Pacific Ocean. 「加密飞行」(Crypto Flight) 是 Uncommons 的人物访谈专栏,围绕活跃在以太坊及加密世界一线的先锋个体,记录加密现实,生产多元视角,将交谈和日常语言作为方法,化约发生在彼处的遥远真实。取自 Antoine
一个系统工程师的十年:黄东旭访谈全文
🌈受访人:黄东旭,PingCAP 联合创始人兼 CTO,资深基础软件工程师,架构师,曾就职于微软亚洲研究院,网易有道及豌豆荚,擅长分布式系统以及数据库开发,在分布式存储领域有丰富的经验和独到的见解。狂热的开源爱好者以及开源软件作者,代表作品是分布式 Redis 缓存方案 Codis,以及分布式关系型数据库 TiDB。 访谈人:jiang, Contributor of Social Layer English Version Transcript & Translation: Carrie Visuals & Editor: dca5 & Shiyu jiang: 我们这个活动是 Papers We Love,是几年前我在美国的一些活动上遇到一个分布式的活动组织,他们会在各个地方组织一些 meetup 来分享论文,然后大家一起做技术研讨。后来我跟他们联系,在中国做 Papers We Love,当时我在北京。我们这个活动既会讨论一些计算机方面的论文,

Who we are 👇

Uncommons
区块链世界内一隅公共空间,一群公共物品建设者,在此碰撞加密人文思想。其前身为 GreenPill 中文社区。
Twitter: x.com/Un__commons
Newsletter: blog.uncommons.cc/
Join us: t.me/theuncommons