Date
6 June, 2017
Speakers
Transcript by
Michael Folkson
Slides: https://docs.google.com/presentation/d/1vjJUWOZIaPshjx1svfE6rFZ2lC07n47ifOdzpKWqpb4/
https://twitter.com/kanzure/status/1054031024085258240
My name is Laolu or Roasbeef on the internet. I’m going to talk about light clients in general and then also what we’re working on in terms of light clients and then also how that factors into the lightning network itself. The talk is called Neutrino because that is the name of our new light client that has been released. There has been a BIP proposal and I’ll go through all of that in the talk itself.
At a high level, this is what I call the usefulness meter for Bitcoin clients. So all the way on the left you have totally useless, all the way on the right, it is useful, it can actually serve the network, it is actually contributing to other peers. On the left we have eavesdroppers, this is like Chainalysis, those people who connect to you, have like 50 inbound connections, use mempool commands, try to intercept all your transactions. Then we have pseudo-nodes. These are nodes that basically pretend to be nodes, they actually advertise on the network but they basically just proxy all other requests to other nodes that are connected to you. It looks like they are a node but they’re not really a node, they are a pseudo-node. Then there are light clients which are more like nodes. They actually do some verification and provide useful services. Then there is SPV which are light clients but they’re more resilient meaning they’ll be able to actually verify incorrect behavior by miners or other nodes. Then we have pruned nodes. They’re like full nodes but they don’t have the entire blockchain. They’re not as useful as full nodes because they can’t serve… other peers but they can help relay transactions and provide connection slots. Then we have full nodes which are the most useful. This is the scale there is currently. I’ll be focusing on light clients and the border between light clients and SPV nodes.
So what is a light client? This is my definition. Light clients by definition don’t really verify the entire chain. If they did, they wouldn’t be light because the chain is heavy - about 150GB+ with all the indexes and such. So one thing they can verify is block headers. They can verify connectivity of the block headers and also that each header follows rules with respect to difficulty adjustments and actually has a valid… and can check some other factors like the timestamp and other consensus rules around that. Light clients themselves, they employ chain weightings. They get all the headers, they count how much work went into each particular block and then with Bitcoin, we pick the heaviest chain, meaning the chain with the most difficulty. This is what most users will use to interact with Bitcoin because full nodes take hours (it is pretty fast now, there have been a bunch of optimizations recently) but it takes a very long time for full nodes to sync to the network and you can’t really run one on your mobile phone. This is likely what you’ll be running on your phone because it is much more lightweight, jt can do with not much RAM and they have a different security model than regular full nodes meaning they can’t verify everything so they’re a little more trusting. They trust that the miners haven’t inflated all the coins because they can’t verify the UTXO set or the input value amounts. They can’t really do script execution because they don’t have the UTXO set and they can’t verify that everything has been spent accordingly. Therefore they basically just have this block, they know it has been timestamped in the chain but they’re not really concerned about the validity of the block in the chain itself. There are various models for querying the chain and that’s what I’m going to be talking about. We can validate the entire chain to a degree that the chain has sufficient work but how are we going to interact with the chain to build useful network services on top of Bitcoin?
So now let’s go into the current state of light clients. So as you probably know and probably have on your phone or your full node has some connections to them right now is BIP 37. BIP 37 is the most widely implemented light client model. I think the original BIP was in 2012 so that’s pretty ancient in terms of cryptocurrency time because a bunch of things happen in one year these days. It is implemented in bitcoinj, the breadwallet libraries and also bcoin. I think they are the only implementations of it. I think bitcoinj is not really as maintained anymore or maybe it is in maintenance mode whilst breadwallet are on iOS and Android now and there’s also bcoin which I can run in my browser if I want to which is pretty cool. This is most of the decentralized mobile wallets. Some of the wallets maybe connect to a server or maybe they connect for fees or to do rescans or things like that. BIP37 is basically bloom filtering but performed on the server side. The original thing was that we have this bloom filtering thing with a false positive ratio so light clients are able to regulate that accordingly and to not tell all the full nodes what all their addresses are. We’ll see in practice that doesn’t work very well but that was the initial intention. So bloom filters have this execution model, it is kind of client-server essentially. What the client does is it crafts a bloom filter that contains all these false addresses… wants to be notified onchain and the client does a bloom filter message and sends that message to the full node. Now per connection, the full node is maintaining this filter in memory for the particular light client. What the light client does now whenever it wants to get relevant data from a block is it requests a filtered block. A filtered block is like a regular block but before sending any of the contents to a light client, the full node queries the bloom filter on the server side if I’m on the client and then will send anything that matches within the bloom filter itself. Because bloom filters are probabilistic data structures you may have false positives meaning something that matches the filter but something the client didn’t request initially. This can be problematic because every time a spent output matches then the full node has to modify the filter to add that new entry. This means that the filter gets bigger and bigger and the false positive rate can start to go up. Then clients need to regulate this and start to verify that I’m getting a bunch of data that isn’t mine so I need to reload my filter. Clients are supposed to dynamically regulate this false positive rate. If it is too low, you basically send the entire block and it’s all theater. If it is too high, then you have pretty large filters and that’s consuming the full node’s bandwidth and also your bandwidth and everyone loading these filters. If you’re doing an initial block download, you basically would need to be doing this method several times or I guess every few blocks or so. That’s the current state essentially. This is what’s been around for a while and we’re going to talk about improving it.
So what are some of the shortcomings of BIP 37? It has been working for a while and I think it was good initially to develop the ecosystem for people running Bitcoin on their phones. That’s a really big aspect. Every time you introduce someone to Bitcoin, you’ll be like “hey download this application”. They send you the Bitcoin and they’re like “That’s really cool, magic internet money”. It was good that it was around but it has some drawbacks. Managing this false positive rate in practice is pretty difficult and research has shown that no one has done it well. There are some pretty easy attacks that full nodes can run. They can collect multiple filters from light clients and intersect these filters and start to use intersection attacks against them. Another thing is that full nodes can lie by admission meaning that even though something matched the filter, they basically don’t tell the light client. If you’re only connected to one node, which you shouldn’t be, it is hard to detect this. You really can’t detect it. This may be benign, like my money didn’t come in two days. But if your application is dependent on quickly reacting to onchain events, maybe there’s some application where you have t days to do some action. If you don’t see that action onchain you could have some bad things happen. Lastly, they are pretty resource intensive. All that work that I had on this other slide is all active work and that’s done per client. This can be detrimental because if a full node is seeing tens of light clients it is doing active work for every single light client. There are also some vectors where the light client can maliciously create a filter that matches everything which means that the full node needs to read every single block, everything is matching, they’re updating the filter in real time. This causes them to read every single block in the entire chain and then send that to the light client, do all the bloom filtering. That makes the full node halt essentially. This has come up in the past before so it would be good to get rid of this. This hasn’t received any serious update since 2012 which is super ancient in Bitcoin.
That brings us to Bloom Filter Digests. This was initially brainstormed on #bitcoin-wizards probably in like 2014. Before this I searched on the logs that I have and I found the link to it. Someone basically said “Hey. What if the full node sent the client the filter?”. Everyone was like “That’s a pretty good idea. Why didn’t we think of this before when we were talking about the different ways to do it, the trade-offs and so on.” I think it was 2015, 2016 someone posted to the mailing list (protonmail) and they had a more fleshed out proposal. At that point we were like “pretty good idea”. The general idea is reverse dependency. With BIP 37 there was server side filtering. Now we were going to do client side filtering. Basically the client requests a filter for a particular block and the full node sends that filter. At this point, the client has the filter locally and can do whatever it wants with it. It can store it for later. Maybe there is an application… onchain data or if it is a wallet, it’s going to query against its addresses or its possible outputs and then if it gets a match then this might be relevant. It might be in the block. The cool part about this is that now the client can fetch the block anywhere. It uses the full nodes to be able to serve the compact index but once it has that index locally it can use that for all time. Maybe to do initial rescan it is using this but then it can also use this for future applications or anything else. There are a few improvements on this. One is that the full node basically does the work once. It indexes the chain and has that locally and then whatever it needs to serve the light client, it serves up this filter and that’s it. This is a big win because it is now all passive and all the work the full node did can now be amortized over a range of clients. The other cool part is that with BIP 37, before it would send the light client individual transactions. Those transactions were the light client’s money or it sending money and that allowed very easy transaction intersection attacks. Now this has gone because we fetched the entire block. The light client may not necessarily fetch the block from the same node that it got the filter from. It can be fetched from anywhere. The other cool part about this is that the light client can now verify the authenticity of the data they are given because given a block you can deterministically reconstruct… You can see if a node is trying to lie to you and if you are connected to many nodes you can verify the integrity of what they give you and ban the nodes that are giving you incorrect data.
The initial innovation was that we wanted the light client mode to be a little more compatible with lightning itself. We’re one of those smart contract applications where we need to hack on onchain events within a certain time period. The old one wasn’t very usable in our context so we decided to take BFT and modify it slightly. Rather than using bloom filters we used something called Golomb-Rice coded sets and I’m going to go into this a little bit later. The reason we’re using them is that they’re more compact than regular bloom filters. With bloom filters, you can update them in real time. With these you can’t but we don’t really care because we basically can track the ones for a block and can serve them up to clients afterwards. The other thing is that the original proposal was meant to be committed eventually but with our proposal there’s no commitment… and this is good because we can test out the idea first, see if it works. We don’t want to jump straight to consensus commitment because then that’s in Bitcoin for all time (unless there’s some crazy hard fork in the future). We also introduced something called header-chain basically because we’re not actually having a commitment and this header chain is meant to verify the… We have a BIP proposal if you guys want to comment on it on the mailing list. It is by me and one of my other engineers, Alex who is not in SF unfortunately. We also have the reference implementation which is called Neutrino and we have a forked version of btcd where I’ve implemented the necessary P2P extensions and also indexing. There’s the set itself and that’s integrated into lnd and I’ll go into that later.
Now we can go into the details a bit. I won’t read everything in the slide but it’ll be online a bit later and there’s a BIP that goes into in more detail and there’s code as well if you want to read into that. At a high level, what we’re doing with Golomb-Rice coded sets is we’re basically making a probabilistic data structure using data compression code. Initially we have this set and it is a… set meaning that it has a false positive rate, meaning that even though it says something is in the set, it may not be in the set. We use this encoding because it lets us compress the set itself and by compressing the set we have space savings and it is like 40% more space… than regular bloom filters. So Golomb-Rice coded set, this lossless compression code, we use it in a different format. It is typically used in audio and video compression where you have some picture, it has redundancy, you want to compress it down. With video similarly you compress that down. How it works, its pretty simple. You have some integer n and you have your divisor and you’re trying to encode multiples of this divisor. So if our divisor is 2 and then we have 16, there’s 8 2’s in that. In order to encode an integer n, you have its components q and r: q is the quotient, r is the remainder. To get the quotient, you divide by m, to get the remainder you do mod m. You have these two fields. They are encoded a bit differently. q was encoded using unary and r was encoded using binary and will always be coded in our specific subset using k bits where k (special part that lets us do the Rice portion) is always some power of 2. The way we do this is that we have some integer. It is kind of like Redland encoding. Redland encoding is saying that rather than telling you there’s 5555, I say there are five… We do the same thing using this format and it turns out that because of the distribution of our data, this is pretty space efficient. That’s how we do the codes themselves.
Then what we do is we take this false positive set and we use the the Golomb codes to compress them down. First, I’ll talk about the initial construction. First, we start with a certain false positive rate (fp). You can say that is 1/(2^20) or 1/1,000,000 or 1/1000 and then you have a parameter P which is 1/fp. The other parameter is N which is the number of items in the set and then you have F = N * P. So N * P is the restricted set that we’ll be hashing our items into and this is where the false positive rate comes in. There’s basically P buckets of size N and then one of those can go into a particular bucket and then the probability that most of them collide is the false positive rate. There’s basically two steps to constructing the set. The first one is construction. The second one is compression. We first take all of our items and then we use siphash which is this pseudorandom function. It is used in a bunch of places. It is a module in… , it is used in BIP 152. We use it in Python for hash tables. It is basically used all over the place, it is pretty good. So what we do is we take the items and then we hash using siphash and then we mod that into our field itself and we do that for every single item. Then afterwards we have this set of hash values. We take those hash values and we sort them. We sort them in ascending order. That’s basically the first part of the set. That can be done pretty quickly and siphash is super simple and can be optimized pretty well.
After that we have a set of items. We use Golom-Rice to compress the set. In the prior slide, we had them in ascending order. We’re going to take the difference between every element in the set. That’s going to be a smaller number due to the smaller set that we’re hashing into. The difference of all of these items because they’re uniformly distributed, will be similar to a symmetric distribution meaning that Golomb-Rice is a very good use case for this. We’re going to encode the deltas between every single element. It is kind of like a… where we take two elements, we encode the delta of that, there’s a remainder, we take that element and the next one and encode the delta between that and itself. Encoding all of these deltas, at the very end we have this very, very compressed set of the actual elements, there’s some pseudo-code there which is not super important… You’re running over all the items, you first calculate the remainder using that prior element then calculate the quotient using that prior element then write the quotient in unary, write the remainder in binary and then you have the… which is that last value. Then you put out that set itself. This is pretty small. We have some stats that have been published along with the BIP proposal. For really big blocks, not really big… I think the average in the last year or so was like 20 kilobytes per filter then historically it is like 61 bytes or so because most big blocks were empty in the past. Now we have this compressed set and it is… and we get this false positive aspect from it and now we can go into querying. Unlike bloom filters, these can’t be queried natively in the regular format. They also can’t be modified. You have the set compressed and at that point you can serve it to anyone else.
But in order to query the set you need to decompress it. We basically do the reverse of what we did before. So we’re walking through the set, we get the current value, we add it to the accumulator and now we have the item. We can then query against this item and see if it matches. If not, we go to the next step. We get the next accumulator, we add it and we keep on going. This basically lets you query the set incrementally without decompressing the entire thing. You could decompress the entire thing but you might not necessarily want to do that - more space efficient to let you do that in memory. You can either query a regular item, you have an item, basically walking down the set or you can have two sets of items that you want to see if its in the Golomb set. What you do is you hash the items using that F value from before and then siphash. You can basically walk down each item and using like a fast and slow pointer, see if anything actually matches. If any element matches then you can decompress the entire thing if you actually need to. So now, filter construction. We basically have this really cool compact set. It is probabilistic, it is pretty small. Now what do we put into the set?
In our proposal, we had two filter types. The first one is for regular wallets, basically what a wallet would need to rescan or any other imports. The second one is more extended, maybe we’ll take out some of them but we can always remove things and that’s ok. The first set includes all of the inputs in a transaction, basically which outputs are being spent. We encode the full outpoint meaning transaction ID and output, and that also includes all the push data In the output script themselves. We ended up doing push data because we thought it was a little more general. We could’ve done the entire script assuming we had certain script templates but maybe better multisig comes back and its really popular in future. We can use those to index…. The extended sets are for people that are doing a more passive scanning of the chain or possibly trying to… events in an application that they need to act upon. The extended filter contains transaction IDs so you can use this to say “is this txid in this block?” or “maybe it is in the block?” or “has transaction ID be confirmed in the block?”. We also encode the witness items which is basically the SegWit version of signature scripts and we also encode the signature scripts themselves. That can be useful if you’re doing some smart contract item or you have something where someone reveals a preimage and you want to know that preimage was revealed in a particular block. You can use this to check against the filters and see if the preimage has been included or not. We use both these filters in our implementation, Neutrino and also in lightning as I allude to later.
There’s another new component which is basically the compact filter header. Because there’s no consensus commitment, the client doesn’t really know if these are valid or not. If there was a commitment, the client would ask for the filter, ask for the coinbase and then ask for a Merkle root path with the coinbase and then in the OP_RETURN, they could verify it hashes to the value in the OP_RETURN and that actually hashes to the Merkle root. But we don’t have a commitment so we can’t do that. So what we did instead was provide a way for the client to verify the identity and then reject the invalid filters. So what it is, you have every single filter for every single block. You basically create another blockchain, you create another hashchain of every single one of those elements. What this allows us to do is that when the client is initially syncing, it gets all the block headers and then also gets all the filter headers themselves. So then once it has all the knowledge committed to some prior history in the past, whenever it fetches a filter, it can then reconstruct, using the prior filter header and the current filter that it fetched, it can then reconstruct that filter hash and if it matches, the full node was giving it true data. The other thing is the space savings from when it is taking from tip because otherwise we have to vet the filter from every single peer and that’s basically the bandwidth of one filter times the number of peers you’re connected to. Instead, now you fetch the new filter header from all the peers, see if those match up, if those matches are good you fetch the header. If they don’t match up, then it fetches the block itself and can then reconstruct the block to see which filter was valid. From there, they can ban any peers that were giving it fake data essentially. The way it works is that it is kind of like a recurrence where there is an edge case where there can be an empty block due to the fact we don’t index the inputs of the… transaction and basically maybe someone just threw away coins in the past. We ran into some testnet where someone wasn’t taking the coinbase output so we had to add this special case for empty blocks. In the empty case of an empty filter then it is basically just a zero hash and we can continue going.
Now we’re getting to some of the peer extensions. Because this is a new servicing mode in Bitcoin, the light client needs to be able to preferentially find different peers that can actually service it. So the first thing we do is we add a service bit and this is cool because when the client connects or it gets an addr message it can see if the peer actually supports it or not. Also the DNS… has a subdomain and using that subdomain you can query the DNS seeds for peers that actually have these service bits. This will save you some time when connecting to peers. You can say “DNS seed, only give me queries that actually have this new method. We also bump the protocol version, I think it is 17. There’s sendheaders and.. filter before that but that’s the new protocol version. In addition to that, we also add a few new P2P message types. The first type is basically because there are multiple filter type and in future it is feasible that they could be extended, we could have new things in the filters or new encoding types if we figure out ones that are more efficient for particular use cases or in general. This allows the client to query the full node to see which filters it supports. Usually it does getcftype and then gets the type and says “ok it supports these end filters and that’s the end that I want”. Then there are two other peer messages: getcfheaders and cfheaders. These work basically identical to the way getheaders works. You have a block locator and you can… You have cfheaders that gives you the compact headers themselves. Clients will use these during initial block download to fetch the filters that are relevant to them. Then finally we have getcfilter and cfilter and these basically let you get a filter by a particular block hash. The cool thing is that this can be done… The client can sync the entire chain with the header chain and then from the tip, it can fetch filters for different blocks or even start at the end and then go backwards. This will let you get up to speed very quickly. This is super useful because if you run Bitcoin Core nodes or any other node, rescanning takes forever. If you’re going to rescan it is going to take hours, it’s going to have to read every single block. What you can do now is if you have a RPC for these filters, the client or the person that wants to rescan from the full node can fetch these filters and then see if maybe it is in the block and then do a manual rescan themselves. This is faster because the full node no longer needs to read every single block from disk and match all the relevant addresses. It can basically just serve these filters over the RPC which will be very efficient. Another cool thing this enables is that decentralized light clients can now do very succinct rescans and key imports. Now they have all the filters on disk, maybe once they have a key they need to import, they can query to see which blocks they may need to fetch. Another cool thing about this is that currently if you’re importing HTC in the wallet it typically has to hit a centralized server to look at the extended public key and see how far ahead every address that has been generated. That’s a dependency we’d like to eliminate. This lets you eliminate the dependency because if you have all the filters on disk you’re going to create them locally. This is also cool because it lets you do high level Bitcoin applications because it is a more natural application model. I query the set, maybe I get the block rather than I have a filter, I send the filter, I need regular false positive, I get the Merkle block, I get the transaction, that’s long and complicated.
So now to Neutrino…. lets you make a wallet on top of it, lets you make a lightning node or lets you make any other thing on top of it. We wanted to put this out because we felt there was a lack of well maintained libraries for light clients in the space and it offers more utility than BIP 37 itself. Our primary motivating factor was to make a lightweight lightning node because we work on lightning, the application and also the code. The application model is very cool because now we can directly query these filters and respond to onchain events. We don’t need to worry about full nodes omitting data which could cause us to possibly lose money by broadcasting a rogue state. This is also a necessary piece for us to eventually have mobile clients for lightning so in the near future you could have lightning nodes on your phone and that would be a killer aspect of it because you’re not going to be carrying round your laptop to make transactions and things likes that. That’s pretty helpful. There’s still a bit to be done. In the future, we’re going to start to use sendheaders which basically lets you cut down the amount of time it takes you to sync the next block. Right now, the full nodes send you an inv and then you say “ok maybe that’s useful” then you do get data then you have the block. With sendheaders, it just sends you the headers directly. Because we’re a light client all we care about is headers. So now as soon as we get the headers, we can get the filter headers and then from that we can see if we need to get the filter or not. The other cool thing is because we consume entire blocks we can do much more consensus validation. In the past BIP 37 clients could have done more but the way they were implemented, they basically don’t verify anything which is pretty bad. We’ll be able to verify soft forks maybe. Most soft forks can be verified with BIP 9, maybe some hard forks as well. We can also verify things like the block size, sigop limits, transaction size and things like that. Another cool thing is that because you’re able to fetch the block from anywhere once you match the filter, we can add new pluggable backends into that and that could be some crazy thing like computational private information retrieval or fetching from several peers or some onion routing network or through some CDN or through some server or something like that. There’s a lot of flexibility in terms of how you actually get the block once you realize it may be relevant.
So lightning and neutrino. So neutrino is now, as I’m speaking, it is integrated into backend for lnd meaning you can now run lnd on Bitcoin’s testnet. We don’t have light client support for it yet because of that whole PoW change, different difficulty readjustment, things like that. It is currently fully into lnd and it was pretty easy to do so because lnd’s backends are all fully abstracted. So the amount of work we did to add Neutrino, you can add Bitcoin Core or any other backend. Right now, we have lnd fully integrated with Neutrino and if you want to download it right now you can run this command, these particular arguments, Neutrino’s active and you add a peer. From there you can connect to the network and start syncing from nodes and start syncing all the headers and… channels. This is very cool because now I’ve been running on my laptop, I only have Neutrino I don’t have a full node but I need to connect to someone else’s full node. You need another full node. Sorry, but its indirect so its a little bit different. This is cool because this means we can run lnd and lightning on phones, on Raspberry Pis, on embedded devices, on smaller resource constrained devices. We need to do a bit of optimization in terms of initial block download because the database we use kind of falls over. Especially on testnet, testnet has 1 million blocks and you’re basically shoving a bunch of data in there very quickly so we’re going to switch to another one, make it a little bit faster. We also designed the lightning protocol, lightning-rfc, there’s a link you can click later, with light clients in mind. The way it is designed, light clients can verify all these kinds of proofs. Within the network, we don’t just say “here’s a channel” we say “here’s a channel and here’s a proof that it’s valid”. A light client can still verify that because we encode the block number, the transaction, the output index. So using that information, it can go fetch its header file, fetch the block and also fetch the transaction and verify that it has been unspent. We also use this to respond to onchain events like peoples’ channels being closed. And then also any time one of our channels gets closed, either by us unilaterally or in the dangerous case where you breach the contract and we can catch that and take all your money. There’s still some optimizations to be done but it is usable today so if you download it and start… It is much faster. Before, testnet would take maybe a few hours, this can be done in twenty minutes or so, we’re trying to make it faster.
There are a few future directions that we want to head into. Some of these require modifications to Bitcoin, some don’t. If we had UTXO set commitments, it would become much more efficient. Right now, in order to verify that a channel is not yet closed, we basically need to get the block, find where it is in the block itself and then scan for it in the chain to see if it has ever been spent. Scanning for it isn’t that bad because we don’t have to download the blocks, we can instead check all the filters. If we don’t have the filter on disk or it is a very old channel, that can be pretty resource intensive. Another future P2P extension we can get is merklegetblocktxn which I just made up before drafting these slides. This is basically like getblocktxn which is using compact block which lets you get a transaction by index. We want to get that by index but also get a Merkle proof. This is kind of like BIP 37 but it is better because we’re getting a particular transaction. We don’t really care if they know we’re getting a transaction to verify the graph because the graph is essentially an authenticated data structure. We can say in a post-HORNET world… For those not familiar lightning uses onion routing to route the payments themselves so people don’t know who the destination is or where they are in the route. We use something called Sphinx and there’s something that’s a little bit better called HORNET. HORNET uses Sphinx but the cool part about HORNET is that you actually form a circuit. When you form the circuit to send the message to the responder, you also give them the backwards route. This backwards route is fully encrypted. This means it can send information back to you but not actually know the source. So if I’m fetching these filters from the regular network, I can then use HORNET, maybe on lightning, to query some random node on the network and ask them for a block but they don’t know who I am (with some caveats about graph diversity and things like that). This is pretty cool because then we actually put services on top, make it more useful and make things easier for light clients and also provide a more private way to fetch blocks. Then maybe in the future, we can start to serve headers directly on the network itself. We have a 65K message size but still you could chop them up and get people synced up pretty quickly. So there’s source code on lightninglabs/neutrino and that’s the Twitter that we have. Thanks, any questions anyone?
Q - You actually did reference potentially using private information retrieval. So your sets when they’re returned are actually giving an ordering of the things in it so it would seem very natural that instead of.. the entire block if you need something, you use private information retrieval, just sending the bitfields of what things XORed together to a set of peers where you think the XOR of all those is the one thing that you want. That could make it so your communications are really super EDB.
A - For super bandwidth intensive, yeah that’s something I hadn’t thought about. We should follow up on that, that’s a good idea.
Q - I have two questions. One is you talk about a basic filter and an extended filter. Do you need the extended filter functionality for the lightning protocol implementation you’re building now.
A - As it is now, the only thing we actually need is the txid. We need that currently to be able to check our transactions are being confirmed. Once we talked about this on IRC, we realized we could compress it down because if we know the txid we probably know the script itself and then can actually compress it down a little bit. We don’t really yet have direct users for the witness or the scriptSig data but we put that in because maybe it could be useful. But because we do have the filter types, that could be introduced later possibly. So we could skim that down a little bit.
Q - The other question I have is how do you determine the optimal value of m to use in your Golomb-Rice?
A - So initially, we just chose a parameter and then did all the work and realized maybe we should try to justify that with some data. The current value we’ve chosen is… we’ve basically used a model, kind of like using a cdf of a geometric distribution and also the actual data onchain. We’ve tried to minimize the data, given a client with 100 addresses or 200 addresses… the expected bandwidth from downloading the filters and also the false positive rates themselves. That’s how we arrived at P=20, I think we used like 50 queries for the client. We also have a calculator that lets you adjust all the values and we need further tuning. When we pushed to the mailing list there was someone who had done pretty extensive research on this in the past. We’re going to talk to him and see if it makes it more optimal. Because it is a global value, we want to have a good value rather than having something that is poor down the line.
Q - Just a general lightning one. Can you give us an update on lightning on Litecoin?
A - It was pushed to the side to finish all this stuff essentially. The steps for that are basically there’s an issue in Neutrino to make the PoW and the… unpluggable. With that, we could add a new command line parameter and you could use Neutrino on Bitcoin and on Litecoin. Once we do that, we can also enable the mode for Litecoin and lnd. Right now, it is there but if you try to do it with Litecoin, it gives you an error because it is not supported currently. Beyond that, we’re basically very close to having cross implementation compatibility. We’ve frozen most of the aspects of this specification. We’re not going to change anything for now. We have what we know we want on the next version or at least we can start to converge towards this. Probably by the end of the month, we’ll have a new release of lnd which should be fully specification compatible. From there we do compatibility testing, robustness, making sure its fault tolerant and so on. There’s some other stuff we’re working on with Litecoin, that’ll be cool.
Q - I was wondering, this might be too a general question but what do you think is the lowest possible resource device that you could run something like Neutrino and get a wallet working on? Obviously you mentioned Raspberry Pis, could you go lower?
A - At least on testnet, that’s the metric, that’s kind of like old man Bitcoin, it has like a million blocks. All the header state is like 150MB essentially with all the Bitcoin headers and all the filter headers itself. You’d at least need that much in terms of… or whatever you’re using for a device. lnd itself, the Go runtime blows up a little bit but it runs on 50.. or so in terms of what it needs actively when you have a few channels. You can go beyond that if you have more channels and you’re doing a bunch of routing and things like that. I’m not quite sure but those are some rough specs in terms of memory usage and space issues.
Q - Would it be plausible to do something on RISC-V or is that way in the future?
A - Maybe. I’m not sure if Go has cross compilation support for that. If it does, we could try it and see if it works or not. That’d be cool also to see once we do some more optimization to make lnd much more slimmer to see what kind of device we can get it into. It’d be cool if you were running it on your router and maybe you’re doing some crazy….there’s micropayments and all that other stuff. That’s in the future and something we envision that this could be used for in that context.
Q - I haven’t dug into how lightning is handling onion routing but does it still suffer from exit routing signal analysis essentially if you are an exit node and you notice this usage profile… Does it still suffer from that?
A - There’s an analogous issue currently in lightning now. There are payment hashes and those payment hashes are the same throughout… We know how to randomize them in the future but that needs a little fancier stuff. If we have Schnorr signatures, we can basically do that. That lets us randomize the payment hash at every single point on the route. So currently if you have two nodes on the same route, they can identify that it is the same payment. In the future we know how to do that. Beyond that, there’s regular traffic analysis but in the protocol, we do have a ping message. In that ping message, you tell the responding node how many bytes to respond with… Also the ping can be padded out itself so you can use that to fake some traffic. That’s definitely going to be an uphill battle in the future but at least we have something that works in the abstract. Once we actually do the deployment, we’ll get into the nitty gritty implementation issues.
Q - People have been asking about cross-chain swaps. Do you want to talk a little bit more about that?
A - So swaps are something we’re not committing a bunch of time to right now but at least there’s the scaffolding to do so in lnd. I was talking about the way the different backends were all abstracted out. The way it is right now, you basically just have a map of chains with different backends so then from there you can have lnd that’s running on two chains or three chains or however many chains. Once on the chain… you can facilitate trades between…We don’t necessarily want to put pricing information directly on the network. Instead there can be a higher layer. Once you’re matched up, you basically do the HTLC, boom and it’s done. The work of finding the matches for people to trade with in addition to what their rates are and which currencies they support, that can all be lifted onto another network, I guess maybe Layer 3. Below that we have lightning itself and we can do cool things from that. Also it’s cool that with lightning you can combine on and offchain HTLCs. So you can say, if a chain you want to trade on doesn’t have a robust malleability fix, you could on one of the final hops (or on any hop), make a regular onchain HTLC. That may slow things down a little bit, waiting for a transaction confirmation but this is a more adaptive bridge when these other chains aren’t compatible to do trading. Short answer, it’s possible, we know how to do it, we’re not focusing on it yet. We’re focusing on making it work well on Bitcoin and then we can branch out to other chains. Likely first Bitcoin, Litecoin, crosschain coming soon.
Q - How would Level 3 work?
A - I don’t know. Level 3 is basically matchmaking and signaling. One thing you can do is matchmaker can make part of the onion route itself to force itself to be the intermediate node and extract some fees for that service essentially. A bunch of cool designs like that utilize different components of lightning itself to make this higher level stuff work. There’s a bunch of people working on this stuff now, people are into this decentralized exchange thing because people are crazy about this token stuff and maybe it would be nice to have two exchanges that do all the volume in the world. People are working on it to figure it out.
Community-maintained archive to unlocking knowledge from technical bitcoin transcripts