Friday, March 28, 2025

Lookup Speed of uint Sets in Go

[]uint vs map[uint]struct{}

A wee bit of premature optimization, I was coding a thing where I wanted a set of uint, and I wanted to filter several million records against them. I had a hunch that []uint with binary search would be faster than hashing the key and doing lookup in a map[uint]. I was right, but only if the set of uint is smaller than 1000. Below 1000 elements, slices.BinarySearch() is faster, somewhere between 100 and 1000 is the crossover point in this test and after that map[uint]struct{} is faster.

I'm not couning setup time to build the set, just lookup time to do 100_000_000 lookups against the set.

ns/op (lower is better)
set sizearraymap
103.9525.911
1005.8437.104
10007.8996.008
1000010.356.746
10000013.159.547


Friday, March 21, 2025

How to develop an atproto firehose source with relay and goat

Big changes are coming to the atproto firehose with sync 1.1 and major code changes to the relay. This blog post is about developing a custom firehose source (e.g. a PDS implementation) and ensuring it is compatible with Bluesky's relay and firehose consumer libraries.

First, install Go 1.23 or newer

Then, git clone the Bluesky indigo repo and build the relay

git clone https://github.com/bluesky-social/indigo.git
cd indigo/cmd/relay
go build

If you have your custom PDS out deployed somewhere with https enabled, start the relay, it will log to stdout/stderr:

./relay --admin-key hunter2 --api-listen :2470 --metrics-listen :2471 --time-seq

If your custom PDS is local or otherwise plain http, `--crawl-insecure-ws` allows non-https PDSes

./relay --admin-key hunter2 --api-listen :2470 --metrics-listen :2471 --time-seq --crawl-insecure-ws

 In a separate terminal, tell your relay to start crawling your PDS and adjust its rate limits

curl --silent --include -H 'Authorization: Bearer hunter2' -H 'Content-Type: application/json' --data '{"hostname":"mygreatpds.address.tld"}' http://127.0.0.1:2470/admin/pds/requestCrawl

curl --silent --include -H 'Authorization: Bearer hunter2' -H 'Content-Type: application/json' --data '{"host":"mygreatpds.address.tld","per_second":5000,"crawl_rate":50,"repo_limit":10000000,"per_hour":50000000,"per_day":500000000}' http://127.0.0.1:2470/admin/pds/changeLimits

Is it working? Let's browse the firehose with `goat`

cd indigo/cmd/goat
go build
./goat firehose --relay-host ws://127.0.0.1:2470

If you create a record in your PDS, you should see it on the relay firehose.
You can also `goat firehose` your PDS directly. Ideally if you have both running you should see an event in both within a few milliseconds of each other.

The other thing to check is /metrics on your relay:

curl --silent http://127.0.0.1:2471/metrics

One useful filter is to get events tied to your PDS name

curl --silent http://127.0.0.1:2471/metrics | grep mygreatpds

That's about it! If all the events you emit out of your com.atproto.sync.subscribeRepos endpoint gets through to `goat firehose` reading the relay, you're in good condition to be part of the atproto world!