If I Rework SSC Token History, These Are the First Things I’d Change

3 weeks ago

Hey everyone,

Lately I have been looking through ssc_tokens_history and thinking about what I would do if I decided to give the history node a proper modernization pass.

This is not one of those "burn it all down and rewrite it in Rust" posts. The project still works. It still serves useful data. It still does the job it was built to do.

I also want to be clear up front: this is not me revisiting my own original design and pretending I had some grand plan all along. I am coming into an existing codebase and looking at it as an operator and maintainer.

Current Git Log

Like a lot of older infrastructure projects, it carries the fingerprints of the moment it was built:

quick fixes that made sense at the time
query shapes that were good enough for the first consumer
indexes designed for the obvious case, not every case
API behavior that is functional, but not always ergonomic

None of that is a criticism of the original work. It is just the normal reality of software that survives long enough to become infrastructure.

So I figured this would make a decent Hive post: if I sit down and improve the history node from where it is today, what are the first things I would actually change?

First: I Would Make the Filters Less Annoying

One of the first things that jumps out at me in the current API is that some filtering behavior is more rigid than it needs to be.

For example, accountHistory currently only applies a timestamp filter if both timestampStart and timestampEnd are present and valid.

That means you can do this:

"show me data between X and Y"

But you cannot cleanly do these:

"show me everything since X"
"show me everything before Y"

That is fine if you are building around a narrow UI, but it is not great if you are treating the endpoint like a general history API.

If I were updating it, I would make the filter logic much more flexible:

timestampStart only: apply $gte
timestampEnd only: apply $lte
both: apply a bounded range

That is not a flashy change, but it is the kind of thing that makes an API feel less stubborn.

I would probably do the same kind of cleanup pass anywhere else the API expects very specific combinations of parameters when a looser interpretation would still be safe and predictable.

Second: I Would Revisit the Indexes Based on Real Query Patterns

Right now the main accountsHistory collection is created with these indexes:

{ account: 1, symbol: 1, operation: 1, timestamp: -1 }
{ transactionId: 1 }
{ timestamp: -1 }

That first compound index is sensible for one very specific query shape:

exact account
optional symbol
optional operation
sorted by newest first

The problem is that in the real world, people do not always query the API in the most index-friendly order.

Sometimes what you really want is:

all history for one account, newest first
all history for one account and one operation, newest first
all history for one account in a time range, newest first

The current compound index helps some of that, but it is not the same as having indexes that are intentionally designed around those actual patterns.

If I were tuning this for production traffic today, I would seriously consider adding indexes like:

{ account: 1, timestamp: -1 }
{ account: 1, operation: 1, timestamp: -1 }

I am not saying "throw indexes at the wall and pray." Every index has a write cost. But a history node is a read-heavy thing by nature, and the whole point of it is to answer these account-centric questions efficiently.

So this is one of the first areas where I would stop guessing and start profiling.

Third: I Would Move Away From Deep Skip/Limit Pagination

This is probably the biggest API-level improvement I would want to make.

The current endpoints use skip and limit. That is easy to implement, easy to explain, and perfectly fine for shallow browsing.

It is much less fine when someone wants to dig deep into an account's history.

The further you paginate with skip, the worse the experience gets. The database still has to walk past everything you skipped. That is one of those problems that does not look too bad at first, but gets uglier the more your dataset grows.

If I were reworking the API, I would strongly consider switching to cursor-based pagination using something like:

timestamp
_id
or a combined "before" cursor

That would make the API more efficient for long browsing sessions and more predictable for frontends that need to page through history without playing offset roulette.

This is also one of those changes that makes the project feel more current without requiring a total redesign.

Fourth: I Would Take Another Hard Look at NFT History

The NFT side of the history node is where the existing design compromises are most obvious.

The nftHistory endpoint currently builds an aggregation pipeline that:

matches NFT-side records
groups by accountHistoryId
looks the data back up in accountsHistory
merges the result back into one object

It works, but it is not exactly elegant.

There is even a comment in the NFT contract parser that basically acknowledges the current structure is a compromise:

the account history should eventually be restructured into separate collections that only reference the account-history entry by ID, similar to how NFT history works.

That kind of comment is useful because it tells me the code itself is already hinting at where the pressure points are.

If I were revisiting the project seriously, I would ask a few questions:

Should NFT account history and NFT item history be more explicitly separated?
Should the read path avoid the $lookup where possible?
Would denormalizing a few more fields into nftHistory reduce query cost enough to be worth it?

I do not think the answer is automatically "more normalization" or "more denormalization." The right answer depends on the actual query traffic. But I do think this is one of the areas where I would stop accepting the current compromise just because it already exists.

Fifth: I Would Probably Expand the Filter Surface a Bit

The current API is useful, but there are some obvious places where I could make it friendlier without turning it into a monster.

For example:

richer operation filtering
better single-sided time filters
maybe account-history views that distinguish token transfers, contract transfers, and fee transfers more explicitly
maybe a cleaner contract/action query model for consumers who think in those terms rather than the flattened operation string

I would be careful here, because API bloat is real. Not every "wouldn't it be nice if" belongs in production.

But I do think there is room for a little more flexibility without making the interface messy.

Sixth: I Would Recheck the Builder’s Rollback Behavior

The parser currently does a rollback by timestamp based on the last parsed block, then replays forward from there.

That is practical and understandable, but it is also one of those places where I would want to re-read the code with a more suspicious mindset if I were making the node more robust.

Any time an system says:

"delete everything with timestamp greater than or equal to X and rebuild from there"

I want to be very sure I agree with the assumptions behind that decision.

I am not saying it is wrong. I am saying this is exactly the kind of logic that deserves a fresh review once a project graduates from "works for my current deployment" into "I want to tighten this up for the long haul."

Seventh: I Would Probably Add Some Query Telemetry Before Touching Too Much

This may actually be the most important point in the whole post.

Before I started adding indexes or restructuring the API, I would want real information about how the history node is actually being used.

What are the most common queries?

account-only lookups?
account + symbol?
deep pagination?
NFT account lookups?
same symbols queried constantly?

Because the worst way to optimize a history service is by optimizing for the queries you imagine people are making instead of the ones they are actually making.

So if I pick this project back up, I would probably start with:

lightweight query logging
maybe some aggregation on common parameter combinations
response timing for the slowest query shapes

That gives me a reasoned path to better indexes instead of just adding whatever sounds plausible in the moment.

This Is the Kind of Project That Ages Quietly

That is probably why I find this interesting enough to write about.

ssc_tokens_history is not flashy. It is not the kind of service most people think about unless it breaks, slows down, or fails to answer the question they wanted answered.

But these are exactly the kinds of projects that quietly accumulate little design decisions over time until one day you look at them and realize:

"Yeah, this still works, but I can already see where I would start cleaning it up."

And honestly, I think those kinds of posts are useful. Not because every idea turns into code tomorrow, but because writing down where the pain points are is often the first real step toward improving a service.