I’ve been experimenting with plain text accounting tools like hledger and Beancount, and one challenge I keep running into is maintaining a clean and scalable journal structure as the number of transactions grows. At the beginning, it feels simple, but over time categories, accounts, and file organization can become harder to manage consistently.
I’m curious how others here approach this. Do you prefer splitting journals into multiple files, using strict naming conventions, or relying more on automation and scripts to keep things organized? Also, how do you balance simplicity vs flexibility in your setup as your data grows?
Welcome @chickpeafilae. (The discourse posting delay was because "New user typed their first post suspiciously fast".)
It is a challenge, and a balancing act. Here are some related things:
I've been running hledger for a few years and settled on an approach that might be useful, especially for those coming from a spreadsheet-heavy background.
I keep a single main.journal for all transactions (may split by bank account in the future). The only files I split out are:
I find year-splitting more trouble than it's worth — the opening/closing balance dance never feels clean to me.
For account names, I use hledger's --strict mode, but enforce names at the data entry layer with Excel data validation. All valid account names live in a dedicated sheet, and every entry cell validates against that list. It works like autocomplete with a hard constraint — you can't enter a typo or an ad-hoc sub-account without first adding it to the list deliberately. New accounts become an intentional act, not an accident.
The account hierarchy matters more than the account count. A deep hierarchy (expenses:food:dining:work-lunch) gives you flexibility to query at any level, but you pay for it at entry time. I keep it to 3 levels unless there's a real reporting reason for a 4th.
I maintain hledger-Excel, a VBA-based pipeline that handles this data validation setup if anyone wants to see it in practice.
I have almost everything in a single file except a small subset of ~70 transactions in a separate file that’s included in the main one. Here’s what hledger stats says:
Txns span : 2014-04-01 to 2026-04-25 (4407 days)
Last txn : 2026-04-24 (0 days ago)
Txns : 7025 (1.6 per day)
Txns last 30 days : 100 (3.3 per day)
Txns last 7 days : 23 (3.3 per day)
Payees/descriptions : 1260
Accounts : 591 (depth 8)
Commodities : 41
Market prices : 102
I exclusively work with my journal files in Emacs, where I have live checking with flycheck-hledger and a bunch of handy features with hledger-mode, plus some extra auto-completion I really need to merge upstream at some point. When I need to query the data, I use the hledger CLI. (I’ve wrapped the most common queries with just for convenience.)
I built a web interface for hledger last year to experiment with some new technologies but I wasn’t very happy with the experience, so I’ll have to rebuild it when I can find the time. If I can get it working properly, I expect I’ll spend much less time looking at transactions in Emacs or querying my journals with the CLI.
I don’t do AI.
I don't know where you live, but in the US where I live, it is a good idea to lock your files every tax year. Our IRS (internal revenue service) takes a dim view of records that mutate, and I gather that tax authorities tend to be the same worldwide. Practically speaking, in the US we have from January 1 to the tax deadline on April 15 to make sure the transactions in the previous year are clean and lock them down, though we can request an extension if there is a problem.
In other words, it may be more about mutability and auditability than scalability.