How do you keep your journal structure scalable over time?

I’ve been experimenting with plain text accounting tools like hledger and Beancount, and one challenge I keep running into is maintaining a clean and scalable journal structure as the number of transactions grows. At the beginning, it feels simple, but over time categories, accounts, and file organization can become harder to manage consistently.

I’m curious how others here approach this. Do you prefer splitting journals into multiple files, using strict naming conventions, or relying more on automation and scripts to keep things organized? Also, how do you balance simplicity vs flexibility in your setup as your data grows?

Welcome @chickpeafilae. (The discourse posting delay was because "New user typed their first post suspiciously fast".)

It is a challenge, and a balancing act. Here are some related things:

I've been running hledger for a few years and settled on an approach that might be useful, especially for those coming from a spreadsheet-heavy background.

I keep a single main.journal for all transactions (may split by bank account in the future). The only files I split out are:

  • prices.journal — commodity prices

  • budget.journal — periodic budget declarations

I find year-splitting more trouble than it's worth — the opening/closing balance dance never feels clean to me.

For account names, I use hledger's --strict mode, but enforce names at the data entry layer with Excel data validation. All valid account names live in a dedicated sheet, and every entry cell validates against that list. It works like autocomplete with a hard constraint — you can't enter a typo or an ad-hoc sub-account without first adding it to the list deliberately. New accounts become an intentional act, not an accident.

The account hierarchy matters more than the account count. A deep hierarchy (expenses:food:dining:work-lunch) gives you flexibility to query at any level, but you pay for it at entry time. I keep it to 3 levels unless there's a real reporting reason for a 4th.
I maintain hledger-Excel, a VBA-based pipeline that handles this data validation setup if anyone wants to see it in practice.

I have almost everything in a single file except a small subset of ~70 transactions in a separate file that’s included in the main one. Here’s what hledger stats says:

Txns span           : 2014-04-01 to 2026-04-25 (4407 days)
Last txn            : 2026-04-24 (0 days ago)
Txns                : 7025 (1.6 per day)
Txns last 30 days   : 100 (3.3 per day)
Txns last 7 days    : 23 (3.3 per day)
Payees/descriptions : 1260
Accounts            : 591 (depth 8)
Commodities         : 41
Market prices       : 102

I exclusively work with my journal files in Emacs, where I have live checking with flycheck-hledger and a bunch of handy features with hledger-mode, plus some extra auto-completion I really need to merge upstream at some point. When I need to query the data, I use the hledger CLI. (I’ve wrapped the most common queries with just for convenience.)

I built a web interface for hledger last year to experiment with some new technologies but I wasn’t very happy with the experience, so I’ll have to rebuild it when I can find the time. If I can get it working properly, I expect I’ll spend much less time looking at transactions in Emacs or querying my journals with the CLI.

I don’t do AI.

I don't know where you live, but in the US where I live, it is a good idea to lock your files every tax year. Our IRS (internal revenue service) takes a dim view of records that mutate, and I gather that tax authorities tend to be the same worldwide. Practically speaking, in the US we have from January 1 to the tax deadline on April 15 to make sure the transactions in the previous year are clean and lock them down, though we can request an extension if there is a problem.

In other words, it may be more about mutability and auditability than scalability.

@chickpeafilae it looks like a post of yours was being flagged as spam by discourse. If you want to try again, I'll process it.

Sinc 2024 I have started splitting my files by year, month and account. So for example: 202604_citibank.journal represents the statement from citibank for April 2026.

This does mean many, many files. This is causing a slight performance hit as hledger has to load all he files. But in return it is very easy to manage and makes my import process idempotent.

Sometimes I change my CSV rules, so I can just reprocess old CSVs and the journals are regenerated. Allows me to target by year or month or bank.

Although the performance hit is getting worse now, so I'm thinking of a hybrid approach.

Are you sure many files (how many ?) is causing the slowdown ? Is hledger faster if you cat all the files at once ? I wouldn't expect much difference.

hledger stats reports 205 files.

Your comment made me curious so I went ahead and profiled it.

Metric Separate files Single file
user 0.50s 0.46s
sys 0.08s 0.02s
cpu 80% 97%
wall clock 0.730s 0.483s

There is a difference, the single file method is 34% faster. It's not that significant for day to day activities so I haven't really bothered to do change my process. But I believe this won't scale forever.

One option I want to try is to use account open/close properly, but only for data that is more than 2 years old, because I still find myself referring to transactions from the last 2 years.

The other option I may try is to setup a wrapper or pre-hook to cat the files into a single file before passing it to hledger.

But the bottomline is that the pain of using multiple files isn't big enough yet for me to explore those options seriously. While the benefits of splitting by statements are much larger!

I guess you ran this a bunch of times ? Those particular numbers don't add up. (And thanks for the testing.)

Oops, I was lazy and ran it only once for each. :slight_smile:

@chickpeafilae Welcome. :slight_smile:

You asked a bit about journals and also accounts. Lots of good discussion about journal files, so maybe I can add something about accounts?

I keep one journal file per year (I'm on year 6 or 7). I started only recording for my business accounts but added personal accounts in 2025.

I close my books at year end and start a new journal file in order to 'lock in ' my accounts for tax purposes (like @rdsteed mentioned.)

account stuff

I have one header file which I include with my journal, which contains all my account declarations. And I run in --strict mode to make sure I'm not mis-typing accounts. Since it's a mix of business and personal accounts, my account list is quite voluminous right now.

In my experience the number of accounts shrinks over time. At least it did for my business. I started with all kinds of granularity with my business expense accounts, but slowly combined them over the years. I don't need massive granularity to manage the business, and when it comes time to do my taxes (Canada), there are only so many categories the government wants to see. 80% of my expense accounts gets rolled up into "Supplies" at tax time anyway.

So my business accounts are tight. My personal accounts are another story. It's because I don't really know what I want to track yet. As I figure that out, I can reduce the number of accounts.

For example, Do I really want to separate Food expenditures by groceries, fast food, restaurants, coffee shops, sweet treats and snacks? Yes for 2026, likely not in 2027. I can create 2027headers.journal and reduce them down. If I need more granularity once in a while, I can always look at the register. :slight_smile:

more journal file stuff

I got into csv imports in a big way last year (I used to hand type transactions!) and my workflow creates a lot of journal files -- one per bank account. But they are all temporary files. After I create a set of temporary journal files from the csv files, I check them with a --strict tag, do a bulk import into my current year's journal file, and archive the temp files.

I hope some of this helps. I find that everyone is different -- so long as you have a setup that makes sense in your brain, and you can use efficiently every week/month/etc, you're fine.