I was too intimidated for years reading Hacker News posts about beancount and ledger, but it seemed useful and like it would eventually pay off.
I realized that large language models would cut most of the overhead in writing custom parsers for the formats institutions make data available in (CSV/OFX format).
In the meantime, I have been making dashboards for a project with my partner: https://jaanli.github.io/american-community-survey/income - these rely on new tools like duckdb and Observable Framework to transform decades of data from 38 million individuals and hundreds of datapoints per person.
So I realized I might be able to get around my fears through exposure therapy and having a large language model like GPT-4 walk me through converting one institution's data this weekend to beancount format.
The results were... better than I expected! Wrote it up here: https://gist.github.com/jaanli/1f735ce0ddec4aa4d1fccb4535f3843f
Would love to see a prompt library that others are using. If I have some more time (trying to keep it to <1 hour per quarter), might see if DSPy can further reduce the time it takes to parse and categorize transactions from diverse institutions (example here: https://github.com/stanfordnlp/dspy/blob/main/intro.ipynb).
Appreciate all the resources out there for plain text accounting and hope this helps someone else who has also been feeling intimidated! Setting out requirements beforehand really helped, like insisting to myself that my partner (non-technical background) must be able to understand every step (otherwise I would be uncomfortable with the resulting implicit potential power & information asymmetry). That helped narrow the options and keep things to one prompt at a time..
Open to any tips/suggestions/workflows others have come up with!
[link] [comments]
This is a companion discussion topic for the original entry at https://www.reddit.com/r/plaintextaccounting/comments/1bbow6e/anyone_else_using_large_language_models_llms_with/