Ways to sanitize payee names

Hello fellow plain text people,

I’m looking for ways to sanitize payee names when importing CSV.

Current I have Hledger CSV rules to assign accounts and to deal

with income vs outgoing payments in the CSV rules.

But a lot of places use payment providers or have for some reasons variations in their names.

I’m looking for a way to deal with that.

Something like this came to mind:

if %name
# list of names the payee had before
       %to <sanitized>

Would this work, are there better names?

I would like to be able to match against a field without repeating

the CSV fields name all the time too.

Thanks for any replies.

Hello @thaodan,

I just use simple rules like this (the patterns that will work best will vary by person):

if
3 SQUARE CAFE
A.A ROOTS
BAJA FRESH
BELCAMPO
...
BBQ
BURRITO
DINER
GRILL
PIZZA
TACO
THAI
\bDELI\b
\bBURGER\b
\bVEGAN\b
\bCAFE\b
\?MCC.581[124]
 account2 expenses:food:dining

If you really need to limit these to a specific field, yes I think repeating the field name is unavoidable:

if
%description 3 SQUARE CAFE
%description A.A ROOTS
%description BAJA FRESH
...

but with my csvs at least, whole-record matching usually just works.

Oops, I answered a question you didn't ask, but I'll leave the example there in case it helps someone.

To also sanitise payee names, it might look something like:

if
AMZN
AMAZON MKTPL
Amazon\.com
AMAZON-SERVICES-KI
CMX UNLIMITED
COMICS
 account2 expenses:books/periodicals
 description Amazon US | %description

if 
BLANKSPACES
CULVERWORKS
 account2 expenses:rent
 description Blankspaces

That will be a lot of rules to maintain, if you want to do this consistently. Another option is to get data from a bank aggregator instead of directly from the bank; they usually do a certain amount of payee name cleanup for you.

Listing multiple names for a match is cheap and cheerful, I am with Simon on this: whenever I have something that slips the net, i just add a rule. I have maybe a couple of rules to add per month of transactions (unless I do something drastically different from my usual routine, but then it is basically expected)