Ways to sanitize payee names

Hello fellow plain text people,

I’m looking for ways to sanitize payee names when importing CSV.

Current I have Hledger CSV rules to assign accounts and to deal

with income vs outgoing payments in the CSV rules.

But a lot of places use payment providers or have for some reasons variations in their names.

I’m looking for a way to deal with that.

Something like this came to mind:

if %name
# list of names the payee had before
       %to <sanitized>

Would this work, are there better names?

I would like to be able to match against a field without repeating

the CSV fields name all the time too.

Thanks for any replies.

Hello @thaodan,

I just use simple rules like this (the patterns that will work best will vary by person):

if
3 SQUARE CAFE
A.A ROOTS
BAJA FRESH
BELCAMPO
...
BBQ
BURRITO
DINER
GRILL
PIZZA
TACO
THAI
\bDELI\b
\bBURGER\b
\bVEGAN\b
\bCAFE\b
\?MCC.581[124]
 account2 expenses:food:dining

If you really need to limit these to a specific field, yes I think repeating the field name is unavoidable:

if
%description 3 SQUARE CAFE
%description A.A ROOTS
%description BAJA FRESH
...

(except by compressing into fewer lines:)

if
%description 3 SQUARE CAFE|A.A ROOTS|BAJA FRESH|...
%description BBQ|BURRITO|DINER|...
 ...

but with my csvs at least, whole-record matching usually just works.

Oops, I answered a question you didn't ask, but I'll leave the example there in case it helps someone.

To also sanitise payee names, it might look something like:

if
AMZN
AMAZON MKTPL
Amazon\.com
AMAZON-SERVICES-KI
CMX UNLIMITED
COMICS
 description Amazon US | %description
 account2 expenses:books/periodicals

if 
BLANKSPACES
CULVERWORKS
 description Blankspaces
 account2 expenses:rent

That will be a lot of rules to maintain, if you want to do this consistently. Another option is to get data from a bank aggregator instead of directly from the bank; they usually do a certain amount of payee name cleanup for you.

1 Like

Listing multiple names for a match is cheap and cheerful, I am with Simon on this: whenever I have something that slips the net, i just add a rule. I have maybe a couple of rules to add per month of transactions (unless I do something drastically different from my usual routine, but then it is basically expected)

Simon Michael notifications@plaintextaccounting.discoursemail.com
writes:

That will be a lot of rules to maintain, if you want to do this
consistently. Another option is to get data from a bank aggregator
instead of directly from the bank; they usually do a certain amount of
payee name cleanup for you.

My issue is that a lot of places use payment providers, sometimes you
might use one, sometimes the other.

That makes it hard to group them all together.

Is it possible to also reassign %to and %from? I would like to do that
so that I can still use the same setting for the description field.

I'm not sure about using an aggregator for privacy reasons and missing
API's on my banks side. Oh good do I wish that Nordic banks would make
it possible to use HBCI/FinTS.

You can create a file that assigns, say, just the description and account2 fields.

You can then include this file into multiple CSV rules files, each of which would assign account1 the way they see fit, and this way you would share some/all of the rules between them

Also you can of course assign account1, 2, ... based on conditions, independently of description. So you may have different if rules handling different parts of the transaction. (And later assignments override earlier ones, so you can set generic values then override for specific patterns.)