Confusion around hledger csv "skip" directive

larsks · January 10, 2025, 10:26pm

I have two questions relating to hledger csv imports...

A counting question

I am importing transactions from Fidelity. A transaction report starts like this (I've added the line numbers):

1	
2	Plan name:,YOUR COMPANY, INC. 401K
3	Date Range,01/01/2024 - 12/31/2024,,,,
4	
5	
6	Date,Investment,Transaction Type,Amount,Shares/Unit
7	12/31/2024,FANTASTIC STOCK INDEX,REVENUE CREDIT,"0.67","0.005"

By my calculation, I need to skip 6 lines to get to the data, but this...

skip 6

...ends up skipping the first three transactions. To make this work, I actually need:

skip 3

Why is that? Are we not counting blank lines?

Skipping variable numbers of lines

Transaction reports from Vanguard start with a list of funds held by the account. For example, something like this:

Account Number,Investment Name,Symbol,Shares,Share Price,Total Value,
12345678,VANGUARD FUND 1,VSFND1,100.00,10.00,1000.00,
12345678,VANGUARD FUND 2,VSFND2,200.00,20.00,4000.00,
12345678,VANGUARD FUND 3,VSFND3,300.00,30.00,9000.00,
12345678,VANGUARD FUND 4,VSFND4,400.00,40.00,16000.00,



Account Number,Trade Date,Settlement Date,Transaction Type,Transaction Description,Investment Name,Symbol,Shares,Share Price,Principal Amount,Commissions and Fees,Net Amount,Accrued Interest,Account Type,
12345678,2024-01-31,2024-01-31,Reinvestment,Dividend Reinvestment,VANGUARD FUND 1,VSFND1,7.58600,...

If the funds held by the account change, that header will change, so a static skip value doesn't make sense. I've looked at the documentation on conditional skip/end rules, and I'm not sure how to make it work. The records don't contain any phrase/pattern that I can use to skip them; the identifying feature is "number of fields", and I don't think that's available as a value for matching in hledger.

I realize that I can fix this by preprocessing the files before feeding them to hledger, but this seems like such a common situation I was hoping I could handle this without introducing the extra step.

simonmic · January 10, 2025, 11:13pm

Correct, we're not counting blank lines. https://hledger.org/hledger.html#skip

Here are some (partial) examples:

You can see how the vanguard one does conditional skipping based on number of fields.

My personal rules have evolved since then, I found I did need a preprocessing step for some reason or other (you can't always avoid it, in general). Here it is:

# drop shorter records and blank lines, sort by date keeping headings at top
grep -vE '^(([^,]*,){6}[^,]*|)$' $CSV | sort -t, -n +2 >$CLEANCSV

I have found Vanguard's csv to be one of the more difficult to convert, as it is representing more complex transactions than a bank's.

larsks · January 11, 2025, 2:22am

My personal rules have evolved since then, I found I did need a preprocessing step for some reason or other (you can't always avoid it, in general).

Would it make sense to give hledger import the ability to run filters itself? Something like:

source OfxDownload.csv
filter sed '1,/^$/d'
filter sort -t, +2

Or:

source OfxDowload.csv | sed '1,/^$/d' | sort -t, +2

That would keep the logic in import.rules, and might simplify a number of configurations.

simonmic · January 11, 2025, 3:08pm

Yes, I think so. Nice mock-ups there.