API Reference

This page documents the public API for Transtractor.

Parser

The main entry point for parsing bank statement PDFs.

StatementData

Represents the parsed bank statement data, including account information and transactions.

class transtractor.structs.statement_data.StatementData(key: str, account_number: str, transactions: list[Transaction])[source]

Bases: object

Class representing bank statement data.

__init__(key: str, account_number: str, transactions: list[Transaction])[source]

Initialize StatementData with validated attributes.

Parameters:
  • key (str) – Unique identifier for the statement

  • account_number (str) – Account number associated with the statement

  • transactions (list[Transaction]) – List of transactions in the statement

property account_number: str

Get the account number.

property filename: str

Get the filename.

property key: str

Get the statement key.

set_account_number(account_number: str) None[source]

Set the account number for the statement data.

Parameters:

account_number (str) – Account number associated with the statement

Raises:

TypeError – If account_number is not a string

set_filename(filename: str) None[source]

Set the filename for the statement data.

Parameters:

filename (str) – Filename for the statement

Raises:

TypeError – If filename is not a string

set_key(key: str) None[source]

Set the key for the statement data.

Parameters:

key (str) – Unique identifier for the statement

Raises:

TypeError – If key is not a string

set_transactions(transactions: list[Transaction]) None[source]

Set the transactions for the statement data.

Parameters:

transactions (list[Transaction]) – List of transactions

Raises:

TypeError – If transactions is not a list or contains non-Transaction items

to_csv(file_path: str, fields: tuple[str, ...] | list[str] = ('date', 'description', 'amount', 'balance')) None[source]

Export the statement data to a CSV file.

Parameters:
  • file_path (str) – Path to the output CSV file

  • fields (Union[tuple[str, ...], list[str]]) – Fields to include in the CSV. Defaults to (‘date’, ‘description’, ‘amount’, ‘balance’). Valid fields are: ‘date’, ‘date_index’, ‘description’, ‘amount’, ‘balance’, ‘key’, ‘filename’, ‘account_number’.

Example usage:

# Export with default fields
statement_data.to_csv('transactions.csv')

# Export with all available fields using list (or tuple)
statement_data.to_csv(
    'full_export.csv',
    fields=['date', 'date_index', 'description', 'amount',
            'balance', 'key', 'filename', 'account_number']
)
to_pandas_dict(fields: tuple[str, ...] | list[str] = ('date', 'description', 'amount', 'balance')) dict[str, list][source]

Convert the statement data to a dictionary suitable for pandas DataFrame.

Parameters:

fields – Fields to include in the dictionary. Defaults to (‘date’, ‘description’, ‘amount’, ‘balance’).

Returns:

Dictionary with keys as field names and values as lists of field values

Return type:

dict[str, list]

Example usage:

# Default fields
data_dict = statement_data.to_pandas_dict()
df = pd.DataFrame(data_dict)

# Custom fields with list (tuple also supported)
data_dict = statement_data.to_pandas_dict(
    fields=['date', 'description', 'amount', 'balance', 'key']
)
df = pd.DataFrame(data_dict)
property transactions: list[Transaction]

Get the list of transactions.

Transaction

Represents an individual transaction within a bank statement.

class transtractor.structs.transaction.Transaction(date: date | int, date_index: int, description: str, amount: float, balance: float)[source]

Bases: object

Class representing a bank transaction.

__init__(date: date | int, date_index: int, description: str, amount: float, balance: float)[source]

Initialize a Transaction.

Parameters:
  • date – Either a date object or milliseconds since epoch (int)

  • date_index – Transaction index for the day

  • description – Transaction description

  • amount – Transaction amount (will be rounded to 2 decimal places)

  • balance – Account balance (will be rounded to 2 decimal places)

amount: float
balance: float
date: date
date_index: int
description: str