Day 2: Python OOP, File I/O & Poetry

Why OOP Matters for BFSI AI Work

On Day 1 I wrote procedural code — one script, top to bottom, downloading data and printing results. That's fine for exploration. But every production AI system in banking or financial services is built around objects: a Transaction, a LoanApplication, a RiskScore. OOP is the language that financial engineers speak.

Today I built a FinancialRecord class — the simplest possible representation of one row of market data — and made each CSV row an instance of it. This is exactly the pattern that scales to ML pipelines: instead of passing raw dicts around, you pass structured objects with validated fields and methods.

🏦 BFSI Relevance

Fraud detection models, credit scoring systems, and trade reconciliation tools at banks are all built on class hierarchies. A Transaction object carries its own validation, formatting, and risk methods. Learning this pattern on Day 2 means you're building the right mental model from the start.

The Dataset — NIFTY50 Historical Data

Day 1 used AAPL from yfinance. Day 2 shifts to NIFTY50 — India's benchmark index, directly relevant for BFSI roles in the Indian market. The data is free, no login required, downloaded directly using yfinance with the ticker ^NSEI.

Terminal — Run Day 2 Script

# Activate your environment first
conda activate ai_dev
python day2_financial_project.py

CSV already exists. Skipping download.
Date: 2024-01-02 | Open: 21665.599609 | Close: 21710.800781
Bullish Day
----------------------------------------
Date: 2024-01-03 | Open: 21719.800781 | Close: 21517.349609
Bearish Day
----------------------------------------
Date: 2024-01-04 | Open: 21519.199219 | Close: 21737.599609
Bullish Day
----------------------------------------

💡 Why "CSV already exists. Skipping download"?

The script checks if the file is already on disk before calling yfinance. This is a real engineering habit — never make a network request you don't need. In production pipelines, redundant downloads waste time and can hit API rate limits.

The KeyError Bug — and Why It Happened

The first run crashed with this traceback:

⚠ TRACEBACK — KeyError

Traceback (most recent call last):
  File "day2_financial_project.py", line 38, in <module>
    row["Date"],
    ~~~^^^^^^^^
KeyError: 'Date'

The cause: yfinance sets the Date as the DataFrame index, not a regular column. When you export to CSV without calling reset_index(), the date gets written with a cryptic multi-level label — not "Date". So row["Date"] inside csv.DictReader raises a KeyError because the key simply doesn't exist in the CSV headers.

⚠️ The Fix — One Line

Call data.reset_index(inplace=True) before saving the CSV. This promotes the index into a proper column named "Date", and row["Date"] works correctly from then on. The lesson: when data doesn't match your expectation, print list(row.keys()) before editing the code.

The One-Line Fix

python — the fix

# Without this, Date is the DataFrame index — not a column
data = yf.download("^NSEI", start="2024-01-01")

# This promotes the index into a proper "Date" column
data.reset_index(inplace=True)   # ← the fix

data.to_csv("nifty50.csv", index=False)

# Now csv.DictReader sees: Date, Open, High, Low, Close, Volume
# And row["Date"] works correctly

The FinancialRecord Class — Full Code

Here is the complete day2_financial_project.py — every line written on Day 2, with the bug fixed and all four steps in one script:

python — day2_financial_project.py

import os
import yfinance as yf
import csv

file_name = "nifty50.csv"

# ── Step 1: Download only if CSV doesn't already exist ───
if os.path.exists(file_name) and os.path.getsize(file_name) > 0:
    print("CSV already exists. Skipping download.")
else:
    print("Downloading NIFTY50 data...")
    data = yf.download("^NSEI", start="2024-01-01")
    data.reset_index(inplace=True)   # moves Date from index → column
    data.to_csv("nifty50.csv", index=False)
    print("NIFTY50 CSV saved successfully")


# ── Step 2: Define the FinancialRecord class ─────────────
class FinancialRecord:
    def __init__(self, date, open_price, high, low, close):
        self.date       = date
        self.open_price = open_price
        self.high       = high
        self.low        = low
        self.close      = close

    def summary(self):
        print(
            f"Date: {self.date} | "
            f"Open: {self.open_price} | "
            f"Close: {self.close}"
        )

    def is_bullish(self):
        return self.close > self.open_price


# ── Step 3: Load CSV rows into FinancialRecord objects ───
records = []

with open(file_name, "r") as file:
    csv_reader = csv.DictReader(file)

    for row in csv_reader:
        record = FinancialRecord(
            row["Date"],
            row["Open"],
            row["High"],
            row["Low"],
            row["Close"]
        )
        records.append(record)


# ── Step 4: Print first 5 records with bullish/bearish ──
for record in records[:5]:
    record.summary()

    if record.is_bullish():
        print("Bullish Day")
    else:
        print("Bearish Day")

    print("-" * 40)

Breaking Down What Each Part Does

os.path.exists() + os.path.getsize() — Guard the Download

Before calling yfinance, check if the CSV is already on disk and non-empty. Both conditions matter — an empty file from a failed download would otherwise fool the check. This prevents redundant API calls every run, which matters in production pipelines that execute daily.

data.reset_index(inplace=True) — Why This Fixes the KeyError

yfinance sets the Date as the DataFrame index, not a regular column. Without reset_index(), the CSV has no Date column — it gets a cryptic multi-level label instead. Calling reset_index() promotes the index into a proper column named "Date", so row["Date"] works cleanly in csv.DictReader.

summary() and is_bullish() — Two Methods, Two Responsibilities

summary() handles display — it prints the record's key fields. is_bullish() handles logic — it returns True or False based on whether close beat open. This separation of display logic from business logic is a core OOP principle. Later you might swap summary() for a JSON formatter without touching is_bullish() at all.

records[:5] — Slicing the List of Objects

records[:5] gives you the first 5 objects from the list. Each iteration calls summary() to print the row, then is_bullish() for the market signal, then a divider line. This pattern — iterate over objects, call methods — is how every production data pipeline operates at its core. records[-1] always returns the most recent date in time-series data.

Setting Up Poetry

On Day 1, packages were installed with bare pip. That works but it's not reproducible — there's no record of which versions were installed or why. Poetry solves this by creating a pyproject.toml (the project manifest) and a poetry.lock (the exact version lock). Anyone who clones the project runs poetry install and gets the exact same environment.

Installing Poetry

Terminal — Install Poetry

# Install Poetry using the official installer
curl -sSL https://install.python-poetry.org | python3 -

# Add Poetry to PATH (add this line to ~/.zshrc too)
export PATH="$HOME/.local/bin:$PATH"

# Verify the installation
poetry --version
Poetry (version 2.4.1)

Initialising the Project

Terminal — poetry init

# Run from your project root (AI-Architect-Roadmap/)
poetry init

# Answer the prompts:
# Package name: ai-architect-roadmap
# Version: 0.1.0
# Description: AI Architect learning roadmap — BFSI Edition
# License: (leave empty, press Enter)
# Define dependencies interactively? → no
# Confirm generation? → yes

Generated file

# Now register your actual packages
poetry add pandas yfinance matplotlib
Updating dependencies
Resolving dependencies... (1.2s)
Writing lock file

The pyproject.toml That Was Generated

toml — pyproject.toml

[project]
name = "ai-architect-roadmap"
version = "0.1.0"
description = "AI Architect learning roadmap projects using Python, AI, and BFSI examples"
authors = [
    {name = "Prabhu"}
]
requires-python = ">=3.11"
dependencies = [
    "pandas (>=3.0.3,<4.0.0)",
    "yfinance (>=1.3.0,<2.0.0)",
    "matplotlib (>=3.10.9,<4.0.0)"
]

[build-system]
requires = ["poetry-core>=2.0.0,<3.0.0"]
build-backend = "poetry.core.masonry.api"

💡 What does poetry.lock do?

It records the exact version of every package and every transitive dependency — even packages your packages depend on. Running poetry install on any machine produces an identical environment to yours, forever. This is what makes builds reproducible — a strict requirement for any production AI system in banking.

Project Structure at End of Day 2

tree — AI-Architect-Roadmap/

AI-Architect-Roadmap/
│
├── pyproject.toml          # ← NEW: Poetry project manifest
├── poetry.lock             # ← NEW: Exact dependency versions
│
├── ai_dev/                 # conda environment (not committed to git)
│
├── Day1/
│   ├── basics.py
│   ├── day1_stock_project.py
│   ├── aapl_stock_data.csv
│   └── chart.pdf
│
└── Day2/
    ├── day2_financial_project.py   # ← NEW: OOP + CSV loading
    └── nifty50.csv                 # ← NEW: NIFTY50 historical data

Key OOP Concepts Internalised Today

python — OOP concepts in context

# Class — the blueprint
class FinancialRecord:
    pass

# Instance — one specific object created from the blueprint
record = FinancialRecord(...)

# __init__ — constructor; called automatically on creation
def __init__(self, date, close):
    self.date  = date     # instance attribute
    self.close = close    # each instance has its own copy

# Method — a function that belongs to the class
# Always takes self as first argument
def summary(self):
    return f"{self.date}: {self.close}"

# Calling a method on an instance
print(record.summary())

# Storing many instances in a list — the standard pattern
records = []
records.append(FinancialRecord(...))

# Iterating — same as any list
for r in records:
    print(r.summary())

Day 2 Writing Reflection

Today's prompt: "Why does structuring data as objects (OOP) produce more maintainable code than raw dictionaries for financial data?"

✍️ Day 2 Writing Reflection

"A raw dictionary like {'Date': '2020-01-02', 'Close': '12282.20'} is just data — it carries no behaviour, no validation, and no guaranteed structure. If I rename a key, every piece of code that touches that dictionary breaks silently. A FinancialRecord object, by contrast, defines its fields once in __init__ and exposes behaviour through methods. For BFSI systems where a Transaction object might pass through fraud detection, accounting, and reporting in the same pipeline, encapsulating data and behaviour together means each stage only needs to call a method — it doesn't need to know the internal structure of the object. That separation is what makes financial AI systems auditable and maintainable at scale."

What I Built by End of Day 2

✅ DAY 2 DELIVERABLES

✅ FinancialRecord class

✅ __init__ + summary()

✅ is_bullish() method

✅ csv.DictReader loading

✅ NIFTY50 2024 dataset

✅ reset_index() fix

✅ Bullish/bearish signal

✅ Poetry installed

✅ pyproject.toml

✅ Writing reflection

⚠️ One Honest Note

The KeyError bug took longer to debug than expected. But that's realistic — real engineering is mostly reading error messages and inspecting data. Printing list(row.keys()) was the fix. The lesson: when data doesn't match your expectation, inspect the data before editing the code.

What's Coming on Day 3

Day 3 covers NumPy + Pandas Foundations. I'll move beyond csv.DictReader and load the NIFTY50 data directly into a Pandas DataFrame. Then I'll compute real financial statistics — rolling averages, daily returns, volatility — using NumPy operations. The FinancialRecord objects from today will evolve into a proper DataFrame-based pipeline.

← Day 1 · Python Environment Phase 1 · Days 1–10 · Python & Math Foundations Day 3 → NumPy + Pandas