# What’s going on inside `mathstream` Figured I’d write down how the whole thing is wired, more like a brain dump than polished docs. If you’re diving in to fix something, this should save you a bunch of spelunking. > New top-level `README.md` now mirrors the high-level pitch, cross-links experiments (`collatz_ui`, `pi_finder`, `find_my.py`, `stupid.py`, `seed_start.py`), and lists quick-start commands (`pip install -e .`, `python test.py`, etc.). Treat this file as the deep dive into the mathstream core. ## Directory map (so you know where to poke) ``` mathstream/ __init__.py # re-export central engine.py # arithmetic guts exceptions.py # custom errors number.py # StreamNumber, manual GC, watcher utils.py # sqlite junk drawer test.py # smoke/integration script ``` ## StreamNumber – the heart of it `mathstream/number.py` owns the `StreamNumber` class. The class does a couple jobs: - wraps a file of digits (either you give us a path or a literal; literals get canonicalised and dropped into `instance/log/literal_.txt`). - streaming happens via `.stream(chunk_size)` so we never load the whole thing; every time we read we call `touch_log_file` so the usage timestamp keeps moving. - when a new stream gets created we check if it lives under `LOG_DIR`. If yes, we register it with the sqlite tracker (`register_log_file`) and also bump a ref counter via `register_reference`. - there’s a weakref finaliser plus a global `_ACTIVE_COUNTER` that keeps tabs on python-side references. If the object falls out of scope we run `_finalize_instance`, which decrements the counter and if it was the last one we call `release_reference` (that may nuke the file instantly). - explicit `free()` exists for people who want deterministic cleanup. It’s basically like `free()` in C: drop ref count and optionally delete the file now. There’s an alias `free_stream` and the class is a context manager so `with StreamNumber(...) as sn:` cleans up automatically. So any time you’ve got a staged result hanging around in memory, the watcher knows about it. Once you ditch it—either by `free()` or just letting the object die—the sqlite ref count drops. ## Engine – maths without ints Living in `mathstream/engine.py`. All the operators (`add/sub/mul/div/mod/pow`) pull chunks from the `StreamNumber` inputs, normalise them into sign + digit strings, run grade-school algorithms, then write the result back into `LOG_DIR`. - `_write_result` is the important bit: writes to disk, calls `register_log_file`, then wraps the file in a new `StreamNumber`. Because of that call, every staged result is tracked automatically. - We’re careful about signs: division and modulo follow Python’s floor division rules. Divide-by-zero is intercepted and converted into `DivideByZeroError`. - `clear_logs()` wipes the folder and calls `wipe_log_records()` to empty sqlite so the next run isn’t polluted. ## Exceptions `mathstream/exceptions.py` just defines `MathStreamError` and the more specific `DivideByZeroError`. Nothing fancy, just so we don’t leak raw `ZeroDivisionError`. ## SQLite watcher (`mathstream/utils.py`) This is the garbage-collection HQ. On import we run `_ensure_db(reset=True)` so every run starts from a clean DB (no migrations, no surprises). Two tables: - `logs` → metadata about every staged file: created time, last access, access count. - `refs` → current reference count (think “how many StreamNumber instances think they own this file”). Important functions: - `register_log_file(path)` – ensure both tables have a row (initial ref count 0). - `register_reference(path)` – increments the ref count, updates last access, access count etc. Called whenever a new `StreamNumber` points at the staged file. - `touch_log_file(path)` – called from `.stream()` so we know the file is being read. - `release_reference(path, delete_file=True)` – the inverse of register. If the count hits zero we remove the DB row and (optionally) delete the file right away. - `collect_garbage(score_threshold)` – this is the periodic sweeper. Computes `score = age / ((ref_count + 1) * (access_count + 1))`. Bigger score means older + less used. If score >= threshold it gets unlinked and removed from DB. Negative thresholds blow up on purpose. - `tracked_files()` – dumb helper that dumps `{path: ref_count}` out of the DB. - `wipe_log_records()` – nukes both tables; used by `clear_logs`. ## How cleanup flows 1. You run an operation (`add`, `mul`, whatever). Result file lands in `LOG_DIR`, gets registered, comes back as a `StreamNumber`. 2. You stream it or create more streams from it – metadata keeps getting updated via `touch_log_file`/`register_reference`. 3. When you’re done, call `.free()` or just drop references. Manual free is immediate. Otherwise the weakref finaliser catches it eventually. 4. `release_reference` is what actually removes the sqlite entries and unlinks the data file when there are no logical references left. 5. If you still have detritus (e.g. you crashed before refs hit zero), run `collect_garbage(threshold)` to sweep anything whose age outweighs usage. 6. `active_streams()` reports what’s still alive in Python land; `tracked_files()` shows what the DB thinks is referenced. ## Example run (`test.py`) `test.py` is half regression, half reference script. It: - seeds some numbers, runs every operation, checks results. - makes sure `DivideByZeroError` fires. - frees every staged number to prove files vanish on the spot. - runs `collect_garbage(0)` just to make sure nothing else lingers. - dumps `active_streams()` and `tracked_files()` so you can see python vs sqlite state. If the logs ever seem suspicious, run that script—it’ll tell you immediately whether something’s still referenced or if the GC is forgetting to clean up.