What’s going on inside `mathstream`

Figured I’d write down how the whole thing is wired, more like a brain dump than polished docs. If you’re diving in to fix something, this should save you a bunch of spelunking.

New top-level README.md now mirrors the high-level pitch, cross-links experiments (collatz_ui, pi_finder, find_my.py, stupid.py, seed_start.py), and lists quick-start commands (pip install -e ., python test.py, etc.). Treat this file as the deep dive into the mathstream core.

Directory map (so you know where to poke)

mathstream/
  __init__.py    # re-export central
  engine.py      # arithmetic guts
  exceptions.py  # custom errors
  number.py      # StreamNumber, manual GC, watcher
  utils.py       # sqlite junk drawer

test.py          # smoke/integration script

StreamNumber – the heart of it

mathstream/number.py owns the StreamNumber class. The class does a couple jobs:

wraps a file of digits (either you give us a path or a literal; literals get canonicalised and dropped into instance/log/literal_<hash>.txt).
streaming happens via .stream(chunk_size) so we never load the whole thing; every time we read we call touch_log_file so the usage timestamp keeps moving.
when a new stream gets created we check if it lives under LOG_DIR. If yes, we register it with the sqlite tracker (register_log_file) and also bump a ref counter via register_reference.
there’s a weakref finaliser plus a global _ACTIVE_COUNTER that keeps tabs on python-side references. If the object falls out of scope we run _finalize_instance, which decrements the counter and if it was the last one we call release_reference (that may nuke the file instantly).
explicit free() exists for people who want deterministic cleanup. It’s basically like free() in C: drop ref count and optionally delete the file now. There’s an alias free_stream and the class is a context manager so with StreamNumber(...) as sn: cleans up automatically.

So any time you’ve got a staged result hanging around in memory, the watcher knows about it. Once you ditch it—either by free() or just letting the object die—the sqlite ref count drops.

Engine – maths without ints

Living in mathstream/engine.py. All the operators (add/sub/mul/div/mod/pow) pull chunks from the StreamNumber inputs, normalise them into sign + digit strings, run grade-school algorithms, then write the result back into LOG_DIR.

_write_result is the important bit: writes to disk, calls register_log_file, then wraps the file in a new StreamNumber. Because of that call, every staged result is tracked automatically.
We’re careful about signs: division and modulo follow Python’s floor division rules. Divide-by-zero is intercepted and converted into DivideByZeroError.
clear_logs() wipes the folder and calls wipe_log_records() to empty sqlite so the next run isn’t polluted.

Exceptions

mathstream/exceptions.py just defines MathStreamError and the more specific DivideByZeroError. Nothing fancy, just so we don’t leak raw ZeroDivisionError.

SQLite watcher (`mathstream/utils.py`)

This is the garbage-collection HQ. On import we run _ensure_db(reset=True) so every run starts from a clean DB (no migrations, no surprises). Two tables:

logs → metadata about every staged file: created time, last access, access count.
refs → current reference count (think “how many StreamNumber instances think they own this file”).

Important functions:

register_log_file(path) – ensure both tables have a row (initial ref count 0).
register_reference(path) – increments the ref count, updates last access, access count etc. Called whenever a new StreamNumber points at the staged file.
touch_log_file(path) – called from .stream() so we know the file is being read.
release_reference(path, delete_file=True) – the inverse of register. If the count hits zero we remove the DB row and (optionally) delete the file right away.
collect_garbage(score_threshold) – this is the periodic sweeper. Computes score = age / ((ref_count + 1) * (access_count + 1)). Bigger score means older + less used. If score >= threshold it gets unlinked and removed from DB. Negative thresholds blow up on purpose.
tracked_files() – dumb helper that dumps {path: ref_count} out of the DB.
wipe_log_records() – nukes both tables; used by clear_logs.

How cleanup flows

You run an operation (add, mul, whatever). Result file lands in LOG_DIR, gets registered, comes back as a StreamNumber.
You stream it or create more streams from it – metadata keeps getting updated via touch_log_file/register_reference.
When you’re done, call .free() or just drop references. Manual free is immediate. Otherwise the weakref finaliser catches it eventually.
release_reference is what actually removes the sqlite entries and unlinks the data file when there are no logical references left.
If you still have detritus (e.g. you crashed before refs hit zero), run collect_garbage(threshold) to sweep anything whose age outweighs usage.
active_streams() reports what’s still alive in Python land; tracked_files() shows what the DB thinks is referenced.

Example run (`test.py`)

test.py is half regression, half reference script. It:

seeds some numbers, runs every operation, checks results.
makes sure DivideByZeroError fires.
frees every staged number to prove files vanish on the spot.
runs collect_garbage(0) just to make sure nothing else lingers.
dumps active_streams() and tracked_files() so you can see python vs sqlite state.

If the logs ever seem suspicious, run that script—it’ll tell you immediately whether something’s still referenced or if the GC is forgetting to clean up.

5.8 KiB Raw Permalink Blame History Unescape Escape

What’s going on inside mathstream