5.8 KiB
What’s going on inside mathstream
Figured I’d write down how the whole thing is wired, more like a brain dump than polished docs. If you’re diving in to fix something, this should save you a bunch of spelunking.
New top-level
README.mdnow mirrors the high-level pitch, cross-links experiments (collatz_ui,pi_finder,find_my.py,stupid.py,seed_start.py), and lists quick-start commands (pip install -e .,python test.py, etc.). Treat this file as the deep dive into the mathstream core.
Directory map (so you know where to poke)
mathstream/
__init__.py # re-export central
engine.py # arithmetic guts
exceptions.py # custom errors
number.py # StreamNumber, manual GC, watcher
utils.py # sqlite junk drawer
test.py # smoke/integration script
StreamNumber – the heart of it
mathstream/number.py owns the StreamNumber class. The class does a couple jobs:
- wraps a file of digits (either you give us a path or a literal; literals get canonicalised and dropped into
instance/log/literal_<hash>.txt). - streaming happens via
.stream(chunk_size)so we never load the whole thing; every time we read we calltouch_log_fileso the usage timestamp keeps moving. - when a new stream gets created we check if it lives under
LOG_DIR. If yes, we register it with the sqlite tracker (register_log_file) and also bump a ref counter viaregister_reference. - there’s a weakref finaliser plus a global
_ACTIVE_COUNTERthat keeps tabs on python-side references. If the object falls out of scope we run_finalize_instance, which decrements the counter and if it was the last one we callrelease_reference(that may nuke the file instantly). - explicit
free()exists for people who want deterministic cleanup. It’s basically likefree()in C: drop ref count and optionally delete the file now. There’s an aliasfree_streamand the class is a context manager sowith StreamNumber(...) as sn:cleans up automatically.
So any time you’ve got a staged result hanging around in memory, the watcher knows about it. Once you ditch it—either by free() or just letting the object die—the sqlite ref count drops.
Engine – maths without ints
Living in mathstream/engine.py. All the operators (add/sub/mul/div/mod/pow) pull chunks from the StreamNumber inputs, normalise them into sign + digit strings, run grade-school algorithms, then write the result back into LOG_DIR.
_write_resultis the important bit: writes to disk, callsregister_log_file, then wraps the file in a newStreamNumber. Because of that call, every staged result is tracked automatically.- We’re careful about signs: division and modulo follow Python’s floor division rules. Divide-by-zero is intercepted and converted into
DivideByZeroError. clear_logs()wipes the folder and callswipe_log_records()to empty sqlite so the next run isn’t polluted.
Exceptions
mathstream/exceptions.py just defines MathStreamError and the more specific DivideByZeroError. Nothing fancy, just so we don’t leak raw ZeroDivisionError.
SQLite watcher (mathstream/utils.py)
This is the garbage-collection HQ. On import we run _ensure_db(reset=True) so every run starts from a clean DB (no migrations, no surprises). Two tables:
logs→ metadata about every staged file: created time, last access, access count.refs→ current reference count (think “how many StreamNumber instances think they own this file”).
Important functions:
register_log_file(path)– ensure both tables have a row (initial ref count 0).register_reference(path)– increments the ref count, updates last access, access count etc. Called whenever a newStreamNumberpoints at the staged file.touch_log_file(path)– called from.stream()so we know the file is being read.release_reference(path, delete_file=True)– the inverse of register. If the count hits zero we remove the DB row and (optionally) delete the file right away.collect_garbage(score_threshold)– this is the periodic sweeper. Computesscore = age / ((ref_count + 1) * (access_count + 1)). Bigger score means older + less used. If score >= threshold it gets unlinked and removed from DB. Negative thresholds blow up on purpose.tracked_files()– dumb helper that dumps{path: ref_count}out of the DB.wipe_log_records()– nukes both tables; used byclear_logs.
How cleanup flows
- You run an operation (
add,mul, whatever). Result file lands inLOG_DIR, gets registered, comes back as aStreamNumber. - You stream it or create more streams from it – metadata keeps getting updated via
touch_log_file/register_reference. - When you’re done, call
.free()or just drop references. Manual free is immediate. Otherwise the weakref finaliser catches it eventually. release_referenceis what actually removes the sqlite entries and unlinks the data file when there are no logical references left.- If you still have detritus (e.g. you crashed before refs hit zero), run
collect_garbage(threshold)to sweep anything whose age outweighs usage. active_streams()reports what’s still alive in Python land;tracked_files()shows what the DB thinks is referenced.
Example run (test.py)
test.py is half regression, half reference script. It:
- seeds some numbers, runs every operation, checks results.
- makes sure
DivideByZeroErrorfires. - frees every staged number to prove files vanish on the spot.
- runs
collect_garbage(0)just to make sure nothing else lingers. - dumps
active_streams()andtracked_files()so you can see python vs sqlite state.
If the logs ever seem suspicious, run that script—it’ll tell you immediately whether something’s still referenced or if the GC is forgetting to clean up.