81 lines
5.8 KiB
Markdown
81 lines
5.8 KiB
Markdown
# What’s going on inside `mathstream`
|
||
|
||
Figured I’d write down how the whole thing is wired, more like a brain dump than polished docs. If you’re diving in to fix something, this should save you a bunch of spelunking.
|
||
|
||
> New top-level `README.md` now mirrors the high-level pitch, cross-links experiments (`collatz_ui`, `pi_finder`, `find_my.py`, `stupid.py`, `seed_start.py`), and lists quick-start commands (`pip install -e .`, `python test.py`, etc.). Treat this file as the deep dive into the mathstream core.
|
||
|
||
## Directory map (so you know where to poke)
|
||
|
||
```
|
||
mathstream/
|
||
__init__.py # re-export central
|
||
engine.py # arithmetic guts
|
||
exceptions.py # custom errors
|
||
number.py # StreamNumber, manual GC, watcher
|
||
utils.py # sqlite junk drawer
|
||
|
||
test.py # smoke/integration script
|
||
```
|
||
|
||
## StreamNumber – the heart of it
|
||
|
||
`mathstream/number.py` owns the `StreamNumber` class. The class does a couple jobs:
|
||
|
||
- wraps a file of digits (either you give us a path or a literal; literals get canonicalised and dropped into `instance/log/literal_<hash>.txt`).
|
||
- streaming happens via `.stream(chunk_size)` so we never load the whole thing; every time we read we call `touch_log_file` so the usage timestamp keeps moving.
|
||
- when a new stream gets created we check if it lives under `LOG_DIR`. If yes, we register it with the sqlite tracker (`register_log_file`) and also bump a ref counter via `register_reference`.
|
||
- there’s a weakref finaliser plus a global `_ACTIVE_COUNTER` that keeps tabs on python-side references. If the object falls out of scope we run `_finalize_instance`, which decrements the counter and if it was the last one we call `release_reference` (that may nuke the file instantly).
|
||
- explicit `free()` exists for people who want deterministic cleanup. It’s basically like `free()` in C: drop ref count and optionally delete the file now. There’s an alias `free_stream` and the class is a context manager so `with StreamNumber(...) as sn:` cleans up automatically.
|
||
|
||
So any time you’ve got a staged result hanging around in memory, the watcher knows about it. Once you ditch it—either by `free()` or just letting the object die—the sqlite ref count drops.
|
||
|
||
## Engine – maths without ints
|
||
|
||
Living in `mathstream/engine.py`. All the operators (`add/sub/mul/div/mod/pow`) pull chunks from the `StreamNumber` inputs, normalise them into sign + digit strings, run grade-school algorithms, then write the result back into `LOG_DIR`.
|
||
|
||
- `_write_result` is the important bit: writes to disk, calls `register_log_file`, then wraps the file in a new `StreamNumber`. Because of that call, every staged result is tracked automatically.
|
||
- We’re careful about signs: division and modulo follow Python’s floor division rules. Divide-by-zero is intercepted and converted into `DivideByZeroError`.
|
||
- `clear_logs()` wipes the folder and calls `wipe_log_records()` to empty sqlite so the next run isn’t polluted.
|
||
|
||
## Exceptions
|
||
|
||
`mathstream/exceptions.py` just defines `MathStreamError` and the more specific `DivideByZeroError`. Nothing fancy, just so we don’t leak raw `ZeroDivisionError`.
|
||
|
||
## SQLite watcher (`mathstream/utils.py`)
|
||
|
||
This is the garbage-collection HQ. On import we run `_ensure_db(reset=True)` so every run starts from a clean DB (no migrations, no surprises). Two tables:
|
||
|
||
- `logs` → metadata about every staged file: created time, last access, access count.
|
||
- `refs` → current reference count (think “how many StreamNumber instances think they own this file”).
|
||
|
||
Important functions:
|
||
|
||
- `register_log_file(path)` – ensure both tables have a row (initial ref count 0).
|
||
- `register_reference(path)` – increments the ref count, updates last access, access count etc. Called whenever a new `StreamNumber` points at the staged file.
|
||
- `touch_log_file(path)` – called from `.stream()` so we know the file is being read.
|
||
- `release_reference(path, delete_file=True)` – the inverse of register. If the count hits zero we remove the DB row and (optionally) delete the file right away.
|
||
- `collect_garbage(score_threshold)` – this is the periodic sweeper. Computes `score = age / ((ref_count + 1) * (access_count + 1))`. Bigger score means older + less used. If score >= threshold it gets unlinked and removed from DB. Negative thresholds blow up on purpose.
|
||
- `tracked_files()` – dumb helper that dumps `{path: ref_count}` out of the DB.
|
||
- `wipe_log_records()` – nukes both tables; used by `clear_logs`.
|
||
|
||
## How cleanup flows
|
||
|
||
1. You run an operation (`add`, `mul`, whatever). Result file lands in `LOG_DIR`, gets registered, comes back as a `StreamNumber`.
|
||
2. You stream it or create more streams from it – metadata keeps getting updated via `touch_log_file`/`register_reference`.
|
||
3. When you’re done, call `.free()` or just drop references. Manual free is immediate. Otherwise the weakref finaliser catches it eventually.
|
||
4. `release_reference` is what actually removes the sqlite entries and unlinks the data file when there are no logical references left.
|
||
5. If you still have detritus (e.g. you crashed before refs hit zero), run `collect_garbage(threshold)` to sweep anything whose age outweighs usage.
|
||
6. `active_streams()` reports what’s still alive in Python land; `tracked_files()` shows what the DB thinks is referenced.
|
||
|
||
## Example run (`test.py`)
|
||
|
||
`test.py` is half regression, half reference script. It:
|
||
|
||
- seeds some numbers, runs every operation, checks results.
|
||
- makes sure `DivideByZeroError` fires.
|
||
- frees every staged number to prove files vanish on the spot.
|
||
- runs `collect_garbage(0)` just to make sure nothing else lingers.
|
||
- dumps `active_streams()` and `tracked_files()` so you can see python vs sqlite state.
|
||
|
||
If the logs ever seem suspicious, run that script—it’ll tell you immediately whether something’s still referenced or if the GC is forgetting to clean up.
|