# Mathstream *Scalable streamed arithmetic for ultra-large integers, chunked from disk.* Do math on numbers too big for RAM by streaming digits off disk. Feed the library paths or literals, compose operations, and keep memory flat while outputs land in `instance/log`. ## Why? Traditional `int` types break when your numbers don’t fit in memory. `mathstream` trades RAM for disk by using streamed digit files, letting you work with absurdly large integers on normal machines. Perfect for: - Iterative transformations like Collatz walks - Memory-constrained pipelines - Low-level big-number experimentation - Long-running experiments where deterministic cleanup beats GC guesswork ## Quick Demo ```python from mathstream import StreamNumber, add a = StreamNumber(literal="999999999999999999") b = StreamNumber(literal="1") print("sum =", "".join(add(a, b).stream())) ``` ## Installation ```bash python -m venv venv source venv/bin/activate pip install -e . ``` ## Usage ```python from mathstream import mul, StreamNumber a = StreamNumber("path/to/big.txt") b = StreamNumber(literal="1337") result = mul(a, b) print("".join(result.stream())) # same helpers are available via Python operators total = a + b # calls mathstream.add under the hood ratio = total / 2 # literal coercion is automatic with StreamNumber(literal="10") as temp: product = temp * ratio ``` Available operations: - Core arithmetic: `add`, `sub`, `mul`, `div`, `mod`, `pow` - Introspection & helpers: `is_even`, `is_odd`, `free_stream`, `active_streams`, `tracked_files` - Lifecycle control: `collect_garbage`, `set_manual_free_only`, `manual_free_only_enabled` - Environment helpers: `clear_logs`, `StreamNumber.write_stage`, `engine.LOG_DIR` - Python sugar: `StreamNumber` implements `+`, `-`, `*`, `/`, `%`, `**`, and their reflected counterparts. Raw `int`, `str`, or `pathlib.Path` operands are coerced automatically. - Context manager support: `with StreamNumber(...) as sn:` ensures `.free()` is called at exit. - Module entry point: `python -m mathstream` launches the interactive streamed-math REPL from `stream_repl.py`. ## How It Works - **Streamed operands** – `StreamNumber` wraps either a user-supplied digit file or an integer literal materialised in `LOG_DIR`. Data is read linearly in configurable chunks, never promoted to a Python `int`. - **Staging directory** – Every operation writes results into `mathstream.number.LOG_DIR` (default `instance/log`). File names include hashes of input paths so repeated calls reuse the same staged copies. - **Bookkeeping database** – `mathstream/utils.py` keeps `instance/mathstream_logs.sqlite`, recording creation time, last access, total access count, and reference counts. This powers GC decisions and makes it trivial to audit what’s on disk. - **Reference counting** – Every `StreamNumber` bumps a ref count in sqlite and in-process counters. Dropping the last reference (or calling `free_stream`) decrements counts and optionally unlinks the file immediately. - **Manual-only mode** – Call `set_manual_free_only(True)` when you want absolute control over lifecycle. The weakref finaliser stops deleting staged files, so outputs persist until you call `.free()` or `collect_garbage()`. - **Zero-copy chaining** – Since staged files stay on disk, you can pass `StreamNumber` handles between processes or reuse them in later runs without recomputing. ## Performance Tips - Reuse literal `StreamNumber` objects to avoid rewriting identical data. - Call `free_stream(...)` or use context managers to drop staged results quickly. - Run `collect_garbage(score_threshold)` to purge stale intermediates. - Keep an eye on disk space in `instance/log`—streaming shifts the pressure from RAM to storage. - For huge literals (10⁶+ digits), generate them directly on disk and wrap the path instead of passing `literal=...`. - Tweak `StreamNumber.stream(chunk_size)` to balance syscalls vs. memory: large chunks speed up CPU-bound math, smaller chunks play nicer with slow disks. - If you are scripting long sessions, snapshot `tracked_files()` periodically; it’s an easy indicator of leaked references. ### Common Pitfalls & Recoveries - **Accidentally freed files** – Automatic finalizers may delete staged outputs while you still hold the path elsewhere. Fix: call `set_manual_free_only(True)` at the start of long-lived workflows, or pass `delete_file=False` to `free_stream` when you need to keep the digits around manually. - **Operator coercion surprises** – Arithmetic operators turn `int`, `str`, or `Path` operands into streamed numbers. If a string happens to be a *file path* instead of a literal, the actual file will be wrapped. Fix: be explicit (`StreamNumber(literal="...")`) when in doubt. - **Literal churn** – Recreating the same `StreamNumber(literal="123")` millions of times hammers the filesystem. Fix: stash the first instance, or cache the `.path` and rely on `StreamNumber(existing_path)` in hot loops. - **GC too aggressive** – Running `collect_garbage(0)` after every operation removes recently written files. Fix: raise the threshold (e.g., `collect_garbage(1000)`) or run GC only after you’ve freed all references. - **Chunk mismatch** – Some editors save files with BOMs or commas. `_normalize_stream` will raise `ValueError("Non-digit characters found...")`. Fix: sanitise input files (only ASCII digits with optional leading sign). - **Disk exhaustion** – Terabyte-scale runs fill `instance/log`. Fix: relocate `engine.LOG_DIR` to a larger volume or run periodic `collect_garbage` sweeps and archive intermediate files. - **Concurrency surprises** – Multiple processes writing to the same `LOG_DIR` share the sqlite tracker. Ensure each writer calls `free_stream` and `collect_garbage` responsibly, or isolate runs by changing `LOG_DIR` per worker. ## Tools and Experiments - `test.py` – Regression smoke test covering all arithmetic helpers. - `collatz.py` / `collatz_ui/` – Curses dashboard that streams Collatz sequences. - `seed_start.py` – Seeds `start.txt` via streamed additions from various sources. - `find_my.py` + `pi_finder/` – Nilakantha-based π explorer that writes results to `found.pi`. - `stream_repl.py` / `python -m mathstream` – Interactive REPL for streamed math (supports `save `, `:show`, `:purge`, `:cleanmode`, `:stats`, and `exit -s` to keep staging files). - `WORK.md` – Deep dive into architecture (Logger DB schema, reference lifetimes, cleanup flow). - `collatz_ui/views.py` – Reference implementation of a threaded worker that coordinates streamed math and curses rendering without blocking. - `pi_finder/engine.py` – Example of building high-precision algorithms (Nilakantha π) purely via streamed primitives, including manual caching of million-digit scale factors. ## Extending You can: - Implement custom storage backends (e.g., S3-backed digit files). - Compose primitives to build new helpers (gcd, factorial, etc.). - Point `engine.LOG_DIR` at your own staging directory before running operations. - Add new operations by mirroring the pattern in `mathstream/engine.py`: normalise inputs with `_normalize_stream`, perform chunk-based math, then `_write_result(...)`. - Build higher-level services (REST APIs, workers, dashboards) by sharing staged file paths instead of raw numbers. - Layer parity or divisibility checks by reading the streamed digits lazily; there’s no requirement to materialise entire outputs unless you need them. ## Contributing Open to PRs for: - New streamed math operations and optimizations. - Smarter garbage collection / tooling around the sqlite tracker. - Experiments that showcase creative uses (Collatz encoding, π spigots, etc.). Please lint with `ruff` and follow the existing streaming patterns. ## License MIT. Use it, remix it, but keep backups—massive streamed math can chew through SSDs fast. 😅