diff --git a/mathstream/README.md b/mathstream/README.md index 24be591..2d955ab 100644 --- a/mathstream/README.md +++ b/mathstream/README.md @@ -1,8 +1,31 @@ -# Mathstream Library +# Mathstream -`mathstream` offers streamed, string-based arithmetic for very large integers that you may not want to load entirely into memory. Instead of parsing numbers into Python `int` values, you work with digit files on disk via `StreamNumber` and call math operations that operate chunk-by-chunk. +*Scalable streamed arithmetic for ultra-large integers, chunked from disk.* -## Quick Start +Do math on numbers too big for RAM by streaming digits off disk. Feed the library paths or literals, compose operations, and keep memory flat while outputs land in `instance/log`. + +## Why? + +Traditional `int` types break when your numbers don’t fit in memory. `mathstream` trades RAM for disk by using streamed digit files, letting you work with absurdly large integers on normal machines. + +Perfect for: +- Iterative transformations like Collatz walks +- Memory-constrained pipelines +- Low-level big-number experimentation +- Long-running experiments where deterministic cleanup beats GC guesswork + +## Quick Demo + +```python +from mathstream import StreamNumber, add + +a = StreamNumber(literal="999999999999999999") +b = StreamNumber(literal="1") + +print("sum =", "".join(add(a, b).stream())) +``` + +## Installation ```bash python -m venv venv @@ -10,90 +33,81 @@ source venv/bin/activate pip install -e . ``` -Create digit files anywhere you like (the examples below use `instance/log`), or supply ad-hoc literals, then construct `StreamNumber` objects and call the helpers: +## Usage ```python -from mathstream import ( - StreamNumber, - add, - sub, - mul, - div, - mod, - pow, - is_even, - is_odd, - free_stream, - collect_garbage, -) +from mathstream import mul, StreamNumber -a = StreamNumber("instance/log/huge.txt") -b = StreamNumber(literal="34567") -e = StreamNumber(literal="3") +a = StreamNumber("path/to/big.txt") +b = StreamNumber(literal="1337") -print("sum =", "".join(add(a, b).stream())) -print("difference =", "".join(sub(a, b).stream())) -print("product =", "".join(mul(a, b).stream())) -print("quotient =", "".join(div(a, b).stream())) -print("modulo =", "".join(mod(a, b).stream())) -print("power =", "".join(pow(a, e).stream())) -print("a is even?", is_even(a)) -print("b is odd?", is_odd(b)) - -# drop staged artifacts immediately when you are done -free_stream(b) - -# reclaim space for files whose age outweighs their use -collect_garbage(0.5) +result = mul(a, b) +print("".join(result.stream())) ``` -Each arithmetic call writes its result back into `instance/log` (configurable via `mathstream.number.LOG_DIR`) so you can stream the digits later or reuse them in further operations. +Available operations: +- Core arithmetic: `add`, `sub`, `mul`, `div`, `mod`, `pow` +- Introspection & helpers: `is_even`, `is_odd`, `free_stream`, `active_streams`, `tracked_files` +- Lifecycle control: `collect_garbage`, `set_manual_free_only`, `manual_free_only_enabled` +- Environment helpers: `clear_logs`, `StreamNumber.write_stage`, `engine.LOG_DIR` -## Core Concepts +## How It Works -- **StreamNumber(path | literal=...)** – Wraps a digit text file or creates one for an integer literal inside `LOG_DIR`. Literal operands are persisted as `literal_.txt`, so repeated runs reuse the same staged file (note that `clear_logs()` removes these cache files too). -- **`.stream(chunk_size)`** – Yields strings of digits with the provided chunk size. Operations in `mathstream.engine` consume these streams to avoid loading the entire number at once. -- **Automatic staging** – Outputs are stored under `LOG_DIR` with hashes based on input file paths, letting you compose operations without manual bookkeeping. -- **Sign-aware** – Addition, subtraction, multiplication, division (`//` behavior), modulo, and exponentiation (non-negative exponents) all respect operand sign. Division/modulo follow Python’s floor-division rules. -- **Utilities** – `clear_logs()` wipes prior staged results so you can start fresh. -- **Manual freeing** – Call `stream.free()` (or `free_stream(stream)`) once you are done with a staged number to release its reference immediately. Logger metadata keeps per-path reference counts so the final free removes the backing file on the spot. -- **GC toggle** – Need total control over when files disappear? Flip `mathstream.set_manual_free_only(True)` so automatic finalizers stop unlinking staged files; they will persist until you call `free()` (or `collect_garbage`). Use `mathstream.manual_free_only_enabled()` to inspect the current setting. -- **Parity helpers** – `is_even` and `is_odd` inspect the streamed digits without materializing the integer. -- **Garbage collection** – `collect_garbage(score_threshold)` computes a score from file age, access count, and reference count (tracked in `instance/mathstream_logs.sqlite`, freshly truncated each run). Files whose score meets or exceeds the threshold are deleted, letting you tune how aggressively to reclaim space. Both staged results and literal caches participate. Use `tracked_files()` or `active_streams()` to inspect current state. - -Divide-by-zero scenarios raise the custom `DivideByZeroError` so callers can distinguish mathstream issues from Python’s native exceptions. +- **Streamed operands** – `StreamNumber` wraps either a user-supplied digit file or an integer literal materialised in `LOG_DIR`. Data is read linearly in configurable chunks, never promoted to a Python `int`. +- **Staging directory** – Every operation writes results into `mathstream.number.LOG_DIR` (default `instance/log`). File names include hashes of input paths so repeated calls reuse the same staged copies. +- **Bookkeeping database** – `mathstream/utils.py` keeps `instance/mathstream_logs.sqlite`, recording creation time, last access, total access count, and reference counts. This powers GC decisions and makes it trivial to audit what’s on disk. +- **Reference counting** – Every `StreamNumber` bumps a ref count in sqlite and in-process counters. Dropping the last reference (or calling `free_stream`) decrements counts and optionally unlinks the file immediately. +- **Manual-only mode** – Call `set_manual_free_only(True)` when you want absolute control over lifecycle. The weakref finaliser stops deleting staged files, so outputs persist until you call `.free()` or `collect_garbage()`. +- **Zero-copy chaining** – Since staged files stay on disk, you can pass `StreamNumber` handles between processes or reuse them in later runs without recomputing. ## Performance Tips -- **Reuse literal streams** – `StreamNumber(literal=...)` persists a hashed copy under `LOG_DIR`. Reuse those objects (or their filenames) across operations instead of recreating them every call. Repeated literal construction churns the filesystem: you pay the cost to rewrite identical data, poll the logger database, and spike disk I/O. Hang on to the staged literal or memoize it so it can be streamed repeatedly without rewriting. -- **Free aggressively** – When a staged result or literal copy is no longer needed, call `free_stream()` (or use `with StreamNumber(...) as n:`) so the reference count drops immediately. This keeps the cache tidy and reduces the chance that stale literal files pile up between runs. +- Reuse literal `StreamNumber` objects to avoid rewriting identical data. +- Call `free_stream(...)` or use context managers to drop staged results quickly. +- Run `collect_garbage(score_threshold)` to purge stale intermediates. +- Keep an eye on disk space in `instance/log`—streaming shifts the pressure from RAM to storage. +- For huge literals (10⁶+ digits), generate them directly on disk and wrap the path instead of passing `literal=...`. +- Tweak `StreamNumber.stream(chunk_size)` to balance syscalls vs. memory: large chunks speed up CPU-bound math, smaller chunks play nicer with slow disks. +- If you are scripting long sessions, snapshot `tracked_files()` periodically; it’s an easy indicator of leaked references. -## Example Script +### Common Pitfalls & Recoveries -`test.py` in the repository root demonstrates a minimal workflow: +- **Accidentally freed files** – Automatic finalizers may delete staged outputs while you still hold the path elsewhere. Fix: call `set_manual_free_only(True)` at the start of long-lived workflows, or pass `delete_file=False` to `free_stream` when you need to keep the digits around manually. +- **Literal churn** – Recreating the same `StreamNumber(literal="123")` millions of times hammers the filesystem. Fix: stash the first instance, or cache the `.path` and rely on `StreamNumber(existing_path)` in hot loops. +- **GC too aggressive** – Running `collect_garbage(0)` after every operation removes recently written files. Fix: raise the threshold (e.g., `collect_garbage(1000)`) or run GC only after you’ve freed all references. +- **Chunk mismatch** – Some editors save files with BOMs or commas. `_normalize_stream` will raise `ValueError("Non-digit characters found...")`. Fix: sanitise input files (only ASCII digits with optional leading sign). +- **Disk exhaustion** – Terabyte-scale runs fill `instance/log`. Fix: relocate `engine.LOG_DIR` to a larger volume or run periodic `collect_garbage` sweeps and archive intermediate files. +- **Concurrency surprises** – Multiple processes writing to the same `LOG_DIR` share the sqlite tracker. Ensure each writer calls `free_stream` and `collect_garbage` responsibly, or isolate runs by changing `LOG_DIR` per worker. -1. Writes sample operands to `tests/*.txt`. -2. Calls every arithmetic primitive plus the modulo/parity helpers. -3. Asserts that the streamed outputs match known values (helpful for quick regression checks). +## Tools and Experiments -Run it via: - -```bash -python test.py -``` +- `test.py` – Regression smoke test covering all arithmetic helpers. +- `collatz.py` / `collatz_ui/` – Curses dashboard that streams Collatz sequences. +- `seed_start.py` – Seeds `start.txt` via streamed additions from various sources. +- `find_my.py` + `pi_finder/` – Nilakantha-based π explorer that writes results to `found.pi`. +- `WORK.md` – Deep dive into architecture (Logger DB schema, reference lifetimes, cleanup flow). +- `collatz_ui/views.py` – Reference implementation of a threaded worker that coordinates streamed math and curses rendering without blocking. +- `pi_finder/engine.py` – Example of building high-precision algorithms (Nilakantha π) purely via streamed primitives, including manual caching of million-digit scale factors. ## Extending -- To hook into other storage backends, implement your own `StreamNumber` variant with the same `.stream()` interface. -- Need modulo or gcd? Compose the existing primitives (e.g., repeated subtraction or using `div` + remainder tracking inside `_divide_abs`) or add new helpers following the same streamed pattern. -- For more control over output locations, override `LOG_DIR` before using the operations: +You can: +- Implement custom storage backends (e.g., S3-backed digit files). +- Compose primitives to build new helpers (gcd, factorial, etc.). +- Point `engine.LOG_DIR` at your own staging directory before running operations. +- Add new operations by mirroring the pattern in `mathstream/engine.py`: normalise inputs with `_normalize_stream`, perform chunk-based math, then `_write_result(...)`. +- Build higher-level services (REST APIs, workers, dashboards) by sharing staged file paths instead of raw numbers. +- Layer parity or divisibility checks by reading the streamed digits lazily; there’s no requirement to materialise entire outputs unless you need them. -```python -from mathstream import engine -from pathlib import Path +## Contributing -engine.LOG_DIR = Path("/tmp/my_mathstage") -engine.clear_logs() -``` +Open to PRs for: +- New streamed math operations and optimizations. +- Smarter garbage collection / tooling around the sqlite tracker. +- Experiments that showcase creative uses (Collatz encoding, π spigots, etc.). -With these building blocks, you can manipulate arbitrarily large integers while keeping memory usage constant. Happy streaming! +Please lint with `ruff` and follow the existing streaming patterns. + +## License + +MIT. Use it, remix it, but keep backups—massive streamed math can chew through SSDs fast. 😅