125 lines
7.8 KiB
Markdown
125 lines
7.8 KiB
Markdown
# Mathstream
|
||
|
||
*Scalable streamed arithmetic for ultra-large integers, chunked from disk.*
|
||
|
||
Do math on numbers too big for RAM by streaming digits off disk. Feed the library paths or literals, compose operations, and keep memory flat while outputs land in `instance/log`.
|
||
|
||
## Why?
|
||
|
||
Traditional `int` types break when your numbers don’t fit in memory. `mathstream` trades RAM for disk by using streamed digit files, letting you work with absurdly large integers on normal machines.
|
||
|
||
Perfect for:
|
||
- Iterative transformations like Collatz walks
|
||
- Memory-constrained pipelines
|
||
- Low-level big-number experimentation
|
||
- Long-running experiments where deterministic cleanup beats GC guesswork
|
||
|
||
## Quick Demo
|
||
|
||
```python
|
||
from mathstream import StreamNumber, add
|
||
|
||
a = StreamNumber(literal="999999999999999999")
|
||
b = StreamNumber(literal="1")
|
||
|
||
print("sum =", "".join(add(a, b).stream()))
|
||
```
|
||
|
||
## Installation
|
||
|
||
```bash
|
||
python -m venv venv
|
||
source venv/bin/activate
|
||
pip install -e .
|
||
```
|
||
|
||
## Usage
|
||
|
||
```python
|
||
from mathstream import mul, StreamNumber
|
||
|
||
a = StreamNumber("path/to/big.txt")
|
||
b = StreamNumber(literal="1337")
|
||
|
||
result = mul(a, b)
|
||
print("".join(result.stream()))
|
||
|
||
# same helpers are available via Python operators
|
||
total = a + b # calls mathstream.add under the hood
|
||
ratio = total / 2 # literal coercion is automatic
|
||
with StreamNumber(literal="10") as temp:
|
||
product = temp * ratio
|
||
```
|
||
|
||
Available operations:
|
||
- Core arithmetic: `add`, `sub`, `mul`, `div`, `mod`, `pow`
|
||
- Introspection & helpers: `is_even`, `is_odd`, `free_stream`, `active_streams`, `tracked_files`
|
||
- Lifecycle control: `collect_garbage`, `set_manual_free_only`, `manual_free_only_enabled`
|
||
- Environment helpers: `clear_logs`, `StreamNumber.write_stage`, `engine.LOG_DIR`
|
||
- Python sugar: `StreamNumber` implements `+`, `-`, `*`, `/`, `%`, `**`, and their reflected counterparts. Raw `int`, `str`, or `pathlib.Path` operands are coerced automatically.
|
||
- Context manager support: `with StreamNumber(...) as sn:` ensures `.free()` is called at exit.
|
||
- Module entry point: `python -m mathstream` launches the interactive streamed-math REPL from `stream_repl.py`.
|
||
|
||
## How It Works
|
||
|
||
- **Streamed operands** – `StreamNumber` wraps either a user-supplied digit file or an integer literal materialised in `LOG_DIR`. Data is read linearly in configurable chunks, never promoted to a Python `int`.
|
||
- **Staging directory** – Every operation writes results into `mathstream.number.LOG_DIR` (default `instance/log`). File names include hashes of input paths so repeated calls reuse the same staged copies.
|
||
- **Bookkeeping database** – `mathstream/utils.py` keeps `instance/mathstream_logs.sqlite`, recording creation time, last access, total access count, and reference counts. This powers GC decisions and makes it trivial to audit what’s on disk.
|
||
- **Reference counting** – Every `StreamNumber` bumps a ref count in sqlite and in-process counters. Dropping the last reference (or calling `free_stream`) decrements counts and optionally unlinks the file immediately.
|
||
- **Manual-only mode** – Call `set_manual_free_only(True)` when you want absolute control over lifecycle. The weakref finaliser stops deleting staged files, so outputs persist until you call `.free()` or `collect_garbage()`.
|
||
- **Zero-copy chaining** – Since staged files stay on disk, you can pass `StreamNumber` handles between processes or reuse them in later runs without recomputing.
|
||
|
||
## Performance Tips
|
||
|
||
- Reuse literal `StreamNumber` objects to avoid rewriting identical data.
|
||
- Call `free_stream(...)` or use context managers to drop staged results quickly.
|
||
- Run `collect_garbage(score_threshold)` to purge stale intermediates.
|
||
- Keep an eye on disk space in `instance/log`—streaming shifts the pressure from RAM to storage.
|
||
- For huge literals (10⁶+ digits), generate them directly on disk and wrap the path instead of passing `literal=...`.
|
||
- Tweak `StreamNumber.stream(chunk_size)` to balance syscalls vs. memory: large chunks speed up CPU-bound math, smaller chunks play nicer with slow disks.
|
||
- If you are scripting long sessions, snapshot `tracked_files()` periodically; it’s an easy indicator of leaked references.
|
||
|
||
### Common Pitfalls & Recoveries
|
||
|
||
- **Accidentally freed files** – Automatic finalizers may delete staged outputs while you still hold the path elsewhere. Fix: call `set_manual_free_only(True)` at the start of long-lived workflows, or pass `delete_file=False` to `free_stream` when you need to keep the digits around manually.
|
||
- **Operator coercion surprises** – Arithmetic operators turn `int`, `str`, or `Path` operands into streamed numbers. If a string happens to be a *file path* instead of a literal, the actual file will be wrapped. Fix: be explicit (`StreamNumber(literal="...")`) when in doubt.
|
||
- **Literal churn** – Recreating the same `StreamNumber(literal="123")` millions of times hammers the filesystem. Fix: stash the first instance, or cache the `.path` and rely on `StreamNumber(existing_path)` in hot loops.
|
||
- **GC too aggressive** – Running `collect_garbage(0)` after every operation removes recently written files. Fix: raise the threshold (e.g., `collect_garbage(1000)`) or run GC only after you’ve freed all references.
|
||
- **Chunk mismatch** – Some editors save files with BOMs or commas. `_normalize_stream` will raise `ValueError("Non-digit characters found...")`. Fix: sanitise input files (only ASCII digits with optional leading sign).
|
||
- **Disk exhaustion** – Terabyte-scale runs fill `instance/log`. Fix: relocate `engine.LOG_DIR` to a larger volume or run periodic `collect_garbage` sweeps and archive intermediate files.
|
||
- **Concurrency surprises** – Multiple processes writing to the same `LOG_DIR` share the sqlite tracker. Ensure each writer calls `free_stream` and `collect_garbage` responsibly, or isolate runs by changing `LOG_DIR` per worker.
|
||
|
||
## Tools and Experiments
|
||
|
||
- `test.py` – Regression smoke test covering all arithmetic helpers.
|
||
- `collatz.py` / `collatz_ui/` – Curses dashboard that streams Collatz sequences.
|
||
- `seed_start.py` – Seeds `start.txt` via streamed additions from various sources.
|
||
- `find_my.py` + `pi_finder/` – Nilakantha-based π explorer that writes results to `found.pi`.
|
||
- `stream_repl.py` / `python -m mathstream` – Interactive REPL for streamed math (supports `save <var> <path>`, `:show`, `:purge`, `:cleanmode`, `:stats`, and `exit -s` to keep staging files).
|
||
- `WORK.md` – Deep dive into architecture (Logger DB schema, reference lifetimes, cleanup flow).
|
||
- `collatz_ui/views.py` – Reference implementation of a threaded worker that coordinates streamed math and curses rendering without blocking.
|
||
- `pi_finder/engine.py` – Example of building high-precision algorithms (Nilakantha π) purely via streamed primitives, including manual caching of million-digit scale factors.
|
||
|
||
## Extending
|
||
|
||
You can:
|
||
- Implement custom storage backends (e.g., S3-backed digit files).
|
||
- Compose primitives to build new helpers (gcd, factorial, etc.).
|
||
- Point `engine.LOG_DIR` at your own staging directory before running operations.
|
||
- Add new operations by mirroring the pattern in `mathstream/engine.py`: normalise inputs with `_normalize_stream`, perform chunk-based math, then `_write_result(...)`.
|
||
- Build higher-level services (REST APIs, workers, dashboards) by sharing staged file paths instead of raw numbers.
|
||
- Layer parity or divisibility checks by reading the streamed digits lazily; there’s no requirement to materialise entire outputs unless you need them.
|
||
|
||
## Contributing
|
||
|
||
Open to PRs for:
|
||
- New streamed math operations and optimizations.
|
||
- Smarter garbage collection / tooling around the sqlite tracker.
|
||
- Experiments that showcase creative uses (Collatz encoding, π spigots, etc.).
|
||
|
||
Please lint with `ruff` and follow the existing streaming patterns.
|
||
|
||
## License
|
||
|
||
MIT. Use it, remix it, but keep backups—massive streamed math can chew through SSDs fast. 😅
|