# Mathstream

*Scalable streamed arithmetic for ultra-large integers, chunked from disk.*

Do math on numbers too big for RAM by streaming digits off disk. Feed the library paths or literals, compose operations, and keep memory flat while outputs land in `instance/log`.

## Why?

Traditional `int` types break when your numbers don’t fit in memory. `mathstream` trades RAM for disk by using streamed digit files, letting you work with absurdly large integers on normal machines.

Perfect for:
- Iterative transformations like Collatz walks
- Memory-constrained pipelines
- Low-level big-number experimentation
- Long-running experiments where deterministic cleanup beats GC guesswork

## Quick Demo

```python
from mathstream import StreamNumber, add

a = StreamNumber(literal="999999999999999999")
b = StreamNumber(literal="1")

print("sum =", "".join(add(a, b).stream()))
```

## Installation

```bash
python -m venv venv
source venv/bin/activate
pip install -e .
```

## Usage

```python
from mathstream import mul, StreamNumber

a = StreamNumber("path/to/big.txt")
b = StreamNumber(literal="1337")

result = mul(a, b)
print("".join(result.stream()))

# same helpers are available via Python operators
total = a + b          # calls mathstream.add under the hood
ratio = total / 2      # literal coercion is automatic
with StreamNumber(literal="10") as temp:
    product = temp * ratio
```

Available operations:
- Core arithmetic: `add`, `sub`, `mul`, `div`, `mod`, `pow`
- Introspection & helpers: `is_even`, `is_odd`, `free_stream`, `active_streams`, `tracked_files`
- Lifecycle control: `collect_garbage`, `set_manual_free_only`, `manual_free_only_enabled`
- Environment helpers: `clear_logs`, `StreamNumber.write_stage`, `engine.LOG_DIR`
- Python sugar: `StreamNumber` implements `+`, `-`, `*`, `/`, `%`, `**`, and their reflected counterparts. Raw `int`, `str`, or `pathlib.Path` operands are coerced automatically.
- Context manager support: `with StreamNumber(...) as sn:` ensures `.free()` is called at exit.
- Module entry point: `python -m mathstream` launches the interactive streamed-math REPL from `stream_repl.py`.

## How It Works

- **Streamed operands** – `StreamNumber` wraps either a user-supplied digit file or an integer literal materialised in `LOG_DIR`. Data is read linearly in configurable chunks, never promoted to a Python `int`.
- **Staging directory** – Every operation writes results into `mathstream.number.LOG_DIR` (default `instance/log`). File names include hashes of input paths so repeated calls reuse the same staged copies.
- **Bookkeeping database** – `mathstream/utils.py` keeps `instance/mathstream_logs.sqlite`, recording creation time, last access, total access count, and reference counts. This powers GC decisions and makes it trivial to audit what’s on disk.
- **Reference counting** – Every `StreamNumber` bumps a ref count in sqlite and in-process counters. Dropping the last reference (or calling `free_stream`) decrements counts and optionally unlinks the file immediately.
- **Manual-only mode** – Call `set_manual_free_only(True)` when you want absolute control over lifecycle. The weakref finaliser stops deleting staged files, so outputs persist until you call `.free()` or `collect_garbage()`.
- **Zero-copy chaining** – Since staged files stay on disk, you can pass `StreamNumber` handles between processes or reuse them in later runs without recomputing.

## Performance Tips

- Reuse literal `StreamNumber` objects to avoid rewriting identical data.
- Call `free_stream(...)` or use context managers to drop staged results quickly.
- Run `collect_garbage(score_threshold)` to purge stale intermediates.
- Keep an eye on disk space in `instance/log`—streaming shifts the pressure from RAM to storage.
- For huge literals (10⁶+ digits), generate them directly on disk and wrap the path instead of passing `literal=...`.
- Tweak `StreamNumber.stream(chunk_size)` to balance syscalls vs. memory: large chunks speed up CPU-bound math, smaller chunks play nicer with slow disks.
- If you are scripting long sessions, snapshot `tracked_files()` periodically; it’s an easy indicator of leaked references.

### Common Pitfalls & Recoveries

- **Accidentally freed files** – Automatic finalizers may delete staged outputs while you still hold the path elsewhere. Fix: call `set_manual_free_only(True)` at the start of long-lived workflows, or pass `delete_file=False` to `free_stream` when you need to keep the digits around manually.
- **Operator coercion surprises** – Arithmetic operators turn `int`, `str`, or `Path` operands into streamed numbers. If a string happens to be a *file path* instead of a literal, the actual file will be wrapped. Fix: be explicit (`StreamNumber(literal="...")`) when in doubt.
- **Literal churn** – Recreating the same `StreamNumber(literal="123")` millions of times hammers the filesystem. Fix: stash the first instance, or cache the `.path` and rely on `StreamNumber(existing_path)` in hot loops.
- **GC too aggressive** – Running `collect_garbage(0)` after every operation removes recently written files. Fix: raise the threshold (e.g., `collect_garbage(1000)`) or run GC only after you’ve freed all references.
- **Chunk mismatch** – Some editors save files with BOMs or commas. `_normalize_stream` will raise `ValueError("Non-digit characters found...")`. Fix: sanitise input files (only ASCII digits with optional leading sign).
- **Disk exhaustion** – Terabyte-scale runs fill `instance/log`. Fix: relocate `engine.LOG_DIR` to a larger volume or run periodic `collect_garbage` sweeps and archive intermediate files.
- **Concurrency surprises** – Multiple processes writing to the same `LOG_DIR` share the sqlite tracker. Ensure each writer calls `free_stream` and `collect_garbage` responsibly, or isolate runs by changing `LOG_DIR` per worker.

## Tools and Experiments

- `test.py` – Regression smoke test covering all arithmetic helpers.
- `collatz.py` / `collatz_ui/` – Curses dashboard that streams Collatz sequences.
- `seed_start.py` – Seeds `start.txt` via streamed additions from various sources.
- `find_my.py` + `pi_finder/` – Nilakantha-based π explorer that writes results to `found.pi`.
- `stream_repl.py` / `python -m mathstream` – Interactive REPL for streamed math (supports `save <var> <path>`, `:show`, `:purge`, `:cleanmode`, `:stats`, and `exit -s` to keep staging files).
- `WORK.md` – Deep dive into architecture (Logger DB schema, reference lifetimes, cleanup flow).
- `collatz_ui/views.py` – Reference implementation of a threaded worker that coordinates streamed math and curses rendering without blocking.
- `pi_finder/engine.py` – Example of building high-precision algorithms (Nilakantha π) purely via streamed primitives, including manual caching of million-digit scale factors.

## Extending

You can:
- Implement custom storage backends (e.g., S3-backed digit files).
- Compose primitives to build new helpers (gcd, factorial, etc.).
- Point `engine.LOG_DIR` at your own staging directory before running operations.
- Add new operations by mirroring the pattern in `mathstream/engine.py`: normalise inputs with `_normalize_stream`, perform chunk-based math, then `_write_result(...)`.
- Build higher-level services (REST APIs, workers, dashboards) by sharing staged file paths instead of raw numbers.
- Layer parity or divisibility checks by reading the streamed digits lazily; there’s no requirement to materialise entire outputs unless you need them.

## Contributing

Open to PRs for:
- New streamed math operations and optimizations.
- Smarter garbage collection / tooling around the sqlite tracker.
- Experiments that showcase creative uses (Collatz encoding, π spigots, etc.).

Please lint with `ruff` and follow the existing streaming patterns.

## License

MIT. Use it, remix it, but keep backups—massive streamed math can chew through SSDs fast. 😅