mathy/mathstream
2025-11-05 16:35:15 +01:00
..
2025-11-05 16:35:15 +01:00
2025-11-05 16:35:15 +01:00
2025-11-05 08:35:01 +01:00
2025-11-05 08:35:01 +01:00
2025-11-05 16:35:15 +01:00
2025-11-05 16:35:15 +01:00
2025-11-05 16:35:15 +01:00
2025-11-05 16:35:15 +01:00

Mathstream

Scalable streamed arithmetic for ultra-large integers, chunked from disk.

Do math on numbers too big for RAM by streaming digits off disk. Feed the library paths or literals, compose operations, and keep memory flat while outputs land in instance/log.

Why?

Traditional int types break when your numbers dont fit in memory. mathstream trades RAM for disk by using streamed digit files, letting you work with absurdly large integers on normal machines.

Perfect for:

  • Iterative transformations like Collatz walks
  • Memory-constrained pipelines
  • Low-level big-number experimentation
  • Long-running experiments where deterministic cleanup beats GC guesswork

Quick Demo

from mathstream import StreamNumber, add

a = StreamNumber(literal="999999999999999999")
b = StreamNumber(literal="1")

print("sum =", "".join(add(a, b).stream()))

Installation

python -m venv venv
source venv/bin/activate
pip install -e .

Usage

from mathstream import mul, StreamNumber

a = StreamNumber("path/to/big.txt")
b = StreamNumber(literal="1337")

result = mul(a, b)
print("".join(result.stream()))

# same helpers are available via Python operators
total = a + b          # calls mathstream.add under the hood
ratio = total / 2      # literal coercion is automatic
with StreamNumber(literal="10") as temp:
    product = temp * ratio

Available operations:

  • Core arithmetic: add, sub, mul, div, mod, pow
  • Introspection & helpers: is_even, is_odd, free_stream, active_streams, tracked_files
  • Lifecycle control: collect_garbage, set_manual_free_only, manual_free_only_enabled
  • Environment helpers: clear_logs, StreamNumber.write_stage, engine.LOG_DIR
  • Python sugar: StreamNumber implements +, -, *, /, %, **, and their reflected counterparts. Raw int, str, or pathlib.Path operands are coerced automatically.
  • Context manager support: with StreamNumber(...) as sn: ensures .free() is called at exit.
  • Module entry point: python -m mathstream launches the interactive streamed-math REPL from stream_repl.py.

How It Works

  • Streamed operands StreamNumber wraps either a user-supplied digit file or an integer literal materialised in LOG_DIR. Data is read linearly in configurable chunks, never promoted to a Python int.
  • Staging directory Every operation writes results into mathstream.number.LOG_DIR (default instance/log). File names include hashes of input paths so repeated calls reuse the same staged copies.
  • Bookkeeping database mathstream/utils.py keeps instance/mathstream_logs.sqlite, recording creation time, last access, total access count, and reference counts. This powers GC decisions and makes it trivial to audit whats on disk.
  • Reference counting Every StreamNumber bumps a ref count in sqlite and in-process counters. Dropping the last reference (or calling free_stream) decrements counts and optionally unlinks the file immediately.
  • Manual-only mode Call set_manual_free_only(True) when you want absolute control over lifecycle. The weakref finaliser stops deleting staged files, so outputs persist until you call .free() or collect_garbage().
  • Zero-copy chaining Since staged files stay on disk, you can pass StreamNumber handles between processes or reuse them in later runs without recomputing.

Performance Tips

  • Reuse literal StreamNumber objects to avoid rewriting identical data.
  • Call free_stream(...) or use context managers to drop staged results quickly.
  • Run collect_garbage(score_threshold) to purge stale intermediates.
  • Keep an eye on disk space in instance/log—streaming shifts the pressure from RAM to storage.
  • For huge literals (10⁶+ digits), generate them directly on disk and wrap the path instead of passing literal=....
  • Tweak StreamNumber.stream(chunk_size) to balance syscalls vs. memory: large chunks speed up CPU-bound math, smaller chunks play nicer with slow disks.
  • If you are scripting long sessions, snapshot tracked_files() periodically; its an easy indicator of leaked references.

Common Pitfalls & Recoveries

  • Accidentally freed files Automatic finalizers may delete staged outputs while you still hold the path elsewhere. Fix: call set_manual_free_only(True) at the start of long-lived workflows, or pass delete_file=False to free_stream when you need to keep the digits around manually.
  • Operator coercion surprises Arithmetic operators turn int, str, or Path operands into streamed numbers. If a string happens to be a file path instead of a literal, the actual file will be wrapped. Fix: be explicit (StreamNumber(literal="...")) when in doubt.
  • Literal churn Recreating the same StreamNumber(literal="123") millions of times hammers the filesystem. Fix: stash the first instance, or cache the .path and rely on StreamNumber(existing_path) in hot loops.
  • GC too aggressive Running collect_garbage(0) after every operation removes recently written files. Fix: raise the threshold (e.g., collect_garbage(1000)) or run GC only after youve freed all references.
  • Chunk mismatch Some editors save files with BOMs or commas. _normalize_stream will raise ValueError("Non-digit characters found..."). Fix: sanitise input files (only ASCII digits with optional leading sign).
  • Disk exhaustion Terabyte-scale runs fill instance/log. Fix: relocate engine.LOG_DIR to a larger volume or run periodic collect_garbage sweeps and archive intermediate files.
  • Concurrency surprises Multiple processes writing to the same LOG_DIR share the sqlite tracker. Ensure each writer calls free_stream and collect_garbage responsibly, or isolate runs by changing LOG_DIR per worker.

Tools and Experiments

  • test.py Regression smoke test covering all arithmetic helpers.
  • collatz.py / collatz_ui/ Curses dashboard that streams Collatz sequences.
  • seed_start.py Seeds start.txt via streamed additions from various sources.
  • find_my.py + pi_finder/ Nilakantha-based π explorer that writes results to found.pi.
  • stream_repl.py / python -m mathstream Interactive REPL for streamed math (supports save <var> <path>, :show, :purge, :cleanmode, :stats, and exit -s to keep staging files).
  • WORK.md Deep dive into architecture (Logger DB schema, reference lifetimes, cleanup flow).
  • collatz_ui/views.py Reference implementation of a threaded worker that coordinates streamed math and curses rendering without blocking.
  • pi_finder/engine.py Example of building high-precision algorithms (Nilakantha π) purely via streamed primitives, including manual caching of million-digit scale factors.

Extending

You can:

  • Implement custom storage backends (e.g., S3-backed digit files).
  • Compose primitives to build new helpers (gcd, factorial, etc.).
  • Point engine.LOG_DIR at your own staging directory before running operations.
  • Add new operations by mirroring the pattern in mathstream/engine.py: normalise inputs with _normalize_stream, perform chunk-based math, then _write_result(...).
  • Build higher-level services (REST APIs, workers, dashboards) by sharing staged file paths instead of raw numbers.
  • Layer parity or divisibility checks by reading the streamed digits lazily; theres no requirement to materialise entire outputs unless you need them.

Contributing

Open to PRs for:

  • New streamed math operations and optimizations.
  • Smarter garbage collection / tooling around the sqlite tracker.
  • Experiments that showcase creative uses (Collatz encoding, π spigots, etc.).

Please lint with ruff and follow the existing streaming patterns.

License

MIT. Use it, remix it, but keep backups—massive streamed math can chew through SSDs fast. 😅