Mathstream
Scalable streamed arithmetic for ultra-large integers, chunked from disk.
Do math on numbers too big for RAM by streaming digits off disk. Feed the library paths or literals, compose operations, and keep memory flat while outputs land in instance/log.
Why?
Traditional int types break when your numbers don’t fit in memory. mathstream trades RAM for disk by using streamed digit files, letting you work with absurdly large integers on normal machines.
Perfect for:
- Iterative transformations like Collatz walks
- Memory-constrained pipelines
- Low-level big-number experimentation
- Long-running experiments where deterministic cleanup beats GC guesswork
Quick Demo
from mathstream import StreamNumber, add
a = StreamNumber(literal="999999999999999999")
b = StreamNumber(literal="1")
print("sum =", "".join(add(a, b).stream()))
Installation
python -m venv venv
source venv/bin/activate
pip install -e .
Usage
from mathstream import mul, StreamNumber
a = StreamNumber("path/to/big.txt")
b = StreamNumber(literal="1337")
result = mul(a, b)
print("".join(result.stream()))
# same helpers are available via Python operators
total = a + b # calls mathstream.add under the hood
ratio = total / 2 # literal coercion is automatic
with StreamNumber(literal="10") as temp:
product = temp * ratio
Available operations:
- Core arithmetic:
add,sub,mul,div,mod,pow - Introspection & helpers:
is_even,is_odd,free_stream,active_streams,tracked_files - Lifecycle control:
collect_garbage,set_manual_free_only,manual_free_only_enabled - Environment helpers:
clear_logs,StreamNumber.write_stage,engine.LOG_DIR - Python sugar:
StreamNumberimplements+,-,*,/,%,**, and their reflected counterparts. Rawint,str, orpathlib.Pathoperands are coerced automatically. - Context manager support:
with StreamNumber(...) as sn:ensures.free()is called at exit. - Module entry point:
python -m mathstreamlaunches the interactive streamed-math REPL fromstream_repl.py.
How It Works
- Streamed operands –
StreamNumberwraps either a user-supplied digit file or an integer literal materialised inLOG_DIR. Data is read linearly in configurable chunks, never promoted to a Pythonint. - Staging directory – Every operation writes results into
mathstream.number.LOG_DIR(defaultinstance/log). File names include hashes of input paths so repeated calls reuse the same staged copies. - Bookkeeping database –
mathstream/utils.pykeepsinstance/mathstream_logs.sqlite, recording creation time, last access, total access count, and reference counts. This powers GC decisions and makes it trivial to audit what’s on disk. - Reference counting – Every
StreamNumberbumps a ref count in sqlite and in-process counters. Dropping the last reference (or callingfree_stream) decrements counts and optionally unlinks the file immediately. - Manual-only mode – Call
set_manual_free_only(True)when you want absolute control over lifecycle. The weakref finaliser stops deleting staged files, so outputs persist until you call.free()orcollect_garbage(). - Zero-copy chaining – Since staged files stay on disk, you can pass
StreamNumberhandles between processes or reuse them in later runs without recomputing.
Performance Tips
- Reuse literal
StreamNumberobjects to avoid rewriting identical data. - Call
free_stream(...)or use context managers to drop staged results quickly. - Run
collect_garbage(score_threshold)to purge stale intermediates. - Keep an eye on disk space in
instance/log—streaming shifts the pressure from RAM to storage. - For huge literals (10⁶+ digits), generate them directly on disk and wrap the path instead of passing
literal=.... - Tweak
StreamNumber.stream(chunk_size)to balance syscalls vs. memory: large chunks speed up CPU-bound math, smaller chunks play nicer with slow disks. - If you are scripting long sessions, snapshot
tracked_files()periodically; it’s an easy indicator of leaked references.
Common Pitfalls & Recoveries
- Accidentally freed files – Automatic finalizers may delete staged outputs while you still hold the path elsewhere. Fix: call
set_manual_free_only(True)at the start of long-lived workflows, or passdelete_file=Falsetofree_streamwhen you need to keep the digits around manually. - Operator coercion surprises – Arithmetic operators turn
int,str, orPathoperands into streamed numbers. If a string happens to be a file path instead of a literal, the actual file will be wrapped. Fix: be explicit (StreamNumber(literal="...")) when in doubt. - Literal churn – Recreating the same
StreamNumber(literal="123")millions of times hammers the filesystem. Fix: stash the first instance, or cache the.pathand rely onStreamNumber(existing_path)in hot loops. - GC too aggressive – Running
collect_garbage(0)after every operation removes recently written files. Fix: raise the threshold (e.g.,collect_garbage(1000)) or run GC only after you’ve freed all references. - Chunk mismatch – Some editors save files with BOMs or commas.
_normalize_streamwill raiseValueError("Non-digit characters found..."). Fix: sanitise input files (only ASCII digits with optional leading sign). - Disk exhaustion – Terabyte-scale runs fill
instance/log. Fix: relocateengine.LOG_DIRto a larger volume or run periodiccollect_garbagesweeps and archive intermediate files. - Concurrency surprises – Multiple processes writing to the same
LOG_DIRshare the sqlite tracker. Ensure each writer callsfree_streamandcollect_garbageresponsibly, or isolate runs by changingLOG_DIRper worker.
Tools and Experiments
test.py– Regression smoke test covering all arithmetic helpers.collatz.py/collatz_ui/– Curses dashboard that streams Collatz sequences.seed_start.py– Seedsstart.txtvia streamed additions from various sources.find_my.py+pi_finder/– Nilakantha-based π explorer that writes results tofound.pi.stream_repl.py/python -m mathstream– Interactive REPL for streamed math (supportssave <var> <path>,:show,:purge,:cleanmode,:stats, andexit -sto keep staging files).WORK.md– Deep dive into architecture (Logger DB schema, reference lifetimes, cleanup flow).collatz_ui/views.py– Reference implementation of a threaded worker that coordinates streamed math and curses rendering without blocking.pi_finder/engine.py– Example of building high-precision algorithms (Nilakantha π) purely via streamed primitives, including manual caching of million-digit scale factors.
Extending
You can:
- Implement custom storage backends (e.g., S3-backed digit files).
- Compose primitives to build new helpers (gcd, factorial, etc.).
- Point
engine.LOG_DIRat your own staging directory before running operations. - Add new operations by mirroring the pattern in
mathstream/engine.py: normalise inputs with_normalize_stream, perform chunk-based math, then_write_result(...). - Build higher-level services (REST APIs, workers, dashboards) by sharing staged file paths instead of raw numbers.
- Layer parity or divisibility checks by reading the streamed digits lazily; there’s no requirement to materialise entire outputs unless you need them.
Contributing
Open to PRs for:
- New streamed math operations and optimizations.
- Smarter garbage collection / tooling around the sqlite tracker.
- Experiments that showcase creative uses (Collatz encoding, π spigots, etc.).
Please lint with ruff and follow the existing streaming patterns.
License
MIT. Use it, remix it, but keep backups—massive streamed math can chew through SSDs fast. 😅