mathy/mathstream/README.md
2025-11-05 11:16:15 +01:00

5.1 KiB
Raw Blame History

Mathstream Library

mathstream offers streamed, string-based arithmetic for very large integers that you may not want to load entirely into memory. Instead of parsing numbers into Python int values, you work with digit files on disk via StreamNumber and call math operations that operate chunk-by-chunk.

Quick Start

python -m venv venv
source venv/bin/activate
pip install -e .

Create digit files anywhere you like (the examples below use instance/log), or supply ad-hoc literals, then construct StreamNumber objects and call the helpers:

from mathstream import (
    StreamNumber,
    add,
    sub,
    mul,
    div,
    mod,
    pow,
    is_even,
    is_odd,
    free_stream,
    collect_garbage,
)

a = StreamNumber("instance/log/huge.txt")
b = StreamNumber(literal="34567")
e = StreamNumber(literal="3")

print("sum =", "".join(add(a, b).stream()))
print("difference =", "".join(sub(a, b).stream()))
print("product =", "".join(mul(a, b).stream()))
print("quotient =", "".join(div(a, b).stream()))
print("modulo =", "".join(mod(a, b).stream()))
print("power =", "".join(pow(a, e).stream()))
print("a is even?", is_even(a))
print("b is odd?", is_odd(b))

# drop staged artifacts immediately when you are done
free_stream(b)

# reclaim space for files whose age outweighs their use
collect_garbage(0.5)

Each arithmetic call writes its result back into instance/log (configurable via mathstream.number.LOG_DIR) so you can stream the digits later or reuse them in further operations.

Core Concepts

  • StreamNumber(path | literal=...) Wraps a digit text file or creates one for an integer literal inside LOG_DIR. Literal operands are persisted as literal_<hash>.txt, so repeated runs reuse the same staged file (note that clear_logs() removes these cache files too).
  • .stream(chunk_size) Yields strings of digits with the provided chunk size. Operations in mathstream.engine consume these streams to avoid loading the entire number at once.
  • Automatic staging Outputs are stored under LOG_DIR with hashes based on input file paths, letting you compose operations without manual bookkeeping.
  • Sign-aware Addition, subtraction, multiplication, division (// behavior), modulo, and exponentiation (non-negative exponents) all respect operand sign. Division/modulo follow Pythons floor-division rules.
  • Utilities clear_logs() wipes prior staged results so you can start fresh.
  • Manual freeing Call stream.free() (or free_stream(stream)) once you are done with a staged number to release its reference immediately. Logger metadata keeps per-path reference counts so the final free removes the backing file on the spot.
  • Parity helpers is_even and is_odd inspect the streamed digits without materializing the integer.
  • Garbage collection collect_garbage(score_threshold) computes a score from file age, access count, and reference count (tracked in instance/mathstream_logs.sqlite, freshly truncated each run). Files whose score meets or exceeds the threshold are deleted, letting you tune how aggressively to reclaim space. Both staged results and literal caches participate. Use tracked_files() or active_streams() to inspect current state.

Divide-by-zero scenarios raise the custom DivideByZeroError so callers can distinguish mathstream issues from Pythons native exceptions.

Performance Tips

  • Reuse literal streams StreamNumber(literal=...) persists a hashed copy under LOG_DIR. Reuse those objects (or their filenames) across operations instead of recreating them every call. Repeated literal construction churns the filesystem: you pay the cost to rewrite identical data, poll the logger database, and spike disk I/O. Hang on to the staged literal or memoize it so it can be streamed repeatedly without rewriting.
  • Free aggressively When a staged result or literal copy is no longer needed, call free_stream() (or use with StreamNumber(...) as n:) so the reference count drops immediately. This keeps the cache tidy and reduces the chance that stale literal files pile up between runs.

Example Script

test.py in the repository root demonstrates a minimal workflow:

  1. Writes sample operands to tests/*.txt.
  2. Calls every arithmetic primitive plus the modulo/parity helpers.
  3. Asserts that the streamed outputs match known values (helpful for quick regression checks).

Run it via:

python test.py

Extending

  • To hook into other storage backends, implement your own StreamNumber variant with the same .stream() interface.
  • Need modulo or gcd? Compose the existing primitives (e.g., repeated subtraction or using div + remainder tracking inside _divide_abs) or add new helpers following the same streamed pattern.
  • For more control over output locations, override LOG_DIR before using the operations:
from mathstream import engine
from pathlib import Path

engine.LOG_DIR = Path("/tmp/my_mathstage")
engine.clear_logs()

With these building blocks, you can manipulate arbitrarily large integers while keeping memory usage constant. Happy streaming!