mathy/mathstream/README.md
2025-11-05 11:16:15 +01:00

99 lines
5.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Mathstream Library
`mathstream` offers streamed, string-based arithmetic for very large integers that you may not want to load entirely into memory. Instead of parsing numbers into Python `int` values, you work with digit files on disk via `StreamNumber` and call math operations that operate chunk-by-chunk.
## Quick Start
```bash
python -m venv venv
source venv/bin/activate
pip install -e .
```
Create digit files anywhere you like (the examples below use `instance/log`), or supply ad-hoc literals, then construct `StreamNumber` objects and call the helpers:
```python
from mathstream import (
StreamNumber,
add,
sub,
mul,
div,
mod,
pow,
is_even,
is_odd,
free_stream,
collect_garbage,
)
a = StreamNumber("instance/log/huge.txt")
b = StreamNumber(literal="34567")
e = StreamNumber(literal="3")
print("sum =", "".join(add(a, b).stream()))
print("difference =", "".join(sub(a, b).stream()))
print("product =", "".join(mul(a, b).stream()))
print("quotient =", "".join(div(a, b).stream()))
print("modulo =", "".join(mod(a, b).stream()))
print("power =", "".join(pow(a, e).stream()))
print("a is even?", is_even(a))
print("b is odd?", is_odd(b))
# drop staged artifacts immediately when you are done
free_stream(b)
# reclaim space for files whose age outweighs their use
collect_garbage(0.5)
```
Each arithmetic call writes its result back into `instance/log` (configurable via `mathstream.number.LOG_DIR`) so you can stream the digits later or reuse them in further operations.
## Core Concepts
- **StreamNumber(path | literal=...)** Wraps a digit text file or creates one for an integer literal inside `LOG_DIR`. Literal operands are persisted as `literal_<hash>.txt`, so repeated runs reuse the same staged file (note that `clear_logs()` removes these cache files too).
- **`.stream(chunk_size)`** Yields strings of digits with the provided chunk size. Operations in `mathstream.engine` consume these streams to avoid loading the entire number at once.
- **Automatic staging** Outputs are stored under `LOG_DIR` with hashes based on input file paths, letting you compose operations without manual bookkeeping.
- **Sign-aware** Addition, subtraction, multiplication, division (`//` behavior), modulo, and exponentiation (non-negative exponents) all respect operand sign. Division/modulo follow Pythons floor-division rules.
- **Utilities** `clear_logs()` wipes prior staged results so you can start fresh.
- **Manual freeing** Call `stream.free()` (or `free_stream(stream)`) once you are done with a staged number to release its reference immediately. Logger metadata keeps per-path reference counts so the final free removes the backing file on the spot.
- **Parity helpers** `is_even` and `is_odd` inspect the streamed digits without materializing the integer.
- **Garbage collection** `collect_garbage(score_threshold)` computes a score from file age, access count, and reference count (tracked in `instance/mathstream_logs.sqlite`, freshly truncated each run). Files whose score meets or exceeds the threshold are deleted, letting you tune how aggressively to reclaim space. Both staged results and literal caches participate. Use `tracked_files()` or `active_streams()` to inspect current state.
Divide-by-zero scenarios raise the custom `DivideByZeroError` so callers can distinguish mathstream issues from Pythons native exceptions.
## Performance Tips
- **Reuse literal streams** `StreamNumber(literal=...)` persists a hashed copy under `LOG_DIR`. Reuse those objects (or their filenames) across operations instead of recreating them every call. Repeated literal construction churns the filesystem: you pay the cost to rewrite identical data, poll the logger database, and spike disk I/O. Hang on to the staged literal or memoize it so it can be streamed repeatedly without rewriting.
- **Free aggressively** When a staged result or literal copy is no longer needed, call `free_stream()` (or use `with StreamNumber(...) as n:`) so the reference count drops immediately. This keeps the cache tidy and reduces the chance that stale literal files pile up between runs.
## Example Script
`test.py` in the repository root demonstrates a minimal workflow:
1. Writes sample operands to `tests/*.txt`.
2. Calls every arithmetic primitive plus the modulo/parity helpers.
3. Asserts that the streamed outputs match known values (helpful for quick regression checks).
Run it via:
```bash
python test.py
```
## Extending
- To hook into other storage backends, implement your own `StreamNumber` variant with the same `.stream()` interface.
- Need modulo or gcd? Compose the existing primitives (e.g., repeated subtraction or using `div` + remainder tracking inside `_divide_abs`) or add new helpers following the same streamed pattern.
- For more control over output locations, override `LOG_DIR` before using the operations:
```python
from mathstream import engine
from pathlib import Path
engine.LOG_DIR = Path("/tmp/my_mathstage")
engine.clear_logs()
```
With these building blocks, you can manipulate arbitrarily large integers while keeping memory usage constant. Happy streaming!