better readme for mathstream
This commit is contained in:
parent
3f8a4a5bf0
commit
60568933e6
@ -1,8 +1,31 @@
|
||||
# Mathstream Library
|
||||
# Mathstream
|
||||
|
||||
`mathstream` offers streamed, string-based arithmetic for very large integers that you may not want to load entirely into memory. Instead of parsing numbers into Python `int` values, you work with digit files on disk via `StreamNumber` and call math operations that operate chunk-by-chunk.
|
||||
*Scalable streamed arithmetic for ultra-large integers, chunked from disk.*
|
||||
|
||||
## Quick Start
|
||||
Do math on numbers too big for RAM by streaming digits off disk. Feed the library paths or literals, compose operations, and keep memory flat while outputs land in `instance/log`.
|
||||
|
||||
## Why?
|
||||
|
||||
Traditional `int` types break when your numbers don’t fit in memory. `mathstream` trades RAM for disk by using streamed digit files, letting you work with absurdly large integers on normal machines.
|
||||
|
||||
Perfect for:
|
||||
- Iterative transformations like Collatz walks
|
||||
- Memory-constrained pipelines
|
||||
- Low-level big-number experimentation
|
||||
- Long-running experiments where deterministic cleanup beats GC guesswork
|
||||
|
||||
## Quick Demo
|
||||
|
||||
```python
|
||||
from mathstream import StreamNumber, add
|
||||
|
||||
a = StreamNumber(literal="999999999999999999")
|
||||
b = StreamNumber(literal="1")
|
||||
|
||||
print("sum =", "".join(add(a, b).stream()))
|
||||
```
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
python -m venv venv
|
||||
@ -10,90 +33,81 @@ source venv/bin/activate
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
Create digit files anywhere you like (the examples below use `instance/log`), or supply ad-hoc literals, then construct `StreamNumber` objects and call the helpers:
|
||||
## Usage
|
||||
|
||||
```python
|
||||
from mathstream import (
|
||||
StreamNumber,
|
||||
add,
|
||||
sub,
|
||||
mul,
|
||||
div,
|
||||
mod,
|
||||
pow,
|
||||
is_even,
|
||||
is_odd,
|
||||
free_stream,
|
||||
collect_garbage,
|
||||
)
|
||||
from mathstream import mul, StreamNumber
|
||||
|
||||
a = StreamNumber("instance/log/huge.txt")
|
||||
b = StreamNumber(literal="34567")
|
||||
e = StreamNumber(literal="3")
|
||||
a = StreamNumber("path/to/big.txt")
|
||||
b = StreamNumber(literal="1337")
|
||||
|
||||
print("sum =", "".join(add(a, b).stream()))
|
||||
print("difference =", "".join(sub(a, b).stream()))
|
||||
print("product =", "".join(mul(a, b).stream()))
|
||||
print("quotient =", "".join(div(a, b).stream()))
|
||||
print("modulo =", "".join(mod(a, b).stream()))
|
||||
print("power =", "".join(pow(a, e).stream()))
|
||||
print("a is even?", is_even(a))
|
||||
print("b is odd?", is_odd(b))
|
||||
|
||||
# drop staged artifacts immediately when you are done
|
||||
free_stream(b)
|
||||
|
||||
# reclaim space for files whose age outweighs their use
|
||||
collect_garbage(0.5)
|
||||
result = mul(a, b)
|
||||
print("".join(result.stream()))
|
||||
```
|
||||
|
||||
Each arithmetic call writes its result back into `instance/log` (configurable via `mathstream.number.LOG_DIR`) so you can stream the digits later or reuse them in further operations.
|
||||
Available operations:
|
||||
- Core arithmetic: `add`, `sub`, `mul`, `div`, `mod`, `pow`
|
||||
- Introspection & helpers: `is_even`, `is_odd`, `free_stream`, `active_streams`, `tracked_files`
|
||||
- Lifecycle control: `collect_garbage`, `set_manual_free_only`, `manual_free_only_enabled`
|
||||
- Environment helpers: `clear_logs`, `StreamNumber.write_stage`, `engine.LOG_DIR`
|
||||
|
||||
## Core Concepts
|
||||
## How It Works
|
||||
|
||||
- **StreamNumber(path | literal=...)** – Wraps a digit text file or creates one for an integer literal inside `LOG_DIR`. Literal operands are persisted as `literal_<hash>.txt`, so repeated runs reuse the same staged file (note that `clear_logs()` removes these cache files too).
|
||||
- **`.stream(chunk_size)`** – Yields strings of digits with the provided chunk size. Operations in `mathstream.engine` consume these streams to avoid loading the entire number at once.
|
||||
- **Automatic staging** – Outputs are stored under `LOG_DIR` with hashes based on input file paths, letting you compose operations without manual bookkeeping.
|
||||
- **Sign-aware** – Addition, subtraction, multiplication, division (`//` behavior), modulo, and exponentiation (non-negative exponents) all respect operand sign. Division/modulo follow Python’s floor-division rules.
|
||||
- **Utilities** – `clear_logs()` wipes prior staged results so you can start fresh.
|
||||
- **Manual freeing** – Call `stream.free()` (or `free_stream(stream)`) once you are done with a staged number to release its reference immediately. Logger metadata keeps per-path reference counts so the final free removes the backing file on the spot.
|
||||
- **GC toggle** – Need total control over when files disappear? Flip `mathstream.set_manual_free_only(True)` so automatic finalizers stop unlinking staged files; they will persist until you call `free()` (or `collect_garbage`). Use `mathstream.manual_free_only_enabled()` to inspect the current setting.
|
||||
- **Parity helpers** – `is_even` and `is_odd` inspect the streamed digits without materializing the integer.
|
||||
- **Garbage collection** – `collect_garbage(score_threshold)` computes a score from file age, access count, and reference count (tracked in `instance/mathstream_logs.sqlite`, freshly truncated each run). Files whose score meets or exceeds the threshold are deleted, letting you tune how aggressively to reclaim space. Both staged results and literal caches participate. Use `tracked_files()` or `active_streams()` to inspect current state.
|
||||
|
||||
Divide-by-zero scenarios raise the custom `DivideByZeroError` so callers can distinguish mathstream issues from Python’s native exceptions.
|
||||
- **Streamed operands** – `StreamNumber` wraps either a user-supplied digit file or an integer literal materialised in `LOG_DIR`. Data is read linearly in configurable chunks, never promoted to a Python `int`.
|
||||
- **Staging directory** – Every operation writes results into `mathstream.number.LOG_DIR` (default `instance/log`). File names include hashes of input paths so repeated calls reuse the same staged copies.
|
||||
- **Bookkeeping database** – `mathstream/utils.py` keeps `instance/mathstream_logs.sqlite`, recording creation time, last access, total access count, and reference counts. This powers GC decisions and makes it trivial to audit what’s on disk.
|
||||
- **Reference counting** – Every `StreamNumber` bumps a ref count in sqlite and in-process counters. Dropping the last reference (or calling `free_stream`) decrements counts and optionally unlinks the file immediately.
|
||||
- **Manual-only mode** – Call `set_manual_free_only(True)` when you want absolute control over lifecycle. The weakref finaliser stops deleting staged files, so outputs persist until you call `.free()` or `collect_garbage()`.
|
||||
- **Zero-copy chaining** – Since staged files stay on disk, you can pass `StreamNumber` handles between processes or reuse them in later runs without recomputing.
|
||||
|
||||
## Performance Tips
|
||||
|
||||
- **Reuse literal streams** – `StreamNumber(literal=...)` persists a hashed copy under `LOG_DIR`. Reuse those objects (or their filenames) across operations instead of recreating them every call. Repeated literal construction churns the filesystem: you pay the cost to rewrite identical data, poll the logger database, and spike disk I/O. Hang on to the staged literal or memoize it so it can be streamed repeatedly without rewriting.
|
||||
- **Free aggressively** – When a staged result or literal copy is no longer needed, call `free_stream()` (or use `with StreamNumber(...) as n:`) so the reference count drops immediately. This keeps the cache tidy and reduces the chance that stale literal files pile up between runs.
|
||||
- Reuse literal `StreamNumber` objects to avoid rewriting identical data.
|
||||
- Call `free_stream(...)` or use context managers to drop staged results quickly.
|
||||
- Run `collect_garbage(score_threshold)` to purge stale intermediates.
|
||||
- Keep an eye on disk space in `instance/log`—streaming shifts the pressure from RAM to storage.
|
||||
- For huge literals (10⁶+ digits), generate them directly on disk and wrap the path instead of passing `literal=...`.
|
||||
- Tweak `StreamNumber.stream(chunk_size)` to balance syscalls vs. memory: large chunks speed up CPU-bound math, smaller chunks play nicer with slow disks.
|
||||
- If you are scripting long sessions, snapshot `tracked_files()` periodically; it’s an easy indicator of leaked references.
|
||||
|
||||
## Example Script
|
||||
### Common Pitfalls & Recoveries
|
||||
|
||||
`test.py` in the repository root demonstrates a minimal workflow:
|
||||
- **Accidentally freed files** – Automatic finalizers may delete staged outputs while you still hold the path elsewhere. Fix: call `set_manual_free_only(True)` at the start of long-lived workflows, or pass `delete_file=False` to `free_stream` when you need to keep the digits around manually.
|
||||
- **Literal churn** – Recreating the same `StreamNumber(literal="123")` millions of times hammers the filesystem. Fix: stash the first instance, or cache the `.path` and rely on `StreamNumber(existing_path)` in hot loops.
|
||||
- **GC too aggressive** – Running `collect_garbage(0)` after every operation removes recently written files. Fix: raise the threshold (e.g., `collect_garbage(1000)`) or run GC only after you’ve freed all references.
|
||||
- **Chunk mismatch** – Some editors save files with BOMs or commas. `_normalize_stream` will raise `ValueError("Non-digit characters found...")`. Fix: sanitise input files (only ASCII digits with optional leading sign).
|
||||
- **Disk exhaustion** – Terabyte-scale runs fill `instance/log`. Fix: relocate `engine.LOG_DIR` to a larger volume or run periodic `collect_garbage` sweeps and archive intermediate files.
|
||||
- **Concurrency surprises** – Multiple processes writing to the same `LOG_DIR` share the sqlite tracker. Ensure each writer calls `free_stream` and `collect_garbage` responsibly, or isolate runs by changing `LOG_DIR` per worker.
|
||||
|
||||
1. Writes sample operands to `tests/*.txt`.
|
||||
2. Calls every arithmetic primitive plus the modulo/parity helpers.
|
||||
3. Asserts that the streamed outputs match known values (helpful for quick regression checks).
|
||||
## Tools and Experiments
|
||||
|
||||
Run it via:
|
||||
|
||||
```bash
|
||||
python test.py
|
||||
```
|
||||
- `test.py` – Regression smoke test covering all arithmetic helpers.
|
||||
- `collatz.py` / `collatz_ui/` – Curses dashboard that streams Collatz sequences.
|
||||
- `seed_start.py` – Seeds `start.txt` via streamed additions from various sources.
|
||||
- `find_my.py` + `pi_finder/` – Nilakantha-based π explorer that writes results to `found.pi`.
|
||||
- `WORK.md` – Deep dive into architecture (Logger DB schema, reference lifetimes, cleanup flow).
|
||||
- `collatz_ui/views.py` – Reference implementation of a threaded worker that coordinates streamed math and curses rendering without blocking.
|
||||
- `pi_finder/engine.py` – Example of building high-precision algorithms (Nilakantha π) purely via streamed primitives, including manual caching of million-digit scale factors.
|
||||
|
||||
## Extending
|
||||
|
||||
- To hook into other storage backends, implement your own `StreamNumber` variant with the same `.stream()` interface.
|
||||
- Need modulo or gcd? Compose the existing primitives (e.g., repeated subtraction or using `div` + remainder tracking inside `_divide_abs`) or add new helpers following the same streamed pattern.
|
||||
- For more control over output locations, override `LOG_DIR` before using the operations:
|
||||
You can:
|
||||
- Implement custom storage backends (e.g., S3-backed digit files).
|
||||
- Compose primitives to build new helpers (gcd, factorial, etc.).
|
||||
- Point `engine.LOG_DIR` at your own staging directory before running operations.
|
||||
- Add new operations by mirroring the pattern in `mathstream/engine.py`: normalise inputs with `_normalize_stream`, perform chunk-based math, then `_write_result(...)`.
|
||||
- Build higher-level services (REST APIs, workers, dashboards) by sharing staged file paths instead of raw numbers.
|
||||
- Layer parity or divisibility checks by reading the streamed digits lazily; there’s no requirement to materialise entire outputs unless you need them.
|
||||
|
||||
```python
|
||||
from mathstream import engine
|
||||
from pathlib import Path
|
||||
## Contributing
|
||||
|
||||
engine.LOG_DIR = Path("/tmp/my_mathstage")
|
||||
engine.clear_logs()
|
||||
```
|
||||
Open to PRs for:
|
||||
- New streamed math operations and optimizations.
|
||||
- Smarter garbage collection / tooling around the sqlite tracker.
|
||||
- Experiments that showcase creative uses (Collatz encoding, π spigots, etc.).
|
||||
|
||||
With these building blocks, you can manipulate arbitrarily large integers while keeping memory usage constant. Happy streaming!
|
||||
Please lint with `ruff` and follow the existing streaming patterns.
|
||||
|
||||
## License
|
||||
|
||||
MIT. Use it, remix it, but keep backups—massive streamed math can chew through SSDs fast. 😅
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user