Compare commits

..

10 Commits

Author SHA1 Message Date
Dominik Krenn
5336eb2c16 expaneed tests 2025-11-05 10:11:46 +01:00
Dominik Krenn
8986a515e2 added seeder 2025-11-05 10:11:35 +01:00
Dominik Krenn
89746b1076 beter garbage collecting 2025-11-05 10:10:51 +01:00
Dominik Krenn
443f9f4f4b added better garabge collection 2025-11-05 08:35:01 +01:00
Dominik Krenn
df9b2b5f29 added literals 2025-11-05 08:16:35 +01:00
Dominik Krenn
8699d8f7ab added mudolor and is even is odd 2025-11-05 08:11:35 +01:00
Dominik Krenn
7e67c3fcf9 added readme for module 2025-11-05 08:05:21 +01:00
Dominik Krenn
de886e30ea added tests 2025-11-05 08:05:03 +01:00
Dominik Krenn
f807a3efa5 added gitignore entry 2025-11-05 08:04:54 +01:00
Dominik Krenn
034fc2b8b6 made module better 2025-11-05 08:04:45 +01:00
18 changed files with 1320 additions and 37 deletions

2
.gitignore vendored
View File

@ -1 +1,3 @@
venv/
instance/
__pycache__/

78
WORK.md Normal file
View File

@ -0,0 +1,78 @@
# Whats going on inside `mathstream`
Figured Id write down how the whole thing is wired, more like a brain dump than polished docs. If youre diving in to fix something, this should save you a bunch of spelunking.
## Directory map (so you know where to poke)
```
mathstream/
__init__.py # re-export central
engine.py # arithmetic guts
exceptions.py # custom errors
number.py # StreamNumber, manual GC, watcher
utils.py # sqlite junk drawer
test.py # smoke/integration script
```
## StreamNumber the heart of it
`mathstream/number.py` owns the `StreamNumber` class. The class does a couple jobs:
- wraps a file of digits (either you give us a path or a literal; literals get canonicalised and dropped into `instance/log/literal_<hash>.txt`).
- streaming happens via `.stream(chunk_size)` so we never load the whole thing; every time we read we call `touch_log_file` so the usage timestamp keeps moving.
- when a new stream gets created we check if it lives under `LOG_DIR`. If yes, we register it with the sqlite tracker (`register_log_file`) and also bump a ref counter via `register_reference`.
- theres a weakref finaliser plus a global `_ACTIVE_COUNTER` that keeps tabs on python-side references. If the object falls out of scope we run `_finalize_instance`, which decrements the counter and if it was the last one we call `release_reference` (that may nuke the file instantly).
- explicit `free()` exists for people who want deterministic cleanup. Its basically like `free()` in C: drop ref count and optionally delete the file now. Theres an alias `free_stream` and the class is a context manager so `with StreamNumber(...) as sn:` cleans up automatically.
So any time youve got a staged result hanging around in memory, the watcher knows about it. Once you ditch it—either by `free()` or just letting the object die—the sqlite ref count drops.
## Engine maths without ints
Living in `mathstream/engine.py`. All the operators (`add/sub/mul/div/mod/pow`) pull chunks from the `StreamNumber` inputs, normalise them into sign + digit strings, run grade-school algorithms, then write the result back into `LOG_DIR`.
- `_write_result` is the important bit: writes to disk, calls `register_log_file`, then wraps the file in a new `StreamNumber`. Because of that call, every staged result is tracked automatically.
- Were careful about signs: division and modulo follow Pythons floor division rules. Divide-by-zero is intercepted and converted into `DivideByZeroError`.
- `clear_logs()` wipes the folder and calls `wipe_log_records()` to empty sqlite so the next run isnt polluted.
## Exceptions
`mathstream/exceptions.py` just defines `MathStreamError` and the more specific `DivideByZeroError`. Nothing fancy, just so we dont leak raw `ZeroDivisionError`.
## SQLite watcher (`mathstream/utils.py`)
This is the garbage-collection HQ. On import we run `_ensure_db(reset=True)` so every run starts from a clean DB (no migrations, no surprises). Two tables:
- `logs` → metadata about every staged file: created time, last access, access count.
- `refs` → current reference count (think “how many StreamNumber instances think they own this file”).
Important functions:
- `register_log_file(path)` ensure both tables have a row (initial ref count 0).
- `register_reference(path)` increments the ref count, updates last access, access count etc. Called whenever a new `StreamNumber` points at the staged file.
- `touch_log_file(path)` called from `.stream()` so we know the file is being read.
- `release_reference(path, delete_file=True)` the inverse of register. If the count hits zero we remove the DB row and (optionally) delete the file right away.
- `collect_garbage(score_threshold)` this is the periodic sweeper. Computes `score = age / ((ref_count + 1) * (access_count + 1))`. Bigger score means older + less used. If score >= threshold it gets unlinked and removed from DB. Negative thresholds blow up on purpose.
- `tracked_files()` dumb helper that dumps `{path: ref_count}` out of the DB.
- `wipe_log_records()` nukes both tables; used by `clear_logs`.
## How cleanup flows
1. You run an operation (`add`, `mul`, whatever). Result file lands in `LOG_DIR`, gets registered, comes back as a `StreamNumber`.
2. You stream it or create more streams from it metadata keeps getting updated via `touch_log_file`/`register_reference`.
3. When youre done, call `.free()` or just drop references. Manual free is immediate. Otherwise the weakref finaliser catches it eventually.
4. `release_reference` is what actually removes the sqlite entries and unlinks the data file when there are no logical references left.
5. If you still have detritus (e.g. you crashed before refs hit zero), run `collect_garbage(threshold)` to sweep anything whose age outweighs usage.
6. `active_streams()` reports whats still alive in Python land; `tracked_files()` shows what the DB thinks is referenced.
## Example run (`test.py`)
`test.py` is half regression, half reference script. It:
- seeds some numbers, runs every operation, checks results.
- makes sure `DivideByZeroError` fires.
- frees every staged number to prove files vanish on the spot.
- runs `collect_garbage(0)` just to make sure nothing else lingers.
- dumps `active_streams()` and `tracked_files()` so you can see python vs sqlite state.
If the logs ever seem suspicious, run that script—itll tell you immediately whether somethings still referenced or if the GC is forgetting to clean up.

211
collatz.py Normal file
View File

@ -0,0 +1,211 @@
#!/usr/bin/env python3
import curses
import time
import os
from pathlib import Path
from mathstream import StreamNumber, add, mul, div, is_even, clear_logs
LOG_DIR = Path("instance/log")
def collatz_step(n, three, two, one):
return div(n, two) if is_even(n) else add(mul(n, three), one)
def draw_header(win, step, elapsed, avg_step, digits_len):
"""Render header above graph and panels."""
win.erase()
cols = curses.COLS - 1
bar = "" * cols
lines = [
f" Collatz (3n + 1) Streamed Viewer ",
f" Step: {step}",
f" Elapsed: {elapsed:8.2f}s | Avg/Step: {avg_step:8.5f}s",
f" Digits: {digits_len:,} | ↑↓ scroll number | PgUp/PgDn scroll log | q quit",
]
for i, line in enumerate(lines):
win.addstr(i, 0, line[:cols], curses.color_pair(1))
win.addstr(len(lines), 0, bar, curses.color_pair(2))
win.noutrefresh()
def draw_graph(win, graph_buf, direction_up, width):
"""Render a single-line graph: grey ░ for empty, colored █ for data."""
win.erase()
# Derive the actual width of the window so we never draw past its edge.
_, max_x = win.getmaxyx()
effective_width = max_x or width
padding = 3 # leave space for arrow and a small gap
cols = max(0, effective_width - padding)
# Clamp buffer to visible width
visible = graph_buf[-cols:] if len(graph_buf) > cols else graph_buf
fill_len = len(visible)
# arrow first
if effective_width > 0:
arrow = "" if direction_up else ""
arrow_color = curses.color_pair(6 if direction_up else 5)
try:
win.addstr(0, 0, arrow, arrow_color)
except curses.error:
pass
# draw filled section
for i, val in enumerate(visible):
col = padding + i
if col >= effective_width:
break
color = curses.color_pair(6 if val > 0 else 5)
try:
win.addstr(0, col, "", color)
except curses.error:
break
# fill remaining with ░
remaining = max(0, cols - fill_len)
fill_start = padding + fill_len
if remaining > 0 and fill_start < effective_width:
run = min(remaining, effective_width - fill_start)
if run > 0:
try:
win.addstr(0, fill_start, "" * run, curses.color_pair(7))
except curses.error:
pass
win.noutrefresh()
def draw_number(win, digits, scroll):
win.erase()
cols = curses.COLS
lines = [digits[i:i + cols - 1] for i in range(0, len(digits), cols - 1)]
max_lines = win.getmaxyx()[0]
scroll = max(0, min(scroll, max(0, len(lines) - max_lines)))
for i, chunk in enumerate(lines[scroll:scroll + max_lines]):
try:
win.addstr(i, 0, chunk, curses.color_pair(3))
except curses.error:
pass
win.noutrefresh()
return scroll
def draw_log_list(win, scroll):
win.erase()
if not LOG_DIR.exists():
LOG_DIR.mkdir(parents=True, exist_ok=True)
files = sorted(LOG_DIR.iterdir(), key=os.path.getmtime, reverse=True)
names = [f"{f.name}" for f in files]
max_lines = win.getmaxyx()[0]
scroll = max(0, min(scroll, max(0, len(names) - max_lines)))
for i, name in enumerate(names[scroll:scroll + max_lines]):
try:
win.addstr(i, 0, name[: curses.COLS - 1], curses.color_pair(4))
except curses.error:
pass
win.noutrefresh()
return scroll
def run_collatz(stdscr):
curses.curs_set(0)
curses.start_color()
curses.use_default_colors()
curses.init_pair(1, curses.COLOR_CYAN, -1) # header text
curses.init_pair(2, curses.COLOR_BLACK, curses.COLOR_CYAN) # separator
curses.init_pair(3, curses.COLOR_WHITE, -1) # number
curses.init_pair(4, curses.COLOR_YELLOW, -1) # log list
curses.init_pair(5, curses.COLOR_RED, -1)
curses.init_pair(6, curses.COLOR_GREEN, -1)
curses.init_pair(7, curses.COLOR_WHITE, -1) # grey for empty
stdscr.nodelay(True)
stdscr.timeout(100)
start_file = Path("start.txt")
if not start_file.exists():
stdscr.addstr(0, 0, "Missing start.txt — please create one with your starting number.")
stdscr.refresh()
stdscr.getch()
return
clear_logs()
n = StreamNumber(start_file)
one, two, three = (StreamNumber(literal=s) for s in ("1", "2", "3"))
start_time = time.time()
step = 0
num_scroll = 0
log_scroll = 0
header_h = 5
graph_h = 1
num_h = (curses.LINES - header_h - graph_h) * 3 // 4
log_h = curses.LINES - header_h - graph_h - num_h - 1
num_win = curses.newwin(num_h, curses.COLS, header_h + graph_h + 1, 0)
graph_win = curses.newwin(graph_h, curses.COLS, header_h + 1, 0)
log_win = curses.newwin(log_h, curses.COLS, header_h + graph_h + num_h + 2, 0)
last_len = 0
graph_buf = []
while True:
step += 1
n = collatz_step(n, three, two, one)
digits = "".join(n.stream()) or "0"
cur_len = len(digits)
diff = cur_len - last_len
last_len = cur_len
graph_width = curses.COLS - 4
# Add new value to graph buffer
if diff != 0:
graph_buf.append(1 if diff > 0 else -1)
else:
graph_buf.append(0)
# Shift left if full
if len(graph_buf) > graph_width:
graph_buf = graph_buf[-graph_width:]
direction_up = diff >= 0
elapsed = time.time() - start_time
avg_step = elapsed / step if step else 0.0
draw_header(stdscr, step, elapsed, avg_step, len(digits))
draw_graph(graph_win, graph_buf, direction_up, curses.COLS)
num_scroll = draw_number(num_win, digits, num_scroll)
log_scroll = draw_log_list(log_win, log_scroll)
curses.doupdate()
ch = stdscr.getch()
if ch == ord("q"):
break
elif ch == curses.KEY_UP:
num_scroll = max(0, num_scroll - 1)
elif ch == curses.KEY_DOWN:
num_scroll += 1
elif ch == curses.KEY_PPAGE:
log_scroll = max(0, log_scroll - 3)
elif ch == curses.KEY_NPAGE:
log_scroll += 3
if digits == "1":
stdscr.nodelay(False)
stdscr.addstr(curses.LINES - 1, 0, "Reached 1 — press any key to exit.")
stdscr.refresh()
stdscr.getch()
break
def main():
curses.wrapper(run_collatz)
if __name__ == "__main__":
main()

93
mathstream/README.md Normal file
View File

@ -0,0 +1,93 @@
# Mathstream Library
`mathstream` offers streamed, string-based arithmetic for very large integers that you may not want to load entirely into memory. Instead of parsing numbers into Python `int` values, you work with digit files on disk via `StreamNumber` and call math operations that operate chunk-by-chunk.
## Quick Start
```bash
python -m venv venv
source venv/bin/activate
pip install -e .
```
Create digit files anywhere you like (the examples below use `instance/log`), or supply ad-hoc literals, then construct `StreamNumber` objects and call the helpers:
```python
from mathstream import (
StreamNumber,
add,
sub,
mul,
div,
mod,
pow,
is_even,
is_odd,
free_stream,
collect_garbage,
)
a = StreamNumber("instance/log/huge.txt")
b = StreamNumber(literal="34567")
e = StreamNumber(literal="3")
print("sum =", "".join(add(a, b).stream()))
print("difference =", "".join(sub(a, b).stream()))
print("product =", "".join(mul(a, b).stream()))
print("quotient =", "".join(div(a, b).stream()))
print("modulo =", "".join(mod(a, b).stream()))
print("power =", "".join(pow(a, e).stream()))
print("a is even?", is_even(a))
print("b is odd?", is_odd(b))
# drop staged artifacts immediately when you are done
free_stream(b)
# reclaim space for files whose age outweighs their use
collect_garbage(0.5)
```
Each arithmetic call writes its result back into `instance/log` (configurable via `mathstream.number.LOG_DIR`) so you can stream the digits later or reuse them in further operations.
## Core Concepts
- **StreamNumber(path | literal=...)** Wraps a digit text file or creates one for an integer literal inside `LOG_DIR`. Literal operands are persisted as `literal_<hash>.txt`, so repeated runs reuse the same staged file (note that `clear_logs()` removes these cache files too).
- **`.stream(chunk_size)`** Yields strings of digits with the provided chunk size. Operations in `mathstream.engine` consume these streams to avoid loading the entire number at once.
- **Automatic staging** Outputs are stored under `LOG_DIR` with hashes based on input file paths, letting you compose operations without manual bookkeeping.
- **Sign-aware** Addition, subtraction, multiplication, division (`//` behavior), modulo, and exponentiation (non-negative exponents) all respect operand sign. Division/modulo follow Pythons floor-division rules.
- **Utilities** `clear_logs()` wipes prior staged results so you can start fresh.
- **Manual freeing** Call `stream.free()` (or `free_stream(stream)`) once you are done with a staged number to release its reference immediately. Logger metadata keeps per-path reference counts so the final free removes the backing file on the spot.
- **Parity helpers** `is_even` and `is_odd` inspect the streamed digits without materializing the integer.
- **Garbage collection** `collect_garbage(score_threshold)` computes a score from file age, access count, and reference count (tracked in `instance/mathstream_logs.sqlite`, freshly truncated each run). Files whose score meets or exceeds the threshold are deleted, letting you tune how aggressively to reclaim space. Both staged results and literal caches participate. Use `tracked_files()` or `active_streams()` to inspect current state.
Divide-by-zero scenarios raise the custom `DivideByZeroError` so callers can distinguish mathstream issues from Pythons native exceptions.
## Example Script
`test.py` in the repository root demonstrates a minimal workflow:
1. Writes sample operands to `tests/*.txt`.
2. Calls every arithmetic primitive plus the modulo/parity helpers.
3. Asserts that the streamed outputs match known values (helpful for quick regression checks).
Run it via:
```bash
python test.py
```
## Extending
- To hook into other storage backends, implement your own `StreamNumber` variant with the same `.stream()` interface.
- Need modulo or gcd? Compose the existing primitives (e.g., repeated subtraction or using `div` + remainder tracking inside `_divide_abs`) or add new helpers following the same streamed pattern.
- For more control over output locations, override `LOG_DIR` before using the operations:
```python
from mathstream import engine
from pathlib import Path
engine.LOG_DIR = Path("/tmp/my_mathstage")
engine.clear_logs()
```
With these building blocks, you can manipulate arbitrarily large integers while keeping memory usage constant. Happy streaming!

View File

@ -1,2 +1,23 @@
from .engine import clear_logs, add, sub, mul, div
from .number import StreamNumber
from .engine import clear_logs, add, sub, mul, div, mod, pow, is_even, is_odd
from .exceptions import MathStreamError, DivideByZeroError
from .number import StreamNumber, free_stream, active_streams
from .utils import collect_garbage, tracked_files
__all__ = [
"clear_logs",
"collect_garbage",
"tracked_files",
"add",
"sub",
"mul",
"div",
"mod",
"pow",
"is_even",
"is_odd",
"StreamNumber",
"free_stream",
"active_streams",
"MathStreamError",
"DivideByZeroError",
]

View File

@ -1,45 +1,329 @@
from pathlib import Path
from __future__ import annotations
from typing import Iterable, Tuple
from .exceptions import DivideByZeroError
from .number import StreamNumber, LOG_DIR
from .utils import register_log_file, wipe_log_records
def _ensure_log_dir() -> None:
LOG_DIR.mkdir(parents=True, exist_ok=True)
def _strip_leading_zeros(digits: str) -> str:
digits = digits.lstrip("0")
return digits or "0"
def _normalize_stream(num: StreamNumber) -> Tuple[int, str]:
"""Return (sign, digits) tuple for the streamed number."""
parts: list[str] = []
for chunk in num.stream():
chunk = chunk.strip()
if not chunk:
continue
parts.append(chunk)
raw = "".join(parts)
if not raw:
raise ValueError(f"Stream for {num.path} is empty")
sign = 1
if raw[0] in "+-":
sign = -1 if raw[0] == "-" else 1
raw = raw[1:]
if not raw.isdigit():
raise ValueError(f"Non-digit characters found in stream for {num.path}")
digits = _strip_leading_zeros(raw)
if digits == "0":
sign = 1
return sign, digits
def _compare_abs(a: str, b: str) -> int:
"""Compare two positive digit strings."""
if len(a) != len(b):
return 1 if len(a) > len(b) else -1
if a == b:
return 0
return 1 if a > b else -1
def _add_abs(a: str, b: str) -> str:
carry = 0
idx_a = len(a) - 1
idx_b = len(b) - 1
out: list[str] = []
while idx_a >= 0 or idx_b >= 0 or carry:
da = ord(a[idx_a]) - 48 if idx_a >= 0 else 0
db = ord(b[idx_b]) - 48 if idx_b >= 0 else 0
total = da + db + carry
carry, digit = divmod(total, 10)
out.append(str(digit))
idx_a -= 1
idx_b -= 1
return "".join(reversed(out))
def _sub_abs(a: str, b: str) -> str:
"""Return a - b for digit strings assuming a >= b."""
borrow = 0
idx_a = len(a) - 1
idx_b = len(b) - 1
out: list[str] = []
while idx_a >= 0:
da = ord(a[idx_a]) - 48
db = ord(b[idx_b]) - 48 if idx_b >= 0 else 0
diff = da - borrow - db
if diff < 0:
diff += 10
borrow = 1
else:
borrow = 0
out.append(str(diff))
idx_a -= 1
idx_b -= 1
return _strip_leading_zeros("".join(reversed(out)))
def _multiply_abs(a: str, b: str) -> str:
if a == "0" or b == "0":
return "0"
result = [0] * (len(a) + len(b))
for i in range(len(a) - 1, -1, -1):
ai = ord(a[i]) - 48
carry = 0
for j in range(len(b) - 1, -1, -1):
bj = ord(b[j]) - 48
pos = i + j + 1
total = result[pos] + ai * bj + carry
carry, result[pos] = divmod(total, 10)
result[i] += carry
return _strip_leading_zeros("".join(str(d) for d in result))
def _multiply_digit(num: str, digit: int) -> str:
if digit == 0 or num == "0":
return "0"
carry = 0
out: list[str] = []
for i in range(len(num) - 1, -1, -1):
total = (ord(num[i]) - 48) * digit + carry
carry, d = divmod(total, 10)
out.append(str(d))
if carry:
out.append(str(carry))
return "".join(reversed(out))
def _divide_abs(dividend: str, divisor: str) -> Tuple[str, str]:
if divisor == "0":
raise DivideByZeroError("division by zero")
if dividend == "0":
return "0", "0"
quotient_digits: list[str] = []
remainder = "0"
for digit in dividend:
remainder = _strip_leading_zeros(remainder + digit)
q_digit = 0
for guess in range(9, -1, -1):
candidate = _multiply_digit(divisor, guess)
if _compare_abs(candidate, remainder) <= 0:
q_digit = guess
remainder = _sub_abs(remainder, candidate) if guess else remainder
break
quotient_digits.append(str(q_digit))
quotient = _strip_leading_zeros("".join(quotient_digits))
remainder = _strip_leading_zeros(remainder)
return quotient, remainder
def _is_zero(digits: str) -> bool:
return digits == "0"
def _is_odd(digits: str) -> bool:
return (ord(digits[-1]) - 48) % 2 == 1
def _halve(digits: str) -> str:
carry = 0
out: list[str] = []
for ch in digits:
current = carry * 10 + (ord(ch) - 48)
quotient = current // 2
carry = current % 2
out.append(str(quotient))
return _strip_leading_zeros("".join(out))
def _write_result(operation: str, operands: Iterable[StreamNumber], digits: str) -> StreamNumber:
_ensure_log_dir()
operand_hash = "_".join(num.hash for num in operands)
out_file = LOG_DIR / f"{operation}_{operand_hash}.bin"
with open(out_file, "w", encoding="utf-8") as out:
out.write(digits)
register_log_file(out_file)
return StreamNumber(out_file)
def clear_logs():
if LOG_DIR.exists():
for p in LOG_DIR.glob("*"):
p.unlink()
LOG_DIR.mkdir(parents=True, exist_ok=True)
_ensure_log_dir()
wipe_log_records()
def add(num_a: StreamNumber, num_b: StreamNumber) -> StreamNumber:
"""Digit-by-digit streamed addition."""
out_file = LOG_DIR / f"{num_a.hash}_add_{num_b.hash}.bin"
"""Return num_a + num_b without loading full ints into memory."""
sign_a, a_digits = _normalize_stream(num_a)
sign_b, b_digits = _normalize_stream(num_b)
carry = 0
a_buf = list(num_a.stream(1))
b_buf = list(num_b.stream(1))
if sign_a == sign_b:
digits = _add_abs(a_digits, b_digits)
sign = sign_a
else:
cmp = _compare_abs(a_digits, b_digits)
if cmp == 0:
digits = "0"
sign = 1
elif cmp > 0:
digits = _sub_abs(a_digits, b_digits)
sign = sign_a
else:
digits = _sub_abs(b_digits, a_digits)
sign = sign_b
# align lengths
max_len = max(len(a_buf), len(b_buf))
a_buf = ["0"] * (max_len - len(a_buf)) + a_buf
b_buf = ["0"] * (max_len - len(b_buf)) + b_buf
result = digits if sign > 0 or digits == "0" else f"-{digits}"
return _write_result("add", (num_a, num_b), result)
with open(out_file, "wb") as out:
for i in range(max_len - 1, -1, -1):
s = int(a_buf[i]) + int(b_buf[i]) + carry
carry, digit = divmod(s, 10)
out.write(str(digit).encode())
if carry:
out.write(str(carry).encode())
return StreamNumber(out_file)
def sub(num_a, num_b):
"""Basic streamed subtraction (assumes a >= b)."""
# similar pattern with borrow propagation...
pass
def sub(num_a: StreamNumber, num_b: StreamNumber) -> StreamNumber:
"""Return num_a - num_b using streamed integer arithmetic."""
sign_a, a_digits = _normalize_stream(num_a)
sign_b, b_digits = _normalize_stream(num_b)
def mul(num_a, num_b):
"""Chunked multiplication using repeated addition."""
# create temporary stage files for partial sums
pass
if sign_a != sign_b:
digits = _add_abs(a_digits, b_digits)
sign = sign_a
else:
cmp = _compare_abs(a_digits, b_digits)
if cmp == 0:
digits = "0"
sign = 1
elif cmp > 0:
digits = _sub_abs(a_digits, b_digits)
sign = sign_a
else:
digits = _sub_abs(b_digits, a_digits)
sign = -sign_a
def div(num_a, num_b):
"""Long division, streamed stage by stage."""
# create multiple intermediate files: div_stage_1, div_stage_2, etc.
pass
result = digits if sign > 0 or digits == "0" else f"-{digits}"
return _write_result("sub", (num_a, num_b), result)
def mul(num_a: StreamNumber, num_b: StreamNumber) -> StreamNumber:
"""Return num_a * num_b with grade-school multiplication."""
sign_a, a_digits = _normalize_stream(num_a)
sign_b, b_digits = _normalize_stream(num_b)
digits = _multiply_abs(a_digits, b_digits)
sign = 1 if digits == "0" else sign_a * sign_b
result = digits if sign > 0 else f"-{digits}"
return _write_result("mul", (num_a, num_b), result)
def div(num_a: StreamNumber, num_b: StreamNumber) -> StreamNumber:
"""Return floor division num_a // num_b with streamed long division."""
sign_a, a_digits = _normalize_stream(num_a)
sign_b, b_digits = _normalize_stream(num_b)
quotient, remainder = _divide_abs(a_digits, b_digits)
if quotient == "0" and remainder == "0":
return _write_result("div", (num_a, num_b), "0")
sign_product = sign_a * sign_b
if sign_product < 0 and remainder != "0":
quotient = _add_abs(quotient, "1")
sign = -1
else:
sign = sign_product if quotient != "0" else 1
result = quotient if sign > 0 else f"-{quotient}"
return _write_result("div", (num_a, num_b), result)
def mod(num_a: StreamNumber, num_b: StreamNumber) -> StreamNumber:
"""Return num_a % num_b following Python's floor-division semantics."""
sign_a, a_digits = _normalize_stream(num_a)
sign_b, b_digits = _normalize_stream(num_b)
if b_digits == "0":
raise DivideByZeroError("modulo by zero")
_, remainder = _divide_abs(a_digits, b_digits)
if remainder == "0":
return _write_result("mod", (num_a, num_b), "0")
if sign_a == sign_b:
digits = remainder
else:
digits = _sub_abs(b_digits, remainder)
sign = 1 if sign_b > 0 else -1
result = digits if sign > 0 else f"-{digits}"
return _write_result("mod", (num_a, num_b), result)
def pow(num_a: StreamNumber, num_b: StreamNumber) -> StreamNumber:
"""Return num_a ** num_b using repeated squaring (integer exponent only)."""
base_sign, base_digits = _normalize_stream(num_a)
exp_sign, exp_digits = _normalize_stream(num_b)
if exp_sign < 0:
raise ValueError("Negative exponents are not supported for integer streams.")
if exp_digits == "0":
return _write_result("pow", (num_a, num_b), "1")
result_digits = "1"
base_abs = base_digits
exponent = exp_digits
while not _is_zero(exponent):
if _is_odd(exponent):
result_digits = _multiply_abs(result_digits, base_abs)
exponent = _halve(exponent)
if not _is_zero(exponent):
base_abs = _multiply_abs(base_abs, base_abs)
base_negative = base_sign < 0
result_sign = -1 if base_negative and _is_odd(exp_digits) else 1
if result_digits == "0":
result_sign = 1
result = result_digits if result_sign > 0 else f"-{result_digits}"
return _write_result("pow", (num_a, num_b), result)
def is_even(num: StreamNumber) -> bool:
"""Return True if the streamed integer is even."""
_, digits = _normalize_stream(num)
return (ord(digits[-1]) - 48) % 2 == 0
def is_odd(num: StreamNumber) -> bool:
"""Return True if the streamed integer is odd."""
return not is_even(num)

6
mathstream/exceptions.py Normal file
View File

@ -0,0 +1,6 @@
class MathStreamError(Exception):
"""Base class for mathstream-specific errors."""
class DivideByZeroError(MathStreamError):
"""Raised when division or modulo operations encounter a zero divisor."""

View File

@ -1,27 +1,151 @@
import hashlib
import weakref
from collections import Counter
from pathlib import Path
from typing import Dict, Optional, Union
from .utils import (
register_log_file,
register_reference,
touch_log_file,
release_reference,
)
LOG_DIR = Path("./instance/log")
def _ensure_log_dir() -> None:
LOG_DIR.mkdir(parents=True, exist_ok=True)
def _canonicalize_literal(value: str) -> str:
raw = value.strip()
if not raw:
raise ValueError("Literal value cannot be empty.")
sign = ""
digits = raw
if raw[0] in "+-":
sign = "-" if raw[0] == "-" else ""
digits = raw[1:]
if not digits or not digits.isdigit():
raise ValueError(f"Literal must be an integer string, got: {value!r}")
digits = digits.lstrip("0") or "0"
if digits == "0":
sign = ""
return f"{sign}{digits}"
def _is_in_log_dir(path: Path) -> bool:
try:
path.resolve().relative_to(LOG_DIR.resolve())
return True
except ValueError:
return False
class StreamNumber:
def __init__(self, file_path):
self.path = Path(file_path)
if not self.path.exists():
raise FileNotFoundError(self.path)
def __init__(
self,
file_path: Optional[Union[str, Path]] = None,
*,
literal: Optional[str] = None,
):
if (file_path is None) == (literal is None):
raise ValueError("Provide exactly one of file_path or literal.")
if literal is not None:
normalized = _canonicalize_literal(literal)
_ensure_log_dir()
literal_hash = hashlib.sha1(normalized.encode()).hexdigest()[:10]
self.path = LOG_DIR / f"literal_{literal_hash}.txt"
self.path.write_text(normalized, encoding="utf-8")
else:
self.path = Path(file_path)
if not self.path.exists():
raise FileNotFoundError(self.path)
self.hash = hashlib.sha1(str(self.path).encode()).hexdigest()[:10]
self._normalized_path = str(self.path.resolve())
self._released = False
_increment_active(self.path)
if _is_in_log_dir(self.path):
register_log_file(self.path)
register_reference(self.path)
self._finalizer = weakref.finalize(
self, _finalize_instance, self._normalized_path
)
def __repr__(self):
return f"<StreamNumber {self.path.name}>"
def stream(self, chunk_size=4096):
"""Yield chunks of digits as strings."""
if _is_in_log_dir(self.path):
touch_log_file(self.path)
with open(self.path, "r", encoding="utf-8") as f:
while chunk := f.read(chunk_size):
yield chunk.strip().replace(",", ".")
def write_stage(self, stage, data: str):
"""Write intermediate stage result."""
_ensure_log_dir()
stage_file = LOG_DIR / f"{self.hash}_stage_{stage}.bin"
with open(stage_file, "wb") as f:
f.write(data.encode())
register_log_file(stage_file)
return stage_file
def free(self, *, delete_file: bool = True) -> None:
"""Release this stream's reference and optionally delete the staged file."""
if self._released:
return
self._released = True
if self._finalizer.alive:
self._finalizer.detach()
_decrement_active(Path(self._normalized_path), delete_file=delete_file)
def __enter__(self):
return self
def __exit__(self, exc_type, exc, tb):
self.free()
_ACTIVE_COUNTER: Counter[str] = Counter()
def _increment_active(path: Path) -> None:
key = str(path.resolve())
_ACTIVE_COUNTER[key] += 1
def _decrement_active(path: Path, delete_file: bool = True) -> None:
key = str(path.resolve())
current = _ACTIVE_COUNTER.get(key, 0)
if current <= 1:
_ACTIVE_COUNTER.pop(key, None)
else:
_ACTIVE_COUNTER[key] = current - 1
if _is_in_log_dir(path):
release_reference(path, delete_file=delete_file)
def _finalize_instance(path_str: str) -> None:
_decrement_active(Path(path_str))
def free_stream(number: StreamNumber, *, delete_file: bool = True) -> None:
"""Convenience helper mirroring manual memory management semantics."""
number.free(delete_file=delete_file)
def active_streams() -> Dict[str, int]:
"""Return the active StreamNumber paths mapped to in-memory reference counts."""
return dict(_ACTIVE_COUNTER)

View File

@ -0,0 +1,220 @@
import sqlite3
from datetime import datetime, timezone
from pathlib import Path
from typing import Iterable, List, Dict
LOG_DB_PATH = Path("./instance/mathstream_logs.sqlite")
def _normalize_paths(paths: Iterable[Path]) -> List[str]:
return [str(Path(p).resolve()) for p in paths]
def _ensure_db(reset: bool = False) -> None:
LOG_DB_PATH.parent.mkdir(parents=True, exist_ok=True)
with sqlite3.connect(LOG_DB_PATH) as conn:
conn.execute(
"""
CREATE TABLE IF NOT EXISTS logs (
path TEXT PRIMARY KEY,
created_at REAL,
last_access REAL,
access_count INTEGER DEFAULT 0
)
"""
)
conn.execute(
"""
CREATE TABLE IF NOT EXISTS refs (
path TEXT PRIMARY KEY,
ref_count INTEGER DEFAULT 0
)
"""
)
if reset:
conn.execute("DELETE FROM logs")
conn.execute("DELETE FROM refs")
conn.commit()
_ensure_db(reset=True)
def register_log_file(path: Path) -> None:
"""Ensure the log database is aware of a file's existence."""
normalized = _normalize_paths([path])[0]
_ensure_db()
timestamp = datetime.now(timezone.utc).timestamp()
with sqlite3.connect(LOG_DB_PATH) as conn:
conn.execute(
"""
INSERT INTO logs (path, created_at, last_access, access_count)
VALUES (?, ?, ?, 0)
ON CONFLICT(path)
DO NOTHING
""",
(normalized, timestamp, timestamp),
)
conn.execute(
"""
INSERT INTO refs (path, ref_count)
VALUES (?, 0)
ON CONFLICT(path)
DO NOTHING
""",
(normalized,),
)
conn.commit()
def register_reference(path: Path) -> None:
"""Increment reference count similarly to Python's ref counter."""
normalized = _normalize_paths([path])[0]
_ensure_db()
timestamp = datetime.now(timezone.utc).timestamp()
with sqlite3.connect(LOG_DB_PATH) as conn:
conn.execute(
"""
INSERT INTO logs (path, created_at, last_access, access_count)
VALUES (?, ?, ?, 1)
ON CONFLICT(path)
DO NOTHING
""",
(normalized, timestamp, timestamp),
)
conn.execute(
"""
INSERT INTO refs (path, ref_count)
VALUES (?, 1)
ON CONFLICT(path)
DO UPDATE SET ref_count = ref_count + 1
""",
(normalized,),
)
conn.execute(
"""
UPDATE logs
SET last_access = ?, access_count = access_count + 1
WHERE path = ?
""",
(timestamp, normalized),
)
conn.commit()
def touch_log_file(path: Path) -> None:
"""Refresh access metadata when a file is streamed."""
normalized = _normalize_paths([path])[0]
_ensure_db()
timestamp = datetime.now(timezone.utc).timestamp()
with sqlite3.connect(LOG_DB_PATH) as conn:
conn.execute(
"""
INSERT INTO logs (path, created_at, last_access, access_count)
VALUES (?, ?, ?, 1)
ON CONFLICT(path)
DO UPDATE SET
last_access = excluded.last_access,
access_count = logs.access_count + 1
""",
(normalized, timestamp, timestamp),
)
conn.commit()
def wipe_log_records() -> None:
"""Drop all bookkeeping (used after manual log purges)."""
_ensure_db()
with sqlite3.connect(LOG_DB_PATH) as conn:
conn.execute("DELETE FROM logs")
conn.execute("DELETE FROM refs")
conn.commit()
def _delete_records(paths: List[Path]) -> None:
if not paths:
return
normalized = [(str(p.resolve()),) for p in paths]
with sqlite3.connect(LOG_DB_PATH) as conn:
conn.executemany("DELETE FROM logs WHERE path = ?", normalized)
conn.executemany("DELETE FROM refs WHERE path = ?", normalized)
conn.commit()
def collect_garbage(score_threshold: float) -> list[Path]:
"""Remove seldom-used staged files based on an age/refcount score."""
if score_threshold < 0:
raise ValueError("score_threshold must be non-negative")
_ensure_db()
now = datetime.now(timezone.utc).timestamp()
with sqlite3.connect(LOG_DB_PATH) as conn:
rows = conn.execute(
"""
SELECT
l.path,
COALESCE(l.created_at, ?),
COALESCE(l.last_access, l.created_at, ?),
COALESCE(l.access_count, 0),
COALESCE(r.ref_count, 0)
FROM logs l
LEFT JOIN refs r ON l.path = r.path
""",
(now, now),
).fetchall()
removed: list[Path] = []
for path_str, created_at, last_access, access_count, ref_count in rows:
path = Path(path_str)
age = now - (last_access or created_at or now)
score = age / ((ref_count + 1) * (access_count + 1))
if score < score_threshold:
continue
if path.exists():
try:
path.unlink()
except OSError:
continue
removed.append(path)
_delete_records(removed)
return removed
def release_reference(path: Path, delete_file: bool = True) -> bool:
"""Decrease the reference count and optionally delete the file when it hits zero."""
normalized = _normalize_paths([path])[0]
_ensure_db()
with sqlite3.connect(LOG_DB_PATH) as conn:
row = conn.execute(
"SELECT ref_count FROM refs WHERE path = ?", (normalized,)
).fetchone()
if row is None:
return False
current = row[0] or 0
new_count = max(current - 1, 0)
if new_count > 0:
conn.execute(
"UPDATE refs SET ref_count = ? WHERE path = ?", (new_count, normalized)
)
conn.commit()
return False
conn.execute("DELETE FROM refs WHERE path = ?", (normalized,))
conn.execute("DELETE FROM logs WHERE path = ?", (normalized,))
conn.commit()
removed = False
if delete_file and path.exists():
try:
path.unlink()
removed = True
except OSError:
removed = False
return removed
def tracked_files() -> Dict[str, int]:
"""Return a mapping of tracked file paths to their reference counts."""
_ensure_db()
with sqlite3.connect(LOG_DB_PATH) as conn:
rows = conn.execute("SELECT path, ref_count FROM refs").fetchall()
return {path: ref_count for path, ref_count in rows}

98
seed_start.py Normal file
View File

@ -0,0 +1,98 @@
#!/usr/bin/env python3
"""
Ultra-fast seed generator for mathstream start.txt.
Usage:
python seed_start.py --seed 10 --mode huge
Modes:
ur = /dev/urandom (1 byte per step)
ran = Python random.randint(0,255)
asc = random printable ASCII ord()
seq = deterministic sequence 0255 loop
huge = massive random digit chunks (SSD-limited chaos)
"""
import argparse
import random
import time
from pathlib import Path
from mathstream import StreamNumber, add, clear_logs
from tqdm import tqdm
def archive_start_file(start_path: Path):
"""Archive old start.txt and reset to 0."""
if start_path.exists():
timestamp = int(time.time())
backup = start_path.with_name(f"start.{timestamp}.txt")
backup.write_text(start_path.read_text())
start_path.write_text("0")
def seed_once(start_path: Path, byte_val: str):
"""Add a single number (string form) to start.txt using mathstream (streamed)."""
current = StreamNumber(start_path)
delta = StreamNumber(literal=byte_val)
result = add(current, delta)
new_value = "".join(result.stream())
start_path.write_text(new_value)
def fast_huge_random_string(size_bytes=65536):
"""Return a huge decimal string generated from /dev/urandom bytes."""
with open("/dev/urandom", "rb") as rnd:
chunk = rnd.read(size_bytes)
# Convert to digits quickly
digits = ''.join(str(b % 10) for b in chunk)
# Trim leading zeros so mathstream doesnt choke on '00000'
return digits.lstrip('0') or "0"
def main():
parser = argparse.ArgumentParser(description="Fast seeding for start.txt using mathstream")
parser.add_argument("--seed", type=int, required=True, help="number of random additions")
parser.add_argument("--mode", choices=["ur", "ran", "asc", "seq", "huge"], default="ur", help="random mode")
parser.add_argument("--chunk", type=int, default=65536,
help="bytes per chunk for huge mode (default 64KB)")
args = parser.parse_args()
start_path = Path("start.txt")
clear_logs()
archive_start_file(start_path)
print(f"Seeding {args.seed} iterations with mode '{args.mode}'")
seq_val = 0
if args.mode == "ur":
with open("/dev/urandom", "rb") as rnd:
for _ in tqdm(range(args.seed), desc="Seeding", unit="byte", ncols=80):
byte_val = rnd.read(1)[0]
seed_once(start_path, str(byte_val))
elif args.mode == "ran":
for _ in tqdm(range(args.seed), desc="Seeding", unit="val", ncols=80):
seed_once(start_path, str(random.randint(0, 255)))
elif args.mode == "asc":
printable = [chr(i) for i in range(32, 127)]
for _ in tqdm(range(args.seed), desc="Seeding", unit="char", ncols=80):
seed_once(start_path, str(ord(random.choice(printable))))
elif args.mode == "seq":
for _ in tqdm(range(args.seed), desc="Seeding", unit="seq", ncols=80):
seed_once(start_path, str(seq_val))
seq_val = (seq_val + 1) % 256
elif args.mode == "huge":
for _ in tqdm(range(args.seed), desc="Seeding", unit="huge", ncols=80):
huge_str = fast_huge_random_string(args.chunk)
seed_once(start_path, huge_str)
print(f"\nFinal start.txt value: {start_path.read_text().strip()}")
if __name__ == "__main__":
main()

1
start.txt Normal file
View File

@ -0,0 +1 @@
55569392576944383732069997790263232211253447162098935262971634652345115098934212633724484589756741539606575

139
test.py Normal file
View File

@ -0,0 +1,139 @@
from __future__ import annotations
from pathlib import Path
from mathstream import (
StreamNumber,
add,
sub,
mul,
div,
mod,
pow,
is_even,
is_odd,
clear_logs,
collect_garbage,
DivideByZeroError,
active_streams,
tracked_files,
)
NUMBERS_DIR = Path(__file__).parent / "tests"
def write_number(name: str, digits: str) -> StreamNumber:
"""Persist digits to disk and return a streamable handle."""
NUMBERS_DIR.mkdir(parents=True, exist_ok=True)
target = NUMBERS_DIR / f"{name}.txt"
target.write_text(digits, encoding="utf-8")
return StreamNumber(target)
def read_number(num: StreamNumber) -> str:
"""Collapse streamed chunks back into a concrete string."""
return "".join(num.stream())
def check(label: str, result: StreamNumber, expected: str) -> None:
actual = read_number(result)
assert (
actual == expected
), f"{label} expected {expected}, got {actual}"
print(f"{label} = {actual}")
def check_bool(label: str, value: bool, expected: bool) -> None:
assert value is expected, f"{label} expected {expected}, got {value}"
print(f"{label} = {value}")
def main() -> None:
clear_logs()
# Build a handful of example operands on disk.
big = write_number("huge", "98765432123456789")
small = write_number("tiny", "34567")
negative = write_number("negative", "-1200")
exponent = write_number("power", "5")
negative_divisor = write_number("neg_divisor", "-34567")
literal_even = StreamNumber(literal="2000")
literal_odd = StreamNumber(literal="-3")
zero_literal = StreamNumber(literal="0")
# Showcase the core operations.
total = add(big, small)
difference = sub(big, small)
product = mul(small, negative)
quotient = div(big, small)
powered = pow(small, exponent)
modulus = mod(big, small)
neg_mod_pos = mod(negative, small)
pos_mod_neg = mod(small, negative)
neg_mod_neg = mod(negative, negative_divisor)
literal_combo = add(literal_even, literal_odd)
print("Operands stored under:", NUMBERS_DIR)
check("huge + tiny", total, "98765432123491356")
check("huge - tiny", difference, "98765432123422222")
check("tiny * negative", product, "-41480400")
check("huge // tiny", quotient, "2857217349595")
check("tiny ** power", powered, "49352419431622775997607")
check("huge % tiny", modulus, "6424")
check("negative % tiny", neg_mod_pos, "33367")
check("tiny % negative", pos_mod_neg, "-233")
check("negative % neg_divisor", neg_mod_neg, "-1200")
check("literal_even + literal_odd", literal_combo, "1997")
check_bool("is_even(negative)", is_even(negative), True)
check_bool("is_even(tiny)", is_even(small), False)
check_bool("is_odd(tiny)", is_odd(small), True)
check_bool("is_odd(negative)", is_odd(negative), False)
check_bool("is_even(literal_even)", is_even(literal_even), True)
check_bool("is_odd(literal_odd)", is_odd(literal_odd), True)
# Custom exception coverage
try:
div(literal_even, zero_literal)
except DivideByZeroError:
print("div(literal_even, zero_literal) raised DivideByZeroError as expected")
else:
raise AssertionError("div by zero did not raise DivideByZeroError")
try:
mod(literal_even, zero_literal)
except DivideByZeroError:
print("mod(literal_even, zero_literal) raised DivideByZeroError as expected")
else:
raise AssertionError("mod by zero did not raise DivideByZeroError")
# manual frees should immediately drop staged files
staged = [
total,
difference,
product,
quotient,
powered,
modulus,
neg_mod_pos,
pos_mod_neg,
neg_mod_neg,
literal_combo,
]
for stream in staged:
stream.free()
literal_even.free()
literal_odd.free()
zero_literal.free()
check_bool("total freed file gone", total.path.exists(), False)
check_bool("literal_even freed file gone", literal_even.path.exists(), False)
removed = collect_garbage(0)
print(f"collect_garbage removed {len(removed)} files after manual free")
check_bool("huge operand persists", big.path.exists(), True)
print("Active streams:", active_streams())
print("Tracked files:", tracked_files())
if __name__ == "__main__":
main()

1
tests/huge.txt Normal file
View File

@ -0,0 +1 @@
98765432123456789

1
tests/neg_divisor.txt Normal file
View File

@ -0,0 +1 @@
-34567

1
tests/negative.txt Normal file
View File

@ -0,0 +1 @@
-1200

1
tests/power.txt Normal file
View File

@ -0,0 +1 @@
5

View File

@ -0,0 +1 @@
993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887993284887

1
tests/tiny.txt Normal file
View File

@ -0,0 +1 @@
34567