Merge branch 'feature/fix-invalid-datetime-sorting' into develop

Keep invalid datetimes at end of sort
Merge branch 'feature/harden-secret-key-config' into develop
2026-04-27 15:08:46 +02:00 · 2026-04-27 15:08:34 +02:00 · 2026-04-27 14:24:18 +02:00 · 2026-04-27 14:23:13 +02:00 · 2026-04-27 14:18:03 +02:00 · 2026-04-27 14:17:44 +02:00
27 changed files with 1984 additions and 1 deletions
--- a/.dockerignore
+++ b/.dockerignore
@@ -0,0 +1,6 @@
 .git
 .pytest_cache
 __pycache__
 *.pyc
 instance
 .venv
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,7 @@
 __pycache__/
 .pytest_cache/
 *.pyc
 *.pyo
 *.egg-info/
 .venv/
 instance/
--- a/37
+++ b/37
@@ -0,0 +1,37 @@
 FROM python:3.12-slim AS base
 ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1
 WORKDIR /app
 COPY pyproject.toml README.md ./
 COPY app ./app
 COPY tests ./tests
 COPY wsgi.py ./
 RUN useradd --create-home appuser && \
    mkdir -p /app/instance/outputs && \
    chown -R appuser:appuser /app
 FROM base AS production
 ENV OUTPUT_DIRECTORY=/app/instance/outputs
 RUN pip install --no-cache-dir .
 USER appuser
 EXPOSE 8000
 CMD ["gunicorn", "--bind", "0.0.0.0:8000", "--workers", "2", "--threads", "4", "wsgi:app"]
 FROM base AS test
 ENV OUTPUT_DIRECTORY=/app/instance/outputs
 RUN pip install --no-cache-dir ".[dev]"
 USER appuser
 CMD ["python", "-m", "pytest"]
--- a/README.md
+++ b/README.md
@@ -1,3 +1,169 @@
 # webfortilog
-Flask based application to convert FortiWeb logs 
+Flask-based web application that converts WAF log files into aligned text reports or CSV exports.
 ## Features
 - Upload a UTF-8 log file where each line is a single record
 - Parse shell-style `key=value` and `key="value with spaces"` tokens
 - Support `vendor` mode with fixed columns and `full` mode with dynamic columns
 - Filter by policy and severity with case-sensitive or case-insensitive partial matching
 - Sort by combined datetime or severity ranking
 - Preview results in the browser and download the generated file
 - Run locally with Flask or in Docker with Gunicorn
 ## Project structure
 ```text
 app/
  services/
  templates/
 tests/
 Dockerfile
 pyproject.toml
 wsgi.py
 ```
 ## Local usage
 ### Requirements
 - Python 3.12
 ### Install
 ```bash
 python3.12 -m venv .venv
 source .venv/bin/activate
 pip install -e ".[dev]"
 ```
 ### Run Container
 ```bash
 export FLASK_APP=wsgi.py
 export APP_ENV=development
 export MAX_UPLOAD_SIZE_MB=100
 flask run --debug
 ```
 Open `http://127.0.0.1:5000`.
 ### Example input file
 If you have a local WAF export such as `attack_download.log`, you can use it as a real example upload.
 - Example file: `attack_download.log`
 - Approximate size in the current workspace: `98.5 MiB`
 - The default `MAX_UPLOAD_SIZE_MB=100` setting is sized to accept a file of that size
 ### Test
 ```bash
 pytest
 ```
 ## Docker usage
 ### Build
 ```bash
 docker build -t webfortilog .
 ```
 ### Run
 ```bash
 docker run --rm -p 8000:8000 -e APP_ENV=development -e MAX_UPLOAD_SIZE_MB=100 webfortilog
 ```
 Open `http://127.0.0.1:8000`.
 ## Docker Compose usage
 ### Start the web app
 ```bash
 docker compose up --build web
 ```
 Compose settings are stored in `env`. Update that file to change values such as:
 - `SECRET_KEY`
 - `APP_ENV`
 - `MAX_UPLOAD_SIZE_MB`
 - `OUTPUT_DIRECTORY`
 - `OUTPUT_RETENTION_HOURS`
 - `CLEANUP_ON_STARTUP`
 - `CLEANUP_AFTER_DOWNLOAD`
 For local Docker Compose usage, `APP_ENV=development` allows an internal development-only fallback secret key.
 For production-like environments, set a strong `SECRET_KEY` explicitly.
 ### Run the test suite in a container
 ```bash
 docker compose run --rm test
 ```
 ## Example usage
 ### Browser upload
 1. Start the app with `flask run --debug` or `docker compose up --build web`
 2. Open the web UI
 3. Upload `attack_download.log`
 4. Try `vendor` mode with `text` output for a readable preview
 5. Try `full` mode with `csv` output for complete export coverage
 ### Command-line upload example
 ```bash
 curl -X POST http://127.0.0.1:5000/convert \
  -F "log_file=@attack_download.log" \
  -F "mode=vendor" \
  -F "output_format=text" \
  -F "sort_by=datetime" \
  -F "order=asc" \
  -F "policy_cs=" \
  -F "policy_ci=" \
  -F "severity_cs=" \
  -F "severity_ci="
 ```
 ## Notes
 - Temporary output files are written to `instance/outputs`
 - Generated files are cleaned up according to the configured output retention policy
 - The application does not require a database
 - Gunicorn is used as the production WSGI server
 - Parsing and export writing are streamed to reduce memory usage on large uploads
 - Sorting still materializes the filtered record set because global ordering by datetime or severity requires the full filtered input
 - Default upload limit is 100 MiB
 - Set `MAX_UPLOAD_SIZE_MB` to configure the upload limit in megabytes
 - `MAX_CONTENT_LENGTH` is also supported as a lower-level byte-based override
 - `SECRET_KEY` is required in production-like environments and must not use placeholder values such as `change-me`
 - Development-only fallback secret key behavior is enabled only when `APP_ENV=development` or `FLASK_ENV=development`
 - `OUTPUT_RETENTION_HOURS` controls how long generated output files are kept
 - `CLEANUP_ON_STARTUP=true` removes expired generated files when the app starts
 - `CLEANUP_AFTER_DOWNLOAD=true` deletes a result only after the response finishes sending
 ## Secure configuration example
 ### Production-like environment
 ```bash
 python3 - <<'PY'
 import secrets
 print(secrets.token_urlsafe(48))
 PY
 ```
 Use the generated value as `SECRET_KEY`, for example:
 ```bash
 docker run --rm -p 8000:8000 \
  -e SECRET_KEY='replace-with-a-long-random-secret' \
  -e MAX_UPLOAD_SIZE_MB=100 \
  webfortilog
 ```
--- a/app/init.py
+++ b/app/init.py
@@ -0,0 +1,49 @@
 from pathlib import Path
 from flask import Flask, flash, render_template
 from werkzeug.exceptions import RequestEntityTooLarge
 from app.config import Config, validate_secret_key
 from app.routes import main_blueprint
 from app.services.storage import cleanup_expired_outputs
 def _format_size_limit(size_limit_bytes: int) -> str:
    """Render the upload limit in a friendly unit for error messages."""
    if size_limit_bytes >= 1024 * 1024:
        return f"{size_limit_bytes / (1024 * 1024):.0f} MB"
    if size_limit_bytes >= 1024:
        return f"{size_limit_bytes / 1024:.0f} KB"
    return f"{size_limit_bytes} bytes"
 def create_app(config_class: type[Config] = Config) -> Flask:
    """Application factory used by Flask and Gunicorn."""
    app = Flask(__name__, instance_relative_config=True)
    app.config.from_object(config_class)
    validate_secret_key(app.config["SECRET_KEY"])
    output_dir = Path(app.config["OUTPUT_DIRECTORY"])
    if not output_dir.is_absolute():
        output_dir = Path(app.instance_path) / output_dir
    app.config["OUTPUT_DIRECTORY"] = output_dir
    output_dir.mkdir(parents=True, exist_ok=True)
    if app.config.get("CLEANUP_ON_STARTUP", False):
        cleanup_expired_outputs(
            output_dir=output_dir,
            retention_hours=app.config.get("OUTPUT_RETENTION_HOURS", 24),
        )
    app.register_blueprint(main_blueprint)
    @app.errorhandler(RequestEntityTooLarge)
    def handle_file_too_large(_error):
        size_limit_bytes = int(app.config["MAX_CONTENT_LENGTH"])
        flash(
            f"The uploaded file is too large. Maximum allowed size is {_format_size_limit(size_limit_bytes)}.",
            "danger",
        )
        return render_template("index.html"), 413
    return app
--- a/app/config.py
+++ b/app/config.py
@@ -0,0 +1,83 @@
 import os
 from pathlib import Path
 DEVELOPMENT_SECRET_KEY = "dev-secret-key-change-me"
 UNSAFE_SECRET_KEYS = {
    "",
    "change-me",
    "dev-secret-key-change-me",
    "secret",
    "default",
 }
 def _get_bool_setting(name: str, default: bool) -> bool:
    """Parse conventional boolean environment values."""
    value = os.environ.get(name)
    if value is None:
        return default
    return value.strip().lower() in {"1", "true", "yes", "on"}
 def _get_max_content_length() -> int:
    """Resolve the upload size limit from environment settings."""
    upload_limit_mb = os.environ.get("MAX_UPLOAD_SIZE_MB")
    if upload_limit_mb:
        return int(upload_limit_mb) * 1024 * 1024
    max_content_length = os.environ.get("MAX_CONTENT_LENGTH")
    if max_content_length:
        return int(max_content_length)
    return 100 * 1024 * 1024
 def _get_app_env() -> str:
    """Resolve the effective application environment."""
    return (
        os.environ.get("APP_ENV")
        or os.environ.get("FLASK_ENV")
        or "production"
    ).strip().lower()
 def _is_development_env() -> bool:
    """Return whether the app is explicitly running in development mode."""
    return _get_app_env() == "development"
 def _get_secret_key() -> str:
    """Resolve the secret key with a development-only fallback."""
    secret_key = os.environ.get("SECRET_KEY", "").strip()
    if secret_key:
        return secret_key
    if _is_development_env():
        return DEVELOPMENT_SECRET_KEY
    return ""
 def validate_secret_key(secret_key: str) -> None:
    """Fail fast when a production-like environment uses an unsafe secret key."""
    normalized = secret_key.strip()
    if _is_development_env():
        return
    if normalized.lower() in UNSAFE_SECRET_KEYS:
        raise RuntimeError(
            "SECRET_KEY is missing or unsafe for a production-like environment. "
            "Set SECRET_KEY to a long random value, or use APP_ENV=development only for local development."
        )
 class Config:
    """Default configuration for local and container usage."""
    SECRET_KEY = _get_secret_key()
    # Default to 100 MiB so larger WAF exports can be processed without tuning.
    MAX_CONTENT_LENGTH = _get_max_content_length()
    PREVIEW_RECORD_LIMIT = int(os.environ.get("PREVIEW_RECORD_LIMIT", 5))
    OUTPUT_DIRECTORY = Path(
        os.environ.get("OUTPUT_DIRECTORY", Path("instance") / "outputs")
    )
    OUTPUT_RETENTION_HOURS = int(os.environ.get("OUTPUT_RETENTION_HOURS", 24))
    CLEANUP_ON_STARTUP = _get_bool_setting("CLEANUP_ON_STARTUP", True)
    CLEANUP_AFTER_DOWNLOAD = _get_bool_setting("CLEANUP_AFTER_DOWNLOAD", False)
--- a/app/constants.py
+++ b/app/constants.py
@@ -0,0 +1,35 @@
 VENDOR_FIELDS = [
    "v015xxxxdate",
    "time",
    "policy",
    "http_method",
    "http_host",
    "http_url",
    "http_refer",
    "service",
    "backend_service",
    "msg",
    "signature_subclass",
    "signature_id",
    "owasp_top10",
    "match_location",
    "action",
    "severity_level",
 ]
 SEVERITY_RANKING = {
    "critical": 5,
    "high": 4,
    "medium": 3,
    "low": 2,
    "info": 1,
    "informational": 1,
    "unknown": 0,
    "none": 0,
    "n/a": 0,
 }
 SORTABLE_FIELDS = {"datetime", "severity"}
 SORT_ORDERS = {"asc", "desc"}
 MODES = {"vendor", "full"}
 OUTPUT_FORMATS = {"text", "csv"}
--- a/app/routes.py
+++ b/app/routes.py
@@ -0,0 +1,156 @@
 from dataclasses import dataclass
 from pathlib import Path
 from flask import (
    Blueprint,
    current_app,
    flash,
    redirect,
    render_template,
    request,
    send_file,
    url_for,
 )
 from werkzeug.datastructures import FileStorage
 from werkzeug.wsgi import ClosingIterator
 from app.constants import MODES, OUTPUT_FORMATS, SORTABLE_FIELDS, SORT_ORDERS
 from app.services.conversion import convert_uploaded_log
 from app.services.parser import LogParseError
 from app.services.processing import ProcessingError, ProcessingOptions
 from app.services.storage import delete_result_files, load_result_metadata
 main_blueprint = Blueprint("main", __name__)
@dataclass(slots=True)
 class FormData:
    mode: str
    output_format: str
    sort_by: str
    order: str
    policy_cs: str
    policy_ci: str
    severity_cs: str
    severity_ci: str
 def _normalize_form() -> FormData:
    return FormData(
        mode=request.form.get("mode", "vendor").strip(),
        output_format=request.form.get("output_format", "text").strip(),
        sort_by=request.form.get("sort_by", "datetime").strip(),
        order=request.form.get("order", "asc").strip(),
        policy_cs=request.form.get("policy_cs", "").strip(),
        policy_ci=request.form.get("policy_ci", "").strip(),
        severity_cs=request.form.get("severity_cs", "").strip(),
        severity_ci=request.form.get("severity_ci", "").strip(),
    )
 def _validate_form(file: FileStorage | None, form: FormData) -> list[str]:
    errors: list[str] = []
    if file is None or not file.filename:
        errors.append("Please choose a log file to upload.")
    if form.mode not in MODES:
        errors.append("Invalid mode selection.")
    if form.output_format not in OUTPUT_FORMATS:
        errors.append("Invalid output format selection.")
    if form.sort_by not in SORTABLE_FIELDS:
        errors.append("Invalid sort field selection.")
    if form.order not in SORT_ORDERS:
        errors.append("Invalid sort order selection.")
    if form.policy_cs and form.policy_ci:
        errors.append(
            "Policy filter must use either case-sensitive or case-insensitive match, not both."
        )
    if form.severity_cs and form.severity_ci:
        errors.append(
            "Severity filter must use either case-sensitive or case-insensitive match, not both."
        )
    return errors
@main_blueprint.get("/")
 def index():
    return render_template("index.html")
@main_blueprint.post("/convert")
 def convert():
    uploaded_file = request.files.get("log_file")
    form = _normalize_form()
    errors = _validate_form(uploaded_file, form)
    if errors:
        for error in errors:
            flash(error, "danger")
        return render_template("index.html", form=form), 400
    assert uploaded_file is not None
    try:
        options = ProcessingOptions(
            policy_cs=form.policy_cs,
            policy_ci=form.policy_ci,
            severity_cs=form.severity_cs,
            severity_ci=form.severity_ci,
            sort_by=form.sort_by,
            order=form.order,
            mode=form.mode,
        )
        conversion_result = convert_uploaded_log(
            stream=uploaded_file.stream,
            options=options,
            output_dir=current_app.config["OUTPUT_DIRECTORY"],
            output_format=form.output_format,
            preview_record_limit=current_app.config["PREVIEW_RECORD_LIMIT"],
        )
    except (LogParseError, ProcessingError) as exc:
        flash(str(exc), "danger")
        return render_template("index.html", form=form), 400
    except UnicodeDecodeError:
        flash(
            "The uploaded file could not be decoded. Supported encodings are UTF-8, UTF-8 with BOM, Windows-1252, and Latin-1.",
            "danger",
        )
        return render_template("index.html", form=form), 400
    return render_template(
        "result.html",
        result_id=conversion_result.metadata.result_id,
        preview_text=conversion_result.export_result.preview(
            current_app.config["PREVIEW_RECORD_LIMIT"]
        ),
        output_format=form.output_format,
        record_count=conversion_result.filtered_count,
        parsed_count=conversion_result.parsed_count,
        filtered_count=conversion_result.filtered_count,
        mode=form.mode,
        sort_by=form.sort_by,
        order=form.order,
    )
@main_blueprint.get("/download/<result_id>")
 def download(result_id: str):
    metadata = load_result_metadata(current_app.config["OUTPUT_DIRECTORY"], result_id)
    if metadata is None:
        flash("Requested output file could not be found.", "danger")
        return redirect(url_for("main.index"))
    response = send_file(
        Path(metadata["file_path"]),
        as_attachment=True,
        download_name=metadata["download_name"],
        mimetype=metadata["mimetype"],
        max_age=0,
    )
    if current_app.config.get("CLEANUP_AFTER_DOWNLOAD", False):
        output_dir = current_app.config["OUTPUT_DIRECTORY"]
        response.response = ClosingIterator(
            response.response,
            [lambda: delete_result_files(output_dir=output_dir, result_id=result_id)],
        )
    return response
--- a/app/services/init.py
+++ b/app/services/init.py
@@ -0,0 +1 @@
 """Service layer for parsing, processing, exporting, and file storage."""
--- a/app/services/conversion.py
+++ b/app/services/conversion.py
@@ -0,0 +1,47 @@
 from dataclasses import dataclass
 from pathlib import Path
 from app.services.exporter import ExportResult
 from app.services.parser import create_parse_session
 from app.services.processing import ProcessingOptions, filter_records, sort_records
 from app.services.storage import ResultMetadata, persist_result
@dataclass(slots=True)
 class ConversionResult:
    metadata: ResultMetadata
    export_result: ExportResult
    parsed_count: int
    filtered_count: int
 def convert_uploaded_log(
    stream,
    options: ProcessingOptions,
    output_dir: Path,
    output_format: str,
    preview_record_limit: int,
 ) -> ConversionResult:
    """Convert an uploaded log into a persisted export with a small in-memory preview.
    Parsing, filtering, and export writing are streamed to keep memory usage low.
    Sorting still materializes the filtered records because global ordering by datetime
    or severity requires seeing the whole filtered result set first.
    """
    parse_session = create_parse_session(stream)
    sorted_records = sort_records(filter_records(parse_session.iter_records(), options), options)
    metadata, export_result = persist_result(
        output_dir=output_dir,
        records=sorted_records,
        union_keys=parse_session.union_keys(),
        mode=options.mode,
        output_format=output_format,
        preview_record_limit=preview_record_limit,
    )
    return ConversionResult(
        metadata=metadata,
        export_result=export_result,
        parsed_count=parse_session.parsed_count,
        filtered_count=len(sorted_records),
    )
--- a/app/services/exporter.py
+++ b/app/services/exporter.py
@@ -0,0 +1,107 @@
 import csv
 import io
 from dataclasses import dataclass
 from pathlib import Path
 from typing import Sequence, TextIO
 from app.constants import VENDOR_FIELDS
@dataclass(slots=True)
 class ExportResult:
    columns: list[str]
    output_format: str
    preview_text: str
    def preview(self, _record_limit: int) -> str:
        """Return the preview that was collected during export writing."""
        return self.preview_text
 def write_export(
    file_path: Path,
    records: Sequence[dict[str, str]],
    union_keys: list[str],
    mode: str,
    output_format: str,
    preview_record_limit: int,
 ) -> ExportResult:
    """Write the final export directly to disk and keep only a small preview in memory."""
    columns = VENDOR_FIELDS if mode == "vendor" else union_keys
    with file_path.open("w", encoding="utf-8", newline="") as export_file:
        if output_format == "text":
            preview_text = _write_text(
                export_file=export_file,
                records=records,
                columns=columns,
                preview_record_limit=preview_record_limit,
            )
        else:
            preview_text = _write_csv(
                export_file=export_file,
                records=records,
                columns=columns,
                preview_record_limit=preview_record_limit,
            )
    return ExportResult(
        columns=columns,
        output_format=output_format,
        preview_text=preview_text,
    )
 def _write_text(
    export_file: TextIO,
    records: Sequence[dict[str, str]],
    columns: list[str],
    preview_record_limit: int,
 ) -> str:
    max_key_length = max((len(column) for column in columns), default=0)
    preview_lines: list[str] = []
    wrote_line = False
    for index, record in enumerate(records, start=1):
        header = f"--- record {index} ---"
        wrote_line = _write_line(export_file, header, wrote_line)
        if index <= preview_record_limit:
            preview_lines.append(header)
        for column in columns:
            line = f"  {column.ljust(max_key_length)} = {record.get(column, '')}"
            wrote_line = _write_line(export_file, line, wrote_line)
            if index <= preview_record_limit:
                preview_lines.append(line)
    return "\n".join(preview_lines)
 def _write_csv(
    export_file: TextIO,
    records: Sequence[dict[str, str]],
    columns: list[str],
    preview_record_limit: int,
 ) -> str:
    writer = csv.DictWriter(export_file, fieldnames=columns, extrasaction="ignore")
    writer.writeheader()
    preview_buffer = io.StringIO()
    preview_writer = csv.DictWriter(preview_buffer, fieldnames=columns, extrasaction="ignore")
    preview_writer.writeheader()
    for index, record in enumerate(records, start=1):
        row = {column: record.get(column, "") for column in columns}
        writer.writerow(row)
        if index <= preview_record_limit:
            preview_writer.writerow(row)
    return preview_buffer.getvalue().rstrip("\n")
 def _write_line(export_file: TextIO, line: str, wrote_line: bool) -> bool:
    """Write lines without leaving a trailing newline at the end of the file."""
    if wrote_line:
        export_file.write("\n")
    export_file.write(line)
    return True
--- a/app/services/parser.py
+++ b/app/services/parser.py
@@ -0,0 +1,187 @@
 import codecs
 from collections import OrderedDict
 from dataclasses import dataclass, field
 from io import BufferedIOBase, TextIOBase
 import re
 RECORD_PREFIX = "v015xxxxdate="
 KEY_PATTERN = re.compile(r"(?:(?<=\s)|^)([A-Za-z_][A-Za-z0-9_]*)=")
 class LogParseError(ValueError):
    """Raised when the uploaded log file cannot be parsed."""
@dataclass(slots=True)
 class ParseSession:
    """Stateful streamed parser for uploaded log files."""
    stream: BufferedIOBase | TextIOBase
    encoding: str | None
    _union_keys: OrderedDict[str, None] = field(default_factory=OrderedDict)
    parsed_count: int = 0
    _consumed: bool = False
    def iter_records(self):
        if self._consumed:
            raise RuntimeError("ParseSession records can only be consumed once.")
        self._consumed = True
        for line_number, line in _iter_logical_records(_iter_physical_lines(self.stream, self.encoding)):
            record = _parse_record(line, line_number)
            for key in record:
                self._union_keys.setdefault(key, None)
            self.parsed_count += 1
            yield record
    def union_keys(self) -> list[str]:
        return list(self._union_keys.keys())
 def create_parse_session(stream: BufferedIOBase | TextIOBase) -> ParseSession:
    """Prepare a streamed parser session without materializing the full upload in memory."""
    return ParseSession(stream=stream, encoding=_resolve_stream_encoding(stream))
 def _normalize_value(value: str) -> str:
    """Remove balanced shell-style quotes while tolerating malformed values."""
    value = value.strip()
    if len(value) >= 2 and value[0] == value[-1] and value[0] in {'"', "'"}:
        return value[1:-1]
    if value[:1] in {'"', "'"}:
        return value[1:]
    return value
 def _resolve_stream_encoding(stream: BufferedIOBase | TextIOBase) -> str | None:
    """Detect the most suitable stream encoding without reading the full file into memory."""
    probe = stream.read(0)
    if isinstance(probe, str):
        return None
    for encoding in ("utf-8-sig", "cp1252", "latin-1"):
        try:
            _validate_stream_encoding(stream, encoding)
            return encoding
        except UnicodeDecodeError:
            continue
    raise UnicodeDecodeError("unknown", b"", 0, 1, "Unsupported text encoding.")
 def _validate_stream_encoding(stream: BufferedIOBase | TextIOBase, encoding: str) -> None:
    """Scan the stream to verify that the candidate encoding can decode it fully."""
    _rewind_stream(stream)
    decoder = codecs.getincrementaldecoder(encoding)()
    for chunk in iter(lambda: stream.read(64 * 1024), b""):
        decoder.decode(chunk, final=False)
    decoder.decode(b"", final=True)
    _rewind_stream(stream)
 def _iter_physical_lines(
    stream: BufferedIOBase | TextIOBase,
    encoding: str | None,
 ):
    """Yield decoded physical lines from the uploaded stream without full-file buffering."""
    _rewind_stream(stream)
    if encoding is None:
        for line_number, raw_line in enumerate(stream, start=1):
            yield line_number, raw_line
        return
    line_number = 1
    decoder = codecs.getincrementaldecoder(encoding)()
    pending = ""
    for chunk in iter(lambda: stream.read(64 * 1024), b""):
        text = decoder.decode(chunk, final=False)
        pending += text
        while True:
            newline_index = pending.find("\n")
            if newline_index == -1:
                break
            line = pending[: newline_index + 1]
            pending = pending[newline_index + 1 :]
            yield line_number, line
            line_number += 1
    pending += decoder.decode(b"", final=True)
    while True:
        newline_index = pending.find("\n")
        if newline_index == -1:
            break
        line = pending[: newline_index + 1]
        pending = pending[newline_index + 1 :]
        yield line_number, line
        line_number += 1
    if pending:
        yield line_number, pending
 def _rewind_stream(stream: BufferedIOBase | TextIOBase) -> None:
    """Move the uploaded stream back to the start."""
    if not hasattr(stream, "seek"):
        raise LogParseError("The uploaded file stream is not seekable.")
    stream.seek(0)
 def _parse_record(line: str, line_number: int) -> dict[str, str]:
    """Parse a logical record by locating `key=` boundaries instead of splitting on spaces."""
    matches = list(KEY_PATTERN.finditer(line))
    if not matches:
        raise LogParseError(f"Line {line_number}: no key=value pairs were found.")
    record: dict[str, str] = {}
    for index, match in enumerate(matches):
        key = match.group(1)
        value_start = match.end()
        value_end = matches[index + 1].start() if index + 1 < len(matches) else len(line)
        raw_value = line[value_start:value_end].strip()
        if raw_value and raw_value[:1] not in {'"', "'"} and any(
            char.isspace() for char in raw_value
        ):
            raise LogParseError(
                f"Line {line_number}: invalid unquoted value for key '{key}'."
            )
        value = _normalize_value(raw_value)
        record[key] = value
    return record
 def _iter_logical_records(physical_lines):
    """Rebuild logical records when embedded newlines split a single log entry."""
    current_record: list[str] = []
    current_start_line: int | None = None
    for line_number, raw_line in physical_lines:
        line = raw_line.strip()
        if not line:
            continue
        if line.startswith(RECORD_PREFIX):
            if current_record and current_start_line is not None:
                yield current_start_line, "".join(current_record)
            current_record = [line]
            current_start_line = line_number
            continue
        if current_record:
            current_record.append(line)
            continue
        raise LogParseError(
            f"Line {line_number}: unexpected content before the first log record."
        )
    if current_record and current_start_line is not None:
        yield current_start_line, "".join(current_record)
 def parse_log_file(stream: BufferedIOBase | TextIOBase) -> tuple[list[dict[str, str]], list[str]]:
    """Compatibility helper that still materializes all parsed records when needed."""
    session = create_parse_session(stream)
    records = list(session.iter_records())
    return records, session.union_keys()
--- a/app/services/processing.py
+++ b/app/services/processing.py
@@ -0,0 +1,92 @@
 from dataclasses import dataclass
 from datetime import datetime
 from typing import Iterable
 from app.constants import SEVERITY_RANKING
 class ProcessingError(ValueError):
    """Raised when records cannot be processed according to the selected options."""
@dataclass(slots=True)
 class ProcessingOptions:
    policy_cs: str
    policy_ci: str
    severity_cs: str
    severity_ci: str
    sort_by: str
    order: str
    mode: str
 def filter_records(
    records: Iterable[dict[str, str]], options: ProcessingOptions
 ) -> Iterable[dict[str, str]]:
    """Apply user-selected filters lazily to parsed records."""
    for record in records:
        policy_value = record.get("policy", "")
        severity_value = record.get("severity_level", "")
        if options.policy_cs and options.policy_cs not in policy_value:
            continue
        if options.policy_ci and options.policy_ci.lower() not in policy_value.lower():
            continue
        if options.severity_cs and options.severity_cs not in severity_value:
            continue
        if options.severity_ci and options.severity_ci.lower() not in severity_value.lower():
            continue
        yield record
 def sort_records(
    records: Iterable[dict[str, str]], options: ProcessingOptions
 ) -> list[dict[str, str]]:
    """Sort records by datetime or severity using the requested order."""
    reverse = options.order == "desc"
    if options.sort_by == "datetime":
        return _sort_records_by_datetime(records, reverse)
    elif options.sort_by == "severity":
        key_func = _severity_key
    else:
        raise ProcessingError("Unsupported sort field.")
    return sorted(records, key=key_func, reverse=reverse)
 def _sort_records_by_datetime(
    records: Iterable[dict[str, str]], reverse: bool
 ) -> list[dict[str, str]]:
    """Sort valid datetimes normally and always place invalid/missing values last."""
    valid_records: list[tuple[datetime, dict[str, str]]] = []
    invalid_records: list[dict[str, str]] = []
    for record in records:
        parsed_datetime = _parse_datetime(record)
        if parsed_datetime is None:
            invalid_records.append(record)
            continue
        valid_records.append((parsed_datetime, record))
    sorted_valid_records = sorted(valid_records, key=lambda item: item[0], reverse=reverse)
    return [record for _parsed, record in sorted_valid_records] + invalid_records
 def _parse_datetime(record: dict[str, str]) -> datetime | None:
    date_value = record.get("v015xxxxdate", "").strip()
    time_value = record.get("time", "").strip()
    if not date_value or not time_value:
        return None
    try:
        return datetime.strptime(f"{date_value} {time_value}", "%Y-%m-%d %H:%M:%S")
    except ValueError:
        return None
 def _severity_key(record: dict[str, str]) -> tuple[int, str]:
    raw_value = record.get("severity_level", "").strip().lower()
    rank = SEVERITY_RANKING.get(raw_value, 0)
    return (rank, raw_value)
--- a/app/services/storage.py
+++ b/app/services/storage.py
@@ -0,0 +1,109 @@
 import json
 import uuid
 from dataclasses import asdict, dataclass
 from datetime import datetime, timedelta, timezone
 from pathlib import Path
 from app.services.exporter import ExportResult, write_export
@dataclass(slots=True)
 class ResultMetadata:
    result_id: str
    file_path: str
    download_name: str
    mimetype: str
 def _result_paths(output_dir: Path, result_id: str) -> tuple[Path, Path]:
    """Build the sidecar metadata and output file search pattern for a result id."""
    metadata_path = output_dir / f"{result_id}.json"
    return metadata_path, output_dir / f"{result_id}"
 def persist_result(
    output_dir: Path,
    records: list[dict[str, str]],
    union_keys: list[str],
    mode: str,
    output_format: str,
    preview_record_limit: int,
 ) -> tuple[ResultMetadata, ExportResult]:
    """Persist generated output and sidecar metadata in a temporary directory."""
    result_id = uuid.uuid4().hex
    extension = "txt" if output_format == "text" else "csv"
    mimetype = "text/plain; charset=utf-8" if extension == "txt" else "text/csv; charset=utf-8"
    file_path = output_dir / f"{result_id}.{extension}"
    metadata_path = output_dir / f"{result_id}.json"
    export_result = write_export(
        file_path=file_path,
        records=records,
        union_keys=union_keys,
        mode=mode,
        output_format=output_format,
        preview_record_limit=preview_record_limit,
    )
    metadata = ResultMetadata(
        result_id=result_id,
        file_path=str(file_path),
        download_name=f"waf-report.{extension}",
        mimetype=mimetype,
    )
    metadata_path.write_text(json.dumps(asdict(metadata)), encoding="utf-8")
    return metadata, export_result
 def load_result_metadata(output_dir: Path, result_id: str) -> dict[str, str] | None:
    """Load sidecar metadata for a generated file."""
    metadata_path, _base_path = _result_paths(output_dir, result_id)
    if not metadata_path.exists():
        return None
    return json.loads(metadata_path.read_text(encoding="utf-8"))
 def delete_result_files(output_dir: Path, result_id: str) -> None:
    """Delete a generated output file and its metadata sidecar if they still exist."""
    metadata_path, base_path = _result_paths(output_dir, result_id)
    for output_file in output_dir.glob(f"{base_path.name}.*"):
        if output_file.name == metadata_path.name:
            continue
        output_file.unlink(missing_ok=True)
    metadata_path.unlink(missing_ok=True)
 def cleanup_expired_outputs(output_dir: Path, retention_hours: int) -> int:
    """Delete generated output sets older than the configured retention window."""
    cutoff = datetime.now(timezone.utc) - timedelta(hours=retention_hours)
    deleted_results = 0
    for metadata_path in output_dir.glob("*.json"):
        try:
            payload = json.loads(metadata_path.read_text(encoding="utf-8"))
        except (OSError, json.JSONDecodeError):
            payload = {}
        result_id = payload.get("result_id") or metadata_path.stem
        file_path = Path(payload["file_path"]) if "file_path" in payload else None
        newest_mtime = _newest_mtime(metadata_path, file_path)
        if newest_mtime is None or newest_mtime >= cutoff:
            continue
        delete_result_files(output_dir=output_dir, result_id=result_id)
        deleted_results += 1
    return deleted_results
 def _newest_mtime(metadata_path: Path, file_path: Path | None) -> datetime | None:
    """Return the newest modification time across the metadata and output file."""
    mtimes: list[datetime] = []
    if metadata_path.exists():
        mtimes.append(datetime.fromtimestamp(metadata_path.stat().st_mtime, tz=timezone.utc))
    if file_path is not None and file_path.exists():
        mtimes.append(datetime.fromtimestamp(file_path.stat().st_mtime, tz=timezone.utc))
    if not mtimes:
        return None
    return max(mtimes)
--- a/app/templates/base.html
+++ b/app/templates/base.html
@@ -0,0 +1,38 @@
 <!doctype html>
 <html lang="en">
  <head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <title>WAF Log Converter</title>
    <link
      href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.3/dist/css/bootstrap.min.css"
      rel="stylesheet"
      integrity="sha384-QWTKZyjpPEjISv5WaRU9OFeRpok6YctnYmDr5pNlyT2bRjXh0JMhjY6hW+ALEwIH"
      crossorigin="anonymous"
    >
  </head>
  <body class="bg-body-tertiary">
    <main class="container py-5">
      <div class="row justify-content-center">
        <div class="col-lg-10">
          <div class="mb-4">
            <h1 class="display-6 fw-semibold">WAF Log Converter</h1>
            <p class="text-secondary mb-0">
              Upload a UTF-8 WAF log file and export a filtered report as readable text or CSV.
            </p>
          </div>
          {% with messages = get_flashed_messages(with_categories=true) %}
            {% if messages %}
              {% for category, message in messages %}
                <div class="alert alert-{{ category }}" role="alert">{{ message }}</div>
              {% endfor %}
            {% endif %}
          {% endwith %}
          {% block content %}{% endblock %}
        </div>
      </div>
    </main>
  </body>
 </html>
--- a/app/templates/index.html
+++ b/app/templates/index.html
@@ -0,0 +1,100 @@
 {% extends "base.html" %}
 {% set form = form or none %}
 {% block content %}
  <div class="card shadow-sm border-0">
    <div class="card-body p-4">
      <form method="post" action="{{ url_for('main.convert') }}" enctype="multipart/form-data" novalidate>
        <div class="mb-4">
          <label for="log_file" class="form-label fw-semibold">Log file</label>
          <input class="form-control" id="log_file" name="log_file" type="file" required>
          <div class="form-text">Each line must contain one record using shell-like key/value tokens.</div>
        </div>
        <div class="row g-3">
          <div class="col-md-3">
            <label for="mode" class="form-label">Mode</label>
            <select class="form-select" id="mode" name="mode">
              <option value="vendor" {% if form and form.mode == "vendor" %}selected{% endif %}>Vendor</option>
              <option value="full" {% if form and form.mode == "full" %}selected{% endif %}>Full</option>
            </select>
          </div>
          <div class="col-md-3">
            <label for="output_format" class="form-label">Format</label>
            <select class="form-select" id="output_format" name="output_format">
              <option value="text" {% if form and form.output_format == "text" %}selected{% endif %}>Text</option>
              <option value="csv" {% if form and form.output_format == "csv" %}selected{% endif %}>CSV</option>
            </select>
          </div>
          <div class="col-md-3">
            <label for="sort_by" class="form-label">Sort by</label>
            <select class="form-select" id="sort_by" name="sort_by">
              <option value="datetime" {% if not form or form.sort_by == "datetime" %}selected{% endif %}>Datetime</option>
              <option value="severity" {% if form and form.sort_by == "severity" %}selected{% endif %}>Severity</option>
            </select>
          </div>
          <div class="col-md-3">
            <label for="order" class="form-label">Order</label>
            <select class="form-select" id="order" name="order">
              <option value="asc" {% if not form or form.order == "asc" %}selected{% endif %}>Ascending</option>
              <option value="desc" {% if form and form.order == "desc" %}selected{% endif %}>Descending</option>
            </select>
          </div>
        </div>
        <hr class="my-4">
        <div class="row g-3">
          <div class="col-md-6">
            <label for="policy_cs" class="form-label">Policy filter, case-sensitive</label>
            <input
              class="form-control"
              id="policy_cs"
              name="policy_cs"
              type="text"
              value="{{ form.policy_cs if form else '' }}"
            >
          </div>
          <div class="col-md-6">
            <label for="policy_ci" class="form-label">Policy filter, case-insensitive</label>
            <input
              class="form-control"
              id="policy_ci"
              name="policy_ci"
              type="text"
              value="{{ form.policy_ci if form else '' }}"
            >
          </div>
          <div class="col-md-6">
            <label for="severity_cs" class="form-label">Severity filter, case-sensitive</label>
            <input
              class="form-control"
              id="severity_cs"
              name="severity_cs"
              type="text"
              value="{{ form.severity_cs if form else '' }}"
            >
          </div>
          <div class="col-md-6">
            <label for="severity_ci" class="form-label">Severity filter, case-insensitive</label>
            <input
              class="form-control"
              id="severity_ci"
              name="severity_ci"
              type="text"
              value="{{ form.severity_ci if form else '' }}"
            >
          </div>
        </div>
        <div class="alert alert-light border mt-4 mb-0" role="note">
          Use only one policy filter and one severity filter at a time. Matching happens as a partial substring.
        </div>
        <div class="mt-4 d-flex gap-2">
          <button class="btn btn-primary" type="submit">Convert log</button>
          <button class="btn btn-outline-secondary" type="reset">Reset</button>
        </div>
      </form>
    </div>
  </div>
 {% endblock %}
--- a/app/templates/result.html
+++ b/app/templates/result.html
@@ -0,0 +1,45 @@
 {% extends "base.html" %}
 {% block content %}
  <div class="row g-4">
    <div class="col-lg-4">
      <div class="card shadow-sm border-0 h-100">
        <div class="card-body">
          <h2 class="h4">Result summary</h2>
          <dl class="row mb-4">
            <dt class="col-sm-5">Parsed records</dt>
            <dd class="col-sm-7">{{ parsed_count }}</dd>
            <dt class="col-sm-5">Output records</dt>
            <dd class="col-sm-7">{{ filtered_count }}</dd>
            <dt class="col-sm-5">Mode</dt>
            <dd class="col-sm-7 text-capitalize">{{ mode }}</dd>
            <dt class="col-sm-5">Format</dt>
            <dd class="col-sm-7 text-uppercase">{{ output_format }}</dd>
            <dt class="col-sm-5">Sort</dt>
            <dd class="col-sm-7">{{ sort_by }} / {{ order }}</dd>
          </dl>
          <div class="d-grid gap-2">
            <a class="btn btn-primary" href="{{ url_for('main.download', result_id=result_id) }}">
              Download export
            </a>
            <a class="btn btn-outline-secondary" href="{{ url_for('main.index') }}">
              Convert another file
            </a>
          </div>
        </div>
      </div>
    </div>
    <div class="col-lg-8">
      <div class="card shadow-sm border-0">
        <div class="card-body">
          <div class="d-flex justify-content-between align-items-center mb-3">
            <h2 class="h4 mb-0">Preview</h2>
            <span class="badge text-bg-secondary">Showing up to {{ record_count if record_count < 5 else 5 }} records</span>
          </div>
          <pre class="bg-dark-subtle p-3 rounded small mb-0" style="white-space: pre-wrap;">{{ preview_text }}</pre>
        </div>
      </div>
    </div>
  </div>
 {% endblock %}
--- a/compose.yaml
+++ b/compose.yaml
@@ -0,0 +1,16 @@
 services:
  web:
    build:
      context: .
      target: production
    ports:
      - "8000:8000"
    env_file:
      - env
  test:
    build:
      context: .
      target: test
    env_file:
      - env
--- a/6
+++ b/6
@@ -0,0 +1,6 @@
 APP_ENV=development
 MAX_UPLOAD_SIZE_MB=120
 OUTPUT_DIRECTORY=/app/instance/outputs
 OUTPUT_RETENTION_HOURS=24
 CLEANUP_ON_STARTUP=true
 CLEANUP_AFTER_DOWNLOAD=false
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -0,0 +1,28 @@
 [build-system]
 requires = ["setuptools>=68", "wheel"]
 build-backend = "setuptools.build_meta"
 [project]
 name = "webfortilog"
 version = "0.1.0"
 description = "Flask application to convert WAF log files into text or CSV reports."
 readme = "README.md"
 requires-python = ">=3.12"
 dependencies = [
  "Flask>=3.0,<4.0",
  "gunicorn>=22.0,<24.0",
 ]
 [project.optional-dependencies]
 dev = [
  "pytest>=8.0,<9.0",
 ]
 [tool.pytest.ini_options]
 testpaths = ["tests"]
 filterwarnings = [
  "error",
 ]
 [tool.setuptools]
 packages = ["app", "app.services"]
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -0,0 +1,29 @@
 import shutil
 from pathlib import Path
 import pytest
 from app import create_app
 class TestConfig:
    TESTING = True
    SECRET_KEY = "test-secret"
    MAX_CONTENT_LENGTH = 100 * 1024 * 1024
    PREVIEW_RECORD_LIMIT = 5
    OUTPUT_DIRECTORY = "test-outputs"
    OUTPUT_RETENTION_HOURS = 24
    CLEANUP_ON_STARTUP = False
    CLEANUP_AFTER_DOWNLOAD = False
@pytest.fixture()
 def app():
    flask_app = create_app(TestConfig)
    yield flask_app
    shutil.rmtree(Path(flask_app.instance_path) / "test-outputs", ignore_errors=True)
@pytest.fixture()
 def client(app):
    return app.test_client()
--- a/tests/test_app.py
+++ b/tests/test_app.py
@@ -0,0 +1,261 @@
 import io
 import json
 from pathlib import Path
 from app import create_app
 SAMPLE_LOG = (
    'v015xxxxdate=2024-05-01 time=10:00:00 policy="Prod Policy" '
    'http_method=GET http_host=example.com http_url="/login" '
    'http_refer="https://ref.example" service=edge backend_service=api '
    'msg="SQL injection blocked" signature_subclass=SQL signature_id=942100 '
    'owasp_top10=A03 match_location=body action=blocked severity_level=high\n'
    'v015xxxxdate=2024-05-02 time=11:00:00 policy="Prod Policy" '
    'http_method=POST http_host=example.com http_url="/checkout" '
    'http_refer="https://shop.example" service=edge backend_service=orders '
    'msg="XSS blocked" signature_subclass=XSS signature_id=941100 '
    'owasp_top10=A03 match_location=query action=monitored severity_level=medium\n'
 )
 def test_index_page_loads(client):
    response = client.get("/")
    assert response.status_code == 200
    assert b"WAF Log Converter" in response.data
 def test_convert_returns_text_preview_and_download_link(client):
    log_file = io.BytesIO(SAMPLE_LOG.encode("utf-8"))
    response = client.post(
        "/convert",
        data={
            "mode": "vendor",
            "output_format": "text",
            "sort_by": "severity",
            "order": "desc",
            "policy_cs": "",
            "policy_ci": "prod",
            "severity_cs": "",
            "severity_ci": "",
            "log_file": (log_file, "sample.log"),
        },
        content_type="multipart/form-data",
    )
    log_file.close()
    assert response.status_code == 200
    assert b"Download export" in response.data
    assert b"--- record 1 ---" in response.data
    response.close()
 def test_convert_full_mode_csv_preserves_union_order(client):
    log_file = io.BytesIO(SAMPLE_LOG.encode("utf-8"))
    response = client.post(
        "/convert",
        data={
            "mode": "full",
            "output_format": "csv",
            "sort_by": "datetime",
            "order": "asc",
            "policy_cs": "",
            "policy_ci": "",
            "severity_cs": "",
            "severity_ci": "",
            "log_file": (log_file, "sample.log"),
        },
        content_type="multipart/form-data",
    )
    log_file.close()
    assert response.status_code == 200
    assert b"TEXT" not in response.data
    assert b"Download export" in response.data
    response.close()
 def test_convert_rejects_mutually_exclusive_filters(client):
    log_file = io.BytesIO(SAMPLE_LOG.encode("utf-8"))
    response = client.post(
        "/convert",
        data={
            "mode": "vendor",
            "output_format": "csv",
            "sort_by": "datetime",
            "order": "asc",
            "policy_cs": "A",
            "policy_ci": "a",
            "severity_cs": "",
            "severity_ci": "",
            "log_file": (log_file, "sample.log"),
        },
        content_type="multipart/form-data",
    )
    log_file.close()
    assert response.status_code == 400
    assert b"Policy filter must use either case-sensitive or case-insensitive match" in response.data
    response.close()
 def test_download_route_returns_generated_file(client):
    log_file = io.BytesIO(SAMPLE_LOG.encode("utf-8"))
    convert_response = client.post(
        "/convert",
        data={
            "mode": "vendor",
            "output_format": "csv",
            "sort_by": "datetime",
            "order": "asc",
            "policy_cs": "",
            "policy_ci": "",
            "severity_cs": "",
            "severity_ci": "",
            "log_file": (log_file, "sample.log"),
        },
        content_type="multipart/form-data",
    )
    log_file.close()
    html = convert_response.data.decode("utf-8")
    marker = '/download/'
    start = html.index(marker) + len(marker)
    end = html.index('"', start)
    result_id = html[start:end]
    download_response = client.get(f"/download/{result_id}")
    assert download_response.status_code == 200
    assert download_response.headers["Content-Type"].startswith("text/csv")
    assert b"v015xxxxdate,time,policy" in download_response.data
    convert_response.close()
    download_response.close()
 def test_download_route_can_cleanup_files_after_download(tmp_path):
    class CleanupAfterDownloadConfig:
        TESTING = True
        SECRET_KEY = "test-secret"
        MAX_CONTENT_LENGTH = 100 * 1024 * 1024
        PREVIEW_RECORD_LIMIT = 5
        OUTPUT_DIRECTORY = tmp_path / "download-cleanup-outputs"
        OUTPUT_RETENTION_HOURS = 24
        CLEANUP_ON_STARTUP = False
        CLEANUP_AFTER_DOWNLOAD = True
    app = create_app(CleanupAfterDownloadConfig)
    client = app.test_client()
    log_file = io.BytesIO(SAMPLE_LOG.encode("utf-8"))
    convert_response = client.post(
        "/convert",
        data={
            "mode": "vendor",
            "output_format": "csv",
            "sort_by": "datetime",
            "order": "asc",
            "policy_cs": "",
            "policy_ci": "",
            "severity_cs": "",
            "severity_ci": "",
            "log_file": (log_file, "sample.log"),
        },
        content_type="multipart/form-data",
    )
    log_file.close()
    html = convert_response.data.decode("utf-8")
    marker = "/download/"
    start = html.index(marker) + len(marker)
    end = html.index('"', start)
    result_id = html[start:end]
    metadata_path = Path(app.config["OUTPUT_DIRECTORY"]) / f"{result_id}.json"
    download_response = client.get(f"/download/{result_id}")
    download_response.close()
    convert_response.close()
    assert not metadata_path.exists()
 def test_cleanup_on_startup_removes_expired_outputs(tmp_path):
    output_dir = tmp_path / "startup-cleanup-outputs"
    output_dir.mkdir(parents=True)
    result_id = "expired-result"
    file_path = output_dir / f"{result_id}.csv"
    metadata_path = output_dir / f"{result_id}.json"
    file_path.write_text("header\nvalue\n", encoding="utf-8")
    metadata_path.write_text(
        json.dumps(
            {
                "result_id": result_id,
                "file_path": str(file_path),
                "download_name": "waf-report.csv",
                "mimetype": "text/csv; charset=utf-8",
            }
        ),
        encoding="utf-8",
    )
    old_timestamp = 946684800
    file_path.touch()
    metadata_path.touch()
    Path(file_path).touch()
    import os
    os.utime(file_path, (old_timestamp, old_timestamp))
    os.utime(metadata_path, (old_timestamp, old_timestamp))
    class StartupCleanupConfig:
        TESTING = True
        SECRET_KEY = "test-secret"
        MAX_CONTENT_LENGTH = 100 * 1024 * 1024
        PREVIEW_RECORD_LIMIT = 5
        OUTPUT_DIRECTORY = output_dir
        OUTPUT_RETENTION_HOURS = 1
        CLEANUP_ON_STARTUP = True
        CLEANUP_AFTER_DOWNLOAD = False
    create_app(StartupCleanupConfig)
    assert not file_path.exists()
    assert not metadata_path.exists()
 def test_default_upload_limit_is_100_mib(app):
    assert app.config["MAX_CONTENT_LENGTH"] == 100 * 1024 * 1024
 def test_too_large_upload_returns_friendly_message(tmp_path):
    class SmallLimitConfig:
        TESTING = True
        SECRET_KEY = "test-secret"
        MAX_CONTENT_LENGTH = 128
        PREVIEW_RECORD_LIMIT = 5
        OUTPUT_DIRECTORY = tmp_path / "tiny-limit-outputs"
    app = create_app(SmallLimitConfig)
    client = app.test_client()
    log_file = io.BytesIO(SAMPLE_LOG.encode("utf-8"))
    response = client.post(
        "/convert",
        data={
            "mode": "vendor",
            "output_format": "text",
            "sort_by": "datetime",
            "order": "asc",
            "policy_cs": "",
            "policy_ci": "",
            "severity_cs": "",
            "severity_ci": "",
            "log_file": (log_file, "sample.log"),
        },
        content_type="multipart/form-data",
    )
    log_file.close()
    assert response.status_code == 413
    assert b"Maximum allowed size is 128 bytes." in response.data
    response.close()
--- a/tests/test_config.py
+++ b/tests/test_config.py
@@ -0,0 +1,100 @@
 import pytest
 from app import create_app
 from app.config import (
    DEVELOPMENT_SECRET_KEY,
    _get_max_content_length,
    _get_secret_key,
    validate_secret_key,
 )
 def test_max_upload_size_mb_environment_variable(monkeypatch):
    monkeypatch.setenv("MAX_UPLOAD_SIZE_MB", "42")
    monkeypatch.delenv("MAX_CONTENT_LENGTH", raising=False)
    assert _get_max_content_length() == 42 * 1024 * 1024
 def test_max_content_length_environment_variable_is_supported(monkeypatch):
    monkeypatch.delenv("MAX_UPLOAD_SIZE_MB", raising=False)
    monkeypatch.setenv("MAX_CONTENT_LENGTH", "2048")
    assert _get_max_content_length() == 2048
 def test_secret_key_uses_development_fallback(monkeypatch):
    monkeypatch.setenv("APP_ENV", "development")
    monkeypatch.delenv("FLASK_ENV", raising=False)
    monkeypatch.delenv("SECRET_KEY", raising=False)
    assert _get_secret_key() == DEVELOPMENT_SECRET_KEY
 def test_secret_key_is_required_outside_development(monkeypatch):
    monkeypatch.setenv("APP_ENV", "production")
    monkeypatch.delenv("FLASK_ENV", raising=False)
    monkeypatch.delenv("SECRET_KEY", raising=False)
    assert _get_secret_key() == ""
 def test_validate_secret_key_rejects_unsafe_value_outside_development(monkeypatch):
    monkeypatch.setenv("APP_ENV", "production")
    monkeypatch.delenv("FLASK_ENV", raising=False)
    with pytest.raises(RuntimeError, match="SECRET_KEY is missing or unsafe"):
        validate_secret_key("change-me")
 def test_create_app_allows_development_without_explicit_secret_key(tmp_path, monkeypatch):
    monkeypatch.setenv("APP_ENV", "development")
    monkeypatch.delenv("FLASK_ENV", raising=False)
    monkeypatch.delenv("SECRET_KEY", raising=False)
    class DevelopmentConfig:
        SECRET_KEY = DEVELOPMENT_SECRET_KEY
        MAX_CONTENT_LENGTH = 1024
        PREVIEW_RECORD_LIMIT = 5
        OUTPUT_DIRECTORY = tmp_path / "dev-outputs"
        OUTPUT_RETENTION_HOURS = 24
        CLEANUP_ON_STARTUP = False
        CLEANUP_AFTER_DOWNLOAD = False
    app = create_app(DevelopmentConfig)
    assert app.config["SECRET_KEY"] == DEVELOPMENT_SECRET_KEY
 def test_create_app_rejects_unsafe_secret_key_outside_development(tmp_path, monkeypatch):
    monkeypatch.setenv("APP_ENV", "production")
    monkeypatch.delenv("FLASK_ENV", raising=False)
    class ProductionConfig:
        SECRET_KEY = "change-me"
        MAX_CONTENT_LENGTH = 1024
        PREVIEW_RECORD_LIMIT = 5
        OUTPUT_DIRECTORY = tmp_path / "prod-outputs"
        OUTPUT_RETENTION_HOURS = 24
        CLEANUP_ON_STARTUP = False
        CLEANUP_AFTER_DOWNLOAD = False
    with pytest.raises(RuntimeError, match="SECRET_KEY is missing or unsafe"):
        create_app(ProductionConfig)
 def test_create_app_rejects_missing_secret_key_outside_development(tmp_path, monkeypatch):
    monkeypatch.setenv("APP_ENV", "production")
    monkeypatch.delenv("FLASK_ENV", raising=False)
    class ProductionConfig:
        SECRET_KEY = ""
        MAX_CONTENT_LENGTH = 1024
        PREVIEW_RECORD_LIMIT = 5
        OUTPUT_DIRECTORY = tmp_path / "prod-outputs-missing-key"
        OUTPUT_RETENTION_HOURS = 24
        CLEANUP_ON_STARTUP = False
        CLEANUP_AFTER_DOWNLOAD = False
    with pytest.raises(RuntimeError, match="SECRET_KEY is missing or unsafe"):
        create_app(ProductionConfig)
--- a/tests/test_parser.py
+++ b/tests/test_parser.py
@@ -0,0 +1,92 @@
 import io
 import pytest
 from app.services.parser import LogParseError, parse_log_file
 def test_parse_log_file_supports_shell_style_quotes():
    stream = io.BytesIO(
        b'v015xxxxdate=2024-02-15 time=09:10:11 policy="Strict Policy" msg="blocked request"\n'
    )
    records, union_keys = parse_log_file(stream)
    assert records == [
        {
            "v015xxxxdate": "2024-02-15",
            "time": "09:10:11",
            "policy": "Strict Policy",
            "msg": "blocked request",
        }
    ]
    assert union_keys == ["v015xxxxdate", "time", "policy", "msg"]
 def test_parse_log_file_rejects_tokens_without_equals():
    stream = io.BytesIO(b"v015xxxxdate=2024-02-15 broken-token\n")
    with pytest.raises(LogParseError):
        parse_log_file(stream)
 def test_parse_log_file_supports_utf8_bom():
    stream = io.BytesIO(
        b'\xef\xbb\xbfv015xxxxdate=2024-02-15 time=09:10:11 msg="blocked request"\n'
    )
    records, _union_keys = parse_log_file(stream)
    assert records[0]["v015xxxxdate"] == "2024-02-15"
 def test_parse_log_file_supports_cp1252_text():
    stream = io.BytesIO(
        'v015xxxxdate=2024-02-15 time=09:10:11 msg="caf\xe9 request"\n'.encode("cp1252")
    )
    records, _union_keys = parse_log_file(stream)
    assert records[0]["msg"] == "cafe request".replace("e", "é", 1)
 def test_parse_log_file_tolerates_unterminated_quotes():
    stream = io.BytesIO(
        b'v015xxxxdate=2024-02-15 time=09:10:11 msg="broken quoted value\n'
    )
    records, _union_keys = parse_log_file(stream)
    assert records[0]["msg"] == "broken quoted value"
 def test_parse_log_file_rebuilds_record_after_embedded_newlines():
    stream = io.BytesIO(
        b'v015xxxxdate=2024-02-15 time=09:10:11 msg="hello\n'
        b'broken-fragment\n'
        b'world" action=Alert\n'
        b'v015xxxxdate=2024-02-15 time=09:10:12 msg="next" action=Monitor\n'
    )
    records, _union_keys = parse_log_file(stream)
    assert len(records) == 2
    assert records[0]["msg"] == "hellobroken-fragmentworld"
    assert records[0]["action"] == "Alert"
    assert records[1]["msg"] == "next"
 def test_parse_log_file_does_not_require_full_stream_read():
    class NoFullReadBytesIO(io.BytesIO):
        def read(self, size=-1):
            if size == -1:
                raise AssertionError("full stream read should not be used")
            return super().read(size)
    stream = NoFullReadBytesIO(
        b'v015xxxxdate=2024-02-15 time=09:10:11 policy="Strict Policy" msg="blocked request"\n'
    )
    records, _union_keys = parse_log_file(stream)
    assert records[0]["policy"] == "Strict Policy"
--- a/tests/test_processing.py
+++ b/tests/test_processing.py
@@ -0,0 +1,108 @@
 from app.services.processing import ProcessingOptions, filter_records, sort_records
 def test_filter_records_supports_case_insensitive_filters():
    records = [
        {"policy": "ProdPolicy", "severity_level": "HIGH"},
        {"policy": "OtherPolicy", "severity_level": "low"},
    ]
    options = ProcessingOptions(
        policy_cs="",
        policy_ci="prod",
        severity_cs="",
        severity_ci="high",
        sort_by="datetime",
        order="asc",
        mode="vendor",
    )
    filtered = list(filter_records(records, options))
    assert filtered == [{"policy": "ProdPolicy", "severity_level": "HIGH"}]
 def test_sort_records_by_severity_desc_uses_defined_ranking():
    records = [
        {"severity_level": "medium"},
        {"severity_level": "critical"},
        {"severity_level": "info"},
    ]
    options = ProcessingOptions(
        policy_cs="",
        policy_ci="",
        severity_cs="",
        severity_ci="",
        sort_by="severity",
        order="desc",
        mode="vendor",
    )
    sorted_records = sort_records(records, options)
    assert [record["severity_level"] for record in sorted_records] == [
        "critical",
        "medium",
        "info",
    ]
 def test_sort_records_by_datetime_asc_places_invalid_records_last():
    records = [
        {"v015xxxxdate": "2024-05-03", "time": "08:00:00", "msg": "latest-valid"},
        {"v015xxxxdate": "", "time": "09:00:00", "msg": "missing-date"},
        {"v015xxxxdate": "2024-05-01", "time": "10:00:00", "msg": "earliest-valid"},
        {"v015xxxxdate": "2024-05-02", "time": "", "msg": "missing-time"},
        {"v015xxxxdate": "bad-date", "time": "99:99:99", "msg": "invalid-datetime"},
        {"v015xxxxdate": "2024-05-02", "time": "09:30:00", "msg": "middle-valid"},
    ]
    options = ProcessingOptions(
        policy_cs="",
        policy_ci="",
        severity_cs="",
        severity_ci="",
        sort_by="datetime",
        order="asc",
        mode="vendor",
    )
    sorted_records = sort_records(records, options)
    assert [record["msg"] for record in sorted_records] == [
        "earliest-valid",
        "middle-valid",
        "latest-valid",
        "missing-date",
        "missing-time",
        "invalid-datetime",
    ]
 def test_sort_records_by_datetime_desc_places_invalid_records_last():
    records = [
        {"v015xxxxdate": "2024-05-03", "time": "08:00:00", "msg": "latest-valid"},
        {"v015xxxxdate": "", "time": "09:00:00", "msg": "missing-date"},
        {"v015xxxxdate": "2024-05-01", "time": "10:00:00", "msg": "earliest-valid"},
        {"v015xxxxdate": "2024-05-02", "time": "", "msg": "missing-time"},
        {"v015xxxxdate": "bad-date", "time": "99:99:99", "msg": "invalid-datetime"},
        {"v015xxxxdate": "2024-05-02", "time": "09:30:00", "msg": "middle-valid"},
    ]
    options = ProcessingOptions(
        policy_cs="",
        policy_ci="",
        severity_cs="",
        severity_ci="",
        sort_by="datetime",
        order="desc",
        mode="vendor",
    )
    sorted_records = sort_records(records, options)
    assert [record["msg"] for record in sorted_records] == [
        "latest-valid",
        "middle-valid",
        "earliest-valid",
        "missing-date",
        "missing-time",
        "invalid-datetime",
    ]
--- a/tests/test_storage.py
+++ b/tests/test_storage.py
@@ -0,0 +1,75 @@
 import json
 import os
 from pathlib import Path
 from app.services.storage import cleanup_expired_outputs, delete_result_files, persist_result
 def test_persist_result_writes_csv_and_collects_preview(tmp_path: Path):
    metadata, export_result = persist_result(
        output_dir=tmp_path,
        records=[
            {
                "v015xxxxdate": "2024-05-01",
                "time": "10:00:00",
                "policy": "Prod Policy",
                "severity_level": "high",
            },
            {
                "v015xxxxdate": "2024-05-02",
                "time": "11:00:00",
                "policy": "Other Policy",
                "severity_level": "low",
            },
        ],
        union_keys=["v015xxxxdate", "time", "policy", "severity_level"],
        mode="full",
        output_format="csv",
        preview_record_limit=1,
    )
    written = Path(metadata.file_path).read_text(encoding="utf-8")
    assert metadata.download_name == "waf-report.csv"
    assert "v015xxxxdate,time,policy,severity_level" in written
    assert "2024-05-01,10:00:00,Prod Policy,high" in written
    assert export_result.preview(1).count("\n") == 1
 def test_delete_result_files_removes_output_and_metadata(tmp_path: Path):
    result_id = "delete-me"
    output_file = tmp_path / f"{result_id}.txt"
    metadata_file = tmp_path / f"{result_id}.json"
    output_file.write_text("content", encoding="utf-8")
    metadata_file.write_text("{}", encoding="utf-8")
    delete_result_files(output_dir=tmp_path, result_id=result_id)
    assert not output_file.exists()
    assert not metadata_file.exists()
 def test_cleanup_expired_outputs_removes_only_old_results(tmp_path: Path):
    old_result_id = "old-result"
    new_result_id = "new-result"
    old_output = tmp_path / f"{old_result_id}.csv"
    old_metadata = tmp_path / f"{old_result_id}.json"
    new_output = tmp_path / f"{new_result_id}.csv"
    new_metadata = tmp_path / f"{new_result_id}.json"
    old_output.write_text("old", encoding="utf-8")
    new_output.write_text("new", encoding="utf-8")
    old_metadata.write_text(json.dumps({"result_id": old_result_id, "file_path": str(old_output)}), encoding="utf-8")
    new_metadata.write_text(json.dumps({"result_id": new_result_id, "file_path": str(new_output)}), encoding="utf-8")
    old_timestamp = 946684800
    os.utime(old_output, (old_timestamp, old_timestamp))
    os.utime(old_metadata, (old_timestamp, old_timestamp))
    deleted_results = cleanup_expired_outputs(output_dir=tmp_path, retention_hours=1)
    assert deleted_results == 1
    assert not old_output.exists()
    assert not old_metadata.exists()
    assert new_output.exists()
    assert new_metadata.exists()
--- a/wsgi.py
+++ b/wsgi.py
@@ -0,0 +1,3 @@
 from app import create_app
 app = create_app()
Author	SHA1	Message	Date
Alfredo Di Stasio	b3c301e69e	Merge branch 'feature/fix-invalid-datetime-sorting' into develop	2026-04-27 15:08:46 +02:00
Alfredo Di Stasio	a2ab2674e3	Keep invalid datetimes at end of sort	2026-04-27 15:08:34 +02:00
Alfredo Di Stasio	3e370c25b6	Merge branch 'feature/harden-secret-key-config' into develop	2026-04-27 14:24:18 +02:00
Alfredo Di Stasio	41c63980f0	Harden secret key configuration	2026-04-27 14:23:13 +02:00
Alfredo Di Stasio	846a22c047	Merge branch 'feature/output-cleanup-policy' into develop	2026-04-27 14:18:03 +02:00
Alfredo Di Stasio	b8069d6771	Add output cleanup policy	2026-04-27 14:17:44 +02:00
Alfredo Di Stasio	93cebeb002	Merge branch 'feature/reduce-memory-footprint' into develop	2026-04-27 12:42:38 +02:00
Alfredo Di Stasio	f9f792f6a1	Reduce conversion memory footprint	2026-04-27 11:44:40 +02:00
Alfredo Di Stasio	9313b54abb	Move compose settings to env file	2026-04-27 09:48:55 +02:00
Alfredo Di Stasio	15240aee59	Increase compose upload limit	2026-04-24 15:14:01 +02:00
Alfredo Di Stasio	235aa47dd3	Harden parser for malformed multiline records	2026-04-24 15:12:51 +02:00
Alfredo Di Stasio	f64deb9c0d	Improve log upload handling	2026-04-24 15:00:43 +02:00
Alfredo Di Stasio	e793b51e4f	Merge branch 'feature/flask-waf-log-converter' into develop	2026-04-24 14:41:00 +02:00
Alfredo Di Stasio	355d61f11f	Build Flask WAF log converter app	2026-04-24 14:40:32 +02:00
		`@@ -0,0 +1 @@`
							`"""Service layer for parsing, processing, exporting, and file storage."""`
		`@@ -0,0 +1,3 @@`
							`from app import create_app`

							`app = create_app()`