diff --git a/CI-and-Workflow.md b/CI-and-Workflow.md new file mode 100644 index 0000000..b5d070f --- /dev/null +++ b/CI-and-Workflow.md @@ -0,0 +1,238 @@ +# CI and Workflow + +## Overview + +Synchronisation is driven by a single Gitea Actions workflow, +`.gitea/workflows/sync.yml`. It runs on a schedule and on manual dispatch. +Each run executes the sync script, commits any changes to `blacklist` or +`blacklist.prev`, and pushes the commit to `main`. + +There is no CI in the traditional sense for this repository -- no tests, +no build, no lint. The workflow's only job is to keep the blacklist in +sync with upstream. + +## Workflow file + +```yaml +name: Sync blocklists from upstream + +on: + schedule: + - cron: '0 4 */7 * *' + workflow_dispatch: + +jobs: + sync: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v3 + + - name: Fetch and merge upstream files + run: python3 scripts/merge_blocklists.py + + - name: Commit and push if changed + run: | + git config user.name "gitea-actions" + git config user.email "actions@gitea" + git add . + git diff --staged --quiet || git commit -m "Sync blocklists from upstream" + git push +``` + +## Schedule + +The cron expression `0 4 */7 * *` runs at 04:00 UTC on days 1, 8, 15, 22, +and 29 of each month -- effectively every 7 days, with a small skip at +the end of each month because day 29 and day 1 of the next month are only +1-3 days apart. + +This cadence is deliberate: upstream Cleanuparr rarely changes the +blacklist, and running less frequently reduces noise in the commit +history. If upstream is updated and you want the change immediately, +use manual dispatch (see below) instead of waiting for the next scheduled +run. + +### Changing the schedule + +Edit the `cron` line in `.gitea/workflows/sync.yml`. Common alternatives: + +| Cron expression | Meaning | +|---|---| +| `0 4 */7 * *` | Every 7 days at 04:00 UTC (current) | +| `0 4 * * 1` | Every Monday at 04:00 UTC | +| `0 4 1 * *` | First day of every month at 04:00 UTC | +| `0 */6 * * *` | Every 6 hours | + +All times are UTC. Gitea Actions does not support timezones in cron +expressions. + +## Manual dispatch + +The `workflow_dispatch` trigger lets you run the sync on demand from +the Gitea UI or via the API. Use this after editing `whitelist` if you +want the change to take effect immediately instead of waiting for the +next scheduled run. + +### From the Gitea UI + +1. Open the repository on Gitea. +2. Go to **Actions** -> **Sync blocklists from upstream**. +3. Click **Run workflow**. +4. Select branch `main`. +5. Click the confirm button. + +The run appears in the Actions list within a few seconds and typically +completes in under a minute. + +### From the API + +```bash +curl -X POST \ + -H "Authorization: token YOUR_TOKEN" \ + -H "Content-Type: application/json" \ + -d '{"ref": "main"}' \ + https://git.hisp.no/api/v1/repos/arr/blocklists/actions/workflows/sync.yml/dispatches +``` + +The token needs `write:repository` scope for the `arr/blocklists` repo. + +## What the workflow does + +### Step 1: checkout + +Standard `actions/checkout@v3`. Checks out the repository at the current +HEAD of `main`. No submodules, no LFS, no special configuration. + +### Step 2: fetch and merge + +Runs `python3 scripts/merge_blocklists.py`. The script: + +1. Fetches the upstream blacklist from + `https://raw.githubusercontent.com/Cleanuparr/Cleanuparr/main/blacklist`. +2. Reads `blacklist.prev`, `blacklist`, and `whitelist` from the checked-out + repository. +3. Performs the three-way merge and whitelist subtraction. +4. Writes `blacklist` and `blacklist.prev` back to disk. + +The script is idempotent: running it twice in a row with no upstream or +whitelist changes produces no diff on the second run. + +See [Sync](Sync) for the full algorithm. + +### Step 3: commit and push if changed + +```bash +git config user.name "gitea-actions" +git config user.email "actions@gitea" +git add . +git diff --staged --quiet || git commit -m "Sync blocklists from upstream" +git push +``` + +This commits and pushes only if the script actually changed something. +The `git diff --staged --quiet` check returns non-zero when there are +staged changes, which triggers the commit via `||`. If nothing changed, +`git commit` is skipped and the final `git push` is a no-op (push with +no local commits ahead of the remote). + +The commit author is always `gitea-actions `, regardless +of who triggered the run. This makes automated syncs easy to distinguish +from human commits in the history. + +## Permissions + +The workflow runs with the default `GITHUB_TOKEN` (Gitea equivalent) that +Gitea Actions provides automatically. This token has write access to the +repository, which is necessary for the commit-and-push step. No additional +secrets are required. + +No external API tokens are needed -- the upstream blacklist is fetched +from a public raw URL on `raw.githubusercontent.com` without +authentication. + +## Monitoring + +### Checking recent runs + +Go to **Actions** -> **Sync blocklists from upstream** in the Gitea UI. +Each run shows: + +- Status (success / failure) +- Trigger (schedule / manual dispatch) +- Commit created (if any) +- Full log output + +### Reading the log + +The Python script prints four summary lines per run. These appear in +the "Fetch and merge upstream files" step log: + +``` +[blacklist] Upstream added: [...] +[blacklist] Upstream removed: [...] +[blacklist] Custom preserved: [...] +[blacklist] Whitelist stripped: [...] +``` + +Use these to verify the sync behaved as expected. "Whitelist stripped" +should list every entry in your whitelist that was present in the upstream +blacklist at fetch time. + +### Run history in git log + +Every automated commit uses the same message, so filtering the history +is easy: + +```bash +git log --author="gitea-actions" --oneline +``` + +Or to see commits that actually touched the blacklist: + +```bash +git log --oneline -- blacklist +``` + +## Failure modes + +### Upstream unreachable + +If `raw.githubusercontent.com` is unreachable or returns a non-200 +response, `urllib.request.urlopen` raises an exception and the script +exits non-zero. The workflow fails at the "Fetch and merge upstream +files" step. No commit is made, no push happens. The repository state +is unchanged. + +Retry the workflow manually once upstream is available again. + +### Script error + +If the sync script crashes (malformed upstream, disk full, etc.), the +step fails and no commit is made. Read the full step log to diagnose. + +### Push rejected + +If someone pushes to `main` between the checkout and the push, the push +is rejected (non-fast-forward). The workflow fails at the push step. +No data is lost -- the next scheduled run will fetch the latest state +and re-apply the sync. + +### Commit is empty + +This is not a failure. The `git diff --staged --quiet || git commit` +pattern explicitly skips the commit when nothing changed, and the +subsequent `git push` is a no-op. The workflow reports success. + +## Disabling the scheduled run + +To pause automatic syncing without removing the workflow entirely, +comment out the `schedule` section in `.gitea/workflows/sync.yml`: + +```yaml +on: + # schedule: + # - cron: '0 4 */7 * *' + workflow_dispatch: +``` + +Manual dispatch still works. Uncomment to re-enable scheduling. diff --git a/Consumers.md b/Consumers.md new file mode 100644 index 0000000..60540a6 --- /dev/null +++ b/Consumers.md @@ -0,0 +1,181 @@ +# Consumers + +The blocklists are consumed by two tools in the ARR stack: + +| Tool | Role | File consumed | Mode | +|---|---|---|---| +| qBittorrent | Download client | `blacklist` | Excluded file names | +| Cleanuparr | Media cleanup / malware blocker | `blacklist` or `whitelist` | Blacklist or whitelist mode | + +Both tools read a remote text file over HTTPS, one glob pattern per line. +They refresh on their own schedule (qBittorrent on restart or manual +refresh; Cleanuparr on its configured interval). + +## Raw URLs + +Point consumers at the raw file URLs, not the Gitea blob viewer URLs: + +``` +https://git.hisp.no/arr/blocklists/raw/branch/main/blacklist +https://git.hisp.no/arr/blocklists/raw/branch/main/whitelist +``` + +The `raw/branch/main/` path serves the file contents directly with the +correct `text/plain` content type. Using `src/branch/main/` instead serves +the HTML viewer page and will break the consumer. + +## qBittorrent + +qBittorrent has an **excluded file names** feature that skips files +matching any of the configured glob patterns when downloading a torrent. +There is no "included file names" or whitelist mode -- qBittorrent only +supports exclusion. This is why it consumes the merged `blacklist` and not +the `whitelist`. + +### Configuration + +1. Open **Options** (Tools -> Options, or Ctrl+,). +2. Go to **Downloads**. +3. Scroll to **Excluded file names**. +4. Enable the checkbox. +5. Set the URL to: + +``` +https://git.hisp.no/arr/blocklists/raw/branch/main/blacklist +``` + +qBittorrent fetches the list on startup and whenever you click **Reload** +next to the field. There is no automatic refresh interval -- a restart or +manual reload is required to pick up changes. + +### What qBittorrent does with the list + +When a torrent is added, qBittorrent iterates the files inside it and +checks each filename against the excluded patterns. Matching files are +marked as "do not download" and will not be written to disk. The rest of +the torrent downloads normally. + +This means the list operates at the **file level within a torrent**, not +the torrent level. A torrent containing `movie.mkv` and `movie.nor.srt` +would download both files if `*.srt` is in the whitelist (and thus not in +the blacklist), or just `movie.mkv` if `*.srt` were in the blacklist. + +### Refreshing after a whitelist change + +qBittorrent does not auto-refresh the list. After updating `whitelist`: + +1. Wait for the next sync run (or dispatch the workflow manually). +2. In qBittorrent, open the excluded file names setting and click + **Reload**, or restart qBittorrent. +3. New torrents added from this point on will use the updated list. + Torrents already in the client are not retroactively changed. + +## Cleanuparr + +Cleanuparr supports two modes for its Malware Blocker and Blacklist Sync +features. The repository provides files suitable for both. + +### Blacklist mode + +In blacklist mode, Cleanuparr deletes any file matching a pattern in the +configured list. + +Point it at the same URL as qBittorrent: + +``` +https://git.hisp.no/arr/blocklists/raw/branch/main/blacklist +``` + +Because the whitelist has already been subtracted, this file will not +cause Cleanuparr to delete anything you have marked as "keep" in the +whitelist. Consistent behaviour between the two tools without any +per-tool customisation. + +### Whitelist mode + +In whitelist mode, Cleanuparr keeps only files matching a pattern in the +configured list and deletes everything else. + +Point it at: + +``` +https://git.hisp.no/arr/blocklists/raw/branch/main/whitelist +``` + +This is the more conservative choice: only the extensions explicitly +listed (video containers and subtitles) are allowed. Anything else -- +including extensions that upstream has not yet flagged as malicious -- +is deleted. + +### Which mode to use + +| Use case | Mode | Why | +|---|---|---| +| You trust upstream Cleanuparr's coverage and want to keep everything except known-bad | Blacklist | Lets through unusual-but-legitimate file types (e.g. exotic subtitle formats) | +| You only want a strict set of video + subtitle files on disk | Whitelist | Much stricter; deletes anything not explicitly listed | +| You want behaviour consistent with qBittorrent | Blacklist | Same source file, same semantics | + +Blacklist mode is the recommended default because it matches the +qBittorrent side and avoids unexpected deletions of legitimate but +non-listed files. + +## Keeping both consumers in sync + +Both consumers ultimately read the whitelist (directly in Cleanuparr +whitelist mode, indirectly via subtraction in blacklist mode and in +qBittorrent). This means maintenance is centralised: + +1. Add a line to `whitelist`. +2. Wait for the next sync run (or dispatch manually). +3. Both consumers honour the change after their next refresh. + +There is no per-tool configuration drift because there is no per-tool +configuration to drift. + +## Troubleshooting + +### A file I whitelisted is still being blocked / deleted + +Check each layer in order: + +1. **Sync ran successfully?** Open the Gitea Actions page for the + repository and verify the most recent run is green and newer than + your whitelist commit. +2. **Blacklist was updated?** Read `blacklist` in Gitea and confirm your + whitelisted entry is not present. +3. **Consumer refreshed?** qBittorrent requires a manual reload or + restart. Cleanuparr refreshes on its own interval -- check its logs + to confirm it picked up the new file. +4. **Exact string match?** Whitelist entries must match blacklist entries + exactly. `*.srt` in whitelist does not strip `*sample.srt` from + blacklist. See [Lists](Lists) for pattern semantics. + +### A file I did not whitelist is passing through + +Check whether the pattern is in the blacklist at all: + +1. Open `blacklist` in Gitea and search for the extension. +2. If it is not there, upstream does not block it either. You can add + it to `blacklist` directly (manual local addition, preserved by the + three-way merge) or file an upstream issue. + +### Consumer returns 404 + +Verify the URL uses `raw/branch/main/`, not `src/branch/main/`: + +``` +# Correct +https://git.hisp.no/arr/blocklists/raw/branch/main/blacklist + +# Wrong (serves HTML, not the file) +https://git.hisp.no/arr/blocklists/src/branch/main/blacklist +``` + +Also check the repository name and branch are correct +(`arr/blocklists`, `main`). + +### Cleanuparr deletes subtitle files + +Cleanuparr is running in whitelist mode against `blacklist`, which is +the wrong combination. Either switch it to blacklist mode (keep the URL), +or keep whitelist mode and point it at `whitelist` instead. diff --git a/Docs.md b/Docs.md deleted file mode 100644 index 5d08b7b..0000000 --- a/Docs.md +++ /dev/null @@ -1 +0,0 @@ -Welcome to the Wiki. \ No newline at end of file diff --git a/Lists.md b/Lists.md new file mode 100644 index 0000000..76d8462 --- /dev/null +++ b/Lists.md @@ -0,0 +1,206 @@ +# Lists + +## The two-file model + +The repository contains exactly two data files. Each has a single, clear +role: + +| File | Role | Source of truth | Edit it? | +|---|---|---|---| +| `blacklist` | Extensions blocked by downloaders and file cleaners | Upstream Cleanuparr, minus `whitelist` | Only for manual additions that upstream missed. Removals do not stick -- use `whitelist` instead | +| `whitelist` | Extensions that must never be blocked or deleted | Locally maintained, not synced from upstream | Yes. This is the main file you interact with | + +`blacklist.prev` also exists in the repo but is not a data file -- it is +the three-way merge baseline used by the sync script. Never edit it. + +## `blacklist` + +The blacklist is the output file consumed by qBittorrent and (optionally) +Cleanuparr. It is regenerated on every sync as: + +``` +upstream_new | custom_local_additions - whitelist +``` + +Where `custom_local_additions` is detected by comparing the committed +`blacklist` against the previous upstream snapshot. See +[Sync](Sync) for the full algorithm. + +### When to edit `blacklist` directly + +In almost every case, you do not. The intended workflow is: + +- To **remove** an entry (stop blocking it): add it to `whitelist`. +- To **add** an entry that upstream should also have: file an upstream + issue with Cleanuparr. +- To **add** an entry that is specific to your setup and not worth + upstreaming: edit `blacklist` directly. The three-way merge preserves + manual additions across syncs. + +### When editing `blacklist` directly does not work + +Removing a line from `blacklist` does not work as a removal mechanism. +The sync will re-add anything upstream has on the next run. If you want +something gone, put it in `whitelist`. + +## `whitelist` + +The whitelist is the locally-maintained allow list. It is the single source +of truth for "what must be kept." It is not synced from upstream -- any +changes you make are permanent until you change them again. + +### Format + +One glob pattern per line, sorted, no blank lines, no comments: + +``` +*.ass +*.avi +*.mkv +*.mp4 +*.srt +*.ssa +*.sub +*.webm +``` + +The sort order is not enforced by the script but is the convention and +makes diffs easier to read. + +### Semantics + +Each line is treated as an exact string and subtracted from the blacklist. +See [Pattern matching](#pattern-matching) below for the details. + +### Adding an entry + +Edit `whitelist` in Gitea (or via a local clone and push), add the new +line, commit. The next sync run (or manual dispatch) will strip it from +the blacklist automatically. + +You do not also need to remove it from the blacklist by hand -- the sync +does that. + +### Removing an entry + +Delete the line from `whitelist` and commit. The next sync will re-add +the entry to the blacklist if upstream still has it. If upstream no longer +has the entry, the entry stays gone (which is probably what you want). + +## Pattern matching + +The whitelist-to-blacklist exclusion uses **exact-string set subtraction**, +not glob matching. This is an intentional design choice that has two +important consequences. + +### Exact entries are stripped + +`*.srt` in the whitelist removes exactly the string `*.srt` from the +blacklist. If upstream has `*.srt` as a line, it gets removed. If upstream +does not have `*.srt`, nothing happens. + +### Partial matches are not affected + +`*.srt` in the whitelist does **not** strip: + +| Blacklist entry | Stripped? | Why | +|---|---|---| +| `*.srt` | yes | Identical string | +| `*sample.srt` | no | Different string | +| `*.srt.bak` | no | Different string | +| `file.srt` | no | Different string | + +This is what makes the whitelist safe to maintain. You can whitelist +`*.srt` to keep bundled subtitle files without accidentally unblocking +sample files or junk variants that happen to end in `.srt`. + +### Why not glob matching + +A glob-based exclusion would strip anything matching `*.srt` as a pattern, +which would also strip `*sample.srt` and `*.srt.bak`. That is usually not +what you want -- sample files are legitimate junk that the blacklist +should still remove. + +Exact-string subtraction is also trivially simple to reason about: if the +line you want stripped is in the blacklist as the exact same string, put +that same string in the whitelist. Done. + +## Examples + +### Keeping Norwegian subtitle files + +Scenario: torrents include `.srt` files as bundled Norwegian subtitles. +You want qBittorrent to download them, not strip them. + +``` +# whitelist entry +*.srt +``` + +After the next sync, `*.srt` is gone from `blacklist`. qBittorrent now +accepts `.srt` files from within torrents. `*sample.srt` remains blocked. + +### Supporting AV1 in `.webm` containers + +Scenario: you want qBittorrent to accept `.webm` AV1 releases, which are +currently blocked because the upstream blacklist treats `*.webm` as junk. + +``` +# whitelist entry +*.webm +``` + +After the next sync, `*.webm` is gone from `blacklist`. `.webm` torrents +download normally. `*sample.webm` remains blocked. + +### Adding a site-specific junk extension + +Scenario: a private tracker keeps injecting `*.nfo.gz` spam files that +upstream does not block. + +``` +# Edit blacklist directly, add the line: +*.nfo.gz +``` + +Commit and push. The next sync runs, the three-way merge sees +`*.nfo.gz` in `local - upstream_prev`, classifies it as a manual addition, +and preserves it through the merge. Subsequent syncs continue to preserve +it even as upstream evolves. + +If upstream ever adds `*.nfo.gz` itself, the entry moves from "custom" +to "upstream" on the next sync -- still present, still blocked, just +sourced differently. + +## What lives in each file right now + +The whitelist ships with the extensions required for a normal media +stack with subtitles and AV1/webm releases: + +``` +*.ass - SubStation Alpha subtitles +*.avi - Audio Video Interleave container +*.mkv - Matroska container +*.mp4 - MPEG-4 container +*.srt - SubRip subtitles +*.ssa - SubStation Alpha subtitles +*.sub - MicroDVD / VobSub subtitles +*.webm - WebM container (AV1, VP9) +``` + +The blacklist contains whatever upstream Cleanuparr ships, minus everything +in the whitelist above. The actual contents change as upstream evolves -- +check the file in Gitea for the current state. + +## Consumer consequences + +Changes to either file affect what consumers see: + +| Change | Effect on qBittorrent | Effect on Cleanuparr (blacklist mode) | Effect on Cleanuparr (whitelist mode) | +|---|---|---|---| +| Add to `whitelist` | Stops blocking this extension | Stops deleting this extension | Starts allowing this extension | +| Remove from `whitelist` | Resumes blocking (if upstream has it) | Resumes deleting (if upstream has it) | Stops allowing this extension | +| Add to `blacklist` directly | Starts blocking this extension | Starts deleting this extension | No effect | +| Remove from `blacklist` directly | No effect (sync re-adds) | No effect (sync re-adds) | No effect | + +See [Consumers](Consumers) for configuration details. diff --git a/Sync.md b/Sync.md new file mode 100644 index 0000000..71eb745 --- /dev/null +++ b/Sync.md @@ -0,0 +1,208 @@ +# Sync + +## Overview + +The sync process fetches the upstream Cleanuparr blacklist, preserves any +manual local additions, subtracts the locally-maintained whitelist, and +writes the result back to `blacklist`. It runs on a schedule (every 7 days) +and on manual dispatch. All logic lives in `scripts/merge_blocklists.py` +(about 45 lines of pure Python standard library, no third-party deps). + +## Inputs + +The script reads three sources on every run: + +| Source | Path | Role | +|---|---|---| +| Upstream | `https://raw.githubusercontent.com/Cleanuparr/Cleanuparr/main/blacklist` | Current upstream state, fetched over HTTPS | +| Upstream snapshot | `blacklist.prev` | What upstream looked like on the previous sync (baseline) | +| Committed blacklist | `blacklist` | Current committed state, may contain manual local additions | +| Whitelist | `whitelist` | Locally-maintained entries to strip from the merged result | + +All four are parsed the same way: one entry per non-empty line, stripped of +leading/trailing whitespace, loaded into a Python `set`. + +## Three-way merge + +The script performs a classic three-way merge, git-style, using set +operations: + +``` +custom = local - upstream_prev +merged = upstream_new | custom +result = merged - whitelist +``` + +Each line does one specific job: + +### `custom = local - upstream_prev` + +Compute what was added locally. Anything in the committed `blacklist` that +was not in the previous upstream snapshot must be a manual local addition, +because the sync script is the only other thing that writes to `blacklist` +and it always produces a subset of `upstream_new | custom`. Tracking this +set lets the next sync re-apply those additions on top of the new upstream. + +### `merged = upstream_new | custom` + +Union the fresh upstream with the preserved local additions. Upstream +additions flow in (they appear in `upstream_new`), upstream removals flow +out (they were in `upstream_prev` but are not in `upstream_new`, and are +also not in `custom`), and manual local additions survive. + +### `result = merged - whitelist` + +Strip every entry that appears in the locally-maintained whitelist. This +is the step that enables local removals: an extension placed in `whitelist` +is always removed from the final `blacklist`, no matter how many times +upstream re-adds it. + +After the merge the script writes `result` to `blacklist` and overwrites +`blacklist.prev` with `upstream_new` so the next run has a fresh baseline. + +## Why a three-way merge + +A simpler design would be `result = upstream_new - whitelist`, with no +`.prev` file and no custom tracking. That works for the common case but +drops an escape hatch: if you spot something upstream missed (a new +malware extension, a tracker-specific junk file) and add it directly to +`blacklist`, the next sync would silently drop it. + +The three-way merge preserves those manual additions without requiring +them to live in a separate "additions" file. If you never add anything +directly, the `custom` set is empty on every run and the merge reduces to +`upstream_new - whitelist`. The overhead is one extra file (`blacklist.prev`) +and two set operations. + +## Whitelist exclusion + +The whitelist is subtracted with exact-string set subtraction, not pattern +matching. This has two important consequences: + +### Exact entries are stripped + +`*.srt` in `whitelist` strips exactly `*.srt` from the blacklist. Same for +`*.webm`, `*.mkv`, etc. + +### Sample patterns are preserved + +The upstream blacklist contains entries like `*sample.srt`, `*sample.webm`, +and `*sample.mkv` that block files with "sample" in the name regardless of +extension. These are separate string entries from `*.srt` or `*.webm`, so +whitelisting the plain extension does not remove the sample-file variant. +Sample files continue to be blocked. + +This is almost always the behaviour you want: subtitle files shipped inside +a release are kept, but standalone "sample.srt" clutter is still filtered. + +## The `.prev` file + +`blacklist.prev` is a plain text snapshot of whatever `upstream_new` was on +the previous successful run. It has no special format, no metadata, and is +never edited manually. The sync script rewrites it at the end of every run. + +It exists purely as the baseline for the `local - upstream_prev` step in +the three-way merge. Without it, the script could not distinguish "this +entry was in local because upstream had it" from "this entry was in local +because someone added it manually." + +If `blacklist.prev` is missing (first run, or manually deleted), the script +treats the current `upstream_new` as the baseline. This means manual +additions made before the first sync are lost -- on the first run with a +fresh `.prev`, add them to `whitelist` instead (where they will survive) +or add them after the first sync completes. + +## Edge cases + +### First run + +`blacklist.prev` does not exist, `blacklist` may or may not exist. +`upstream_prev = upstream_new`, so `custom = local - upstream_new` (anything +in `local` that is not upstream). After the run, `.prev` exists and +subsequent runs use the normal path. + +### Empty or missing whitelist + +If `whitelist` is missing or empty, `whitelist = set()` and the subtraction +is a no-op. The merge degenerates to a plain upstream sync with local +additions preserved. + +### Empty or missing blacklist + +If `blacklist` is missing, `local = set()`, `custom = set()`, and +`result = upstream_new - whitelist`. Equivalent to a fresh install. + +### Upstream removes an entry that is also in the whitelist + +Harmless. `upstream_new` does not contain it, so `merged` does not contain +it, and the whitelist subtraction removes nothing (the entry was already +absent). The whitelist entry stays as a harmless no-op for future syncs. + +### An entry appears in both whitelist and blacklist custom additions + +You manually added `*.foo` to `blacklist` and also added `*.foo` to +`whitelist`. The whitelist wins: `*.foo` is in `custom`, survives the +union, then gets stripped by the final subtraction. The committed +`blacklist` will not contain `*.foo`. The custom entry is effectively +invisible until you remove `*.foo` from `whitelist`. + +## Reporting + +Each sync run logs four lines to the workflow output: + +``` +[blacklist] Upstream added: [...] +[blacklist] Upstream removed: [...] +[blacklist] Custom preserved: [...] +[blacklist] Whitelist stripped: [...] +``` + +These are sorted lists showing exactly what changed. Check the Actions run +log after any sync to see what happened, especially if a consumer reports +unexpected behaviour. + +## Full script + +```python +import urllib.request + +UPSTREAM_URL = "https://raw.githubusercontent.com/Cleanuparr/Cleanuparr/main/blacklist" +BLACKLIST = "blacklist" +BLACKLIST_PREV = "blacklist.prev" +WHITELIST = "whitelist" + + +def read_lines(path): + try: + with open(path) as f: + return set(line.strip() for line in f if line.strip()) + except FileNotFoundError: + return set() + + +def main(): + with urllib.request.urlopen(UPSTREAM_URL) as r: + upstream_new = set( + line.strip() for line in r.read().decode().splitlines() if line.strip() + ) + + upstream_prev = read_lines(BLACKLIST_PREV) + if not upstream_prev: + upstream_prev = upstream_new.copy() + + local = read_lines(BLACKLIST) + whitelist = read_lines(WHITELIST) + + custom = local - upstream_prev + merged = upstream_new | custom + result = merged - whitelist + + with open(BLACKLIST, "w") as f: + f.write("\n".join(sorted(result)) + "\n") + + with open(BLACKLIST_PREV, "w") as f: + f.write("\n".join(sorted(upstream_new)) + "\n") +``` + +Logging and the `__main__` guard are omitted above for clarity. See +`scripts/merge_blocklists.py` in the repository for the full source.