Initial wiki: sync, lists, consumers, workflow

CodeX
2026-04-07 01:09:01 +02:00
parent 726d8170b9
commit d4fdeb66ac
5 changed files with 833 additions and 1 deletions
+238
@@ -0,0 +1,238 @@
# CI and Workflow
## Overview
Synchronisation is driven by a single Gitea Actions workflow,
`.gitea/workflows/sync.yml`. It runs on a schedule and on manual dispatch.
Each run executes the sync script, commits any changes to `blacklist` or
`blacklist.prev`, and pushes the commit to `main`.
There is no CI in the traditional sense for this repository -- no tests,
no build, no lint. The workflow's only job is to keep the blacklist in
sync with upstream.
## Workflow file
```yaml
name: Sync blocklists from upstream
on:
schedule:
- cron: '0 4 */7 * *'
workflow_dispatch:
jobs:
sync:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Fetch and merge upstream files
run: python3 scripts/merge_blocklists.py
- name: Commit and push if changed
run: |
git config user.name "gitea-actions"
git config user.email "actions@gitea"
git add .
git diff --staged --quiet || git commit -m "Sync blocklists from upstream"
git push
```
## Schedule
The cron expression `0 4 */7 * *` runs at 04:00 UTC on days 1, 8, 15, 22,
and 29 of each month -- effectively every 7 days, with a small skip at
the end of each month because day 29 and day 1 of the next month are only
1-3 days apart.
This cadence is deliberate: upstream Cleanuparr rarely changes the
blacklist, and running less frequently reduces noise in the commit
history. If upstream is updated and you want the change immediately,
use manual dispatch (see below) instead of waiting for the next scheduled
run.
### Changing the schedule
Edit the `cron` line in `.gitea/workflows/sync.yml`. Common alternatives:
| Cron expression | Meaning |
|---|---|
| `0 4 */7 * *` | Every 7 days at 04:00 UTC (current) |
| `0 4 * * 1` | Every Monday at 04:00 UTC |
| `0 4 1 * *` | First day of every month at 04:00 UTC |
| `0 */6 * * *` | Every 6 hours |
All times are UTC. Gitea Actions does not support timezones in cron
expressions.
## Manual dispatch
The `workflow_dispatch` trigger lets you run the sync on demand from
the Gitea UI or via the API. Use this after editing `whitelist` if you
want the change to take effect immediately instead of waiting for the
next scheduled run.
### From the Gitea UI
1. Open the repository on Gitea.
2. Go to **Actions** -> **Sync blocklists from upstream**.
3. Click **Run workflow**.
4. Select branch `main`.
5. Click the confirm button.
The run appears in the Actions list within a few seconds and typically
completes in under a minute.
### From the API
```bash
curl -X POST \
-H "Authorization: token YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"ref": "main"}' \
https://git.hisp.no/api/v1/repos/arr/blocklists/actions/workflows/sync.yml/dispatches
```
The token needs `write:repository` scope for the `arr/blocklists` repo.
## What the workflow does
### Step 1: checkout
Standard `actions/checkout@v3`. Checks out the repository at the current
HEAD of `main`. No submodules, no LFS, no special configuration.
### Step 2: fetch and merge
Runs `python3 scripts/merge_blocklists.py`. The script:
1. Fetches the upstream blacklist from
`https://raw.githubusercontent.com/Cleanuparr/Cleanuparr/main/blacklist`.
2. Reads `blacklist.prev`, `blacklist`, and `whitelist` from the checked-out
repository.
3. Performs the three-way merge and whitelist subtraction.
4. Writes `blacklist` and `blacklist.prev` back to disk.
The script is idempotent: running it twice in a row with no upstream or
whitelist changes produces no diff on the second run.
See [Sync](Sync) for the full algorithm.
### Step 3: commit and push if changed
```bash
git config user.name "gitea-actions"
git config user.email "actions@gitea"
git add .
git diff --staged --quiet || git commit -m "Sync blocklists from upstream"
git push
```
This commits and pushes only if the script actually changed something.
The `git diff --staged --quiet` check returns non-zero when there are
staged changes, which triggers the commit via `||`. If nothing changed,
`git commit` is skipped and the final `git push` is a no-op (push with
no local commits ahead of the remote).
The commit author is always `gitea-actions <actions@gitea>`, regardless
of who triggered the run. This makes automated syncs easy to distinguish
from human commits in the history.
## Permissions
The workflow runs with the default `GITHUB_TOKEN` (Gitea equivalent) that
Gitea Actions provides automatically. This token has write access to the
repository, which is necessary for the commit-and-push step. No additional
secrets are required.
No external API tokens are needed -- the upstream blacklist is fetched
from a public raw URL on `raw.githubusercontent.com` without
authentication.
## Monitoring
### Checking recent runs
Go to **Actions** -> **Sync blocklists from upstream** in the Gitea UI.
Each run shows:
- Status (success / failure)
- Trigger (schedule / manual dispatch)
- Commit created (if any)
- Full log output
### Reading the log
The Python script prints four summary lines per run. These appear in
the "Fetch and merge upstream files" step log:
```
[blacklist] Upstream added: [...]
[blacklist] Upstream removed: [...]
[blacklist] Custom preserved: [...]
[blacklist] Whitelist stripped: [...]
```
Use these to verify the sync behaved as expected. "Whitelist stripped"
should list every entry in your whitelist that was present in the upstream
blacklist at fetch time.
### Run history in git log
Every automated commit uses the same message, so filtering the history
is easy:
```bash
git log --author="gitea-actions" --oneline
```
Or to see commits that actually touched the blacklist:
```bash
git log --oneline -- blacklist
```
## Failure modes
### Upstream unreachable
If `raw.githubusercontent.com` is unreachable or returns a non-200
response, `urllib.request.urlopen` raises an exception and the script
exits non-zero. The workflow fails at the "Fetch and merge upstream
files" step. No commit is made, no push happens. The repository state
is unchanged.
Retry the workflow manually once upstream is available again.
### Script error
If the sync script crashes (malformed upstream, disk full, etc.), the
step fails and no commit is made. Read the full step log to diagnose.
### Push rejected
If someone pushes to `main` between the checkout and the push, the push
is rejected (non-fast-forward). The workflow fails at the push step.
No data is lost -- the next scheduled run will fetch the latest state
and re-apply the sync.
### Commit is empty
This is not a failure. The `git diff --staged --quiet || git commit`
pattern explicitly skips the commit when nothing changed, and the
subsequent `git push` is a no-op. The workflow reports success.
## Disabling the scheduled run
To pause automatic syncing without removing the workflow entirely,
comment out the `schedule` section in `.gitea/workflows/sync.yml`:
```yaml
on:
# schedule:
# - cron: '0 4 */7 * *'
workflow_dispatch:
```
Manual dispatch still works. Uncomment to re-enable scheduling.
+181
@@ -0,0 +1,181 @@
# Consumers
The blocklists are consumed by two tools in the ARR stack:
| Tool | Role | File consumed | Mode |
|---|---|---|---|
| qBittorrent | Download client | `blacklist` | Excluded file names |
| Cleanuparr | Media cleanup / malware blocker | `blacklist` or `whitelist` | Blacklist or whitelist mode |
Both tools read a remote text file over HTTPS, one glob pattern per line.
They refresh on their own schedule (qBittorrent on restart or manual
refresh; Cleanuparr on its configured interval).
## Raw URLs
Point consumers at the raw file URLs, not the Gitea blob viewer URLs:
```
https://git.hisp.no/arr/blocklists/raw/branch/main/blacklist
https://git.hisp.no/arr/blocklists/raw/branch/main/whitelist
```
The `raw/branch/main/` path serves the file contents directly with the
correct `text/plain` content type. Using `src/branch/main/` instead serves
the HTML viewer page and will break the consumer.
## qBittorrent
qBittorrent has an **excluded file names** feature that skips files
matching any of the configured glob patterns when downloading a torrent.
There is no "included file names" or whitelist mode -- qBittorrent only
supports exclusion. This is why it consumes the merged `blacklist` and not
the `whitelist`.
### Configuration
1. Open **Options** (Tools -> Options, or Ctrl+,).
2. Go to **Downloads**.
3. Scroll to **Excluded file names**.
4. Enable the checkbox.
5. Set the URL to:
```
https://git.hisp.no/arr/blocklists/raw/branch/main/blacklist
```
qBittorrent fetches the list on startup and whenever you click **Reload**
next to the field. There is no automatic refresh interval -- a restart or
manual reload is required to pick up changes.
### What qBittorrent does with the list
When a torrent is added, qBittorrent iterates the files inside it and
checks each filename against the excluded patterns. Matching files are
marked as "do not download" and will not be written to disk. The rest of
the torrent downloads normally.
This means the list operates at the **file level within a torrent**, not
the torrent level. A torrent containing `movie.mkv` and `movie.nor.srt`
would download both files if `*.srt` is in the whitelist (and thus not in
the blacklist), or just `movie.mkv` if `*.srt` were in the blacklist.
### Refreshing after a whitelist change
qBittorrent does not auto-refresh the list. After updating `whitelist`:
1. Wait for the next sync run (or dispatch the workflow manually).
2. In qBittorrent, open the excluded file names setting and click
**Reload**, or restart qBittorrent.
3. New torrents added from this point on will use the updated list.
Torrents already in the client are not retroactively changed.
## Cleanuparr
Cleanuparr supports two modes for its Malware Blocker and Blacklist Sync
features. The repository provides files suitable for both.
### Blacklist mode
In blacklist mode, Cleanuparr deletes any file matching a pattern in the
configured list.
Point it at the same URL as qBittorrent:
```
https://git.hisp.no/arr/blocklists/raw/branch/main/blacklist
```
Because the whitelist has already been subtracted, this file will not
cause Cleanuparr to delete anything you have marked as "keep" in the
whitelist. Consistent behaviour between the two tools without any
per-tool customisation.
### Whitelist mode
In whitelist mode, Cleanuparr keeps only files matching a pattern in the
configured list and deletes everything else.
Point it at:
```
https://git.hisp.no/arr/blocklists/raw/branch/main/whitelist
```
This is the more conservative choice: only the extensions explicitly
listed (video containers and subtitles) are allowed. Anything else --
including extensions that upstream has not yet flagged as malicious --
is deleted.
### Which mode to use
| Use case | Mode | Why |
|---|---|---|
| You trust upstream Cleanuparr's coverage and want to keep everything except known-bad | Blacklist | Lets through unusual-but-legitimate file types (e.g. exotic subtitle formats) |
| You only want a strict set of video + subtitle files on disk | Whitelist | Much stricter; deletes anything not explicitly listed |
| You want behaviour consistent with qBittorrent | Blacklist | Same source file, same semantics |
Blacklist mode is the recommended default because it matches the
qBittorrent side and avoids unexpected deletions of legitimate but
non-listed files.
## Keeping both consumers in sync
Both consumers ultimately read the whitelist (directly in Cleanuparr
whitelist mode, indirectly via subtraction in blacklist mode and in
qBittorrent). This means maintenance is centralised:
1. Add a line to `whitelist`.
2. Wait for the next sync run (or dispatch manually).
3. Both consumers honour the change after their next refresh.
There is no per-tool configuration drift because there is no per-tool
configuration to drift.
## Troubleshooting
### A file I whitelisted is still being blocked / deleted
Check each layer in order:
1. **Sync ran successfully?** Open the Gitea Actions page for the
repository and verify the most recent run is green and newer than
your whitelist commit.
2. **Blacklist was updated?** Read `blacklist` in Gitea and confirm your
whitelisted entry is not present.
3. **Consumer refreshed?** qBittorrent requires a manual reload or
restart. Cleanuparr refreshes on its own interval -- check its logs
to confirm it picked up the new file.
4. **Exact string match?** Whitelist entries must match blacklist entries
exactly. `*.srt` in whitelist does not strip `*sample.srt` from
blacklist. See [Lists](Lists) for pattern semantics.
### A file I did not whitelist is passing through
Check whether the pattern is in the blacklist at all:
1. Open `blacklist` in Gitea and search for the extension.
2. If it is not there, upstream does not block it either. You can add
it to `blacklist` directly (manual local addition, preserved by the
three-way merge) or file an upstream issue.
### Consumer returns 404
Verify the URL uses `raw/branch/main/`, not `src/branch/main/`:
```
# Correct
https://git.hisp.no/arr/blocklists/raw/branch/main/blacklist
# Wrong (serves HTML, not the file)
https://git.hisp.no/arr/blocklists/src/branch/main/blacklist
```
Also check the repository name and branch are correct
(`arr/blocklists`, `main`).
### Cleanuparr deletes subtitle files
Cleanuparr is running in whitelist mode against `blacklist`, which is
the wrong combination. Either switch it to blacklist mode (keep the URL),
or keep whitelist mode and point it at `whitelist` instead.
-1
@@ -1 +0,0 @@
Welcome to the Wiki.
+206
@@ -0,0 +1,206 @@
# Lists
## The two-file model
The repository contains exactly two data files. Each has a single, clear
role:
| File | Role | Source of truth | Edit it? |
|---|---|---|---|
| `blacklist` | Extensions blocked by downloaders and file cleaners | Upstream Cleanuparr, minus `whitelist` | Only for manual additions that upstream missed. Removals do not stick -- use `whitelist` instead |
| `whitelist` | Extensions that must never be blocked or deleted | Locally maintained, not synced from upstream | Yes. This is the main file you interact with |
`blacklist.prev` also exists in the repo but is not a data file -- it is
the three-way merge baseline used by the sync script. Never edit it.
## `blacklist`
The blacklist is the output file consumed by qBittorrent and (optionally)
Cleanuparr. It is regenerated on every sync as:
```
upstream_new | custom_local_additions - whitelist
```
Where `custom_local_additions` is detected by comparing the committed
`blacklist` against the previous upstream snapshot. See
[Sync](Sync) for the full algorithm.
### When to edit `blacklist` directly
In almost every case, you do not. The intended workflow is:
- To **remove** an entry (stop blocking it): add it to `whitelist`.
- To **add** an entry that upstream should also have: file an upstream
issue with Cleanuparr.
- To **add** an entry that is specific to your setup and not worth
upstreaming: edit `blacklist` directly. The three-way merge preserves
manual additions across syncs.
### When editing `blacklist` directly does not work
Removing a line from `blacklist` does not work as a removal mechanism.
The sync will re-add anything upstream has on the next run. If you want
something gone, put it in `whitelist`.
## `whitelist`
The whitelist is the locally-maintained allow list. It is the single source
of truth for "what must be kept." It is not synced from upstream -- any
changes you make are permanent until you change them again.
### Format
One glob pattern per line, sorted, no blank lines, no comments:
```
*.ass
*.avi
*.mkv
*.mp4
*.srt
*.ssa
*.sub
*.webm
```
The sort order is not enforced by the script but is the convention and
makes diffs easier to read.
### Semantics
Each line is treated as an exact string and subtracted from the blacklist.
See [Pattern matching](#pattern-matching) below for the details.
### Adding an entry
Edit `whitelist` in Gitea (or via a local clone and push), add the new
line, commit. The next sync run (or manual dispatch) will strip it from
the blacklist automatically.
You do not also need to remove it from the blacklist by hand -- the sync
does that.
### Removing an entry
Delete the line from `whitelist` and commit. The next sync will re-add
the entry to the blacklist if upstream still has it. If upstream no longer
has the entry, the entry stays gone (which is probably what you want).
## Pattern matching
The whitelist-to-blacklist exclusion uses **exact-string set subtraction**,
not glob matching. This is an intentional design choice that has two
important consequences.
### Exact entries are stripped
`*.srt` in the whitelist removes exactly the string `*.srt` from the
blacklist. If upstream has `*.srt` as a line, it gets removed. If upstream
does not have `*.srt`, nothing happens.
### Partial matches are not affected
`*.srt` in the whitelist does **not** strip:
| Blacklist entry | Stripped? | Why |
|---|---|---|
| `*.srt` | yes | Identical string |
| `*sample.srt` | no | Different string |
| `*.srt.bak` | no | Different string |
| `file.srt` | no | Different string |
This is what makes the whitelist safe to maintain. You can whitelist
`*.srt` to keep bundled subtitle files without accidentally unblocking
sample files or junk variants that happen to end in `.srt`.
### Why not glob matching
A glob-based exclusion would strip anything matching `*.srt` as a pattern,
which would also strip `*sample.srt` and `*.srt.bak`. That is usually not
what you want -- sample files are legitimate junk that the blacklist
should still remove.
Exact-string subtraction is also trivially simple to reason about: if the
line you want stripped is in the blacklist as the exact same string, put
that same string in the whitelist. Done.
## Examples
### Keeping Norwegian subtitle files
Scenario: torrents include `.srt` files as bundled Norwegian subtitles.
You want qBittorrent to download them, not strip them.
```
# whitelist entry
*.srt
```
After the next sync, `*.srt` is gone from `blacklist`. qBittorrent now
accepts `.srt` files from within torrents. `*sample.srt` remains blocked.
### Supporting AV1 in `.webm` containers
Scenario: you want qBittorrent to accept `.webm` AV1 releases, which are
currently blocked because the upstream blacklist treats `*.webm` as junk.
```
# whitelist entry
*.webm
```
After the next sync, `*.webm` is gone from `blacklist`. `.webm` torrents
download normally. `*sample.webm` remains blocked.
### Adding a site-specific junk extension
Scenario: a private tracker keeps injecting `*.nfo.gz` spam files that
upstream does not block.
```
# Edit blacklist directly, add the line:
*.nfo.gz
```
Commit and push. The next sync runs, the three-way merge sees
`*.nfo.gz` in `local - upstream_prev`, classifies it as a manual addition,
and preserves it through the merge. Subsequent syncs continue to preserve
it even as upstream evolves.
If upstream ever adds `*.nfo.gz` itself, the entry moves from "custom"
to "upstream" on the next sync -- still present, still blocked, just
sourced differently.
## What lives in each file right now
The whitelist ships with the extensions required for a normal media
stack with subtitles and AV1/webm releases:
```
*.ass - SubStation Alpha subtitles
*.avi - Audio Video Interleave container
*.mkv - Matroska container
*.mp4 - MPEG-4 container
*.srt - SubRip subtitles
*.ssa - SubStation Alpha subtitles
*.sub - MicroDVD / VobSub subtitles
*.webm - WebM container (AV1, VP9)
```
The blacklist contains whatever upstream Cleanuparr ships, minus everything
in the whitelist above. The actual contents change as upstream evolves --
check the file in Gitea for the current state.
## Consumer consequences
Changes to either file affect what consumers see:
| Change | Effect on qBittorrent | Effect on Cleanuparr (blacklist mode) | Effect on Cleanuparr (whitelist mode) |
|---|---|---|---|
| Add to `whitelist` | Stops blocking this extension | Stops deleting this extension | Starts allowing this extension |
| Remove from `whitelist` | Resumes blocking (if upstream has it) | Resumes deleting (if upstream has it) | Stops allowing this extension |
| Add to `blacklist` directly | Starts blocking this extension | Starts deleting this extension | No effect |
| Remove from `blacklist` directly | No effect (sync re-adds) | No effect (sync re-adds) | No effect |
See [Consumers](Consumers) for configuration details.
+208
@@ -0,0 +1,208 @@
# Sync
## Overview
The sync process fetches the upstream Cleanuparr blacklist, preserves any
manual local additions, subtracts the locally-maintained whitelist, and
writes the result back to `blacklist`. It runs on a schedule (every 7 days)
and on manual dispatch. All logic lives in `scripts/merge_blocklists.py`
(about 45 lines of pure Python standard library, no third-party deps).
## Inputs
The script reads three sources on every run:
| Source | Path | Role |
|---|---|---|
| Upstream | `https://raw.githubusercontent.com/Cleanuparr/Cleanuparr/main/blacklist` | Current upstream state, fetched over HTTPS |
| Upstream snapshot | `blacklist.prev` | What upstream looked like on the previous sync (baseline) |
| Committed blacklist | `blacklist` | Current committed state, may contain manual local additions |
| Whitelist | `whitelist` | Locally-maintained entries to strip from the merged result |
All four are parsed the same way: one entry per non-empty line, stripped of
leading/trailing whitespace, loaded into a Python `set`.
## Three-way merge
The script performs a classic three-way merge, git-style, using set
operations:
```
custom = local - upstream_prev
merged = upstream_new | custom
result = merged - whitelist
```
Each line does one specific job:
### `custom = local - upstream_prev`
Compute what was added locally. Anything in the committed `blacklist` that
was not in the previous upstream snapshot must be a manual local addition,
because the sync script is the only other thing that writes to `blacklist`
and it always produces a subset of `upstream_new | custom`. Tracking this
set lets the next sync re-apply those additions on top of the new upstream.
### `merged = upstream_new | custom`
Union the fresh upstream with the preserved local additions. Upstream
additions flow in (they appear in `upstream_new`), upstream removals flow
out (they were in `upstream_prev` but are not in `upstream_new`, and are
also not in `custom`), and manual local additions survive.
### `result = merged - whitelist`
Strip every entry that appears in the locally-maintained whitelist. This
is the step that enables local removals: an extension placed in `whitelist`
is always removed from the final `blacklist`, no matter how many times
upstream re-adds it.
After the merge the script writes `result` to `blacklist` and overwrites
`blacklist.prev` with `upstream_new` so the next run has a fresh baseline.
## Why a three-way merge
A simpler design would be `result = upstream_new - whitelist`, with no
`.prev` file and no custom tracking. That works for the common case but
drops an escape hatch: if you spot something upstream missed (a new
malware extension, a tracker-specific junk file) and add it directly to
`blacklist`, the next sync would silently drop it.
The three-way merge preserves those manual additions without requiring
them to live in a separate "additions" file. If you never add anything
directly, the `custom` set is empty on every run and the merge reduces to
`upstream_new - whitelist`. The overhead is one extra file (`blacklist.prev`)
and two set operations.
## Whitelist exclusion
The whitelist is subtracted with exact-string set subtraction, not pattern
matching. This has two important consequences:
### Exact entries are stripped
`*.srt` in `whitelist` strips exactly `*.srt` from the blacklist. Same for
`*.webm`, `*.mkv`, etc.
### Sample patterns are preserved
The upstream blacklist contains entries like `*sample.srt`, `*sample.webm`,
and `*sample.mkv` that block files with "sample" in the name regardless of
extension. These are separate string entries from `*.srt` or `*.webm`, so
whitelisting the plain extension does not remove the sample-file variant.
Sample files continue to be blocked.
This is almost always the behaviour you want: subtitle files shipped inside
a release are kept, but standalone "sample.srt" clutter is still filtered.
## The `.prev` file
`blacklist.prev` is a plain text snapshot of whatever `upstream_new` was on
the previous successful run. It has no special format, no metadata, and is
never edited manually. The sync script rewrites it at the end of every run.
It exists purely as the baseline for the `local - upstream_prev` step in
the three-way merge. Without it, the script could not distinguish "this
entry was in local because upstream had it" from "this entry was in local
because someone added it manually."
If `blacklist.prev` is missing (first run, or manually deleted), the script
treats the current `upstream_new` as the baseline. This means manual
additions made before the first sync are lost -- on the first run with a
fresh `.prev`, add them to `whitelist` instead (where they will survive)
or add them after the first sync completes.
## Edge cases
### First run
`blacklist.prev` does not exist, `blacklist` may or may not exist.
`upstream_prev = upstream_new`, so `custom = local - upstream_new` (anything
in `local` that is not upstream). After the run, `.prev` exists and
subsequent runs use the normal path.
### Empty or missing whitelist
If `whitelist` is missing or empty, `whitelist = set()` and the subtraction
is a no-op. The merge degenerates to a plain upstream sync with local
additions preserved.
### Empty or missing blacklist
If `blacklist` is missing, `local = set()`, `custom = set()`, and
`result = upstream_new - whitelist`. Equivalent to a fresh install.
### Upstream removes an entry that is also in the whitelist
Harmless. `upstream_new` does not contain it, so `merged` does not contain
it, and the whitelist subtraction removes nothing (the entry was already
absent). The whitelist entry stays as a harmless no-op for future syncs.
### An entry appears in both whitelist and blacklist custom additions
You manually added `*.foo` to `blacklist` and also added `*.foo` to
`whitelist`. The whitelist wins: `*.foo` is in `custom`, survives the
union, then gets stripped by the final subtraction. The committed
`blacklist` will not contain `*.foo`. The custom entry is effectively
invisible until you remove `*.foo` from `whitelist`.
## Reporting
Each sync run logs four lines to the workflow output:
```
[blacklist] Upstream added: [...]
[blacklist] Upstream removed: [...]
[blacklist] Custom preserved: [...]
[blacklist] Whitelist stripped: [...]
```
These are sorted lists showing exactly what changed. Check the Actions run
log after any sync to see what happened, especially if a consumer reports
unexpected behaviour.
## Full script
```python
import urllib.request
UPSTREAM_URL = "https://raw.githubusercontent.com/Cleanuparr/Cleanuparr/main/blacklist"
BLACKLIST = "blacklist"
BLACKLIST_PREV = "blacklist.prev"
WHITELIST = "whitelist"
def read_lines(path):
try:
with open(path) as f:
return set(line.strip() for line in f if line.strip())
except FileNotFoundError:
return set()
def main():
with urllib.request.urlopen(UPSTREAM_URL) as r:
upstream_new = set(
line.strip() for line in r.read().decode().splitlines() if line.strip()
)
upstream_prev = read_lines(BLACKLIST_PREV)
if not upstream_prev:
upstream_prev = upstream_new.copy()
local = read_lines(BLACKLIST)
whitelist = read_lines(WHITELIST)
custom = local - upstream_prev
merged = upstream_new | custom
result = merged - whitelist
with open(BLACKLIST, "w") as f:
f.write("\n".join(sorted(result)) + "\n")
with open(BLACKLIST_PREV, "w") as f:
f.write("\n".join(sorted(upstream_new)) + "\n")
```
Logging and the `__main__` guard are omitted above for clarity. See
`scripts/merge_blocklists.py` in the repository for the full source.