Table of Contents
- Sync
- Overview
- Inputs
- Three-way merge
- Why a three-way merge
- Whitelist exclusion
- The .prev file
- Edge cases
- First run
- Empty or missing whitelist
- Empty or missing blacklist
- Upstream removes an entry that is also in the whitelist
- An entry appears in both whitelist and blacklist custom additions
- Reporting
- Full script
Sync
Overview
The sync process fetches the upstream Cleanuparr blacklist, preserves any
manual local additions, subtracts the locally-maintained whitelist, and
writes the result back to blacklist. It runs on a schedule (every 7 days)
and on manual dispatch. All logic lives in scripts/merge_blocklists.py
(about 45 lines of pure Python standard library, no third-party deps).
Inputs
The script reads three sources on every run:
| Source | Path | Role |
|---|---|---|
| Upstream | https://raw.githubusercontent.com/Cleanuparr/Cleanuparr/main/blacklist |
Current upstream state, fetched over HTTPS |
| Upstream snapshot | blacklist.prev |
What upstream looked like on the previous sync (baseline) |
| Committed blacklist | blacklist |
Current committed state, may contain manual local additions |
| Whitelist | whitelist |
Locally-maintained entries to strip from the merged result |
All four are parsed the same way: one entry per non-empty line, stripped of
leading/trailing whitespace, loaded into a Python set.
Three-way merge
The script performs a classic three-way merge, git-style, using set operations:
custom = local - upstream_prev
merged = upstream_new | custom
result = merged - whitelist
Each line does one specific job:
custom = local - upstream_prev
Compute what was added locally. Anything in the committed blacklist that
was not in the previous upstream snapshot must be a manual local addition,
because the sync script is the only other thing that writes to blacklist
and it always produces a subset of upstream_new | custom. Tracking this
set lets the next sync re-apply those additions on top of the new upstream.
merged = upstream_new | custom
Union the fresh upstream with the preserved local additions. Upstream
additions flow in (they appear in upstream_new), upstream removals flow
out (they were in upstream_prev but are not in upstream_new, and are
also not in custom), and manual local additions survive.
result = merged - whitelist
Strip every entry that appears in the locally-maintained whitelist. This
is the step that enables local removals: an extension placed in whitelist
is always removed from the final blacklist, no matter how many times
upstream re-adds it.
After the merge the script writes result to blacklist and overwrites
blacklist.prev with upstream_new so the next run has a fresh baseline.
Why a three-way merge
A simpler design would be result = upstream_new - whitelist, with no
.prev file and no custom tracking. That works for the common case but
drops an escape hatch: if you spot something upstream missed (a new
malware extension, a tracker-specific junk file) and add it directly to
blacklist, the next sync would silently drop it.
The three-way merge preserves those manual additions without requiring
them to live in a separate "additions" file. If you never add anything
directly, the custom set is empty on every run and the merge reduces to
upstream_new - whitelist. The overhead is one extra file (blacklist.prev)
and two set operations.
Whitelist exclusion
The whitelist is subtracted with exact-string set subtraction, not pattern matching. This has two important consequences:
Exact entries are stripped
*.srt in whitelist strips exactly *.srt from the blacklist. Same for
*.webm, *.mkv, etc.
Sample patterns are preserved
The upstream blacklist contains entries like *sample.srt, *sample.webm,
and *sample.mkv that block files with "sample" in the name regardless of
extension. These are separate string entries from *.srt or *.webm, so
whitelisting the plain extension does not remove the sample-file variant.
Sample files continue to be blocked.
This is almost always the behaviour you want: subtitle files shipped inside a release are kept, but standalone "sample.srt" clutter is still filtered.
The .prev file
blacklist.prev is a plain text snapshot of whatever upstream_new was on
the previous successful run. It has no special format, no metadata, and is
never edited manually. The sync script rewrites it at the end of every run.
It exists purely as the baseline for the local - upstream_prev step in
the three-way merge. Without it, the script could not distinguish "this
entry was in local because upstream had it" from "this entry was in local
because someone added it manually."
If blacklist.prev is missing (first run, or manually deleted), the script
treats the current upstream_new as the baseline. This means manual
additions made before the first sync are lost -- on the first run with a
fresh .prev, add them to whitelist instead (where they will survive)
or add them after the first sync completes.
Edge cases
First run
blacklist.prev does not exist, blacklist may or may not exist.
upstream_prev = upstream_new, so custom = local - upstream_new (anything
in local that is not upstream). After the run, .prev exists and
subsequent runs use the normal path.
Empty or missing whitelist
If whitelist is missing or empty, whitelist = set() and the subtraction
is a no-op. The merge degenerates to a plain upstream sync with local
additions preserved.
Empty or missing blacklist
If blacklist is missing, local = set(), custom = set(), and
result = upstream_new - whitelist. Equivalent to a fresh install.
Upstream removes an entry that is also in the whitelist
Harmless. upstream_new does not contain it, so merged does not contain
it, and the whitelist subtraction removes nothing (the entry was already
absent). The whitelist entry stays as a harmless no-op for future syncs.
An entry appears in both whitelist and blacklist custom additions
You manually added *.foo to blacklist and also added *.foo to
whitelist. The whitelist wins: *.foo is in custom, survives the
union, then gets stripped by the final subtraction. The committed
blacklist will not contain *.foo. The custom entry is effectively
invisible until you remove *.foo from whitelist.
Reporting
Each sync run logs four lines to the workflow output:
[blacklist] Upstream added: [...]
[blacklist] Upstream removed: [...]
[blacklist] Custom preserved: [...]
[blacklist] Whitelist stripped: [...]
These are sorted lists showing exactly what changed. Check the Actions run log after any sync to see what happened, especially if a consumer reports unexpected behaviour.
Full script
import urllib.request
UPSTREAM_URL = "https://raw.githubusercontent.com/Cleanuparr/Cleanuparr/main/blacklist"
BLACKLIST = "blacklist"
BLACKLIST_PREV = "blacklist.prev"
WHITELIST = "whitelist"
def read_lines(path):
try:
with open(path) as f:
return set(line.strip() for line in f if line.strip())
except FileNotFoundError:
return set()
def main():
with urllib.request.urlopen(UPSTREAM_URL) as r:
upstream_new = set(
line.strip() for line in r.read().decode().splitlines() if line.strip()
)
upstream_prev = read_lines(BLACKLIST_PREV)
if not upstream_prev:
upstream_prev = upstream_new.copy()
local = read_lines(BLACKLIST)
whitelist = read_lines(WHITELIST)
custom = local - upstream_prev
merged = upstream_new | custom
result = merged - whitelist
with open(BLACKLIST, "w") as f:
f.write("\n".join(sorted(result)) + "\n")
with open(BLACKLIST_PREV, "w") as f:
f.write("\n".join(sorted(upstream_new)) + "\n")
Logging and the __main__ guard are omitted above for clarity. See
scripts/merge_blocklists.py in the repository for the full source.