Simplify to whitelist/blacklist model
- Rewrite merge_blocklists.py to sync a single blacklist from upstream and subtract the locally-maintained whitelist - Replace whitelist contents with subtitle + webm seed - Remove blacklist_permissive, whitelist_with_subtitles, and all .prev files that are no longer needed - Rewrite README to reflect the two-file model and link to wiki
This commit is contained in:
@@ -1,40 +1,120 @@
|
||||
# ARR Stack Blocklists
|
||||
# arr/blocklists
|
||||
|
||||
Automatically synchronized blocklists for use with Cleanuparr in the ARR media stack.
|
||||
Curated blacklist and whitelist for the ARR media stack. The blacklist is
|
||||
synced automatically from upstream Cleanuparr and stripped of anything
|
||||
listed in the locally-maintained whitelist, so consumers like qBittorrent
|
||||
and Cleanuparr can point at a single raw URL per list and stay in sync.
|
||||
|
||||
## Files
|
||||
See the wiki for full technical reference:
|
||||
- [Sync](https://git.hisp.no/arr/blocklists/wiki/Sync)
|
||||
-- three-way merge, whitelist exclusion, `.prev` snapshot, edge cases
|
||||
- [Lists](https://git.hisp.no/arr/blocklists/wiki/Lists)
|
||||
-- the two-file model, pattern semantics, maintaining the whitelist
|
||||
- [Consumers](https://git.hisp.no/arr/blocklists/wiki/Consumers)
|
||||
-- qBittorrent and Cleanuparr integration, raw URLs, recommended modes
|
||||
- [CI and Workflow](https://git.hisp.no/arr/blocklists/wiki/CI-and-Workflow)
|
||||
-- scheduled Gitea Actions job, manual dispatch, commit behaviour
|
||||
|
||||
| File | Description |
|
||||
|------|-------------|
|
||||
| `blacklist` | Standard blocklist — blocks all known malicious and unwanted file types |
|
||||
| `blacklist_permissive` | Permissive blocklist — blocks genuinely malicious types with fewer false positives |
|
||||
| `whitelist` | Whitelist — only files matching these patterns are allowed |
|
||||
| `whitelist_with_subtitles` | Whitelist with subtitle file types included |
|
||||
| `*.prev` | Internal sync reference files — do not edit manually |
|
||||
## How it works
|
||||
|
||||
The repository contains two data files:
|
||||
|
||||
| File | Role | Source |
|
||||
|---|---|---|
|
||||
| `blacklist` | Extensions blocked by downloaders and file cleaners | Synced from upstream, with the whitelist subtracted |
|
||||
| `whitelist` | Extensions that must never be blocked or deleted | Locally maintained |
|
||||
|
||||
On every scheduled run the sync script:
|
||||
|
||||
1. Fetches the current upstream blacklist from Cleanuparr.
|
||||
2. Detects any manual additions made directly to `blacklist` (three-way
|
||||
merge against `blacklist.prev`).
|
||||
3. Subtracts every entry listed in `whitelist`.
|
||||
4. Writes the result back to `blacklist` and updates `blacklist.prev`.
|
||||
|
||||
The whitelist is the single source of truth for "what I want kept." Adding
|
||||
an extension to `whitelist` removes it from `blacklist` on the next sync
|
||||
and prevents consumers from blocking or deleting it. See
|
||||
[Sync](https://git.hisp.no/arr/blocklists/wiki/Sync) for the full algorithm.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- A consumer that reads a remote text file of glob patterns (qBittorrent
|
||||
excluded file names, Cleanuparr blacklist/whitelist sync, etc.)
|
||||
- Network access from that consumer to `git.hisp.no`
|
||||
|
||||
## File structure
|
||||
|
||||
| Path | Purpose |
|
||||
|---|---|
|
||||
| `blacklist` | Merged output: upstream blacklist minus the whitelist. Consumer-facing |
|
||||
| `blacklist.prev` | Snapshot of the last upstream fetch. Baseline for the three-way merge. Do not edit |
|
||||
| `whitelist` | Locally-maintained allow list. Edit directly to add or remove entries |
|
||||
| `scripts/merge_blocklists.py` | Sync script executed by the scheduled workflow |
|
||||
| `.gitea/workflows/sync.yml` | Scheduled Gitea Actions workflow |
|
||||
|
||||
## Usage
|
||||
|
||||
Point Cleanuparr's Malware Blocker and Blacklist Sync at the raw URL of your chosen file:
|
||||
Point your consumer at the raw URL of the file it should use.
|
||||
|
||||
### qBittorrent
|
||||
|
||||
qBittorrent has no whitelist feature, so it consumes the blacklist directly.
|
||||
Set the excluded file names list (Options -> Downloads -> Excluded file
|
||||
names) to:
|
||||
|
||||
```
|
||||
https://git.hisp.no/arr/blocklists/raw/branch/main/blacklist_permissive
|
||||
https://git.hisp.no/arr/blocklists/raw/branch/main/blacklist
|
||||
```
|
||||
|
||||
## Sync
|
||||
Because the whitelist is already subtracted from this file, any extension
|
||||
you add to `whitelist` stops being blocked by qBittorrent on the next sync.
|
||||
|
||||
Files are automatically synchronized from the upstream [Cleanuparr](https://github.com/Cleanuparr/Cleanuparr) repository every 6 hours via Gitea Actions.
|
||||
### Cleanuparr
|
||||
|
||||
The sync uses a three-way merge strategy:
|
||||
- Upstream additions are automatically included
|
||||
- Upstream removals are automatically removed
|
||||
- Your custom additions are preserved across every sync
|
||||
Cleanuparr supports both blacklist and whitelist modes. Use whichever
|
||||
matches your setup:
|
||||
|
||||
## Custom Entries
|
||||
- **Blacklist mode** -- point at the same `blacklist` raw URL as qBittorrent.
|
||||
- **Whitelist mode** -- point at the `whitelist` raw URL:
|
||||
|
||||
To add your own entries, edit the relevant file directly in Gitea. Your additions will be detected as custom entries and preserved on every subsequent sync.
|
||||
|
||||
## Upstream Source
|
||||
|
||||
Blocklists are sourced from:
|
||||
```
|
||||
https://raw.githubusercontent.com/Cleanuparr/Cleanuparr/main/
|
||||
```
|
||||
https://git.hisp.no/arr/blocklists/raw/branch/main/whitelist
|
||||
```
|
||||
|
||||
See [Consumers](https://git.hisp.no/arr/blocklists/wiki/Consumers) for
|
||||
recommended mode per feature.
|
||||
|
||||
## Maintaining the whitelist
|
||||
|
||||
Edit `whitelist` directly in Gitea or via a local clone. One glob pattern
|
||||
per line, sorted, no blank lines. Patterns are matched against the blacklist
|
||||
with exact-string set subtraction:
|
||||
|
||||
- `*.srt` in `whitelist` removes `*.srt` from `blacklist`.
|
||||
- `*sample.srt` in `blacklist` is not affected by `*.srt` in `whitelist`.
|
||||
Sample-file patterns are preserved because exact-string subtraction only
|
||||
removes identical entries.
|
||||
|
||||
See [Lists](https://git.hisp.no/arr/blocklists/wiki/Lists) for the full
|
||||
pattern rules and examples.
|
||||
|
||||
## Sync schedule
|
||||
|
||||
The Gitea Actions workflow runs every 7 days at 04:00 UTC and on manual
|
||||
dispatch. Each run:
|
||||
|
||||
1. Executes `scripts/merge_blocklists.py`.
|
||||
2. Commits `blacklist` and `blacklist.prev` if either changed.
|
||||
3. Pushes the commit to `main`.
|
||||
|
||||
See [CI and Workflow](https://git.hisp.no/arr/blocklists/wiki/CI-and-Workflow)
|
||||
for workflow details and manual dispatch instructions.
|
||||
|
||||
## Upstream source
|
||||
|
||||
The blacklist is sourced from:
|
||||
|
||||
```
|
||||
https://raw.githubusercontent.com/Cleanuparr/Cleanuparr/main/blacklist
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user