Script Deep Dive: generate-rss.ps1
RSS 2.0 · XML Escaping · Plain Text Summaries
What RSS Is and Why It Matters
RSS (Really Simple Syndication) was born in 1999 as a content subscription protocol. It uses XML to describe a "channel" and its "items." A feed reader periodically fetches this XML, discovers new entries, and notifies you. No algorithmic recommendations, no account registration — you subscribe, you receive.
In 2026, most content is locked behind walled gardens, making RSS all the more precious. A 90s-style website offering RSS serves both a functional need and a philosophical alignment — RSS itself is a product of that era.
generate-rss.ps1 has the smallest code footprint of the five scripts (about 108 lines), but it implements a complete RSS 2.0 generator: XML escaping, title extraction, plaintext conversion, GUID management — each step is straightforward.
The Structure of RSS 2.0
The core of RSS 2.0 is a <channel> containing multiple <item> elements. The channel describes basic site information (title, link, language), and each item corresponds to an article (title, link, summary, GUID, publication date).
This site chose RSS 2.0 over Atom (the 2005 alternative standard) because RSS 2.0 is the simplest, most prevalent in podcasts and traditional blogs, and compatible with the most feed readers.
XML Escaping: Five Characters That Must Be Handled
The first step in generating XML is ensuring that article content doesn't break the XML structure. The script defines an Escape-Xml function that replaces five special characters with XML entities: & → &, < → <, > → >, " → ", ' → '.
Although the XML 1.0 specification only mandates escaping < and &, escaping all five characters is the safer approach, especially within attribute values. & must be replaced first, otherwise it will corrupt already-escaped entities — a common pitfall.
Publication Date Derivation
RSS pubDate uses RFC 2822 format (e.g., Mon, 28 Apr 2026 12:00:00 GMT). The script uses a three-tier fallback mechanism to derive dates:
First tier: Extract year-month-day from the article title (e.g., "April 28, 2026")
Second tier: Extract year-month from the title (e.g., "April 2026"), defaulting to the 1st of that month
Third tier: Use file modification time as the fallback
PowerShell's DateTime.ToString('R') outputs RFC 2822 format natively, with no need for manual string concatenation.
Plain Text Summaries
The RSS <description> field in this script uses plain text rather than HTML. Although RSS supports wrapping HTML content in <![CDATA[]]>, not all readers render it correctly — especially older readers and command-line RSS tools.
The summary generation is a three-step cleanup: strip tags with regex → replace → compress consecutive whitespace. The first 200 characters are then taken, providing enough preview without bloating the RSS file.
GUID and Known Limitations
Each article uses its page path as the GUID (isPermaLink="false"). RSS readers use the GUID to determine whether an article has already been read — if an article's slug changes, the reader treats it as a new article, which is usually the desired behavior.
The current version has some areas for improvement: it only scans the Chinese blog directory (English articles need a separate RSS), and lastBuildDate uses the script runtime rather than the latest article's publication date (causing updates on every build even without new content). These are "version two" to-do items.
Why This Script Deserves to Exist
In 2026, what this 108-line PowerShell script accomplishes is what many platforms need entire engineering teams to build. RSS isn't cutting-edge technology, but it represents an important principle — open standards are superior to closed platforms, users own their subscription lists, and no company controls them.
Every article on this site appears in the RSS feed. No login required, no payment required, no recommendation algorithm to accept. Add it to your reader and you'll know when new articles arrive. Just like 1999.
|