|
Contents
|
Script Deep Dive: generate-archive.ps1Archive · Tag Cloud · Latest Posts · Search Index What This Script Does
It must run before Metadata ScanningThe first thing the script does is traverse the blog directory, extracting four pieces of metadata from HTML comments in each date) — compatible with both 2026-04-28 and 2026-04-28 23:30 formatstitle) — the display name of the articletags) — comma-separated keywords used for the tag clouddraft: true) — if present, the entire article is skipped and excluded from all outputsAll article information is stored in a PowerShell object array, sorted by date in descending order. This step forms the data foundation for the entire script; all four subsequent outputs are derived from this array. Output A: Latest Posts ComponentTakes the top 5 articles and generates There's a noteworthy detail here: title truncation uses pixel width estimation rather than character count. Chinese characters are estimated at 11px, ASCII at 6px, and titles exceeding 110px total width are truncated to 10 characters. Character-count truncation works fine for pure Chinese, but mixed Chinese-English titles can vary wildly — "AAAAAAAAAA" and "ChineseChinese" are both 5 characters but can differ by nearly a factor of two in pixel width. Before CSS The output file uses UTF-8 without BOM encoding. Because it gets injected into pages that already have a BOM, including a BOM would introduce garbled characters at the injection point. Output B: Article Archive PageThe core of the archive page is year-month grouping. Dates are standardized to Each month is rendered as a heading ("April 2026") + a two-column table: date on the left (yellow, 130px wide) and article title on the right (cyan links). This page has no sidebar — it uses a two-column layout with blog articles on the left and other pages (download center, FAQ, etc.) on the right, centered in a 980px table. The archive page outputs to Output C: Tag Cloud PageThe tag cloud is the most complex of the four outputs. It needs to build a reverse index of tags to articles: iterating through all article tags and using a hash table to map each tag to its list of containing articles. Tag font sizes are tiered by article count: 3+ articles get large size (size="4"), 2 articles get medium (size="3"), 1 article gets small (size="2"). Three levels are sufficient for differentiation without needing more sophisticated popularity algorithms. Below the tag cloud is the article list for each tag, using Output D: Search IndexThe search index is a plain text file The text extraction process is straightforward: read HTML → strip all tags with regex → replace The search is completely JavaScript-free — the search box is an HTML form that submits to CGI, the server performs full-text matching, and returns a results page with highlighting. The entire process works the same way search engines did in 2002. Bilingual SupportThe script uses the Design PhilosophyIf this script were split into four separate scripts, the code would be more modular, but each full-site build would require scanning the blog directory four times. The design of one scan, four outputs reduces I/O. On a site with only a few dozen articles, the difference may be just a few hundred milliseconds, but it reflects an engineering habit of "thinking through the data flow ahead of time" — knowing where data comes from, where it goes, and how many transformations it passes through in between. PowerShell's
|