Script Deep Dive: generate-sitemap.ps1

Sitemap Protocol 0.9 · changefreq · priority

What This Script Does

generate-sitemap.ps1 is the final step in the build pipeline. It generates an XML file conforming to the Sitemap Protocol 0.9 standard, telling search engines like Google and Bing which pages exist on this site, when each page was last updated, and which pages are more important.

A sitemap doesn't guarantee indexing — it's merely a suggestion. Search engines have the right to ignore certain pages and to crawl pages not listed in the sitemap. But providing an accurate, up-to-date sitemap can significantly speed up the discovery of new content, especially for new websites and infrequently updated pages.

The Three Fields of the Sitemap Protocol

Each <url> contains three fields:

<loc> (required) — the full URL of the page, including protocol and domain

<lastmod> — last modified date, in YYYY-MM-DD format. Search engines use it to prioritize crawling recently updated pages

<changefreq> — update frequency hint (daily / weekly / monthly), helping crawlers allocate their crawl budget

<priority> — relative priority (0.0-1.0). Homepage is 1.0, blog posts are 0.7, tool pages are 0.3-0.5

Honestly, modern search engines pay less attention to changefreq and priority — they have their own algorithms for assessing page importance. But adding these two fields costs nothing, and "having them is better than not."

What Pages Are Scanned

The current version scans the following, generating approximately 40 URLs:

Homepage (Chinese and English) — hardcoded, since the homepage has no corresponding content file

All blog posts (Chinese and English) — scanned from src/content/blog/ and blog/en/, automatically skipping drafts

All standalone pages (Chinese and English) — scanned from src/content/page/ and page/en/

Static pages — archive.html, tags.html, stats.html

lastmod uses the source file's LastWriteTime, which is more accurate than the date in metadata — it reflects when the content was actually last modified, rather than the article's byline date.

Why Static Generation Instead of Dynamic

Some websites use CGI or PHP to dynamically generate sitemaps (scanning the database on each request). This site chose static generation. The reasons are straightforward:

With a small number of articles, generating a static XML in 0.1 seconds during build is perfectly acceptable

Static files can be cached by the web server without consuming CGI processes

Search engine crawlers may visit the sitemap daily; dynamic generation would cause unnecessary server load

Consistent with the site's "static-first" philosophy — whenever possible, pre-generate rather than compute at runtime

This is a consistent design philosophy: static first, CGI as a last resort.

Output Location and Auto-Discovery

The sitemap outputs to the project root (not dist/) because sitemap.xml must, by convention, be placed in the website's root directory. Search engine crawlers look for /sitemap.xml and /robots.txt first when crawling.

This site's robots.txt also includes a Sitemap: https://www.dragonrster.cn/sitemap.xml line, providing a second discovery path. Belt and suspenders.

Search Engine Discovery Flow

A complete search engine discovery flow goes like this: the crawler first reads robots.txt to check crawl restrictions, then reads sitemap.xml to get all URLs and their last modification times, crawls by lastmod priority, parses HTML to extract body text and titles, and finally enters the index.

In 2026, a 90s-style website with no JavaScript and pure table-based layout can still be found by Google, thanks in no small part to the sitemap and solid SEO metadata.

« Script Deep Dive: generate-rss.ps1

« Home

Script Deep Dive: build.ps1 »

Tools

[Toolbox]

Latest Posts

» WD HC620 User Guide
» Scripts Overview
» May 2026 ·...
» IE 5.5 Com...
» Current Si...

» Article Archive

Tags

Tutorial Web Development Scripts SEO