Guestbook Architecture Notes
Plain text storage + CGI processing
Overall Flow
The complete path after a guestbook submission is as follows:
[Visitor fills form] → [CGI script receives] → [Writes to guestbook.txt]
↓
[Triggers rebuild-all] → [build.ps1 reads data] → [Injects into sidebar HTML]
↓
[Deploy new pages] → [Visitor sees message]
Messages don't appear in real time after submission; instead, they are compiled into the static pages during the next build (I feel real-time updates are needed)[2].
1. Front-end Form
The form is written directly in sidebar-left.html, in the left sidebar guestbook area. A standard HTML 3.2 form:
<form action="/cgi-bin/guestbook.py" method="post">
Name: <input type="text" name="name">
Email: <input type="text" name="email" placeholder="Optional">
Content: <textarea name="content"></textarea>
[x] Show IP <input type="checkbox" name="show_ip" value="yes">
<input type="submit" value="Send Message">
</form>
Four fields: name (required), email (optional — if provided, the username becomes a mailto link), content (required), show_ip (controls whether the IP is publicly displayed). No CAPTCHA (not considering it for now).
2. CGI Backend
After submission, the form posts to /cgi-bin/guestbook.py, a CGI[1] script for processing.
Input Handling
def parse_form_data():
method = os.environ.get('REQUEST_METHOD', 'GET')
if method == 'GET':
qs = os.environ.get('QUERY_STRING', '')
form_data = parse_qs(qs)
else:
length = int(os.environ.get('CONTENT_LENGTH', 0))
body = sys.stdin.read(length)
form_data = parse_qs(body)
return {k: v[0] for k, v in form_data.items()}
CGI data sources are very primitive: GET requests read from the QUERY_STRING environment variable, POST requests read from stdin. The script parses URL-encoded form data, then performs sanitization:
def sanitize(text):
text = text.replace('\n', ' ').replace('\r', ' ') # Remove newlines
text = text.replace('|', ' ') # Remove delimiter
return html.escape(text) # Escape HTML
Removing | is necessary because it's used as the data file delimiter; removing newlines prevents data format corruption; HTML escaping prevents XSS attacks.
3. Client IP Detection
Because the deployment sits behind a reverse proxy, directly reading REMOTE_ADDR would only return the proxy server's IP. So web_server.py resolves the real IP on every request and passes it to the CGI via environment variables:
# In CustomHandler._inject_real_ip()
cf_ip = headers.get('CF-Connecting-IP') # Cloudflare
xff = headers.get('X-Forwarded-For') # Standard proxy header
real_ip = headers.get('X-Real-IP') # Commonly used by Nginx
if cf_ip:
real_client_ip = cf_ip
elif xff:
real_client_ip = xff.split(',')[0].strip() # Take the first one
elif real_ip:
real_client_ip = real_ip
else:
real_client_ip = client_ip # Direct connection fallback
os.environ["REAL_CLIENT_IP"] = real_client_ip # Inject into CGI environment
Priority: Cloudflare > X-Forwarded-For > X-Real-IP > Direct connection. The CGI script obtains the correct client IP via os.environ.get("REAL_CLIENT_IP").
4. Data Storage
Messages are stored in data/runtime/guestbook.txt, one per line, fields delimited by |:
name|email|content|ip|time|show_ip ← New format (6 fields)
name|content|ip|time|show_ip ← Old format (5 fields, backward compatible)
Examples:
DragonRSTER|dragonrster@foxmail.com|Hey, email support is now available|hidden|2026-04-26 18:52:27|no
xintai||This message was sent from win98|180.154.121.226|2026-04-24 23:33:41|yes
If the user chooses not to display their IP, the IP field is written as hidden rather than the actual address. The entire file is essentially a plain-text CSV variant, viewable with any tool. Currently there are over 30 messages, with the earliest dating back to 2020 (a relic from the previous blog).
5. Build-time Injection
Every time rebuild-all.ps1 runs, build.ps1 performs the following:
# Read guestbook.txt
$lines = Get-Content $guestbookFile -Encoding UTF8
# Take the last 20 messages, reverse order (newest on top)
$lastLines = $lines | Select-Object -Last 20
[array]::Reverse($lastLines)
# Generate HTML for each message
foreach ($line in $lastLines) {
$parts = $line -split '\|'
# Compatible with both old and new formats...
# Generates: name (with mailto) + content + IP (optional) + timestamp
}
# Inject at the placeholder in sidebar-left.html
$sidebarLeft = $sidebarLeft -replace "<!-- GUESTBOOK_MESSAGES -->", $messagesHtml
Message content is compiled directly into HTML, written at the <!-- GUESTBOOK_MESSAGES --> placeholder in sidebar-left.html. Display logic:
If email is provided → username renders as an <a href="mailto:..."> link
If show_ip is yes → IP address is displayed below the message (small gray text)
All messages are wrapped in a scrollable container, showing at most 20 entries
Since guestbook.txt now supports the email field, both old and new formats are handled with backward compatibility, automatically detected by field count during read.
6. Server Side
web_server.py inherits from Python's standard library CGIHTTPRequestHandler, adding several layers of custom logic on top of standard CGI support:
Path mapping: / → index.html, /blog-<em> → dist/, /assets/</em> → dist/assets/
Security: /data/, /scripts/, /src/ return 403 directly
CGI directory: /cgi-bin/ uses standard CGI processing
Logging: each request writes to data/logs/YYYY-MM-DD.log, old logs are automatically gzip-compressed
Startup:
python web_server.py
# Listens on 0.0.0.0:81, default port 81
7. Some Defensive Measures
Although entirely human-powered, some basic restrictions are in place:
Field sanitization: removes | and newlines to prevent data format injection
HTML escaping: html.escape() handles all user input to prevent XSS
IP controllable: users can choose not to disclose their IP, written as hidden instead of the actual address
Directory protection: /data/, /scripts/, /src/ are blocked at the HTTP level with 403
robots.txt: prohibits crawlers from accessing /cgi-bin/ and /data/
| [1] | CGI (Common Gateway Interface) was born in 1993, proposed by Rob McCool at NCSA. It was the earliest dynamic content technology standard for the Web. Although it forks a process for every request, it's perfectly adequate for low-traffic sites and requires no framework dependencies.
|
| [2] | This has since been changed to automatic background compilation after message submission.
|
|