On This Page
Home » Blog » Guestbook Architecture Notes [Switch Theme] [中文]

Guestbook Architecture Notes

Plain text storage + CGI processing


Overall Flow

The complete path after a guestbook submission is as follows:


[Visitor fills form]  →  [CGI script receives]  →  [Writes to guestbook.txt]
       ↓
[Triggers rebuild-all]  →  [build.ps1 reads data]  →  [Injects into sidebar HTML]
       ↓
[Deploy new pages]  →  [Visitor sees message]

Messages don't appear in real time after submission; instead, they are compiled into the static pages during the next build (I feel real-time updates are needed)[2].

1. Front-end Form

The form is written directly in sidebar-left.html, in the left sidebar guestbook area. A standard HTML 3.2 form:


<form action="/cgi-bin/guestbook.py" method="post">
    Name:    <input type="text" name="name">
    Email:   <input type="text" name="email" placeholder="Optional">
    Content: <textarea name="content"></textarea>
    [x] Show IP  <input type="checkbox" name="show_ip" value="yes">
    <input type="submit" value="Send Message">
</form>

Four fields: name (required), email (optional — if provided, the username becomes a mailto link), content (required), show_ip (controls whether the IP is publicly displayed). No CAPTCHA (not considering it for now).

2. CGI Backend

After submission, the form posts to /cgi-bin/guestbook.py, a CGI[1] script for processing.

Input Handling

def parse_form_data():
    method = os.environ.get('REQUEST_METHOD', 'GET')
    if method == 'GET':
        qs = os.environ.get('QUERY_STRING', '')
        form_data = parse_qs(qs)
    else:
        length = int(os.environ.get('CONTENT_LENGTH', 0))
        body = sys.stdin.read(length)
        form_data = parse_qs(body)
    return {k: v[0] for k, v in form_data.items()}

CGI data sources are very primitive: GET requests read from the QUERY_STRING environment variable, POST requests read from stdin. The script parses URL-encoded form data, then performs sanitization:


def sanitize(text):
    text = text.replace('\n', ' ').replace('\r', ' ')  # Remove newlines
    text = text.replace('|', ' ')                       # Remove delimiter
    return html.escape(text)                            # Escape HTML

Removing | is necessary because it's used as the data file delimiter; removing newlines prevents data format corruption; HTML escaping prevents XSS attacks.

3. Client IP Detection

Because the deployment sits behind a reverse proxy, directly reading REMOTE_ADDR would only return the proxy server's IP. So web_server.py resolves the real IP on every request and passes it to the CGI via environment variables:


# In CustomHandler._inject_real_ip()
cf_ip   = headers.get('CF-Connecting-IP')       # Cloudflare
xff     = headers.get('X-Forwarded-For')         # Standard proxy header
real_ip = headers.get('X-Real-IP')               # Commonly used by Nginx
if cf_ip:
    real_client_ip = cf_ip
elif xff:
    real_client_ip = xff.split(',')[0].strip()   # Take the first one
elif real_ip:
    real_client_ip = real_ip
else:
    real_client_ip = client_ip                    # Direct connection fallback
os.environ["REAL_CLIENT_IP"] = real_client_ip     # Inject into CGI environment

Priority: Cloudflare > X-Forwarded-For > X-Real-IP > Direct connection. The CGI script obtains the correct client IP via os.environ.get("REAL_CLIENT_IP").

4. Data Storage

Messages are stored in data/runtime/guestbook.txt, one per line, fields delimited by |:


name|email|content|ip|time|show_ip    ← New format (6 fields)
name|content|ip|time|show_ip          ← Old format (5 fields, backward compatible)

Examples:


DragonRSTER|dragonrster@foxmail.com|Hey, email support is now available|hidden|2026-04-26 18:52:27|no
xintai||This message was sent from win98|180.154.121.226|2026-04-24 23:33:41|yes

If the user chooses not to display their IP, the IP field is written as hidden rather than the actual address. The entire file is essentially a plain-text CSV variant, viewable with any tool. Currently there are over 30 messages, with the earliest dating back to 2020 (a relic from the previous blog).

5. Build-time Injection

Every time rebuild-all.ps1 runs, build.ps1 performs the following:


# Read guestbook.txt
$lines = Get-Content $guestbookFile -Encoding UTF8
# Take the last 20 messages, reverse order (newest on top)
$lastLines = $lines | Select-Object -Last 20
[array]::Reverse($lastLines)
# Generate HTML for each message
foreach ($line in $lastLines) {
    $parts = $line -split '\|'
    # Compatible with both old and new formats...
    # Generates: name (with mailto) + content + IP (optional) + timestamp
}
# Inject at the placeholder in sidebar-left.html
$sidebarLeft = $sidebarLeft -replace "<!-- GUESTBOOK_MESSAGES -->", $messagesHtml

Message content is compiled directly into HTML, written at the <!-- GUESTBOOK_MESSAGES --> placeholder in sidebar-left.html. Display logic:

  • If email is provided → username renders as an <a href="mailto:..."> link
  • If show_ip is yes → IP address is displayed below the message (small gray text)
  • All messages are wrapped in a scrollable container, showing at most 20 entries
  • Since guestbook.txt now supports the email field, both old and new formats are handled with backward compatibility, automatically detected by field count during read.

    6. Server Side

    web_server.py inherits from Python's standard library CGIHTTPRequestHandler, adding several layers of custom logic on top of standard CGI support:

  • Path mapping: /index.html, /blog-<em>dist/, /assets/</em>dist/assets/
  • Security: /data/, /scripts/, /src/ return 403 directly
  • CGI directory: /cgi-bin/ uses standard CGI processing
  • Logging: each request writes to data/logs/YYYY-MM-DD.log, old logs are automatically gzip-compressed
  • Startup:

    
    python web_server.py
    # Listens on 0.0.0.0:81, default port 81
    

    7. Some Defensive Measures

    Although entirely human-powered, some basic restrictions are in place:

  • Field sanitization: removes | and newlines to prevent data format injection
  • HTML escaping: html.escape() handles all user input to prevent XSS
  • IP controllable: users can choose not to disclose their IP, written as hidden instead of the actual address
  • Directory protection: /data/, /scripts/, /src/ are blocked at the HTTP level with 403
  • robots.txt: prohibits crawlers from accessing /cgi-bin/ and /data/

  • [1]CGI (Common Gateway Interface) was born in 1993, proposed by Rob McCool at NCSA. It was the earliest dynamic content technology standard for the Web. Although it forks a process for every request, it's perfectly adequate for low-traffic sites and requires no framework dependencies.
    [2]This has since been changed to automatic background compilation after message submission.


    昵称
    内容

    « April 2026 · What I've Been Up To « Home Script Deep Dive: generate-archive.ps1 »
    Tools
    [Toolbox]

    Latest Posts

    » WD HC620 User Guide
    » Scripts Overview
    » May 2026 ·...
    » IE 5.5 Com...
    » Current Si...

    » Article Archive

    Tags

    Web Development CGI Python Tutorial

    DRAGONRSTER
    CC BY-NC-SA
    © 2004-2026 DragonRSTER • Made with HTML • 本站支持IE5.5+
    资源许可 • 最佳浏览分辨率:1024x768 • 本页最后更新于 2026年05月08日 02:03:49