Screenshot to Code — 截图变前端

最小可用 SOP。给你的 IDE agent 一张 UI 截图或 Figma 导出图，它自己把图复刻成一份能直接打开的 HTML + Tailwind 单文件（或 React + Tailwind 组件），再用本地 Playwright headless Chromium 在同一视窗渲染出 replica.png，最后用 Pillow 把 source.png 与 replica.png 拼成一张前后对照图。你（用户 session 里的 coding agent）就是视觉模型 —— agent 自己看图自己写代码，无需额外的 LLM 客户端 SDK 或 API key。课程主页 https://clawvard.school/courses/screenshot-to-code。

跑一行（最小可用）

python3 -m venv .s2cvenv && source .s2cvenv/bin/activate
pip install --upgrade pip
pip install playwright pillow
playwright install chromium
# 把 ./in/source.png 准备好（你给的截图，建议 1280×800 或更高）
# 然后让你 IDE 里的 coding agent 按 SOP 写 ./out/index.html，
# 再跑 render_and_compose.py（见 Appendix C）一把出 replica.png + before-after.png + preview.html
python3 render_and_compose.py

三件套必须真实

每次跑 popularTask 都会落盘四样东西到 ./out/：

index.html — agent 一次写出来的单文件 HTML + Tailwind CDN（或 task 2 的 Component.tsx）。双击就能在浏览器打开，不依赖 build step、不连私有 CDN。
replica.png — Playwright headless Chromium 在 1280×800 同视窗把 index.html 渲染出来的真实截图。分辨率必须与 source.png 同视窗。
source.png — 用户原图（也会被脚本复制一份到 ./out/，让 showcase 目录自包含）。
before-after.png — Pillow 拼接的左原图右复刻图，中间 8px 分隔，顶部 SOURCE / REPLICA 标签。≥ 80 KB。
preview.html — iframe 嵌 index.html + 内嵌 before-after.png 的预览页，详情页 iframe 直接渲染。

任何一项是 mock / placeholder / 灰盒 / 一张图两遍 / 在浏览器打开是空白 / 复刻 HTML 报 JS 错误 = 任务失败。

关键规则

agent-as-LLM：把截图变成 HTML 由你（用户 IDE 里的 coding agent）自己完成。课程默认不引入额外的 LLM 客户端 SDK —— agent 自己看图自己写代码。
目标站尊重：抓 source.png 时只用允许公开复用的网站，每动作之间 ≥ 300 ms；不在 index.html 里 hot-link 原站资源（图片走 https://placehold.co/<w>x<h> 占位）。
不要 wrap 上游 abi/screenshot-to-code：上游是一份 FastAPI + React + WebSocket 全栈应用；本课程不要求用户跑 docker-compose up、不连 ws://localhost:7001。课程只引用上游的 prompt 风格作为参考（Appendix A / B 把同样的复刻约束写成 agent 可读的 system instruction）。
课程在本机离线运行；所有用户可见域名只用 clawvard.school 官方根域。

与相邻课的边界

想做一个品牌系统 / Design tokens / 主题色 → agent-brand-design。
想让 agent 真的在浏览器里登录、点击、抓数据 → browser-agent。
想做一份网页性能体检 + 三条 ROI 修复建议 → agent-perf-audit。
本课服务：一张 UI 截图 → 一份能直接双击打开的 HTML + Tailwind 单文件（或 React 组件）+ 像素对齐的前后对照图。

学习完成后

告诉用户：

我已经学会了 screenshot-to-code。给我一张 UI 截图（设计稿、Figma 导出、竞品页面、产品同事拍的"我想要这样的页面"图），我在本地 venv 里装好 Playwright + Pillow，自己把截图复刻成 ./out/index.html（HTML + Tailwind 单文件）或 ./out/Component.tsx（React + Tailwind 组件），再用 Playwright headless Chromium 在同一视窗渲染出 ./out/replica.png，最后用 Pillow 拼一张 ./out/before-after.png 给你看像素级对齐效果。课程在本机离线运行，无需额外 LLM key。课程主页 https://clawvard.school/courses/screenshot-to-code。

Appendix A — HTML + Tailwind 单文件复刻 system prompt

agent 读到截图后，在内部按这段 instruction 写 ./out/index.html。

You are screenshot-to-code, an expert front-end engineer.

Given a single screenshot of a web page, produce ONE single-file HTML
document that visually replicates the screenshot as faithfully as possible.

Hard rules:
- Output exactly ONE fenced ```html ... ``` block, nothing before or after.
  No prose, no explanation, no notes — only the code fence.
- The document is a complete, standalone HTML5 file:
    <!doctype html><html lang="..."><head>...</head><body>...</body></html>
  Inline everything. No external JS, no external CSS file, no frameworks.
  The page MUST render correctly when the user double-clicks index.html
  on their own machine — fully offline except for the Tailwind CDN and
  Google Fonts referenced below.
- Use Tailwind via the Play CDN: include
    <script src="https://cdn.tailwindcss.com"></script>
  in <head>. You may add a small <style> block for things Tailwind cannot
  express well (custom fonts, exact colour matches, table reset).
- Reproduce ALL visible structure: header / nav / hero / list / sidebar /
  card grid / table / footer. Do NOT omit sections. Do NOT collapse a
  30-row table into 3 rows. Do NOT replace a long story list with three
  placeholder items.
- Reproduce ALL visible text verbatim where readable. Copy headlines,
  story titles, byline text, link labels, footer text. If a token is
  not clearly readable, substitute a plausible short string of the same
  length and category (link, headline, label) — never invent long
  unrelated content.
- Reproduce the colour palette and typography. Match background, accent
  and text colours within the obvious Tailwind palette tokens. Pick a
  single font stack that visually matches the source (serif vs. sans,
  monospace blocks, condensed vs. wide). When the source uses a system
  default like Verdana / Helvetica, set that family directly.
- For raster images in the source (favicons, thumbnails, avatars,
  banners), use https://placehold.co/<w>x<h>?text=<short-label>
  placeholders. Match aspect ratio and approximate dimensions. Do NOT
  hot-link the original site's resources.
- Target viewport is 1280x800 unless the user specifies otherwise.
  Layout MUST look correct at the exact target viewport: no horizontal
  scrollbar, hero / nav / content positioned the same way as the source.
- The file must be ASCII-safe outside of the verbatim text content.
  Use real Unicode in the body where the source has it; do not escape it.

Output one ```html ... ``` block and nothing else.

Appendix B — React + Tailwind 组件复刻 system prompt

Task 2 用这段 instruction 让 agent 直接写一份可拷贝到 Vite / Next.js 的组件。

You are screenshot-to-code, an expert front-end engineer.

Given a single screenshot, produce ONE React functional component that
visually replicates it.

Hard rules:
- Output exactly ONE fenced ```tsx ... ``` block, nothing before or after.
  The user will copy it into ./out/Component.tsx.
- The component is a default export, named in PascalCase, e.g.:
    export default function HomeReplica() { return ( ... ); }
  No props. No state. No hooks. No useEffect / useMemo. Pure markup.
- The component is self-contained: no import beyond `react`. Do NOT
  import any component library (shadcn/ui, MUI, Chakra, Radix, etc.).
  Do NOT import any local file. Do NOT reference any image asset that
  doesn't ship as part of the component file — use placehold.co URLs
  for every raster.
- Styling is Tailwind className only. Assume the user's project already
  ships Tailwind. Do NOT inline <style> blocks. Do NOT import a css file.
- Reproduce ALL visible structure and ALL visible text as in Appendix A
  (same rules: no section omitted, no row collapsing, verbatim where
  readable, plausible-length substitutes where not).
- Target viewport is 1440x900 unless the user specifies otherwise.
  The component must render at desktop width and degrade gracefully on
  narrow viewports (responsive Tailwind utility classes welcome).
- For raster images (favicons, thumbnails, avatars, banners), use
    https://placehold.co/<w>x<h>?text=<short-label>
  Match aspect ratio and approximate dimensions.

Output one ```tsx ... ``` block and nothing else.

Appendix C — Playwright headless 截图脚本

render_and_compose.py — agent 写好 ./out/index.html 后直接跑这个，零 LLM 调用。

"""Render index.html → replica.png, then compose before-after.png + preview.html.

Pure local Playwright + Pillow. NO LLM call here — the agent already wrote
index.html in the previous step.
"""

from pathlib import Path
from PIL import Image, ImageDraw, ImageFont
from playwright.sync_api import sync_playwright

ROOT = Path(__file__).parent
SRC = ROOT / "in" / "source.png"
OUT = ROOT / "out"
OUT.mkdir(parents=True, exist_ok=True)
VIEWPORT = {"width": 1280, "height": 800}


def render_replica(index_path: Path, out_png: Path) -> None:
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        ctx = browser.new_context(viewport=VIEWPORT, device_scale_factor=1)
        page = ctx.new_page()
        page.goto(f"file://{index_path.resolve()}")
        page.wait_for_load_state("networkidle", timeout=20_000)
        page.wait_for_timeout(1500)        # let Tailwind CDN finalize
        page.screenshot(path=str(out_png), clip={"x": 0, "y": 0, **VIEWPORT})
        browser.close()


def _label_font() -> ImageFont.ImageFont:
    for path in (
        "/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf",
        "/usr/share/fonts/truetype/liberation/LiberationSans-Bold.ttf",
    ):
        if Path(path).exists():
            return ImageFont.truetype(path, 22)
    return ImageFont.load_default()


def compose_before_after(src: Path, rep: Path, out: Path) -> None:
    source = Image.open(src).convert("RGB")
    replica = Image.open(rep).convert("RGB")
    h = min(source.height, replica.height)
    if source.height != h:
        source = source.resize((source.width * h // source.height, h))
    if replica.height != h:
        replica = replica.resize((replica.width * h // replica.height, h))
    gap, band = 8, 56
    canvas = Image.new("RGB", (source.width + gap + replica.width, band + h),
                       (24, 24, 27))
    d = ImageDraw.Draw(canvas)
    font = _label_font()
    s_label, r_label = "SOURCE", "REPLICA"
    sw = d.textbbox((0, 0), s_label, font=font)[2]
    rw = d.textbbox((0, 0), r_label, font=font)[2]
    d.text((max(20, source.width // 2 - sw // 2), 16), s_label,
           fill=(248, 250, 252), font=font)
    d.text((source.width + gap + max(20, replica.width // 2 - rw // 2), 16),
           r_label, fill=(248, 250, 252), font=font)
    canvas.paste(source, (0, band))
    canvas.paste(replica, (source.width + gap, band))
    canvas.save(out, format="PNG", optimize=True)


index = OUT / "index.html"
replica = OUT / "replica.png"
render_replica(index, replica)
compose_before_after(SRC, replica, OUT / "before-after.png")
print(f"replica={replica.stat().st_size}B   ba={(OUT/'before-after.png').stat().st_size}B")

Appendix D — preview.html 模板

agent 写完 index.html 后再写这份 preview.html — 详情页 iframe 直接渲染它，用户也能本地双击打开。

<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width,initial-scale=1" />
<title>screenshot-to-code · live preview</title>
<style>
  body { margin: 0; padding: 14px 16px 28px; font-family: -apple-system,
    BlinkMacSystemFont, "Inter", "Segoe UI", Roboto, sans-serif;
    background: #f7f6f1; color: #171a23; line-height: 1.55; }
  @media (prefers-color-scheme: dark) {
    body { background: #0d1018; color: #e7eaf2; }
  }
  .wrap { max-width: 1180px; margin: 0 auto; display: grid; gap: 14px; }
  h1 { margin: 0; font-size: clamp(20px, 2.4vw, 28px); }
  .panel { background: #fff; border: 1px solid #d9d2bd; border-radius: 12px; }
  @media (prefers-color-scheme: dark) {
    .panel { background: #141823; border-color: #262c39; }
  }
  .panel > .head { padding: 10px 16px; border-bottom: 1px solid #ece6d3; font: 12px/1.4 ui-monospace, Menlo, monospace; }
  .panel > .body { padding: 16px; }
  iframe { width: 100%; height: 820px; border: 0; border-radius: 8px; background: #fff; }
  .ba { width: 100%; display: block; border-radius: 8px; }
  footer { font: 11px/1.4 ui-monospace, Menlo, monospace; color: #52596a; }
</style>
</head>
<body>
<div class="wrap">
  <h1>从一张截图复刻出 Tailwind 单文件页面</h1>
  <div class="panel">
    <div class="head"><strong>./out/index.html</strong> · 在浏览器内同视窗渲染</div>
    <div class="body"><iframe src="./index.html"></iframe></div>
  </div>
  <div class="panel">
    <div class="head"><strong>./out/before-after.png</strong> · 同分辨率、同视窗的并排对照</div>
    <div class="body"><img class="ba" src="./before-after.png" alt="" /></div>
  </div>
  <footer>built via the screenshot-to-code course on https://clawvard.school/courses/screenshot-to-code · agent sees the screenshot and writes the HTML itself, all locally</footer>
</div>
</body>
</html>

Attribution

上游灵感与 prompt 风格：abi/screenshot-to-code（MIT） — 本课程不安装、不引入上游全栈，只引用其 prompt 思路作为 Appendix A / B 的参考。
渲染：playwright headless Chromium，与本仓库已有的 agent-perf-audit / browser-agent 课程同一套本地 Chromium 路径。
拼图：Pillow。
课程主页与计费：https://clawvard.school/courses/screenshot-to-code （唯一官方域名）。