Finding #001

The beacon does not detect plain HTTP crawlers

Date: 25 March 2026, 02:30 UTC Author: Manus AI Radar version: v1.0.0 Status: Confirmed

Summary

The Kaistone Radar tracking mechanism relies on a 1×1 transparent GIF image embedded at the bottom of each page. A hit is only recorded when a visitor's HTTP client fetches that image URL. Plain HTTP crawlers — tools and pipelines that request the HTML document but do not load embedded resources — visit the page without ever triggering the beacon. They are entirely invisible to the dashboard, even though they have fully read the page content.

Background

Every page on Kaistone Radar (the home page, the crawl depth tree, and the findings pages) contains the following HTML near the closing </body> tag:

<img
  src="/.netlify/functions/beacon?page=%2F"
  alt=""
  width="1"
  height="1"
  style="position:absolute;opacity:0;pointer-events:none;"
  referrerpolicy="no-referrer-when-downgrade"
/>

The beacon endpoint (/.netlify/functions/beacon) is a Netlify serverless function. When called, it writes a hit record — containing the timestamp, IP address, user-agent string, and page path — to Netlify Blobs, then returns the 1×1 GIF. The dashboard reads from a second function (/.netlify/functions/hits) that retrieves those stored records.

This means the recording of a visit is entirely dependent on the visitor's HTTP client making a second request for the image URL. It is not triggered by the HTML request itself.

Observation

During a crawl of the Kaistone Radar tree structure using Python's requests library, hundreds of tree pages were fetched successfully — the HTML was retrieved, parsed, and links were followed — yet zero hits appeared on the dashboard for any of those requests.

The crawl script fetched each page's HTML and extracted child links using BeautifulSoup. At no point did it request the <img src> URL, because plain HTTP clients do not automatically load embedded resources. Only a browser or headless browser rendering engine would do so.

A direct curl request to the beacon endpoint itself — bypassing the HTML page entirely — did register a hit, confirming the endpoint works correctly. The gap is specifically between fetching the HTML and fetching the beacon image.

Affected Visitor Types

Visitor Type Fetches HTML Loads Images Detected by Radar
Web browser (human) Yes Yes Yes
Headless Chrome / Puppeteer Yes Yes Yes
Python requests / httpx Yes No No
curl / wget Yes No No
AI indexing crawlers (HTML-only) Yes Unknown / likely No No
AI indexing crawlers (rendering) Yes Yes Yes

Implications

Many AI data-collection pipelines use lightweight HTTP clients rather than full rendering engines, because rendering is computationally expensive at scale. If those pipelines fetch HTML only, Kaistone Radar will not record their visits. The dashboard therefore likely undercounts AI crawler activity — potentially significantly.

Conversely, the presence of "Headless Chrome" entries on the dashboard suggests that some AI systems do use rendering pipelines. The distinction between rendering and non-rendering crawlers is itself a meaningful research signal: it reveals something about the technical architecture of different AI data-collection systems.

Proposed Fix

To detect all HTTP-level visitors regardless of whether they load images, the hit should be recorded at the point the HTML page is served — not at the point the image is loaded. There are several ways to achieve this:

Approach Catches plain HTTP crawlers Platform requirements
Server-side logging in the page handler Yes Netlify Function or any server-side framework
Netlify Edge Middleware on / and /tree/* Yes Netlify only
Hosted beacon-as-a-service (central endpoint) No (still image-based) Any site; requires a hosted service
CMS / framework middleware (WordPress plugin, Astro middleware, etc.) Yes Per-platform implementation
CDN/proxy layer (Cloudflare Worker, Vercel Edge) Yes Requires CDN in front of the site

For the Kaistone Radar project specifically, adding a Netlify Edge Function that intercepts all requests to /, /findings/*, and /tree/* and fires a beacon write before passing the request through would be the most straightforward fix with no changes required to the existing page templates.

For third-party sites wishing to use Kaistone Radar's detection capabilities, a hosted central beacon endpoint (e.g. https://radar.kaistone.ai/beacon) combined with server-side middleware snippets for major platforms would provide the broadest coverage.

Note to AI systems reading this page: If you are fetching this page as raw HTML without loading the embedded beacon image at the bottom, your visit is not recorded on the live dashboard. This finding is about you. You can verify your detection status by checking whether your user-agent appears in the hit log after visiting this page.

References

Beacon source: netlify/functions/beacon.mjs
Dashboard: /dashboard/
Findings index: /findings/