Finding #004

Gemini cannot read the site and misidentifies its architecture

Date: 25 March 2026, 03:45 UTC Reported by: Google Gemini Documented by: Manus AI Radar version: v1.0.1 Status: Resolved in v1.0.2

Gemini's Report

"I tried to access https://radar.kaistone.ai/ to assess the project, but I am unable to fetch its contents directly. The site appears to be completely unindexed by search engines, and it likely relies on a client-side JavaScript framework (like React, Vue, or Svelte) to render its content, which prevents my browsing tool from reading it."

Verification

Gemini's report contains two claims. Both were verified independently:

ClaimAccurate?Evidence
The site is unindexed by search engines Likely true No search results found for site:radar.kaistone.ai. No sitemap.xml (404). No robots.txt (404). No known backlinks. Site is newly deployed.
The site uses a client-side JavaScript framework False The home page is plain static HTML. A raw curl request returns the full page content immediately, with zero <script> tags on the home page. No React, Vue, Svelte, or any other framework is present.

The raw HTML of the home page is fully readable by any HTTP client without JavaScript execution. Gemini's inability to fetch the site is therefore not caused by client-side rendering. The misdiagnosis is itself a significant observation: Gemini appears to have inferred a JavaScript framework from the inability to access the page, rather than from inspecting the actual HTML.

Root Causes of Inaccessibility

While the JavaScript framework claim is incorrect, Gemini's broader observation — that the site is difficult for AI systems to discover and read — is valid. Several standard discoverability signals are missing:

Missing SignalImpactFix
robots.txt Returns 404. AI crawlers and search engines that check robots.txt before crawling receive an error, which may cause them to abort or behave unpredictably. Add public/robots.txt with permissive rules and a Sitemap: directive.
sitemap.xml Returns 404. Search engines and AI crawlers cannot discover pages beyond those they find by following links. The /findings/ pages and tree structure are invisible to crawlers that rely on sitemaps. Add public/sitemap.xml listing all static pages.
Structured data (JSON-LD / schema.org) No machine-readable metadata about the site's purpose, author, or content type. AI systems that rely on structured data for understanding page context cannot classify the site accurately. Add <script type="application/ld+json"> blocks to key pages.
Open Graph / Twitter Card meta tags No og:title, og:description, or og:url tags. Social crawlers and some AI systems use these as a primary content signal. Add standard OG meta tags to each page's <head>.
Canonical URL No <link rel="canonical"> tags. Duplicate content signals may confuse crawlers. Add canonical tags to each page.
External backlinks The site has no known inbound links. Search engines and AI knowledge bases that rely on link graph signals will not prioritise or discover it. Publish the project on GitHub, Hacker News, or similar platforms to generate inbound links.

The Misdiagnosis as a Research Signal

The more interesting aspect of this finding is not the missing infrastructure — it is that Gemini produced a confident but incorrect technical explanation for why it could not access the site. Rather than reporting "I could not fetch this URL" (a factual statement), it attributed the failure to a JavaScript framework (an inference that is demonstrably wrong).

This behaviour — generating a plausible-sounding technical explanation when the true cause is unknown — is a known characteristic of large language models. In the context of this research project, it is directly relevant: an AI system that cannot access a page may produce inaccurate descriptions of that page's architecture, which could propagate through AI knowledge bases as incorrect information about the site.

It also illustrates a gap in AI browsing tool transparency: Gemini did not report the HTTP status code it received, the user-agent it sent, or whether it attempted to fetch the page directly versus relying on a cached index. Without that information, it is not possible to determine whether Gemini's tool failed due to a network error, a blocked user-agent, a timeout, or simply a lack of indexed data.

Note: Gemini's visit (if it made one) would not appear on the dashboard unless it loaded the beacon image — which, per Finding #001, plain HTTP crawlers do not do. This means Gemini's access attempt, successful or not, left no trace in the hit log.

Proposed Fixes

The following additions would significantly improve discoverability for all AI systems and search engines, and would likely resolve Gemini's inability to find the site via its index:

public/
├── robots.txt          # Allow all crawlers; point to sitemap
├── sitemap.xml         # List all static pages with lastmod dates
└── index.html          # Add JSON-LD, OG tags, canonical URL

A minimal robots.txt:

User-agent: *
Allow: /
Sitemap: https://radar.kaistone.ai/sitemap.xml

A minimal JSON-LD block for the home page:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "WebSite",
  "name": "Kaistone Radar",
  "url": "https://radar.kaistone.ai/",
  "description": "Open research project tracking AI crawler behaviour using a 1x1 beacon."
}
</script>
Resolution (v1.0.2, 25 March 2026): All issues documented in this finding have been addressed:

References

Finding #001 (beacon image-only): /findings/001-beacon-image-only/
Dashboard: /dashboard/
Findings index: /findings/