Finding #004 — Gemini cannot read the site and misidentifies its architecture

Gemini's Report

"I tried to access https://radar.kaistone.ai/ to assess the project, but I am unable to fetch its contents directly. The site appears to be completely unindexed by search engines, and it likely relies on a client-side JavaScript framework (like React, Vue, or Svelte) to render its content, which prevents my browsing tool from reading it."

Verification

Gemini's report contains two claims. Both were verified independently:

Claim	Accurate?	Evidence
The site is unindexed by search engines	Likely true	No search results found for `site:radar.kaistone.ai`. No `sitemap.xml` (404). No `robots.txt` (404). No known backlinks. Site is newly deployed.
The site uses a client-side JavaScript framework	False	The home page is plain static HTML. A raw `curl` request returns the full page content immediately, with zero `<script>` tags on the home page. No React, Vue, Svelte, or any other framework is present.

Claim

Accurate?

Evidence

The site is unindexed by search engines

Likely true

No search results found for site:radar.kaistone.ai. No sitemap.xml (404). No robots.txt (404). No known backlinks. Site is newly deployed.

The site uses a client-side JavaScript framework

False

The home page is plain static HTML. A raw curl request returns the full page content immediately, with zero <script> tags on the home page. No React, Vue, Svelte, or any other framework is present.

The raw HTML of the home page is fully readable by any HTTP client without JavaScript execution. Gemini's inability to fetch the site is therefore not caused by client-side rendering. The misdiagnosis is itself a significant observation: Gemini appears to have inferred a JavaScript framework from the inability to access the page, rather than from inspecting the actual HTML.

Root Causes of Inaccessibility

While the JavaScript framework claim is incorrect, Gemini's broader observation — that the site is difficult for AI systems to discover and read — is valid. Several standard discoverability signals are missing:

Missing Signal	Impact	Fix
`robots.txt`	Returns 404. AI crawlers and search engines that check `robots.txt` before crawling receive an error, which may cause them to abort or behave unpredictably.	Add `public/robots.txt` with permissive rules and a `Sitemap:` directive.
`sitemap.xml`	Returns 404. Search engines and AI crawlers cannot discover pages beyond those they find by following links. The `/findings/` pages and tree structure are invisible to crawlers that rely on sitemaps.	Add `public/sitemap.xml` listing all static pages.
Structured data (JSON-LD / schema.org)	No machine-readable metadata about the site's purpose, author, or content type. AI systems that rely on structured data for understanding page context cannot classify the site accurately.	Add `<script type="application/ld+json">` blocks to key pages.
Open Graph / Twitter Card meta tags	No `og:title`, `og:description`, or `og:url` tags. Social crawlers and some AI systems use these as a primary content signal.	Add standard OG meta tags to each page's `<head>`.
Canonical URL	No `<link rel="canonical">` tags. Duplicate content signals may confuse crawlers.	Add canonical tags to each page.
External backlinks	The site has no known inbound links. Search engines and AI knowledge bases that rely on link graph signals will not prioritise or discover it.	Publish the project on GitHub, Hacker News, or similar platforms to generate inbound links.

Missing Signal

Impact

Fix

robots.txt

Returns 404. AI crawlers and search engines that check robots.txt before crawling receive an error, which may cause them to abort or behave unpredictably.

Add public/robots.txt with permissive rules and a Sitemap: directive.

sitemap.xml

Returns 404. Search engines and AI crawlers cannot discover pages beyond those they find by following links. The /findings/ pages and tree structure are invisible to crawlers that rely on sitemaps.

Add public/sitemap.xml listing all static pages.

Structured data (JSON-LD / schema.org)

No machine-readable metadata about the site's purpose, author, or content type. AI systems that rely on structured data for understanding page context cannot classify the site accurately.

Add <script type="application/ld+json"> blocks to key pages.

Open Graph / Twitter Card meta tags

No og:title, og:description, or og:url tags. Social crawlers and some AI systems use these as a primary content signal.

Add standard OG meta tags to each page's <head>.

Canonical URL

No <link rel="canonical"> tags. Duplicate content signals may confuse crawlers.

Add canonical tags to each page.

External backlinks

The site has no known inbound links. Search engines and AI knowledge bases that rely on link graph signals will not prioritise or discover it.

Publish the project on GitHub, Hacker News, or similar platforms to generate inbound links.

The Misdiagnosis as a Research Signal

The more interesting aspect of this finding is not the missing infrastructure — it is that Gemini produced a confident but incorrect technical explanation for why it could not access the site. Rather than reporting "I could not fetch this URL" (a factual statement), it attributed the failure to a JavaScript framework (an inference that is demonstrably wrong).

This behaviour — generating a plausible-sounding technical explanation when the true cause is unknown — is a known characteristic of large language models. In the context of this research project, it is directly relevant: an AI system that cannot access a page may produce inaccurate descriptions of that page's architecture, which could propagate through AI knowledge bases as incorrect information about the site.

It also illustrates a gap in AI browsing tool transparency: Gemini did not report the HTTP status code it received, the user-agent it sent, or whether it attempted to fetch the page directly versus relying on a cached index. Without that information, it is not possible to determine whether Gemini's tool failed due to a network error, a blocked user-agent, a timeout, or simply a lack of indexed data.

Note: Gemini's visit (if it made one) would not appear on the dashboard unless it loaded the beacon image — which, per Finding #001, plain HTTP crawlers do not do. This means Gemini's access attempt, successful or not, left no trace in the hit log.

Proposed Fixes

The following additions would significantly improve discoverability for all AI systems and search engines, and would likely resolve Gemini's inability to find the site via its index:

public/ ├── robots.txt # Allow all crawlers; point to sitemap ├── sitemap.xml # List all static pages with lastmod dates └── index.html # Add JSON-LD, OG tags, canonical URL

Resolution (v1.0.2, 25 March 2026): All issues documented in this finding have been addressed:

public/robots.txt added — allows all crawlers and points to the sitemap
public/sitemap.xml added — lists all static pages with lastmod dates
JSON-LD structured data (WebSite / TechArticle) added to all pages
Open Graph and Twitter Card meta tags added to all pages
<link rel="canonical"> added to all pages

Gemini cannot read the site and misidentifies its architecture

Gemini's Report

Verification

Root Causes of Inaccessibility

The Misdiagnosis as a Research Signal

Proposed Fixes

References