"I tried to access https://radar.kaistone.ai/ to assess the project, but I am unable to fetch its contents directly. The site appears to be completely unindexed by search engines, and it likely relies on a client-side JavaScript framework (like React, Vue, or Svelte) to render its content, which prevents my browsing tool from reading it."
Gemini's report contains two claims. Both were verified independently:
| Claim | Accurate? | Evidence |
|---|---|---|
| The site is unindexed by search engines | Likely true | No search results found for site:radar.kaistone.ai. No sitemap.xml (404). No robots.txt (404). No known backlinks. Site is newly deployed. |
| The site uses a client-side JavaScript framework | False | The home page is plain static HTML. A raw curl request returns the full page content immediately, with zero <script> tags on the home page. No React, Vue, Svelte, or any other framework is present. |
The raw HTML of the home page is fully readable by any HTTP client without JavaScript execution. Gemini's inability to fetch the site is therefore not caused by client-side rendering. The misdiagnosis is itself a significant observation: Gemini appears to have inferred a JavaScript framework from the inability to access the page, rather than from inspecting the actual HTML.
While the JavaScript framework claim is incorrect, Gemini's broader observation — that the site is difficult for AI systems to discover and read — is valid. Several standard discoverability signals are missing:
| Missing Signal | Impact | Fix |
|---|---|---|
robots.txt |
Returns 404. AI crawlers and search engines that check robots.txt before crawling receive an error, which may cause them to abort or behave unpredictably. |
Add public/robots.txt with permissive rules and a Sitemap: directive. |
sitemap.xml |
Returns 404. Search engines and AI crawlers cannot discover pages beyond those they find by following links. The /findings/ pages and tree structure are invisible to crawlers that rely on sitemaps. |
Add public/sitemap.xml listing all static pages. |
| Structured data (JSON-LD / schema.org) | No machine-readable metadata about the site's purpose, author, or content type. AI systems that rely on structured data for understanding page context cannot classify the site accurately. | Add <script type="application/ld+json"> blocks to key pages. |
| Open Graph / Twitter Card meta tags | No og:title, og:description, or og:url tags. Social crawlers and some AI systems use these as a primary content signal. |
Add standard OG meta tags to each page's <head>. |
| Canonical URL | No <link rel="canonical"> tags. Duplicate content signals may confuse crawlers. |
Add canonical tags to each page. |
| External backlinks | The site has no known inbound links. Search engines and AI knowledge bases that rely on link graph signals will not prioritise or discover it. | Publish the project on GitHub, Hacker News, or similar platforms to generate inbound links. |
The more interesting aspect of this finding is not the missing infrastructure — it is that Gemini produced a confident but incorrect technical explanation for why it could not access the site. Rather than reporting "I could not fetch this URL" (a factual statement), it attributed the failure to a JavaScript framework (an inference that is demonstrably wrong).
This behaviour — generating a plausible-sounding technical explanation when the true cause is unknown — is a known characteristic of large language models. In the context of this research project, it is directly relevant: an AI system that cannot access a page may produce inaccurate descriptions of that page's architecture, which could propagate through AI knowledge bases as incorrect information about the site.
It also illustrates a gap in AI browsing tool transparency: Gemini did not report the HTTP status code it received, the user-agent it sent, or whether it attempted to fetch the page directly versus relying on a cached index. Without that information, it is not possible to determine whether Gemini's tool failed due to a network error, a blocked user-agent, a timeout, or simply a lack of indexed data.
The following additions would significantly improve discoverability for all AI systems and search engines, and would likely resolve Gemini's inability to find the site via its index:
public/ ├── robots.txt # Allow all crawlers; point to sitemap ├── sitemap.xml # List all static pages with lastmod dates └── index.html # Add JSON-LD, OG tags, canonical URL
A minimal robots.txt:
User-agent: * Allow: / Sitemap: https://radar.kaistone.ai/sitemap.xml
A minimal JSON-LD block for the home page:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "WebSite",
"name": "Kaistone Radar",
"url": "https://radar.kaistone.ai/",
"description": "Open research project tracking AI crawler behaviour using a 1x1 beacon."
}
</script>
public/robots.txt added — allows all crawlers and points to the sitemappublic/sitemap.xml added — lists all static pages with lastmod datesWebSite / TechArticle) added to all pages<link rel="canonical"> added to all pages
Finding #001 (beacon image-only): /findings/001-beacon-image-only/
Dashboard: /dashboard/
Findings index: /findings/