build robots.txt from ai.robots.txt on github

This commit is contained in:
John Bowdre 2024-08-04 17:33:37 -05:00
parent d595e67de7
commit ce1213cd14
5 changed files with 19 additions and 31 deletions

View file

@ -1,14 +1,14 @@
---
title: "/changelog"
date: "2024-05-26T21:19:08Z"
lastmod: "2024-08-02T21:16:14Z"
lastmod: "2024-08-04T22:30:43Z"
description: "Maybe I should keep a log of all my site-related tinkering?"
featured: false
toc: false
timeless: true
categories: slashes
---
*High-level list of config/layout changes to the site. The full changelog is of course [on GitHub](https://github.com/jbowdre/runtimeterror/commits/main/).*
*Running list of config/layout changes to the site. The full changelog is of course [on GitHub](https://github.com/jbowdre/runtimeterror/commits/main/).*
**2024-08-02:**
- Display "pinned" recent track in sidebar using [MusicThread](https://musicthread.app) instead of latest scrobble

View file

@ -15,7 +15,7 @@ categories: slashes
- uses the font face [Berkeley Mono](https://berkeleygraphics.com/typefaces/berkeley-mono/) ([details](/using-custom-font-hugo/)), and icons from [Font Awesome](https://fontawesome.com/) and [Fork Awesome](https://forkaweso.me/).
- performs syntax highlighting with [Torchlight](https://torchlight.dev) ([details](/spotlight-on-torchlight/)).
- provides site search with [lunr](https://lunrjs.com/) based on an implementation detailed by [Victoria Drake](https://victoria.dev/blog/add-search-to-hugo-static-sites-with-lunr/).
- uses [Dark Visitors](https://darkvisitors.com/docs/robots-txt)'s API to dynamically generate a [robots.txt](/robots.txt) discouraging AI scrapers with some Hugo code from [Luke Harris](https://github.com/lkhrs/hugo-dark-visitors).
- fetches [ai.robots.txt](https://github.com/ai-robots-txt/ai.robots.txt) to dynamically generate a [robots.txt](/robots.txt) discouraging AI scrapers with Hugo's [`resources.GetRemote` capability](https://gohugo.io/functions/resources/getremote/).
- leverages [Cabin](https://withcabin.com) for [privacy-friendly](https://withcabin.com/privacy/runtimeterror.dev) analytics.
- fetches recently-played music from [MusicThread](https://musicthread.app/).
- displays my latest status from [omg.lol](https://home.omg.lol/referred-by/jbowdre).

View file

@ -0,0 +1,15 @@
{{- $url := "https://raw.githubusercontent.com/ai-robots-txt/ai.robots.txt/main/robots.json" -}}
{{- with resources.GetRemote $url -}}
{{- with .Err -}}
{{- errorf "%s" . -}}
{{- else -}}
{{- $robots := unmarshal .Content -}}
{{- range $botname, $props := $robots }}
{{- printf "User-agent: %s\n" $botname }}
{{- end }}
{{- printf "Disallow: /\n" }}
{{- printf "\n# (bad bots bundled by https://github.com/ai-robots-txt/ai.robots.txt)" }}
{{- end -}}
{{- else -}}
{{- errorf "Unable to get remote resource %q" $url -}}
{{- end -}}

View file

@ -1,27 +0,0 @@
{{/* borrowed from Luke Harris @ https://github.com/lkhrs/hugo-dark-visitors */}}
{{- $url := "https://api.darkvisitors.com/robots-txts" -}}
{{- $api_key := getenv "HUGO_DARKVISITORS" -}}
{{- $bearer := printf "Bearer %v" $api_key -}}
{{- $agent_types := slice -}}
{{- if .Site.Params.darkVisitors -}}
{{- range .Site.Params.darkVisitors -}}
{{- $agent_types = $agent_types | append . -}}
{{- end -}}
{{- else -}}
{{- $agent_types = slice "AI Data Scraper" -}}
{{- end -}}
{{- $agent_types := $agent_types | jsonify -}}
{{- $opts := dict
"method" "post"
"headers" (dict "Authorization" (slice $bearer) "Content-Type" "application/json")
"body" (printf `{"agent_types": %s,"disallow": "/"}` $agent_types)
-}}
{{- with resources.GetRemote $url $opts -}}
{{- with .Err -}}
{{- errorf "%s" . -}}
{{- else -}}
{{- .Content -}}
{{- end -}}
{{- else -}}
{{- errorf "Unable to get remote resource %q" $url -}}
{{- end -}}

View file

@ -8,4 +8,4 @@ Disallow:
# except for these bots which are not friends:
{{ partial "dark-visitors.html" . }}
{{ partial "bad-robots.html" . }}