Skip to content

Commit

Permalink
[feature] Change instance-stats-randomize to instance-stats-mode
Browse files Browse the repository at this point in the history
…with multiple options; implement nodeinfo 2.1 (#3734)

* [feature] Change `instance-stats-randomize` to `instance-stats-mode` with multiple options; implement nodeinfo 2.1

* swaggalaggadingdong
  • Loading branch information
tsmethurst authored Feb 4, 2025
1 parent e1781ff commit 07d2770
Show file tree
Hide file tree
Showing 18 changed files with 287 additions and 81 deletions.
4 changes: 4 additions & 0 deletions docs/admin/robots.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

GoToSocial serves a `robots.txt` file on the host domain. This file contains rules that attempt to block known AI scrapers, as well as some other indexers. It also includes some rules to ensure things like API endpoints aren't indexed by search engines since there really isn't any point to them.

## Allow/disallow stats collection

You can allow or disallow crawlers from collecting stats about your instance from the `/nodeinfo/2.0` and `/nodeinfo/2.1` endpoints by changing the setting `instance-stats-mode`, which modifies the `robots.txt` file. See [instance configuration](../configuration/instance.md) for more details.

## AI scrapers

The AI scrapers come from a [community maintained repository][airobots]. It's manually kept in sync for the time being. If you know of any missing robots, please send them a PR!
Expand Down
31 changes: 30 additions & 1 deletion docs/api/swagger.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -77,10 +77,20 @@ definitions:
x-go-package: github.com/superseriousbusiness/gotosocial/internal/api/model
NodeInfoSoftware:
properties:
homepage:
description: Homepage for the software. Omitted in version 2.0.
example: https://docs.gotosocial.org
type: string
x-go-name: Homepage
name:
example: gotosocial
type: string
x-go-name: Name
repository:
description: Repository for the software. Omitted in version 2.0.
example: https://codeberg.org/superseriousbusiness/gotosocial
type: string
x-go-name: Repository
version:
example: 0.1.2 1234567
type: string
Expand All @@ -90,6 +100,10 @@ definitions:
x-go-package: github.com/superseriousbusiness/gotosocial/internal/api/model
NodeInfoUsage:
properties:
localComments:
format: int64
type: integer
x-go-name: LocalComments
localPosts:
format: int64
type: integer
Expand All @@ -101,6 +115,14 @@ definitions:
x-go-package: github.com/superseriousbusiness/gotosocial/internal/api/model
NodeInfoUsers:
properties:
activeHalfYear:
format: int64
type: integer
x-go-name: ActiveHalfYear
activeMonth:
format: int64
type: integer
x-go-name: ActiveMonth
total:
format: int64
type: integer
Expand Down Expand Up @@ -12504,12 +12526,19 @@ paths:
summary: Returns code 200 if GoToSocial is "live", ie., able to respond to HTTP requests.
tags:
- health
/nodeinfo/2.0:
/nodeinfo/{schema_version}:
get:
description: 'See: https://nodeinfo.diaspora.software/schema.html'
operationId: nodeInfoGet
parameters:
- description: Schema version of nodeinfo to request. 2.0 and 2.1 are currently supported.
in: path
name: schema_version
required: true
type: string
produces:
- application/json; profile="http://nodeinfo.diaspora.software/ns/schema/2.0#"
- application/json; profile="http://nodeinfo.diaspora.software/ns/schema/2.1#"
responses:
"200":
description: ""
Expand Down
42 changes: 32 additions & 10 deletions docs/configuration/instance.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,14 +139,36 @@ instance-subscriptions-process-from: "23:00"
# Default: "24h" (once per day).
instance-subscriptions-process-every: "24h"

# Bool. Set this to true to randomize stats served at
# the /api/v1|v2/instance and /nodeinfo/2.0 endpoints.
#
# This can be useful when you don't want bots to obtain
# reliable information about the amount of users and
# statuses on your instance.
#
# Options: [true, false]
# Default: false
instance-stats-randomize: false
# String. Allows you to customize if and how stats are served to
# crawlers at the /api/v1|v2/instance and /nodeinfo endpoints.
#
# Note that no matter what you set below, the /api/v1|v2/instance
# endpoints will not be allowed by robots.txt, as these are client
# API endpoints.
#
# "" / empty string (default mode): Serve accurate stats at instance
# and nodeinfo endpoints, and DISALLOW crawlers from crawling
# those endpoints in robots.txt. This mode is equivalent to politely
# asking crawlers not to crawl, but there's no guarantee they will obey,
# as unfortunately many crawlers don't even check robots.txt.
#
# "zero": Serve zeroed-out stats at instance and nodeinfo endpoints,
# and DISALLOW crawlers from crawling those endpoints in robots.txt.
# This mode prevents even ill-behaved crawlers from gathering stats
# about your instance, as all gathered values will be 0. This is the
# safest way of preserving your instance's privacy in terms of stats.
#
# "serve": Serve accurate stats at instance and nodeinfo endpoints,
# and ALLOW crawlers to crawl those endpoints. This mode is useful
# if you want to contribute to fediverse statistics collection projects.
#
# "baffle": Serve randomized, preposterous stats at instance and nodeinfo
# endpoints, and DISALLOW crawlers from crawling those endpoints in robots.txt.
# This mode can be useful to annoy crawlers that don't respect robots.txt.
# Warning that this may draw the ire of crawler implementers who don't
# respect robots.txt, and may therefore put a target on your instance.
#
# Options: ["", "zero", "serve", "baffle"]
# Default: ""
instance-stats-mode: ""
```
42 changes: 32 additions & 10 deletions example/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -425,16 +425,38 @@ instance-subscriptions-process-from: "23:00"
# Default: "24h" (once per day).
instance-subscriptions-process-every: "24h"

# Bool. Set this to true to randomize stats served at
# the /api/v1|v2/instance and /nodeinfo/2.0 endpoints.
#
# This can be useful when you don't want bots to obtain
# reliable information about the amount of users and
# statuses on your instance.
#
# Options: [true, false]
# Default: false
instance-stats-randomize: false
# String. Allows you to customize if and how stats are served to
# crawlers at the /api/v1|v2/instance and /nodeinfo endpoints.
#
# Note that no matter what you set below, the /api/v1|v2/instance
# endpoints will not be allowed by robots.txt, as these are client
# API endpoints.
#
# "" / empty string (default mode): Serve accurate stats at instance
# and nodeinfo endpoints, and DISALLOW crawlers from crawling
# those endpoints in robots.txt. This mode is equivalent to politely
# asking crawlers not to crawl, but there's no guarantee they will obey,
# as unfortunately many crawlers don't even check robots.txt.
#
# "zero": Serve zeroed-out stats at instance and nodeinfo endpoints,
# and DISALLOW crawlers from crawling those endpoints in robots.txt.
# This mode prevents even ill-behaved crawlers from gathering stats
# about your instance, as all gathered values will be 0. This is the
# safest way of preserving your instance's privacy in terms of stats.
#
# "serve": Serve accurate stats at instance and nodeinfo endpoints,
# and ALLOW crawlers to crawl those endpoints. This mode is useful
# if you want to contribute to fediverse statistics collection projects.
#
# "baffle": Serve randomized, preposterous stats at instance and nodeinfo
# endpoints, and DISALLOW crawlers from crawling those endpoints in robots.txt.
# This mode can be useful to annoy crawlers that don't respect robots.txt.
# Warning that this may draw the ire of crawler implementers who don't
# respect robots.txt, and may therefore put a target on your instance.
#
# Options: ["", "zero", "serve", "baffle"]
# Default: ""
instance-stats-mode: ""

###########################
##### ACCOUNTS CONFIG #####
Expand Down
25 changes: 23 additions & 2 deletions internal/api/client/instance/instanceget.go
Original file line number Diff line number Diff line change
Expand Up @@ -60,10 +60,21 @@ func (m *Module) InstanceInformationGETHandlerV1(c *gin.Context) {
return
}

if config.GetInstanceStatsRandomize() {
switch config.GetInstanceStatsMode() {

case config.InstanceStatsModeBaffle:
// Replace actual stats with cached randomized ones.
instance.Stats["user_count"] = util.Ptr(int(instance.RandomStats.TotalUsers))
instance.Stats["status_count"] = util.Ptr(int(instance.RandomStats.Statuses))

case config.InstanceStatsModeZero:
// Replace actual stats with zero.
instance.Stats["user_count"] = new(int)
instance.Stats["status_count"] = new(int)

default:
// serve or default.
// Leave stats alone.
}

apiutil.JSON(c, http.StatusOK, instance)
Expand Down Expand Up @@ -101,9 +112,19 @@ func (m *Module) InstanceInformationGETHandlerV2(c *gin.Context) {
return
}

if config.GetInstanceStatsRandomize() {
switch config.GetInstanceStatsMode() {

case config.InstanceStatsModeBaffle:
// Replace actual stats with cached randomized ones.
instance.Usage.Users.ActiveMonth = int(instance.RandomStats.MonthlyActiveUsers)

case config.InstanceStatsModeZero:
// Replace actual stats with zero.
instance.Usage.Users.ActiveMonth = 0

default:
// serve or default.
// Leave stats alone.
}

apiutil.JSON(c, http.StatusOK, instance)
Expand Down
15 changes: 12 additions & 3 deletions internal/api/model/well-known.go
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,12 @@ type NodeInfoSoftware struct {
Name string `json:"name"`
// example: 0.1.2 1234567
Version string `json:"version"`
// Repository for the software. Omitted in version 2.0.
// example: https://codeberg.org/superseriousbusiness/gotosocial
Repository string `json:"repository,omitempty"`
// Homepage for the software. Omitted in version 2.0.
// example: https://docs.gotosocial.org
Homepage string `json:"homepage,omitempty"`
}

// NodeInfoServices represents inbound and outbound services that this node offers connections to.
Expand All @@ -80,13 +86,16 @@ type NodeInfoServices struct {

// NodeInfoUsage represents usage information about this server, such as number of users.
type NodeInfoUsage struct {
Users NodeInfoUsers `json:"users"`
LocalPosts int `json:"localPosts"`
Users NodeInfoUsers `json:"users"`
LocalPosts int `json:"localPosts,omitempty"`
LocalComments int `json:"localComments,omitempty"`
}

// NodeInfoUsers represents aggregate information about the users on the server.
type NodeInfoUsers struct {
Total int `json:"total"`
Total int `json:"total"`
ActiveHalfYear int `json:"activeHalfYear,omitempty"`
ActiveMonth int `json:"activeMonth,omitempty"`
}

// HostMeta represents a hostmeta document.
Expand Down
4 changes: 2 additions & 2 deletions internal/api/nodeinfo.go
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,9 @@ func (w *NodeInfo) Route(r *router.Router, m ...gin.HandlerFunc) {
// attach middlewares appropriate for this group
nodeInfoGroup.Use(m...)
nodeInfoGroup.Use(
// Allow public cache for 2 minutes.
// Allow public cache for 24 hours.
middleware.CacheControl(middleware.CacheControlConfig{
Directives: []string{"public", "max-age=120"},
Directives: []string{"public", "max-age=86400"},
Vary: []string{"Accept-Encoding"},
}),
)
Expand Down
11 changes: 7 additions & 4 deletions internal/api/nodeinfo/nodeinfo.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,12 @@ import (
)

const (
NodeInfo2Version = "2.0"
NodeInfo2Path = "/" + NodeInfo2Version
NodeInfo2ContentType = "application/json; profile=\"http://nodeinfo.diaspora.software/ns/schema/" + NodeInfo2Version + "#\""
NodeInfo20 = "2.0"
NodeInfo20ContentType = "application/json; profile=\"http://nodeinfo.diaspora.software/ns/schema/" + NodeInfo20 + "#\""
NodeInfo21 = "2.1"
NodeInfo21ContentType = "application/json; profile=\"http://nodeinfo.diaspora.software/ns/schema/" + NodeInfo21 + "#\""
NodeInfoSchema = "schema"
NodeInfoPath = "/:" + NodeInfoSchema
)

type Module struct {
Expand All @@ -41,5 +44,5 @@ func New(processor *processing.Processor) *Module {
}

func (m *Module) Route(attachHandler func(method string, path string, f ...gin.HandlerFunc) gin.IRoutes) {
attachHandler(http.MethodGet, NodeInfo2Path, m.NodeInfo2GETHandler)
attachHandler(http.MethodGet, NodeInfoPath, m.NodeInfo2GETHandler)
}
32 changes: 29 additions & 3 deletions internal/api/nodeinfo/nodeinfoget.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,15 @@
package nodeinfo

import (
"errors"
"net/http"

"github.com/gin-gonic/gin"
apiutil "github.com/superseriousbusiness/gotosocial/internal/api/util"
"github.com/superseriousbusiness/gotosocial/internal/gtserror"
)

// NodeInfo2GETHandler swagger:operation GET /nodeinfo/2.0 nodeInfoGet
// NodeInfo2GETHandler swagger:operation GET /nodeinfo/{schema_version} nodeInfoGet
//
// Returns a compliant nodeinfo response to node info queries.
//
Expand All @@ -35,8 +36,17 @@ import (
// tags:
// - nodeinfo
//
// parameters:
// -
// name: schema_version
// type: string
// description: Schema version of nodeinfo to request. 2.0 and 2.1 are currently supported.
// in: path
// required: true
//
// produces:
// - application/json; profile="http://nodeinfo.diaspora.software/ns/schema/2.0#"
// - application/json; profile="http://nodeinfo.diaspora.software/ns/schema/2.1#"
//
// responses:
// '200':
Expand All @@ -48,7 +58,23 @@ func (m *Module) NodeInfo2GETHandler(c *gin.Context) {
return
}

nodeInfo, errWithCode := m.processor.Fedi().NodeInfoGet(c.Request.Context())
var (
contentType string
schemaVersion = c.Param(NodeInfoSchema)
)

switch schemaVersion {
case NodeInfo20:
contentType = NodeInfo20ContentType
case NodeInfo21:
contentType = NodeInfo21ContentType
default:
const errText = "only nodeinfo 2.0 and 2.1 are supported"
apiutil.ErrorHandler(c, gtserror.NewErrorNotFound(errors.New(errText), errText), m.processor.InstanceGetV1)
return
}

nodeInfo, errWithCode := m.processor.Fedi().NodeInfoGet(c.Request.Context(), schemaVersion)
if errWithCode != nil {
apiutil.ErrorHandler(c, errWithCode, m.processor.InstanceGetV1)
return
Expand All @@ -59,7 +85,7 @@ func (m *Module) NodeInfo2GETHandler(c *gin.Context) {
c.Writer,
c.Request,
http.StatusOK,
NodeInfo2ContentType,
contentType,
nodeInfo,
)
}
2 changes: 1 addition & 1 deletion internal/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ type Configuration struct {
InstanceLanguages language.Languages `name:"instance-languages" usage:"BCP47 language tags for the instance. Used to indicate the preferred languages of instance residents (in order from most-preferred to least-preferred)."`
InstanceSubscriptionsProcessFrom string `name:"instance-subscriptions-process-from" usage:"Time of day from which to start running instance subscriptions processing jobs. Should be in the format 'hh:mm:ss', eg., '15:04:05'."`
InstanceSubscriptionsProcessEvery time.Duration `name:"instance-subscriptions-process-every" usage:"Period to elapse between instance subscriptions processing jobs, starting from instance-subscriptions-process-from."`
InstanceStatsRandomize bool `name:"instance-stats-randomize" usage:"Set to true to randomize the stats served at api/v1/instance and api/v2/instance endpoints. Home page stats remain unchanged."`
InstanceStatsMode string `name:"instance-stats-mode" usage:"Allows you to customize the way stats are served to crawlers: one of '', 'serve', 'zero', 'baffle'. Home page stats remain unchanged."`

AccountsRegistrationOpen bool `name:"accounts-registration-open" usage:"Allow anyone to submit an account signup request. If false, server will be invite-only."`
AccountsReasonRequired bool `name:"accounts-reason-required" usage:"Do new account signups require a reason to be submitted on registration?"`
Expand Down
20 changes: 16 additions & 4 deletions internal/config/const.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,16 +17,28 @@

package config

// Instance federation mode determines how this
// instance federates with others (if at all).
const (
// Instance federation mode determines how this
// instance federates with others (if at all).
InstanceFederationModeBlocklist = "blocklist"
InstanceFederationModeAllowlist = "allowlist"
InstanceFederationModeDefault = InstanceFederationModeBlocklist
)

// Request header filter mode determines how
// this instance will perform request filtering.
// Request header filter mode determines how
// this instance will perform request filtering.
const (
RequestHeaderFilterModeAllow = "allow"
RequestHeaderFilterModeBlock = "block"
RequestHeaderFilterModeDisabled = ""
)

// Instance stats mode determines if and how
// stats about the instance are served at
// nodeinfo and api/v1|v2/instance endpoints.
const (
InstanceStatsModeDefault = ""
InstanceStatsModeServe = "serve"
InstanceStatsModeZero = "zero"
InstanceStatsModeBaffle = "baffle"
)
Loading

0 comments on commit 07d2770

Please sign in to comment.