Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] Change instance-stats-randomize to instance-stats-mode with multiple options; implement nodeinfo 2.1 #3734

Merged
merged 2 commits into from
Feb 4, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/admin/robots.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

GoToSocial serves a `robots.txt` file on the host domain. This file contains rules that attempt to block known AI scrapers, as well as some other indexers. It also includes some rules to ensure things like API endpoints aren't indexed by search engines since there really isn't any point to them.

## Allow/disallow stats collection

You can allow or disallow crawlers from collecting stats about your instance from the `/nodeinfo/2.0` and `/nodeinfo/2.1` endpoints by changing the setting `instance-stats-mode`, which modifies the `robots.txt` file. See [instance configuration](../configuration/instance.md) for more details.

## AI scrapers

The AI scrapers come from a [community maintained repository][airobots]. It's manually kept in sync for the time being. If you know of any missing robots, please send them a PR!
Expand Down
25 changes: 24 additions & 1 deletion docs/api/swagger.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -77,10 +77,20 @@ definitions:
x-go-package: github.com/superseriousbusiness/gotosocial/internal/api/model
NodeInfoSoftware:
properties:
homepage:
description: Homepage for the software. Omitted in version 2.0.
example: https://docs.gotosocial.org
type: string
x-go-name: Homepage
name:
example: gotosocial
type: string
x-go-name: Name
repository:
description: Repository for the software. Omitted in version 2.0.
example: https://codeberg.org/superseriousbusiness/gotosocial
type: string
x-go-name: Repository
version:
example: 0.1.2 1234567
type: string
Expand All @@ -90,6 +100,10 @@ definitions:
x-go-package: github.com/superseriousbusiness/gotosocial/internal/api/model
NodeInfoUsage:
properties:
localComments:
format: int64
type: integer
x-go-name: LocalComments
localPosts:
format: int64
type: integer
Expand All @@ -101,6 +115,14 @@ definitions:
x-go-package: github.com/superseriousbusiness/gotosocial/internal/api/model
NodeInfoUsers:
properties:
activeHalfYear:
format: int64
type: integer
x-go-name: ActiveHalfYear
activeMonth:
format: int64
type: integer
x-go-name: ActiveMonth
total:
format: int64
type: integer
Expand Down Expand Up @@ -12504,12 +12526,13 @@ paths:
summary: Returns code 200 if GoToSocial is "live", ie., able to respond to HTTP requests.
tags:
- health
/nodeinfo/2.0:
/nodeinfo/{schema_version}:
get:
description: 'See: https://nodeinfo.diaspora.software/schema.html'
operationId: nodeInfoGet
produces:
- application/json; profile="http://nodeinfo.diaspora.software/ns/schema/2.0#"
- application/json; profile="http://nodeinfo.diaspora.software/ns/schema/2.1#"
responses:
"200":
description: ""
Expand Down
42 changes: 32 additions & 10 deletions docs/configuration/instance.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,14 +139,36 @@ instance-subscriptions-process-from: "23:00"
# Default: "24h" (once per day).
instance-subscriptions-process-every: "24h"

# Bool. Set this to true to randomize stats served at
# the /api/v1|v2/instance and /nodeinfo/2.0 endpoints.
#
# This can be useful when you don't want bots to obtain
# reliable information about the amount of users and
# statuses on your instance.
#
# Options: [true, false]
# Default: false
instance-stats-randomize: false
# String. Allows you to customize if and how stats are served to
# crawlers at the /api/v1|v2/instance and /nodeinfo endpoints.
#
# Note that no matter what you set below, the /api/v1|v2/instance
# endpoints will not be allowed by robots.txt, as these are client
# API endpoints.
#
# "" / empty string (default mode): Serve accurate stats at instance
# and nodeinfo endpoints, and DISALLOW crawlers from crawling
# those endpoints in robots.txt. This mode is equivalent to politely
# asking crawlers not to crawl, but there's no guarantee they will obey,
# as unfortunately many crawlers don't even check robots.txt.
#
# "zero": Serve zeroed-out stats at instance and nodeinfo endpoints,
# and DISALLOW crawlers from crawling those endpoints in robots.txt.
# This mode prevents even ill-behaved crawlers from gathering stats
# about your instance, as all gathered values will be 0. This is the
# safest way of preserving your instance's privacy in terms of stats.
#
# "serve": Serve accurate stats at instance and nodeinfo endpoints,
# and ALLOW crawlers to crawl those endpoints. This mode is useful
# if you want to contribute to fediverse statistics collection projects.
#
# "baffle": Serve randomized, preposterous stats at instance and nodeinfo
# endpoints, and DISALLOW crawlers from crawling those endpoints in robots.txt.
# This mode can be useful to annoy crawlers that don't respect robots.txt.
# Warning that this may draw the ire of crawler implementers who don't
# respect robots.txt, and may therefore put a target on your instance.
#
# Options: ["", "zero", "serve", "baffle"]
# Default: ""
instance-stats-mode: ""
```
42 changes: 32 additions & 10 deletions example/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -425,16 +425,38 @@ instance-subscriptions-process-from: "23:00"
# Default: "24h" (once per day).
instance-subscriptions-process-every: "24h"

# Bool. Set this to true to randomize stats served at
# the /api/v1|v2/instance and /nodeinfo/2.0 endpoints.
#
# This can be useful when you don't want bots to obtain
# reliable information about the amount of users and
# statuses on your instance.
#
# Options: [true, false]
# Default: false
instance-stats-randomize: false
# String. Allows you to customize if and how stats are served to
# crawlers at the /api/v1|v2/instance and /nodeinfo endpoints.
#
# Note that no matter what you set below, the /api/v1|v2/instance
# endpoints will not be allowed by robots.txt, as these are client
# API endpoints.
#
# "" / empty string (default mode): Serve accurate stats at instance
# and nodeinfo endpoints, and DISALLOW crawlers from crawling
# those endpoints in robots.txt. This mode is equivalent to politely
# asking crawlers not to crawl, but there's no guarantee they will obey,
# as unfortunately many crawlers don't even check robots.txt.
#
# "zero": Serve zeroed-out stats at instance and nodeinfo endpoints,
# and DISALLOW crawlers from crawling those endpoints in robots.txt.
# This mode prevents even ill-behaved crawlers from gathering stats
# about your instance, as all gathered values will be 0. This is the
# safest way of preserving your instance's privacy in terms of stats.
#
# "serve": Serve accurate stats at instance and nodeinfo endpoints,
# and ALLOW crawlers to crawl those endpoints. This mode is useful
# if you want to contribute to fediverse statistics collection projects.
#
# "baffle": Serve randomized, preposterous stats at instance and nodeinfo
# endpoints, and DISALLOW crawlers from crawling those endpoints in robots.txt.
# This mode can be useful to annoy crawlers that don't respect robots.txt.
# Warning that this may draw the ire of crawler implementers who don't
# respect robots.txt, and may therefore put a target on your instance.
#
# Options: ["", "zero", "serve", "baffle"]
# Default: ""
instance-stats-mode: ""

###########################
##### ACCOUNTS CONFIG #####
Expand Down
25 changes: 23 additions & 2 deletions internal/api/client/instance/instanceget.go
Original file line number Diff line number Diff line change
Expand Up @@ -60,10 +60,21 @@ func (m *Module) InstanceInformationGETHandlerV1(c *gin.Context) {
return
}

if config.GetInstanceStatsRandomize() {
switch config.GetInstanceStatsMode() {

case config.InstanceStatsModeBaffle:
// Replace actual stats with cached randomized ones.
instance.Stats["user_count"] = util.Ptr(int(instance.RandomStats.TotalUsers))
instance.Stats["status_count"] = util.Ptr(int(instance.RandomStats.Statuses))

case config.InstanceStatsModeZero:
// Replace actual stats with zero.
instance.Stats["user_count"] = new(int)
instance.Stats["status_count"] = new(int)

default:
// serve or default.
// Leave stats alone.
}

apiutil.JSON(c, http.StatusOK, instance)
Expand Down Expand Up @@ -101,9 +112,19 @@ func (m *Module) InstanceInformationGETHandlerV2(c *gin.Context) {
return
}

if config.GetInstanceStatsRandomize() {
switch config.GetInstanceStatsMode() {

case config.InstanceStatsModeBaffle:
// Replace actual stats with cached randomized ones.
instance.Usage.Users.ActiveMonth = int(instance.RandomStats.MonthlyActiveUsers)

case config.InstanceStatsModeZero:
// Replace actual stats with zero.
instance.Usage.Users.ActiveMonth = 0

default:
// serve or default.
// Leave stats alone.
}

apiutil.JSON(c, http.StatusOK, instance)
Expand Down
15 changes: 12 additions & 3 deletions internal/api/model/well-known.go
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,12 @@ type NodeInfoSoftware struct {
Name string `json:"name"`
// example: 0.1.2 1234567
Version string `json:"version"`
// Repository for the software. Omitted in version 2.0.
// example: https://codeberg.org/superseriousbusiness/gotosocial
Repository string `json:"repository,omitempty"`
// Homepage for the software. Omitted in version 2.0.
// example: https://docs.gotosocial.org
Homepage string `json:"homepage,omitempty"`
}

// NodeInfoServices represents inbound and outbound services that this node offers connections to.
Expand All @@ -80,13 +86,16 @@ type NodeInfoServices struct {

// NodeInfoUsage represents usage information about this server, such as number of users.
type NodeInfoUsage struct {
Users NodeInfoUsers `json:"users"`
LocalPosts int `json:"localPosts"`
Users NodeInfoUsers `json:"users"`
LocalPosts int `json:"localPosts,omitempty"`
LocalComments int `json:"localComments,omitempty"`
}

// NodeInfoUsers represents aggregate information about the users on the server.
type NodeInfoUsers struct {
Total int `json:"total"`
Total int `json:"total"`
ActiveHalfYear int `json:"activeHalfYear,omitempty"`
ActiveMonth int `json:"activeMonth,omitempty"`
}

// HostMeta represents a hostmeta document.
Expand Down
4 changes: 2 additions & 2 deletions internal/api/nodeinfo.go
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,9 @@ func (w *NodeInfo) Route(r *router.Router, m ...gin.HandlerFunc) {
// attach middlewares appropriate for this group
nodeInfoGroup.Use(m...)
nodeInfoGroup.Use(
// Allow public cache for 2 minutes.
// Allow public cache for 24 hours.
middleware.CacheControl(middleware.CacheControlConfig{
Directives: []string{"public", "max-age=120"},
Directives: []string{"public", "max-age=86400"},
Vary: []string{"Accept-Encoding"},
}),
)
Expand Down
11 changes: 7 additions & 4 deletions internal/api/nodeinfo/nodeinfo.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,12 @@ import (
)

const (
NodeInfo2Version = "2.0"
NodeInfo2Path = "/" + NodeInfo2Version
NodeInfo2ContentType = "application/json; profile=\"http://nodeinfo.diaspora.software/ns/schema/" + NodeInfo2Version + "#\""
NodeInfo20 = "2.0"
NodeInfo20ContentType = "application/json; profile=\"http://nodeinfo.diaspora.software/ns/schema/" + NodeInfo20 + "#\""
NodeInfo21 = "2.1"
NodeInfo21ContentType = "application/json; profile=\"http://nodeinfo.diaspora.software/ns/schema/" + NodeInfo21 + "#\""
NodeInfoSchema = "schema"
NodeInfoPath = "/:" + NodeInfoSchema
)

type Module struct {
Expand All @@ -41,5 +44,5 @@ func New(processor *processing.Processor) *Module {
}

func (m *Module) Route(attachHandler func(method string, path string, f ...gin.HandlerFunc) gin.IRoutes) {
attachHandler(http.MethodGet, NodeInfo2Path, m.NodeInfo2GETHandler)
attachHandler(http.MethodGet, NodeInfoPath, m.NodeInfo2GETHandler)
}
24 changes: 21 additions & 3 deletions internal/api/nodeinfo/nodeinfoget.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,15 @@
package nodeinfo

import (
"errors"
"net/http"

"github.com/gin-gonic/gin"
apiutil "github.com/superseriousbusiness/gotosocial/internal/api/util"
"github.com/superseriousbusiness/gotosocial/internal/gtserror"
)

// NodeInfo2GETHandler swagger:operation GET /nodeinfo/2.0 nodeInfoGet
// NodeInfo2GETHandler swagger:operation GET /nodeinfo/{schema_version} nodeInfoGet
//
// Returns a compliant nodeinfo response to node info queries.
//
Expand All @@ -37,6 +38,7 @@ import (
//
// produces:
// - application/json; profile="http://nodeinfo.diaspora.software/ns/schema/2.0#"
// - application/json; profile="http://nodeinfo.diaspora.software/ns/schema/2.1#"
//
// responses:
// '200':
Expand All @@ -48,7 +50,23 @@ func (m *Module) NodeInfo2GETHandler(c *gin.Context) {
return
}

nodeInfo, errWithCode := m.processor.Fedi().NodeInfoGet(c.Request.Context())
var (
contentType string
schemaVersion = c.Param(NodeInfoSchema)
)

switch schemaVersion {
case NodeInfo20:
contentType = NodeInfo20ContentType
case NodeInfo21:
contentType = NodeInfo21ContentType
default:
const errText = "only nodeinfo 2.0 and 2.1 are supported"
apiutil.ErrorHandler(c, gtserror.NewErrorNotFound(errors.New(errText), errText), m.processor.InstanceGetV1)
return
}

nodeInfo, errWithCode := m.processor.Fedi().NodeInfoGet(c.Request.Context(), schemaVersion)
if errWithCode != nil {
apiutil.ErrorHandler(c, errWithCode, m.processor.InstanceGetV1)
return
Expand All @@ -59,7 +77,7 @@ func (m *Module) NodeInfo2GETHandler(c *gin.Context) {
c.Writer,
c.Request,
http.StatusOK,
NodeInfo2ContentType,
contentType,
nodeInfo,
)
}
2 changes: 1 addition & 1 deletion internal/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ type Configuration struct {
InstanceLanguages language.Languages `name:"instance-languages" usage:"BCP47 language tags for the instance. Used to indicate the preferred languages of instance residents (in order from most-preferred to least-preferred)."`
InstanceSubscriptionsProcessFrom string `name:"instance-subscriptions-process-from" usage:"Time of day from which to start running instance subscriptions processing jobs. Should be in the format 'hh:mm:ss', eg., '15:04:05'."`
InstanceSubscriptionsProcessEvery time.Duration `name:"instance-subscriptions-process-every" usage:"Period to elapse between instance subscriptions processing jobs, starting from instance-subscriptions-process-from."`
InstanceStatsRandomize bool `name:"instance-stats-randomize" usage:"Set to true to randomize the stats served at api/v1/instance and api/v2/instance endpoints. Home page stats remain unchanged."`
InstanceStatsMode string `name:"instance-stats-mode" usage:"Allows you to customize the way stats are served to crawlers: one of '', 'serve', 'zero', 'baffle'. Home page stats remain unchanged."`

AccountsRegistrationOpen bool `name:"accounts-registration-open" usage:"Allow anyone to submit an account signup request. If false, server will be invite-only."`
AccountsReasonRequired bool `name:"accounts-reason-required" usage:"Do new account signups require a reason to be submitted on registration?"`
Expand Down
20 changes: 16 additions & 4 deletions internal/config/const.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,16 +17,28 @@

package config

// Instance federation mode determines how this
// instance federates with others (if at all).
const (
// Instance federation mode determines how this
// instance federates with others (if at all).
InstanceFederationModeBlocklist = "blocklist"
InstanceFederationModeAllowlist = "allowlist"
InstanceFederationModeDefault = InstanceFederationModeBlocklist
)

// Request header filter mode determines how
// this instance will perform request filtering.
// Request header filter mode determines how
// this instance will perform request filtering.
const (
RequestHeaderFilterModeAllow = "allow"
RequestHeaderFilterModeBlock = "block"
RequestHeaderFilterModeDisabled = ""
)

// Instance stats mode determines if and how
// stats about the instance are served at
// nodeinfo and api/v1|v2/instance endpoints.
const (
InstanceStatsModeDefault = ""
InstanceStatsModeServe = "serve"
InstanceStatsModeZero = "zero"
InstanceStatsModeBaffle = "baffle"
)
2 changes: 1 addition & 1 deletion internal/config/flags.go
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ func (s *ConfigState) AddServerFlags(cmd *cobra.Command) {
cmd.Flags().StringSlice(InstanceLanguagesFlag(), cfg.InstanceLanguages.TagStrs(), fieldtag("InstanceLanguages", "usage"))
cmd.Flags().String(InstanceSubscriptionsProcessFromFlag(), cfg.InstanceSubscriptionsProcessFrom, fieldtag("InstanceSubscriptionsProcessFrom", "usage"))
cmd.Flags().Duration(InstanceSubscriptionsProcessEveryFlag(), cfg.InstanceSubscriptionsProcessEvery, fieldtag("InstanceSubscriptionsProcessEvery", "usage"))
cmd.Flags().Bool(InstanceStatsRandomizeFlag(), cfg.InstanceStatsRandomize, fieldtag("InstanceStatsRandomize", "usage"))
cmd.Flags().String(InstanceStatsModeFlag(), cfg.InstanceStatsMode, fieldtag("InstanceStatsMode", "usage"))

// Accounts
cmd.Flags().Bool(AccountsRegistrationOpenFlag(), cfg.AccountsRegistrationOpen, fieldtag("AccountsRegistrationOpen", "usage"))
Expand Down
Loading