Repository Types

Note all repositories are defined in the SolrCloud specification. In order to use a repository in the SolrBackup CRD, it must be defined in the SolrCloud spec. All yaml examples below are SolrCloud resources, not SolrBackup resources.

The Solr-operator currently supports three different backup repository types: Google Cloud Storage ("GCS"), AWS S3 ("S3"), and Volume ("local"). The cloud backup solutions (GCS and S3) are strongly suggested as they are cloud-native backup solutions, however they require newer Solr versions.

Multiple repositories can be defined under the SolrCloud.spec.backupRepositories field. Specify a unique name and single repo type that you want to connect to. Repository-type specific options are found under the object named with the repository-type. Examples can be found below under each repository-type section below. Feel free to mix and match multiple backup repository types to fit your use case (or multiple repositories of the same type):

spec:
  backupRepositories:
    - name: "local-collection-backups-1"
      volume:
        ...
    - name: "gcs-collection-backups-1"
      gcs:
        ...
    - name: "s3-collection-backups-1"
      s3:
        ...
    - name: "s3-collection-backups-2"
      s3:
        ...

GCS Backup Repositories

GCS Repositories store backup data remotely in Google Cloud Storage. This repository type is only supported in deployments that use a Solr version >= 8.9.0.

Each repository must specify the GCS bucket to store data in (the bucket property), and (usually) the name of a Kubernetes secret containing credentials for accessing GCS (the gcsCredentialSecret property). This secret must have a key service-account-key.json whose value is a JSON service account key as described here. If you already have your service account key, this secret can be created using a command like the one below.

kubectl create secret generic <secretName> --from-file=service-account-key.json=<path-to-service-account-key>

In some rare cases (e.g. when deploying in GKE and relying on its Workload Identity feature) explicit credentials are not required to talk to GCS. In these cases, the gcsCredentialSecret property may be omitted.

An example of a SolrCloud spec with only one backup repository, with type GCS:

spec:
  backupRepositories:
    - name: "gcs-backups-1"
      gcs:
        bucket: "backup-bucket" # Required
        gcsCredentialSecret:
          name: "secretName"
          key: "service-account-key.json"
        baseLocation: "/store/here" # Optional

S3 Backup Repositories

S3 Repositories store backup data remotely in AWS S3 (or a supported S3 compatible interface). This repository type is only supported in deployments that use a Solr version >= 8.10.0.

Each repository must specify an S3 bucket and region to store data in (the bucket and region properties). Users will want to setup credentials so that the SolrCloud can connect to the S3 bucket and region, more information can be found in the credentials section.

spec:
  backupRepositories:
    - name: "s3-backups-1"
      s3:
        region: "us-west-2" # Required
        bucket: "backup-bucket" # Required
        credentials: {} # Optional
        proxyUrl: "https://proxy-url-for-s3:3242" # Optional
        endpoint: "https://custom-s3-endpoint:3242" # Optional

Users can also optionally set a proxyUrl or endpoint for the S3Repository. More information on these settings can be found in the Ref Guide.

S3 Credentials

The Solr S3Repository module uses the default credential chain for AWS. All of the options below are designed to be utilized by this credential chain.

There are a few options for giving a SolrCloud the credentials for connecting to S3. The two most straightforward ways can be used via the spec.backupRepositories.s3.credentials property.

spec:
  backupRepositories:
    - name: "s3-backups-1"
      s3:
        region: "us-west-2"
        bucket: "backup-bucket"
        credentials:
          accessKeyIdSecret: # Optional
            name: aws-secrets
            key: access-key-id
          secretAccessKeySecret: # Optional
            name: aws-secrets
            key: secret-access-key
          sessionTokenSecret: # Optional
            name: aws-secrets
            key: session-token
          credentialsFileSecret: # Optional
            name: aws-credentials
            key: credentials

All options in the credentials property are optional, as users can pick and choose which ones to use. If you have all of your credentials setup in an AWS Credentials File, then credentialsFileSecret will be the only property you need to set. However, if you don’t have a credentials file, you will likely need to set at least the accessKeyIdSecret and secretAccessKeySecret properties. All of these options require the referenced Kuberentes secrets to already exist before creating the SolrCloud resource. (If desired, all options can be combined. e.g. Use accessKeyIdSecret and credentialsFileSecret together. The ordering of the default credentials chain will determine which options are used.)

The options in the credentials file above merely set environment variables on the pod, or in the case of credentialsFileSecret use an environment variable and a volume mount. Users can decide to not use the credentials section of the s3 repository config, and instead set these environment variables themselves via spec.customSolrKubeOptions.podOptions.env.

Lastly, if running in EKS, it is possible to add IAM information to Kubernetes serviceAccounts. If this is done correctly, you will only need to specify the serviceAccount for the SolrCloud pods via spec.customSolrKubeOptions.podOptions.serviceAccount.

Because the Solr S3 Repository is using system-wide settings for AWS credentials, you cannot specify different credentials for different S3 repositories. This may be addressed in future Solr versions, but for now use the same credentials for all s3 repos.

Volume Backup Repositories

Volume repositories store backup data "locally" on a Kubernetes volume mounted to each Solr pod. An example of a SolrCloud spec with only one backup repository, with type Volume:

spec:
  backupRepositories:
    - name: "local-collection-backups-1"
      volume:
        source: # Required
          persistentVolumeClaim:
            claimName: "collection-backup-pvc"
        directory: "store/here" # Optional
All persistent volumes used with Volume Repositories must have accessMode: ReadWriteMany set, otherwise the backups will not succeed.