Github

Github logo

Github

Configuration

GitHub source connector integrates with the GitHub REST API to extract and synchronize data.

Connects to your GitHub repositories.

It supports syncing repository metadata, directory structures, and code files with configurable filtering options for branches and file types.

Authentication

This connector uses a custom authentication configuration.

Authentication Configuration

GitHub authentication credentials schema.

personal_access_token
strRequired

GitHub PAT with read rights (code, contents, metadata) to the repository

repo_name
strRequired

Repository to sync in owner/repo format (e.g., ‘airweave-ai/airweave’)

Configuration Options

The following configuration options are available for this connector:

Configuration Parameters

Github configuration schema.

branch
str

Specific branch to sync (e.g., ‘main’, ‘development’). If empty, uses the default branch.

Data Models

The following data models are available for this connector:

Schema for GitHub repository entity.

FieldTypeDescription
namestrRepository name
full_namestrFull repository name including owner
descriptionOptional[str]Repository description
default_branchstrDefault branch of the repository
created_atdatetimeCreation timestamp
updated_atdatetimeLast update timestamp
languageOptional[str]Primary language of the repository
forkboolWhether the repository is a fork
sizeintSize of the repository in KB
stars_countOptional[int]Number of stars
watchers_countOptional[int]Number of watchers
forks_countOptional[int]Number of forks
open_issues_countOptional[int]Number of open issues

Schema for GitHub directory entity.

FieldTypeDescription
pathstrPath of the directory within the repository
repo_namestrName of the repository containing this directory
repo_ownerstrOwner of the repository

Schema for GitHub code file entity.

FieldTypeDescription
shastrSHA hash of the file content
pathstrPath of the file within the repository
is_binaryboolFlag indicating if file is binary

Schema for a GitHub repository.

References: https://docs.github.com/en/rest/repos/repos?apiVersion=2022-11-28

FieldTypeDescription
nameOptional[str]Name of the repository.
full_nameOptional[str]Full name (including owner) of the repo.
owner_loginOptional[str]Login/username of the repository owner.
privateboolWhether the repository is private.
descriptionOptional[str]Short description of the repository.
forkboolWhether this repository is a fork.
created_atOptional[datetime]When the repository was created.
updated_atOptional[datetime]When the repository was last updated.
pushed_atOptional[datetime]When the repository was last pushed.
homepageOptional[str]Homepage URL for the repository.
sizeOptional[int]Size of the repository (in kilobytes).
stargazers_countintNumber of stars on this repository.
watchers_countintNumber of people watching this repository.
languageOptional[str]Primary language of the repository.
forks_countintNumber of forks for this repository.
open_issues_countintNumber of open issues on this repository.
topicsList[str]Topics/tags applied to this repo.
default_branchOptional[str]Default branch name of the repository.
archivedboolWhether the repository is archived.
disabledboolWhether the repository is disabled in GitHub.

Schema for a GitHub repository’s content (file, directory, submodule, etc.).

References: https://docs.github.com/en/rest/repos/contents?apiVersion=2022-11-28

FieldTypeDescription
repo_full_nameOptional[str]Full name of the parent repository.
pathOptional[str]Path of the file or directory within the repo.
shaOptional[str]SHA identifier for this content item.
item_typeOptional[str]Type of content. Typically ‘file’, ‘dir’, ‘submodule’, or ‘symlink’.
sizeOptional[int]Size of the content (in bytes).
html_urlOptional[str]HTML URL for viewing this content on GitHub.
download_urlOptional[str]Direct download URL if applicable.
contentOptional[str]File content (base64-encoded) if retrieved via ‘mediaType=raw’ or similar.
encodingOptional[str]Indicates the encoding of the content (e.g., ‘base64’).

Setting up a GitHub Personal Access Token for Airweave

To connect your GitHub repositories to Airweave, you’ll need to create a Personal Access Token (PAT) with the appropriate permissions. This guide walks you through the process of creating and configuring a fine-grained token for use with Airweave.

Step 1: Access Developer Settings in GitHub

Navigate to your GitHub account settings by clicking on your profile picture in the top right corner, then select “Settings”. From there, scroll down to find and click on “Developer settings” in the left sidebar.

Finding Developer Settings in GitHub

Step 2: Create a New Fine-Grained Token

In the Developer settings page, select “Fine-grained tokens” from the left menu, then click on “Generate new token”.

Fine-grained tokens section

Step 3: Configure Your Token

Fill out the token form with the following details:

  1. Token name: Choose a descriptive name like “Airweave Integration”
  2. Expiration: Select an appropriate expiration date (recommended: 1 year for production use)
  3. Repository access: Choose either “All repositories” or select specific repositories you want to connect to Airweave
Creating a new token

Step 4: Set Required Permissions

For the GitHub connector to work properly, you need to grant the following permissions:

Under “Repository permissions”:

  • Set “Contents” to “Read-only” - This allows Airweave to read repository files
Setting content permissions

Step 5: Generate and Save Your Token

After configuring the permissions, scroll to the bottom of the page and click “Generate token”.

Important: GitHub will display your token only once. Make sure to copy and store it in a secure location, as you won’t be able to view it again.

Step 6: Add Your Token to Airweave

When setting up the GitHub connector in Airweave:

  1. Paste your personal access token in the “Personal Access Token” field
  2. Enter the repository name in the format owner/repo (e.g., airweave-ai/airweave)

Your GitHub repository is now connected to Airweave and ready for synchronization.