What is GitHub? More than Git version control in the cloud

GitHub is at heart a Git repository hosting service, i.e. a cloud-based source code management or version control system, but that’s just the beginning. In addition, GitHub implements features for code review (pull requests, diffs, and review requests), project management (including issue tracking and assignment), integrations with other developer tools, team management, documentation, and “social coding.”

Something like a social networking site for programmers, GitHub is an open environment where programmers can freely share and collaborate (even ad hoc) on open source code. GitHub makes it easy to find useful code, copy repositories for your own use, and submit changes to others’ projects. As a result, GitHub has become home to virtually every open source project of any importance, and countless other projects besides.

Whenever I want to explore an open source project, I start by searching for the project name. Once I find the project website, I look for its code repository link, and nine times out of ten I wind up on GitHub.

Git version control

Before we can understand what GitHub does and how GitHub works, we need to understand Git. Git is a distributed version control system, originally written by Linus Torvalds in 2005 for and with help from the Linux kernel community. I’m not here to sell you on Git, so I’ll spare you the spiel about how fast and small and flexible and popular it is, but you should know that when you clone a Git repository (“repo,” for short) you get the entire version history on your own computer, not just a snapshot from one branch at one time.

Git started as a command-line tool, befitting its origin in the Linux kernel community. You can still use the Git command line, if you like, but you don’t have to. Instead of or in addition to the command line, you can use the free GitHub client on Windows or Mac, or any of a number of other GUIs for Git, or a code editor that integrates with Git. All of these options are initially easier to use than the command line. The Git command line comes pre-installed on most Mac and Linux systems and supports all operations; the GUIs typically support a frequently used subset of Git operations.

Git is different from older version control systems such as Subversion in that it is distributed rather than centralized. It’s also quite fast, especially since most operations happen on your local repository. Nevertheless, using Git adds a level of complexity: committing code to your local repository and pushing your commits to a remote repository are separate steps. When teams forget this (or weren’t taught about it) it can lead to situations where different developers are working with code bases that have diverged.

A remote Git repository can be on a server, or it can be on another developer’s machine. That enables many possible workflows for teams. One common workflow involves using a server repository as the “blessed” repository, to which only reviewed, well-tested code is committed, often through a pull request issued from a developer’s repository.

GitHub functionality

I’ve already noted that GitHub is a cloud-based Git server for code hosting and social coding, and that it implements features for code review (pull requests, diffs, and review requests), project management (including issue tracking and assignment), integrations with other developer tools, team management, and documentation.

One innovation in social coding from GitHub is commit co-authors, which you accomplish by adding one or more “co-authored-by” trailers to the end of a commit message. This mechanism doesn’t affect the repo core per se, and doesn’t change how the repo looks on plain Git, but on GitHub the chrome will show multiple committers in the commit list, and give each co-author credit in his or her contribution graph.

If you wish, you can extend GitHub using the GitHub GraphQL API. This is a significant improvement over GitHub’s previous API, which was based on REST calls.

Stars, forks, and watches

Every GitHub repository shows counts of its stars, forks, and watches. All reflect the popularity of the repo in some way, but each has a different function.

Starring a repo does several things. A star makes it easy to find a repository or topic again later. You can see all the repositories and topics you have starred by going to your stars page. A star also shows appreciation to the repository maintainer. GitHub ranks repos by their number of stars, so stars are a primary measure of repo popularity.

Forks are important to open source development. A fork is a new GitHub repository that shares code and visibility settings with the original “upstream” repo, and typically gives a coder who doesn’t have commit privilege to a repo the ability to modify its code. To merge changes from a fork to the upstream repo, you submit a pull request. You’ll also hear about forks in the context of creating new open source projects from old ones for various reasons, often because of disputes in the direction of a project or changes to its license.

Watching a repo subscribes you to updates for activity in that repo. You can customize and filter your watches. You can also control whether GitHub will send you notifications by email. If you are participating in a conversation in a repo, you are subscribed to notifications about that conversation; you can control how you are notified, and you can unsubscribe whenever you wish.

Follows

In addition to watching a repo, you can follow people and organizations on GitHub to receive notifications about their activity and discover projects in their communities. An organization’s activity includes its new discussions, sponsorships, and repositories.

You can also view the list of users someone follows, and the list of users that follow someone. Follows are especially useful when you want to find out what’s happening in a specific area and you know the people and organizations who are active in that area. For example, to find out what’s going on with wasmCloud, you could follow the wasmCloud organization and some of the primary contributors to wasmCloud/wasmCloud.

Sections of a repository

GitHub repos have eight sections listed across the top, plus seven sections listed down the right column. The sections listed at the right tend to be self-explanatory, so we’ll concentrate on the ones shown across the top.

A GitHub repository. Note the sections listed across the top and down the right column. This is the main code repo for the Python version of LangChain.

IDG

Code

You’re probably familiar with the code section of GitHub repositories if you’ve ever done any software development. With modern high-speed internet, navigating online repos is almost as responsive as navigating code bases on your own computer. You can even work with the code in an online editor or a “codespace” (see below), if you have permission to do so.

Issues

GitHub issues is essentially a bug- and suggestion-oriented discussion and tracking board for repositories. Some repos use GitHub Issues to great effect for bug tracking, feature requests, and clarifications of pull requests. Open-source repos can also use GitHub Issues to advertise areas that need contributors. Projects sometimes use other systems for bug tracking instead of GitHub Issues, such as Jira, Bugzilla, or Redmine.

Pull requests

The pull requests section of a repo basically lists the pull requests that have been submitted for approval. A pull request is a proposal to merge a set of changes from one branch of code (which may be in a forked repo for projects with a small number of committers) into another. There is a “new pull request” button at the top right of this page.

If the repo runs automatic checks (typically via GitHub Actions) on pull requests, these are usually shown as comments to the pull request listings in this section. If the repo has a limited group of committers, they can use this section of the repo as a guide to deciding which pull requests they should review for possible merging.

Discussions

GitHub Discussions is a collaborative communication forum for the community around an open source or internal project. It is separate from GitHub Issues, although there is some overlap in usage between the two. GitHub Discussions is more of a support and general discussion board focused on the users, while GitHub Issues is more of a bug- and suggestion-tracking board focused on the software developers. Projects that don’t use GitHub Discussions might instead use Discord or vBulletin.

Actions

GitHub Actions is a CI/CD (continuous integration and continuous delivery) or devops system; it allows you to automate your build, test, and deployment pipeline. It can also be used for other purposes, such as triaging GitHub Issues by assigning labels when new issues are added.

Events in GitHub Actions, such as creating a pull request, opening an issue, or pushing a commit to a repository, trigger workflows, which run one or more jobs and may include other workflows. A job is a set of steps in a workflow that is executed on the same runner. Each step is either a shell script that will be executed, or an action that will be run. Steps are executed in order and are dependent on each other. An action is a custom application for the GitHub Actions platform that performs a complex but frequently repeated task.

A runner is a server that runs your workflows when they’re triggered. Each workflow run executes in a fresh, newly-provisioned virtual machine, which may use Ubuntu Linux, Microsoft Windows, or macOS.

Projects that don’t use GitHub Actions might instead use Jenkins, CircleCI, Bamboo, Azure Pipelines, or other CI/CD products. Despite the alternatives, GitHub Actions is quite popular.

Projects

A GitHub project is an adaptable spreadsheet, task board, and road map that integrates with your issues and pull requests on GitHub to help you plan and track your work effectively.

Security

Public repos can publish security policies and advisories in their Security tab. Vulnerability reporters can collaborate privately to fix a vulnerability in a temporary fork. GitHub can assign Common Vulnerabilities and Exposures (CVE) numbers for new CVEs; you can also provide an existing CVE identification number when you report a security vulnerability.

Insights

The Insights tab of a repository can help you understand the state and health of the repository and its community through charts, graphs, and lists. For example, the code frequency chart below shows code additions and deletions per week. Peaks in this chart typically reflect a new release. An extended lack of activity at the end of a code frequency chart can point to a moribund project. This repository (langchain) looks active and healthy.

GitHub Insights screenshot

The code frequency chart for langchain-ai/langchain. Green areas are additions to the code, and red areas are deletions.

IDG

GitHub Codespaces

GitHub Codespaces provides a development environment in a Linux-based Docker container, running on a virtual machine hosted by GitHub in the Microsoft Azure cloud. By default, Codespaces use an Ubuntu Linux image with popular languages and tools installed, but you can customize your dev container environment for specific Linux distros, languages, frameworks, scripts, and tools. Codespaces currently support four code editors: Visual Studio Code Desktop, Visual Studio Code Browser, JetBrains IDEs (there are multiple editions of this for different languages, but they all use the same plug-in for GitHub Codespaces), and command line via SSH.

Alternatives to GitHub Codespaces include Gitpod, Coder, and Visual Studio Online, among others. You can see a green button to launch Gitpod in the screenshot of the LangChain code repo above, next to the green Code button that can, among other things, launch a Codespace.

GitHub Packages

GitHub Packages is a platform for hosting and managing packages, including containers and other dependencies. You can integrate GitHub Packages with GitHub APIs, GitHub Actions, and webhooks to create an end-to-end devops workflow that includes your code, continuous integration, and deployment solutions. GitHub Packages currently supports half a dozen package registries, such as NPM, Maven, and NuGet.

Git Large File Storage

GitHub has always limited the size of the files you can check into a repository. There’s a warning for files bigger than 50 MiB, and a hard block for files bigger than 100 MiB. There may be warnings for repositories bigger than 1 GB.

Git Large File Storage allows you to handle large files by storing references to the file in the repo, and storing the actual file elsewhere. To enable this in your local Git repos, you need to install Git LFS. Even using LFS, there are limits on the maximum file size; these depend on your GitHub plan and range from 2 GB for individual accounts to 5 GB for enterprise accounts.

Why limit the file size if the file is not actually stored in the repo? GitHub still needs to download the large file when you clone the repo.

GitHub releases and tags

GitHub supports releases, which are deployable packaged software iterations. Releases are based on Git tags. GitHub will automatically include links to download a zip file and a tarball containing the contents of the repository at the point of the tag’s creation. You can create release notes manually, or automatically using a template. You can include links to binary files in a release, but the binaries themselves often exceed the 2 GiB file size limit.

GitHub Copilot

GitHub Copilot is an AI pair programmer. It’s available by subscription for individuals and businesses. GitHub Copilot extensions are currently available for five editing environments. These are Visual Studio Code, Visual Studio, JetBrains IDEs, Vim/Neovim, and Azure Data Studio.

GitHub Copilot proper can perform code generation and code completion in your IDE. GitHub Copilot Chat lets you ask questions and receive answers to coding-related topics on GitHub.com and in supported IDEs.

You can’t depend on code produced by GitHub Copilot: Sometimes it’s good, and sometimes it doesn’t even compile. You should treat GitHub Copilot like a junior programmer with a drinking problem, and review and test its code carefully.

GitHub Enterprise

GitHub.com is a cloud hosting service that can handle a range of account types: free developer accounts, teams ($4 per user per month), and enterprises ($21 per user per month). Should you wish to run GitHub Enterprise on-premises or in your own cloud instance on AWS, Microsoft Azure, or Google Cloud Platform, you can.

GitHub vs. Bitbucket

GitHub isn’t the only hosted enhanced Git service, and GitHub Enterprise isn’t the only on-premises product for companies. Atlassian Bitbucket competes with both of them, with slightly lower pricing and with a free five-member team level that includes unlimited private repos and the use of Bitbucket Pipelines for continuous integration. GitHub is a more popular site for open source projects and it has a much larger pool of open source developers. Bitbucket’s pricing used to be more favorable for small startups. Now that GitHub allows unlimited private repos on free and team accounts, that’s no longer the case.

GitHub vs. GitLab

GitLab competes with both GitHub and Bitbucket, both hosted and on-premises. On the surface, GitLab appears to have more lifecycle functionality than the others, but the difference from Bitbucket mostly disappears if you include Jira when you evaluate Bitbucket. GitLab offers Gold-plan cloud features to open-source projects for free, but that additional functionality doesn’t really compensate for the larger open-source developer community on GitHub.

GitHub Desktop

GitHub Desktop, shown below, makes it easy to manage your GitHub.com and GitHub Enterprise repositories. While it doesn’t implement all the features of the Git command line and the GitHub web GUI, it does implement all the operations you’ll do on a daily basis from your desktop while contributing to projects. Typically, you will clone repos from GitHub to GitHub Desktop, sync them as needed, create branches for your work, commit your work, and occasionally revert one or more commits.

To work with repos for which you lack commit and collaborate privileges, you typically start by forking the repo on GitHub and cloning the fork to your desktop. Then you add any branches you need in GitHub Desktop, commit any changes you wish, test your work, push the commits back to your remote forked repo, and finally generate a pull request to the parent project.

You can see the Pull Request button at the upper right of the GitHub Desktop interface. You can also see many commits in the Neo4j project that were merges of branches or pull requests. That’s typical of open-source projects with few committers and many contributors.

github desktop

GitHub Desktop gives you a handy GUI for adding or cloning repos, navigating branches, pushing changes, and managing pull requests.  

GitHub for open-source projects

Open-source software projects often need ways to enforce quality control while still accepting contributions from outside the core team of committers. The need for contributors is huge, but bringing new contributors into the project while maintaining the integrity of the codebase is a difficult and potentially dangerous undertaking. At the same time, the need for feedback from users of the project is also huge.

GitHub has a number of mechanisms that can help grease the wheels of open source projects. For example, users can add issues to the project on GitHub to report bugs or request features. Some other systems call these tickets. Project managers working with issues can generate task lists, assign issues to specific contributors, mention other interested contributors so that they are notified of changes, add labels, and add milestones.

To contribute to a project, you basically start from a topic head branch that contains the committed changes that you want added to the project base branch and initialize a pull request from the head branch, as shown below. Then you push your commits and add them to the project branch. Other contributors can review your proposed changes, add review comments, contribute to the pull request discussion, and add their own commits to the pull request.

Once everyone involved is happy with the proposed changes, a committer can merge the pull request. The merge can preserve all the commits, squash all changes into a single commit, or rebase the commits from the head branch into the base branch. If the merge generates conflicts, you can resolve them on GitHub or using the command line.

Code reviews on GitHub allow a distributed team to collaborate asynchronously. Useful GitHub tools for reviewers include diffs (the lower half of the screenshot below), history (the upper half), and blame view (a way to view the evolution of a file commit by commit). Code discussions on GitHub go into comments that are presented in line with your code changes. If the built-in tools don’t suffice for your project, you can add code review and continuous integration tools from the GitHub marketplace. Marketplace add-ons are often free for open source projects.

github compare

GitHub provides a number of useful views into your code including a commit history (top) and a diff view (bottom). 

GitHub gists

Gists are special GitHub repositories for sharing your work (public) or for saving work for later reuse (secret). They can contain single files, parts of files, or full applications. You can download gists, clone them, fork them, and embed them.

Public gists can be discovered and found in searches. You can use keywords to narrow down what you find, including prefixes to restrict the results to gists from specific users, gists with at least N stars, gists with specific filenames, and so on.

Secret gists are not searchable, but anyone with the URL can see them. If you really want your code to be protected, use a private repository.

As we’ve seen, GitHub provides Git repositories as a service, along with features for code review, project management, integrations with other developer tools, team management, social coding, and documentation. While GitHub is not the only product in its category, it is the dominant repository for open-source software development.

Go to Source

Author: