Supercharge your research code with Git and GitHub

Tue 23rd May 2023

β€œPiled Higher and Deeper” by Jorge Cham, phd comics

Examples of version control (non-code)

  • πŸ“š Book editions
  • Wikipedia pages
  • Google docs

What makes a version control system?


  • πŸ“Έ Snapshot current version
  • 🏷️ Name specific versions
  • β†©οΈŽοΈ Revert back to a particular version


Perhaps

  • πŸ“š Compare and merge versions

Benefits of version control/source code management

Local πŸ’»

e.g. Git

  • Protect against breaking everything
  • Keep at least one working version of the code
  • Snapshot your progress

Remote 🌐

e.g. GitHub , GitLab

  • Work collaboratively
  • Share code easily
  • Remote backup

Without version control


πŸ˜• Make changes by making a copy of the entire codebase.


😐 Merging is a manual process.


😨 Lose track of which version contains what functionality.


😭 Collaborating is just emailing zip files and crying.


Version Control == Git

More often than not

How does Git work?

%%{init: { 'logLevel': 'debug', 'theme': 'base', 'gitGraph': {'showCommitLabel': true, 'rotateCommitLabel': true}} }%%
gitGraph
    commit id: "commit 1"
    commit id: "commit 2"
    commit id: "commit 3"
    commit id: "commit 4"
    commit id: "commit 5"
    commit id: "commit 6"
    commit id: "commit 7"

The most important concept in git is the commit - the name given to a unit of changes, and also to the process of making a commit.

Repositories (Repos)


Once a directory/folder is initialised with git it becomes a repository.



directory

.
β”œβ”€β”€ src/
β”œβ”€β”€ LICENSE.md
└── README.md

git init β€”β€”->

repository

.
β”œβ”€β”€ .git/
β”œβ”€β”€ src/
β”œβ”€β”€ LICENSE.md
└── README.md

It’s generally a bad idea to keep local repos in a managed location such as Dropbox or Google Drive.

Use GitHub or GitLab to make remote copies/backups.

Demo

https://onlywei.github.io/explain-git-with-d3

or

https://bit.ly/git-sandbox

or

Firstly, a linear commit history



%%{init: { 'logLevel': 'debug', 'theme': 'base', 'gitGraph': {'showCommitLabel': true, 'rotateCommitLabel': true}} }%%
gitGraph
    commit id: "commit 1"
    commit id: "commit 2"
    commit id: "commit 3"
    commit id: "commit 4"
    commit id: "commit 5"
    commit id: "commit 6"
    commit id: "commit 7"

Branches

Used to work on new features/changes/additions to the code.

%%{init: { 'logLevel': 'debug', 'theme': 'base'} }%%
gitGraph
    commit id:"8bc2520"
    commit id:"2a70480"
    branch experiment
    commit id:"089e06b"
    commit id:"bec84f4"
    commit id:"2420edd"
git branch experiment
git checkout experiment
git commit
git commit
git commit

Checkout: switching to a different branch.

Alternatively, git switch

(Be aware checkout command has many jobs, not just switching branches.

The default branch is now usually called main but is sometimes still master.

Merging


Combine changes from two branches.


%%{init: { 'logLevel': 'debug', 'theme': 'base'} }%%
gitGraph
    commit id:"8bc2520"
    commit id:"2a70480"
    branch experiment
    commit id:"089e06b"
    commit id:"bec84f4"
    commit id:"2420edd"
    checkout main
    merge experiment
    commit id:"60489ec"
git checkout main
git merge experiment
git commit

Making a commit

    flowchart TB
        A(Edit file) --> B
        B(Save) --> C
        C(Stage changes) --> D(Commit)


Commits contain changes

Not actually snapshots of a file.

But can recreate a state from a sequence of changes.

The commit hash


Git generates a hash string, uniquely identifying each commit.

Git uses a β€œMerkle tree” under the hood. (Don’t ask me how it works, I have no idea 🀷)


Hashes look like:

d3dd03f493707256c8528bc83ad280a460f05a56


But are most often seen as the first 7 characters, as this is easier to read/type and is normally enough to identify the commit.

d3dd03f

The commit message

Each commit has a message associated with it.


Summary/Title: <50-72 characters

Displayed most frequently.


Detailed description: no character limit.

Can be used to capture more detail. Not used that often.


This commit will…

  • ❌ some stuff
  • ❌ code
  • ❌ updates
  • βœ”οΈ add new module β€œrenderers”
  • βœ”οΈ add new install instructions
  • βœ”οΈ fix bug #17 with package update

How to interact with Git


command line git

via unix shell (or gitbash/WSL on Windows)

$ git add README.md
$ git commit -m 'initial commit'
$ git status


File types

Git is great at handling text files but NOT binary files.

What are text files?

Just plain text, e.g. files containing code such as:

  • Plain text code files - e.g. .R, .py, .m, .cpp
  • Plain text data files - e.g. csv


NOT documents such as

  • .docx - Microsoft Word documents
  • .pdf - PDF documents

These aren’t text files, even though they primarily contain text.

What are binary files?

Files that need a specific program to read, including:

  • .docx - Microsoft Word documents
  • .xlsx - spreadsheets
  • .pdf - PDF documents
  • .fig - MATLAB figures
  • .jpg, .gif - Images


Note: Git can handle binary files, but can’t inspect what changes have been made, only that the file has changed. Making merging impossible.

Git learning resources


Remember

Learning Git is a process.

Everyone makes mistakes.

Git vs GitHub or GitLab


Git

  • Local client for source code management
  • Interacting with remote git servers

GitHub/GitLab

  • Code hosting
  • Collaboration
  • OSS contribution
  • Project management
  • Automated workflows/continuous delivery

Repositories

Issues

Projects/Kanban Board

Continuous integration/Automated testing