AI GitHub Code Analyzer

It's a command-line tool that clones a public GitHub repository, loads code into your chosen generative AI model as context and helps you answer any question you may have about the code.

Sep 16, 2024

Retrieval-augmented generation

To put it simple, when you want to provide a context that is too large to be “copy-pasted” into your context prompt, you need RAG.

Or, as NVIDIA says:

“Retrieval-augmented generation (RAG) is a technique for enhancing the accuracy and reliability of generative AI models with facts fetched from external sources.”

Here's how it works:

The system first searches a large dataset (like documents, web pages, or code repositories) to retrieve the most relevant information based on a your question.
After retrieving the relevant data, the system uses a generative model to generate a response that incorporates the retrieved information.

This way we can produce more accurate and contextually relevant answers by affecting its responses in real, retrieved data rather than relying on pre-existing knowledge.

Enough talk, let’s roll!

Let’s have a look at example of how my AI GitHub Code Analyzer written in Python does it’s thing.

Prerequirements

We will need ollama running on the same machine you want to run the analyzer. More information on how to set it up in my other article: https://bitnirmata.com/p/how-to-run-your-own-chatgpt

Installation

Obviously, you should clone the repository and install dependencies:

git clone https://github.com/krmeljalen/ai-github-code-analyzer.git
cd ai-github-code-analyzer
pipenv install

Configuration

All kinda important configuration is in config.yaml file:

############
# Defaults #
############

repo: "krmeljalen/kdeploy"
selected_model: "llama3.1:8b"
ollama_endpoint: "http://localhost:11434"

###############
# Llama-Index #
###############

chat_mode: "compact"

#####################
# Advanced Settings #
#####################

num_thread: 12
system_prompt: "You are a sophisticated virtual assistant designed to assist users in comprehensively understanding and extracting insights from a wide range of documents at their disposal. Your expertise lies in tackling complex inquiries and providing insightful analyses based on the information contained within these documents."
embedding_model: "BAAI/bge-large-en-v1.5"
top_k: 3

Parameters you probably should change to fit your needs:

repo - It’s basically repository url. This string is used this way https://github.com/{repo}

selected_model - needs to be the same as the model that you downloaded and loaded into your ollama

num_thread - if you are using CPU and don’t have GPU like my poor laptop for example, it can speed up things, so modify this to number of cores you have

top_k - increase this if you are getting bad answers, to make it think a little more before blabbering nonsense

Running

Simple enough, just run these two commands in the folder you ran pipenv install:

pipenv shell
python3 main.py

It should look like this:

You are free to ask it questions now and when you are done just type: EOF

End of the line

Thanks for your attention, now go parse some public repos and ask AI what kind of security vulnerabilities did it find in the code.

BitNirmata

Discussion about this post