Conversational AI That Understands Your Codebase - DZone - Uplaza - uPlaza

Think about having a software that understands your code and might reply your questions, present insights, and even assist debug points — all via pure language queries. On this article, we’ll stroll you thru the method of making a conversational AI that lets you speak to your code utilizing Chainlit, Qdrant, and OpenAI.

Advantages of Conversational AI for Codebases

Streamlined code evaluation: Shortly evaluation particular code modules and perceive their context with out spending time digging via the information.
Environment friendly debugging: Ask questions on potential points within the code and get focused responses, which helps scale back the time spent on troubleshooting.
Enhanced studying: New workforce members can study how completely different elements within the code work with out having to pair with current consultants within the code.
Improved documentation: Summarizing utilizing AI would assist generate explanations for advanced code, making it simpler to reinforce documentation.

Now allow us to have a look at how we made that occur.

Making ready the Codebase for Interplay

Step one is to make sure the code base is prepared for interplay. This may be attainable by vectorizing the code and storing it in a vector DB, from which it may be effectively reviewed.

import openai
import yaml
import os
import uuid
from qdrant_client import QdrantClient, fashions

# Load configuration from config.yaml
with open("config.yaml", "r") as file:
    config = yaml.safe_load(file)

# Extract API keys and URLs from the config
qdrant_cloud_url = config["qdrant"]["url"]
qdrant_api_key = config["qdrant"]["api_key"]
openai_api_key = config["openai"]["api_key"]
code_folder_path = config["folder"]["path"]

# Initialize OpenAI API
openai.api_key = openai_api_key

# Initialize Qdrant shopper
shopper = QdrantClient(
    url=qdrant_cloud_url,
    api_key=qdrant_api_key,
)

def chunk_code(code, chunk_size=512):
    """
    Splits the code into chunks, every of a specified dimension.
    This helps in producing embeddings for manageable items of code.
    """
    strains = code.cut up('n')
    for i in vary(0, len(strains), chunk_size):
        yield 'n'.be a part of(strains[i:i + chunk_size])

def vectorize_and_store_code(code, filename):
    strive:
        # Chunk the code for higher embedding illustration
        code_chunks = listing(chunk_code(code))

        # Generate embeddings for every chunk utilizing the OpenAI API
        embeddings = []
        for chunk in code_chunks:
            response = openai.embeddings.create(
                enter=[chunk],  # Enter must be a listing of strings
                mannequin="text-embedding-ada-002"
            )
            
            # Entry the embedding information appropriately
            embedding = response.information[0].embedding
            embeddings.append(embedding)

        # Flatten embeddings if wanted or retailer every chunk as a separate entry
        if len(embeddings) == 1:
            final_embeddings = embeddings[0]
        else:
            final_embeddings = [item for sublist in embeddings for item in sublist]
        
        # Guarantee the gathering exists
        strive:
            shopper.create_collection(
                collection_name="talk_to_your_code",
                vectors_config=fashions.VectorParams(dimension=len(final_embeddings), distance=fashions.Distance.COSINE)
            )
        besides Exception as e:
            print("Collection already exists or other error:", e)
    
        # Insert every chunk into the gathering with related metadata
        for i, embedding in enumerate(embeddings):
            point_id = str(uuid.uuid4())
            factors = [
                models.PointStruct(
                    id=point_id,
                    vector=embedding,
                    payload={
                        "filename": filename,
                        "chunk_index": i,
                        "total_chunks": len(embeddings),
                        "code_snippet": code_chunks[i]
                    }
                )
            ]
            shopper.upsert(collection_name="talk_to_your_code", factors=factors)
    
        return f"{filename}: Code vectorized and stored successfully."
    
    besides Exception as e:
        return f"An error occurred with {filename}: {str(e)}"

def process_files_in_folder(folder_path):
    for filename in os.listdir(folder_path):
        if filename.endswith(".py"):
            file_path = os.path.be a part of(folder_path, filename)
            with open(file_path, 'r', encoding='utf-8') as file:
                code = file.learn()
                print(vectorize_and_store_code(code, filename))

if __name__ == "__main__":
    process_files_in_folder(code_folder_path)

Allow us to have a look at the noteworthy points of the above code.

Load your code information and chunk them into manageable items.
Chunking is an important side. The chunk dimension shouldn’t be too small, the place the operate or module you wish to find out about is out there in a number of chunks, or too large, the place a number of features or modules are squeezed right into a single chunk; each situations will scale back the retrieval high quality.
Used OpenAI’s text-embedding-ada-002 mannequin to generate embeddings for every chunk.
Processing and storing the embeddings in Qdrant for enhanced retrieval.
Including metadata to the code chunks will assist retrieve particular elements and make the code dialog highly effective.
For simplicity, I used a folder path the place I positioned a few code information that have been used for constructing this conversational module. This may be enhanced to level to a GitHub URL.
2 Python information, particularly ragwithknowledgegraph.py and ragwithoutknowledgegraph.py, have been used to generate embeddings and retailer them in a vector DB, over which questions will be requested by way of the chat interface.

Constructing the Conversational Interface

We are going to now arrange a chainlit interface that takes person enter, queries Qdrant, and returns contextually related details about your code.

import chainlit as cl
import qdrant_client
import openai
import yaml
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.prompts import PromptTemplate

# Load configuration from config.yaml
with open("config.yaml", "r") as file:
    config = yaml.safe_load(file)

# Extract API keys and URLs from the config
qdrant_cloud_url = config["qdrant"]["url"]
qdrant_api_key = config["qdrant"]["api_key"]
openai_api_key = config["openai"]["api_key"]

# Initialize OpenAI API
openai.api_key = openai_api_key

# Initialize OpenAI Embeddings
embeddings = OpenAIEmbeddings(mannequin="text-embedding-ada-002", openai_api_key=openai_api_key)

# Initialize Qdrant shopper
shopper = qdrant_client.QdrantClient(
    url=qdrant_cloud_url,
    api_key=qdrant_api_key,
)

# Initialize OpenAI Chat mannequin
chat_model = ChatOpenAI(openai_api_key=openai_api_key, mannequin="gpt-4")

# Outline a easy QA immediate template
qa_prompt_template = PromptTemplate(
    input_variables=["context", "question"],
    template="Given the following context:n{context}nAnswer the following question:n{question}"
)

# Chainlit operate to deal with person enter
@cl.on_message
async def handle_message(message: cl.message.Message):
    strive:
        # Extract the precise textual content content material from the message object
        user_input = message.content material
        
        # Generate the question vector utilizing OpenAI Embeddings
        query_vector = embeddings.embed_query(user_input)
        
        # Manually ship the question to Qdrant
        response = shopper.search(
            collection_name="talk_to_your_code",
            query_vector=query_vector,
            restrict=5
        )
        
        # Course of and retrieve the related context (code snippets) from the Qdrant response
        context_list = []
        for level in response:
            code_snippet = level.payload.get('code_snippet', '')
            filename = level.payload.get('filename', 'Unknown')
            context_list.append(f"Filename: {filename}nCode Snippet:n{code_snippet}n")
        
        context = "n".be a part of(context_list)
        if not context:
            context = "No matching documents found."
        
        # Generate a response utilizing the LLM with the retrieved context
        immediate = qa_prompt_template.format(context=context, query=user_input)
        response_text = chat_model.predict(immediate)
        
        # Ship the LLM's response
        await cl.Message(content material=response_text).ship()
        
    besides Exception as e:
        # Log the error
        print(f"Error during message handling: {e}")
        await cl.Message(content material=f"An error occurred: {str(e)}").ship()

if __name__ == "__main__":
    cl.run()

Vital points of the above code:

Initialize chainlit and configure it to work together with OpenAI and Qdrant.
Generate question vectors for the enter to assist retrieve the related code snippets from Qdrant.
Outline a immediate template that mixes the context retrieved from Qdrant with the person’s query.
Make the context and query obtainable to OpenAI’s language mannequin and return the generated reply to the person.
Please be aware that I’ve simplified among the implementation for higher understanding.

Output From the Chat Interface

Allow us to have a look at the output generated by the chat interface once we requested to summarize one of many code information. As talked about earlier, we loaded the two Python information to vector db, and I requested to summarize one of many scripts.

One makes use of a information graph, and the opposite one doesn’t for implementing a easy RAG use case. The LLM did a great job of summarizing the script in pure language.

Subsequent Steps

Enhance retrieval by incorporating further metadata to determine numerous points of the code.
Combine the chat interface to soak up a GitHub URL and ingest the code base which can be utilized to ask questions.
Check the appliance by asking particular and broad inquiries to see how nicely the appliance understands the context.
Engineer prompts and check retrieval utilizing numerous completely different prompts.

Conclusion

Making a conversational AI that understands your codebase will unlock a brand new stage of effectivity and perception in your improvement course of. Whether or not you’re streamlining code critiques, accelerating debugging, or enhancing workforce collaboration, this method presents immense worth. With this straightforward method, you may remodel the way in which you work together together with your code.

Share your experiences and enhancements with the neighborhood to assist form the way forward for code interplay.

Conversational AI That Understands Your Codebase – DZone – Uplaza

Advantages of Conversational AI for Codebases

Making ready the Codebase for Interplay

Constructing the Conversational Interface

Output From the Chat Interface

Subsequent Steps

Conclusion

Leave a Reply

Advantages of Conversational AI for Codebases

Making ready the Codebase for Interplay

Constructing the Conversational Interface

Output From the Chat Interface

Subsequent Steps

Conclusion

Leave a Reply Cancel reply

Leave a Reply