Content is user-generated and unverified.

Resume Analyzer & Job Profiler - Complete Code Explanation

1. Import Statements and Dependencies

python
import os
import gradio as gr
from dotenv import load_dotenv
from PyPDF2 import PdfReader
import tempfile
import shutil
  • os: Operating system interface for environment variables
  • gradio (gr): Web UI framework for creating interactive ML applications
  • dotenv: Loads environment variables from .env files
  • PyPDF2: Library for reading and extracting text from PDF files
  • tempfile & shutil: File system utilities for temporary file handling
python
from langchain.text_splitter import CharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.vectorstores import FAISS
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.memory import ConversationBufferMemory
from langchain.schema.runnable import RunnablePassthrough, RunnableLambda
from langchain.schema.output_parser import StrOutputParser

LangChain Components:

  • CharacterTextSplitter: Splits large text into smaller chunks
  • HuggingFaceEmbeddings: Converts text to vector embeddings using HuggingFace models
  • ChatPromptTemplate & MessagesPlaceholder: Templates for structuring AI conversations
  • FAISS: Facebook's vector similarity search library
  • ChatGoogleGenerativeAI: Interface to Google's Gemini AI model
  • ConversationBufferMemory: Stores conversation history
  • RunnablePassthrough & RunnableLambda: Chain components for data flow
  • StrOutputParser: Parses AI output to string format
python
from google.colab import userdata
  • userdata: Google Colab's secure way to access API keys and secrets
python
load_dotenv()
  • Loads environment variables from .env file (if present)

2. ResumeAnalyzer Class Definition

python
class ResumeAnalyzer:
    def __init__(self):
        self.vectorstore = None
        self.conversation_chain = None
        self.memory = None
        self.processed_files = []

Class Initialization:

  • vectorstore: Will store the FAISS vector database
  • conversation_chain: Will store the AI conversation pipeline
  • memory: Will store conversation history
  • processed_files: List of successfully processed PDF files

3. PDF Text Extraction Method

python
def extract_pdf_text(self, pdf_files):
    """Extract text from uploaded PDF files"""
    if not pdf_files:
        return ""

    text = ""
    self.processed_files = []

    for pdf_file in pdf_files:
        try:
            # Handle file path (Gradio returns file paths as strings)
            pdf_path = pdf_file if isinstance(pdf_file, str) else pdf_file.name
            self.processed_files.append(os.path.basename(pdf_path))

            pdf_reader = PdfReader(pdf_path)
            for page in pdf_reader.pages:
                page_text = page.extract_text()
                if page_text:
                    text += page_text + "\n"
        except Exception as e:
            print(f"Error processing {pdf_path}: {str(e)}")
            continue

    return text

Function Breakdown:

  • Checks if any files were uploaded
  • Initializes empty text string and clears processed files list
  • Loops through each uploaded PDF file
  • Handles both string paths and file objects from Gradio
  • Extracts filename for tracking processed files
  • Uses PyPDF2 to read each page and extract text
  • Concatenates all text with newlines
  • Error handling continues processing other files if one fails

4. Text Chunking Method

python
def create_text_chunks(self, text):
    """Split text into chunks for processing"""
    if not text.strip():
        return []

    text_splitter = CharacterTextSplitter(
        separator="\n",
        chunk_size=1000,
        chunk_overlap=200,
        length_function=len
    )

    chunks = text_splitter.split_text(text)
    return chunks

Text Chunking Parameters:

  • separator="\n": Splits on newlines to maintain context
  • chunk_size=1000: Each chunk contains ~1000 characters
  • chunk_overlap=200: 200 characters overlap between chunks for context continuity
  • length_function=len: Uses character count for length measurement

5. Vector Store Creation

python
def create_vectorstore(self, text_chunks):
    """Create FAISS vector store from text chunks"""
    if not text_chunks:
        return None

    try:
        embeddings = HuggingFaceEmbeddings(
            model_name="hkunlp/instructor-xl",
            model_kwargs={"device": "cpu"}
            # model_kwargs={"device": "cuda"}  # For GPU acceleration
        )

        vectorstore = FAISS.from_texts(texts=text_chunks, embedding=embeddings)
        return vectorstore
    except Exception as e:
        print(f"Error creating vectorstore: {str(e)}")
        return None

Vector Store Setup:

  • Uses HuggingFace's "instructor-xl" model for creating embeddings
  • Configured for CPU (can be changed to CUDA for GPU)
  • Creates FAISS vector database from text chunks
  • Each text chunk gets converted to a high-dimensional vector
  • FAISS enables fast similarity search across vectors

6. Conversation Chain Setup

python
def setup_conversation_chain(self, vectorstore):
    """Setup the conversation chain with Gemini LLM"""
    if not vectorstore:
        return None

    try:
        # Initialize Gemini
        llm = ChatGoogleGenerativeAI(
            model="gemini-2.5-flash",
            temperature=0.7,
        )

        # Memory
        self.memory = ConversationBufferMemory(
            memory_key="chat_history",
            return_messages=True
        )

LLM Configuration:

  • Uses Google's Gemini 2.5 Flash model
  • temperature=0.7: Controls randomness (0=deterministic, 1=creative)
  • Sets up conversation memory to maintain chat history

6.1 System Prompt Template

python
prompt = ChatPromptTemplate.from_messages([
    ("system", """You are an AI assistant for a resume analyzer system.
You MUST ONLY answer questions related to resume analysis, job profiling, and candidate evaluation based on the uploaded resumes.

STRICT RULES:
1. ONLY respond to queries about:
   - Finding candidates for specific job roles
   - Analyzing skills and qualifications from resumes
   - Comparing candidates for positions
   - Extracting contact information from resumes
   - Summarizing candidate profiles
   - Job-related questions about the uploaded resumes

2. If asked about ANYTHING else (history, general knowledge, unrelated topics, etc.), respond with:
   "I can only help with resume analysis and job profiling based on the uploaded resumes. Please ask questions about finding candidates, analyzing skills, or job-related queries."

3. IMPORTANT: If the context shows "No relevant resume information found", it means no candidates in the database match the query. In this case, respond with:
   "❌ No candidates found matching your criteria. This could mean:
   • No resumes in the database match the specified skills/role
   • The job title or skills mentioned aren't present in the uploaded resumes
   • Try broadening your search criteria or using different keywords

   Consider rephrasing your query or checking if the relevant resumes were properly uploaded."

4. For valid resume-related queries with relevant context, provide:
   - Full name
   - Email address (if available)
   - LinkedIn profile link (if available)
   - Phone number (if available)
   - A concise summary of their qualifications and experience
   - Key skills that match the job requirements
   - Years of experience (if mentioned)

5. Present information in a clear, organized format. If contact information is not available, mention "Not provided" for those fields.

6. Never make up or hallucinate information about candidates. Only use information explicitly provided in the context.

Context from uploaded resumes: {context}"""),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{question}")
])

Prompt Engineering:

  • Defines the AI's role as a resume analyzer
  • Sets strict boundaries on what topics to discuss
  • Provides specific response formats for different scenarios
  • Includes placeholders for context, chat history, and user questions
  • Prevents hallucination by requiring factual information only

6.2 Helper Functions

python
def format_docs(docs):
    if not docs:
        return "No relevant resume information found."

    # Check if documents have meaningful content
    meaningful_docs = []
    for doc in docs:
        if doc.page_content and len(doc.page_content.strip()) > 10:
            meaningful_docs.append(doc)

    if not meaningful_docs:
        return "No relevant resume information found."

    return "\n\n".join(doc.page_content for doc in meaningful_docs)

Document Formatting:

  • Filters out empty or very short documents
  • Joins relevant documents with double newlines
  • Returns a message if no relevant information is found
python
def get_chat_history(inputs):
    return self.memory.chat_memory.messages if self.memory else []

Chat History Retrieval:

  • Returns conversation history from memory
  • Provides empty list if no memory exists

6.3 Enhanced Retriever Function

python
def enhanced_retriever(query):
    """Enhanced retriever with similarity threshold checking"""
    try:
        # Perform similarity search with scores
        docs_with_scores = vectorstore.similarity_search_with_score(query, k=5)

        # Filter documents based on similarity threshold
        # Lower scores indicate higher similarity in FAISS
        similarity_threshold = 1.5  # Adjust based on your needs

        relevant_docs = []
        for doc, score in docs_with_scores:
            if score < similarity_threshold:  # Lower score = more similar
                relevant_docs.append(doc)

        # If no documents meet the threshold, return empty list
        if not relevant_docs:
            return []

        return relevant_docs

    except Exception as e:
        print(f"Retrieval error: {e}")
        return []

Enhanced Retrieval Logic:

  • Searches for top 5 most similar documents
  • Uses similarity threshold to filter relevant results
  • Lower FAISS scores indicate higher similarity
  • Returns empty list if no documents meet the threshold

6.4 RAG Chain Creation

python
# Create the chain
rag_chain = (
    {
        "context": RunnableLambda(enhanced_retriever) | format_docs,
        "question": RunnablePassthrough(),
        "chat_history": RunnableLambda(get_chat_history)
    }
    | prompt
    | llm
    | StrOutputParser()
)

RAG Pipeline:

  1. Context: Query → Enhanced Retriever → Document Formatting
  2. Question: Passes user query directly through
  3. Chat History: Retrieves conversation history
  4. Prompt: Formats the system prompt with context, question, and history
  5. LLM: Processes through Gemini AI
  6. Output Parser: Converts AI response to string

6.5 Memory Management Wrapper

python
def conversation_with_memory(question):
    try:
        response = rag_chain.invoke(question)
        # Save to memory
        if self.memory:
            self.memory.chat_memory.add_user_message(question)
            self.memory.chat_memory.add_ai_message(response)
        return response
    except Exception as e:
        return f"Error processing query: {str(e)}"

return conversation_with_memory

Memory Integration:

  • Invokes the RAG chain with user question
  • Saves both user question and AI response to memory
  • Maintains conversation context across interactions
  • Returns error message if processing fails

7. Resume Processing Method

python
def process_resumes(self, pdf_files, progress=gr.Progress()):
    """Process uploaded resume PDFs"""
    if not pdf_files:
        return "❌ No files uploaded. Please upload PDF resumes.", ""

    try:
        progress(0.1, desc="Extracting text from PDFs...")

        # Extract text from PDFs
        raw_text = self.extract_pdf_text(pdf_files)

        if not raw_text.strip():
            return "❌ No text could be extracted from the uploaded PDFs.", ""

        progress(0.3, desc="Creating text chunks...")

        # Create text chunks
        text_chunks = self.create_text_chunks(raw_text)

        if not text_chunks:
            return "❌ Could not create text chunks from the extracted text.", ""

        progress(0.6, desc="Creating vector database...")

        # Create vector store
        self.vectorstore = self.create_vectorstore(text_chunks)

        if not self.vectorstore:
            return "❌ Failed to create vector database.", ""

        progress(0.8, desc="Setting up AI conversation chain...")

        # Setup conversation chain
        self.conversation_chain = self.setup_conversation_chain(self.vectorstore)

        if not self.conversation_chain:
            return "❌ Failed to setup AI conversation chain.", ""

        progress(1.0, desc="Processing complete!")

        success_msg = f"""✅ **Processing Complete!**

📄 **Files Processed:** {len(self.processed_files)}
📝 **Text Chunks Created:** {len(text_chunks)}
🔍 **Vector Database:** Ready
🤖 **AI System:** Initialized

**Processed Files:**
{chr(10).join(f"• {file}" for file in self.processed_files)}

You can now query for job profiles using the chat interface below."""

        return success_msg, ""

    except Exception as e:
        return f"❌ Error processing resumes: {str(e)}", ""

Processing Pipeline:

  1. Validates file upload
  2. Extracts text from PDFs (10% progress)
  3. Creates text chunks (30% progress)
  4. Builds vector database (60% progress)
  5. Sets up conversation chain (80% progress)
  6. Returns success message with statistics (100% progress)
  7. Each step includes error handling and progress updates

8. Chat Interface Method

python
def chat_with_system(self, message, history):
    """Handle chat interactions with context validation"""
    if not self.conversation_chain:
        return history + [(message, "❌ Please upload and process resumes first.")], ""

    if not message.strip():
        return history, ""

    # Check if the question is resume/job-related
    if not self._is_resume_related_query(message):
        response = "I can only help with resume analysis and job profiling based on the uploaded resumes. Please ask questions about finding candidates, analyzing skills, or job-related queries."
        history.append((message, response))
        return history, ""

    try:
        # Get response from the conversation chain
        response = self.conversation_chain(message)

        # Update chat history
        history.append((message, response))

        return history, ""

    except Exception as e:
        error_msg = f"Error: {str(e)}"
        history.append((message, error_msg))
        return history, ""

Chat Logic:

  • Validates that the system is ready (conversation chain exists)
  • Ignores empty messages
  • Checks if query is resume-related using keyword validation
  • Processes valid queries through the conversation chain
  • Updates chat history and returns response
  • Handles errors gracefully

9. Query Validation Method

python
def _is_resume_related_query(self, query):
    """Check if the query is related to resume analysis or job profiling"""
    query_lower = query.lower()

    # Keywords that indicate resume/job-related queries
    resume_keywords = [
        'candidate', 'candidates', 'resume', 'resumes', 'job', 'position', 'role',
        'skill', 'skills', 'experience', 'qualification', 'qualifications',
        'developer', 'engineer', 'manager', 'analyst', 'designer', 'consultant',
        'hire', 'hiring', 'recruit', 'recruitment', 'interview', 'profile',
        'background', 'expertise', 'competency', 'competencies', 'ability',
        'python', 'java', 'javascript', 'react', 'node', 'sql', 'database',
        'frontend', 'backend', 'fullstack', 'devops', 'data science', 'machine learning',
        'project management', 'leadership', 'team', 'work', 'employment',
        'education', 'degree', 'certification', 'portfolio', 'github',
        'linkedin', 'contact', 'email', 'phone', 'name', 'find', 'search',
        'best', 'suitable', 'match', 'fit', 'senior', 'junior', 'entry level',
        'years of experience', 'cv', 'curriculum vitae'
    ]

    # Check if any resume-related keywords are present
    return any(keyword in query_lower for keyword in resume_keywords)

Keyword Validation:

  • Converts query to lowercase for case-insensitive matching
  • Defines comprehensive list of resume/job-related keywords
  • Includes job titles, skills, technologies, and HR terminology
  • Returns True if any keyword is found in the query

10. Application Initialization

python
# Initialize the resume analyzer
analyzer = ResumeAnalyzer()

Creates a global instance of the ResumeAnalyzer class.

11. Gradio Interface Creation

python
def create_interface():
    with gr.Blocks(
        title="Resume Analyzer & Job Profiler",
        theme=gr.themes.Soft(),
        css="""
        .header { text-align: center; margin-bottom: 20px; }
        .status-box { padding: 15px; border-radius: 10px; margin: 10px 0; }
        .upload-area { border: 2px dashed #ccc; padding: 20px; border-radius: 10px; }
        """
    ) as demo:

Interface Setup:

  • Uses Gradio Blocks for custom layout
  • Applies soft theme for modern appearance
  • Includes custom CSS for styling

11.1 Header Section

python
gr.HTML("""
<div class="header">
    <h1>🎯 Resume Analyzer & Job Profiler</h1>
    <p>Upload resumes and find the best candidates for any job profile using AI</p>
</div>
""")

Creates the main header with title and description.

11.2 Upload Tab

python
with gr.Tab("📤 Upload & Process Resumes"):
    gr.HTML("""
    <div style="padding: 15px; border-radius: 10px; margin-bottom: 20px;">
        <h3>Step 1: Upload Resume PDFs</h3>
        <p>Upload multiple PDF resumes to build your candidate database. The system will extract text and create a searchable vector database.</p>
    </div>
    """)

    with gr.Row():
        with gr.Column(scale=2):
            file_upload = gr.File(
                label="Upload Resume PDFs",
                file_count="multiple",
                file_types=[".pdf"],
                interactive=True
            )

            process_btn = gr.Button(
                "🚀 Process Resumes",
                variant="primary",
                size="lg"
            )

        with gr.Column(scale=1):
            gr.HTML("""
            <div style="padding: 15px; border-radius: 10px;">
                <h4>📋 Requirements</h4>
                <ul>
                    <li>PDF format only</li>
                    <li>Text-based PDFs (not scanned images)</li>
                    <li>Multiple files supported</li>
                    <li>Processing may take a few minutes</li>
                </ul>
            </div>
            """)

    status_output = gr.HTML(label="Processing Status")

Upload Interface:

  • Creates file upload component for multiple PDFs
  • Adds processing button with primary styling
  • Includes requirements and instructions
  • Provides status output area for feedback

11.3 Query Tab

python
with gr.Tab("💬 Query Candidates"):
    gr.HTML("""
    <div style="padding: 15px; border-radius: 10px; margin-bottom: 20px;">
        <h3>Step 2: Find the Best Candidates</h3>
        <p>Ask questions about job profiles to find the most suitable candidates from your uploaded resumes.</p>
    </div>
    """)

    chatbot = gr.Chatbot(
        label="AI Resume Analyzer",
        height=500,
        placeholder="Process resumes first, then start chatting..."
    )

    with gr.Row():
        msg_input = gr.Textbox(
            label="Your Query",
            placeholder="e.g., 'Who are the best candidates for a senior Python developer position?'",
            lines=2,
            scale=4
        )
        send_btn = gr.Button("Send", variant="primary", scale=1)

    gr.Examples(
        examples=[
            "Who are the best candidates for a software engineer position?",
            "Find candidates with React.js and Node.js experience",
            "Show me candidates suitable for a data scientist role",
            "Who has the most experience in machine learning?",
            "Find candidates with project management experience",
            "Show me candidates with both frontend and backend skills"
        ],
        inputs=msg_input,
        label="Example Queries"
    )

Chat Interface:

  • Creates chatbot component for conversation display
  • Adds text input for user queries
  • Includes send button
  • Provides example queries to guide users

11.4 About Tab

python
with gr.Tab("ℹ️ About"):
    gr.HTML("""
    <div style="padding: 20px;">
        <h2>About Resume Analyzer & Job Profiler</h2>

        <h3>🔧 Technology Stack</h3>
        <ul>
            <li><strong>LangChain:</strong> Framework for building AI applications</li>
            <li><strong>FAISS:</strong> Vector database for similarity search</li>
            <li><strong>Google Gemini AI:</strong> Advanced language model</li>
            <li><strong>HuggingFace Embeddings:</strong> Text embedding generation</li>
            <li><strong>Gradio:</strong> Web interface framework</li>
        </ul>

        <h3>📋 How It Works</h3>
        <ol>
            <li><strong>Upload:</strong> Upload multiple PDF resumes</li>
            <li><strong>Process:</strong> System extracts text and creates vector embeddings</li>
            <li><strong>Query:</strong> Ask for candidates matching specific job profiles</li>
            <li><strong>Results:</strong> AI analyzes and returns best matching candidates</li>
        </ol>

        <h3>🎯 Use Cases</h3>
        <ul>
            <li>HR recruitment and candidate screening</li>
            <li>Talent acquisition for specific roles</li>
            <li>Resume database management</li>
            <li>Quick candidate profiling</li>
        </ul>

        <h3>⚙️ Setup Requirements</h3>
        <p>Make sure you have the following API keys configured:</p>
        <ul>
            <li><code>GOOGLE_API_KEY</code> - For Gemini AI</li>
            <li><code>HUGGINGFACEHUB_API_TOKEN</code> - For embeddings</li>
        </ul>
    </div>
    """)

Provides comprehensive documentation about the application, including technology stack, workflow, use cases, and setup requirements.

12. Event Handlers

python
# Event handlers
process_btn.click(
    fn=analyzer.process_resumes,
    inputs=[file_upload],
    outputs=[status_output, msg_input],
    show_progress=True
)

send_btn.click(
    fn=analyzer.chat_with_system,
    inputs=[msg_input, chatbot],
    outputs=[chatbot, msg_input]
)

msg_input.submit(
    fn=analyzer.chat_with_system,
    inputs=[msg_input, chatbot],
    outputs=[chatbot, msg_input]
)

Event Binding:

  • Process Button: Connects to resume processing function with progress bar
  • Send Button: Connects to chat function
  • Text Input Submit: Allows Enter key to send messages

13. Application Launch

python
if __name__ == "__main__":
    os.environ["HUGGINGFACEHUB_API_TOKEN"] = userdata.get('HUGGINGFACEHUB_API_TOKEN')
    os.environ["GOOGLE_API_KEY"] = userdata.get('GOOGLE_API_KEY')
    
    # Check for required environment variables
    required_vars = ["GOOGLE_API_KEY", "HUGGINGFACEHUB_API_TOKEN"]
    missing_vars = [var for var in required_vars if not os.getenv(var)]

    if missing_vars:
        print(f"⚠️  Missing environment variables: {', '.join(missing_vars)}")
        print("Please set these variables in your .env file or environment")

    demo = create_interface()
    demo.launch(
        server_name="0.0.0.0",
        server_port=7860,
        share=True,
        debug=True
    )

Launch Configuration:

  • Sets API keys from Google Colab's userdata
  • Validates required environment variables
  • Creates and launches the Gradio interface
  • server_name="0.0.0.0": Makes server accessible from all network interfaces
  • server_port=7860: Uses port 7860 (Gradio default)
  • share=True: Creates public shareable link
  • debug=True: Enables debug mode for development

Application Workflow Summary

  1. Upload Phase: Users upload PDF resumes through the web interface
  2. Processing Phase: System extracts text, creates chunks, builds vector database, and sets up AI chain
  3. Query Phase: Users ask questions about candidates through the chat interface
  4. Retrieval Phase: System finds relevant resume sections using vector similarity search
  5. Generation Phase: AI processes context and generates structured candidate responses
  6. Response Phase: Results are displayed in the chat interface with candidate details

This application effectively combines modern AI technologies to create a powerful resume analysis and candidate matching system.

Content is user-generated and unverified.
    Resume Analyzer Code Explanation | Claude