Building an AI-Powered Video Segmentation Toolkit
Transform lecture recordings into structured, navigable educational content using LLMs, Python, and modern web technologies.
1. Overview
This tutorial guides you through building a complete toolkit for transforming lecture video transcripts into AI-analyzed pedagogical segments. The system processes SRT subtitle files, uses large language models to identify educational boundaries, maps content to YouTube uploads, and generates interactive viewing experiences for students.
What You'll Build
π― AI Segmentation Engine
Process transcripts through LLMs to identify introductions, concept explanations, examples, and summaries with pedagogical metadata.
π Structured JSON Output
Generate validated JSON with learning objectives, prerequisites, difficulty levels, and engagement tips for each segment.
π¬ YouTube Integration
Automatically map segmentation data to YouTube video IDs for embedded playback with timestamp navigation.
π₯οΈ Interactive Viewer
Single-file HTML viewer with role-based content, search functionality, and theater mode for immersive learning.
Design Principles
- Bounded Expertise Oracle: AI analysis anchored to transcript content with mandatory timecode citations
- Graceful Degradation: Token truncation with clear markers when transcripts exceed limits
- Human-in-the-Loop: Flask-based annotation tool for timestamp corrections
- Schema Enforcement: Pydantic validation ensures reliable JSON parsing
- Idempotent Operations: Backup files before any mutation
This toolkit was developed for STAT 350 at Purdue University, processing 70+ lecture videos into a searchable, navigable learning platform. The same architecture applies to any course with video content.
2. System Architecture
The toolkit follows a pipeline architecture where each component produces artifacts consumed by downstream processes. This design enables incremental processing and easy debugging.
Component Responsibilities
| Component | Input | Output | Purpose |
|---|---|---|---|
srt_pedagogical_segmentation.py |
SRT files | JSON segments | Core AI analysis engine |
segmentation_schema.py |
JSON data | Validated objects | Schema enforcement |
estimate_limit.py |
SRT file | Token count | Context window sizing |
create_youtube_mapping_smart.py |
Upload log + JSON | Mapping JSON | Video-to-lecture matching |
generate_segmentation_report_youtube.py |
JSON + mapping | HTML viewer | Interactive interface |
lecture_segment_annotator.py |
JSON segments | Corrected JSON | Manual timestamp refinement |
3. Prerequisites
Required Software
# Core dependencies pandas>=1.5.0 pydantic>=2.0.0 tiktoken>=0.5.0 # LLM providers (install one or both) openai>=1.0.0 anthropic>=0.18.0 # Web application (for annotation tool) flask>=3.0.0 flask-cors>=4.0.0 # Optional: video probing # ffprobe (from ffmpeg) - install via system package manager
API Keys
Configure environment variables for your chosen LLM provider:
# OpenAI (for GPT-4o, o1, o3 models) OPENAI_API_KEY=sk-your-key-here # Anthropic (for Claude models) ANTHROPIC_API_KEY=sk-ant-your-key-here # Or use a local/custom endpoint OPENAI_BASE_URL=https://your-endpoint.com/v1
Project Structure
4. Schema Design
Reliable LLM output parsing requires strict schema enforcement. We use Pydantic models to validate AI-generated JSON and provide clear error messages when output doesn't conform.
Segment Types
The toolkit recognizes ten pedagogical segment types, each with distinct visual styling and semantic meaning:
| Type | Icon | Description | Color |
|---|---|---|---|
introduction | π― | Topic introduction and learning objectives | #2E86DE |
concept_explanation | π‘ | Core concept explanation and theory | #5F27CD |
example | π | Worked examples and demonstrations | #00B894 |
deep_reasoning | π§ | Deep reasoning and intuition building | #D63031 |
common_mistakes | β οΈ | Common mistakes and misconceptions | #E17055 |
practice_problem | βοΈ | Practice problems and exercises | #00CEC9 |
real_world_application | π | Real-world applications and context | #A29BFE |
summary | π | Summary and key takeaways | #6C5CE7 |
q_and_a | β | Student questions and answers | #00B894 |
transition | β‘οΈ | Topic transitions and administrative content | #636E72 |
Pydantic Models
from typing import Dict, List, Optional
from pydantic import BaseModel, Field, validator
from enum import Enum
class SegmentType(str, Enum):
"""Valid segment types."""
INTRODUCTION = "introduction"
CONCEPT_EXPLANATION = "concept_explanation"
EXAMPLE = "example"
DEEP_REASONING = "deep_reasoning"
COMMON_MISTAKES = "common_mistakes"
PRACTICE_PROBLEM = "practice_problem"
REAL_WORLD_APPLICATION = "real_world_application"
SUMMARY = "summary"
Q_AND_A = "q_and_a"
TRANSITION = "transition"
class DifficultyLevel(str, Enum):
"""Valid difficulty levels."""
EASY = "Easy"
MEDIUM = "Medium"
HARD = "Hard"
class TimeRange(BaseModel):
"""Represents a time range in HH:MM:SS,mmm format."""
start: str = Field(..., pattern=r'^\d{2}:\d{2}:\d{2}[,;]\d{2,3}$')
end: str = Field(..., pattern=r'^\d{2}:\d{2}:\d{2}[,;]\d{2,3}$')
class SegmentSchema(BaseModel):
"""Schema for a single segment."""
time_range: TimeRange
segment_type: SegmentType
title: str = Field(..., min_length=1, max_length=200)
description: str = Field(..., min_length=1, max_length=500)
key_concepts: List[str] = Field(default_factory=list, max_items=10)
learning_objectives: List[str] = Field(default_factory=list, max_items=5)
prerequisites: List[str] = Field(default_factory=list, max_items=5)
difficulty: DifficultyLevel = Field(default=DifficultyLevel.MEDIUM)
engagement_tips: List[str] = Field(default_factory=list, max_items=5)
microlecture_suitable: bool = Field(default=False)
class Config:
use_enum_values = True
class SegmentationAnalysis(BaseModel):
"""Complete schema for segmentation analysis output."""
overview: LectureOverview
segments: List[SegmentSchema] = Field(..., min_items=1)
interactive_opportunities: List[InteractiveOpportunity] = Field(default_factory=list)
microlecture_recommendations: List[MicrolectureRecommendation] = Field(default_factory=list)
Always validate LLM output before storing. Use validate_analysis(data) to catch malformed responses early. LLMs occasionally produce invalid JSON, especially with complex schemas.
5. Core Segmentation Engine
The segmentation engine processes SRT transcripts through several stages: parsing, enhancement, LLM analysis, and output generation.
Timestamp Handling
SRT files use millisecond timestamps (HH:MM:SS,mmm), while video editing uses frame-based timecodes. The toolkit handles both formats:
def tc_to_seconds(tc: str, fps: float = 30.0) -> float:
"""Convert timecode to seconds - handles all formats."""
try:
if ',' in tc: # Millisecond format: HH:MM:SS,mmm
h, m, rest = tc.split(':')
s, ms = rest.split(',')
return int(h) * 3600 + int(m) * 60 + int(s) + int(ms) / 1000
else: # Frame format: HH:MM:SS:FF or HH:MM:SS;FF
parts = re.split(r'[:;]', tc)
if len(parts) == 4:
h, m, s, ff = parts
return int(h) * 3600 + int(m) * 60 + int(s) + int(ff) / fps
else:
h, m, s = parts
return int(h) * 3600 + int(m) * 60 + int(s)
except Exception as e:
logger.error(f"Error parsing timecode '{tc}': {e}")
return 0.0
def seconds_to_tc(sec: float, fps: float = 30.0, mode: str = "ms") -> str:
"""Convert seconds to timecode string."""
h, rem = divmod(sec, 3600)
m, s_full = divmod(rem, 60)
s = int(s_full)
if mode == "frames":
frames = int(round((s_full - s) * fps))
return f"{int(h):02d}:{int(m):02d}:{int(s):02d}:{frames:02d}"
else: # milliseconds
ms = int((sec - int(sec)) * 1000)
return f"{int(h):02d}:{int(m):02d}:{int(s):02d},{ms:03d}"
Transcript Enhancement
Before sending to the LLM, we enhance transcripts with embedded timestamp markers. This anchors the AI to actual timecodes rather than hallucinating times:
def enhance_transcript_with_timestamps(srt_text: str) -> Tuple[str, List]:
"""
Add timestamp markers to transcript for AI reference.
Returns: (enhanced_text, timestamp_mapping)
"""
enhanced_lines = []
timestamp_mapping = []
# Split into caption blocks
blocks = re.split(r'\n\n+', srt_text.strip())
for block in blocks:
lines = block.strip().split('\n')
if len(lines) < 3:
continue
# Extract timestamp
timestamp_match = TIMESTAMP_RE.search(lines[1])
if timestamp_match:
start_tc, end_tc = timestamp_match.groups()
# Get caption text (lines after timestamp)
text = ' '.join(lines[2:])
# Add to enhanced transcript with timestamp marker
enhanced_lines.append(f"[{start_tc}] {text}")
timestamp_mapping.append((start_tc, end_tc, text))
return '\n'.join(enhanced_lines), timestamp_mapping
Token Budget Management
Long lectures may exceed model context windows. Use estimate_limit.py to calculate token requirements:
python estimate_limit.py --srt lecture_42.srt --model gpt-4o # Output: # Transcript characters : 45,230 # Exact tokens (gpt-4o) : 12,847 # --------------------------------------------------------- # Recommended config.token_limit : 19,764 (largest safe model window) # Minimum-for-current heuristic : 12,848 (fits transcript + 1-token reply)
The toolkit uses a 65/25/10 heuristic: 65% of context for transcript, 25% for model output, 10% safety margin. If transcripts exceed limits, they're truncated with a clear [TRANSCRIPT TRUNCATED] marker.
6. LLM Prompting Strategy
Effective prompting is critical for reliable segmentation. The toolkit uses a structured prompt that constrains the LLM to use actual transcript timestamps.
Model Configuration
MODEL_CONFIGS = {
'gpt-4o': {
'temperature': 0.1, # Low temp for consistent output
'max_tokens': 4000,
'token_limit': 128000,
'provider': 'openai'
},
'claude-sonnet-4': {
'temperature': 0.1,
'max_tokens': 4000,
'token_limit': 200000,
'provider': 'claude'
},
'o3': {
'temperature': None, # O-series ignores temperature
'max_completion_tokens': 25000,
'token_limit': 200000,
'provider': 'openai',
'reasoning_effort': 'high'
}
}
System Prompt Structure
The prompt follows a specific structure to maximize reliability:
You are an expert educational content analyst specializing in
pedagogical segmentation of lecture videos.
## CRITICAL TIMESTAMP RULES
1. ONLY use timestamps that appear in [HH:MM:SS,mmm] markers in the transcript
2. Segment boundaries MUST align with actual caption timestamps
3. DO NOT interpolate or invent timestamps
4. If uncertain about a boundary, use the nearest visible timestamp
## Segment Types
- introduction: Topic setup and learning objectives
- concept_explanation: Core theory and definitions
- example: Worked problems and demonstrations
- deep_reasoning: Intuition building and "why" explanations
- common_mistakes: Pitfalls and misconceptions
- practice_problem: Student exercises
- real_world_application: Practical applications
- summary: Key takeaways and review
- q_and_a: Student questions
- transition: Topic changes and administrative content
## Output Format
Return valid JSON matching this schema:
{
"overview": {
"learning_objectives": ["..."],
"prerequisites": ["..."],
"key_takeaways": ["..."]
},
"segments": [
{
"time_range": {"start": "HH:MM:SS,mmm", "end": "HH:MM:SS,mmm"},
"segment_type": "concept_explanation",
"title": "Descriptive Title",
"description": "1-2 sentence summary",
"key_concepts": ["concept1", "concept2"],
"difficulty": "Easy|Medium|Hard"
}
]
}
LLMs frequently hallucinate timestamps when not properly constrained. The embedded [HH:MM:SS,mmm] markers in enhanced transcripts and explicit instructions to only use visible timestamps significantly reduce this problem, but human verification remains essential.
7. YouTube Video Mapping
After uploading lectures to YouTube, you need to map video IDs to lecture indices. The create_youtube_mapping_smart.py script automates this using a two-pass matching algorithm.
Input: YouTube Upload Log
Most YouTube upload tools generate a CSV log. The script expects this format:
Video File,YouTube Video ID,Status "/path/to/STAT 350 - Chapter 6.3.1 Intro.mp4",dQw4w9WgXcQ,UPLOAD SUCCESS "/path/to/STAT 350 - Chapter 6.3.2 Examples.mp4",9bZkp7q19f0,UPLOAD SUCCESS "/path/to/STAT 350 - Chapter 7.1 Overview.mp4",kJQP7kiw5Fk,UPLOAD SUCCESS
Matching Algorithm
-
Extract Chapter Information: Parse chapter numbers from filenames using regex patterns like
Chapter\s*(\d+(?:\.\d+)*). - Pass 1 - Chapter Group Matching: Group videos and lectures by base chapter (e.g., "6.3") and match in order. This handles sub-chapters like 6.3.1, 6.3.2.
-
Pass 2 - Fuzzy Title Matching: For remaining unmatched items, use
difflib.SequenceMatcherto find best title matches above a similarity threshold (default 0.7). -
Generate Mapping: Output
youtube_mapping.jsonwith lecture index β video ID pairs.
python create_youtube_mapping_smart.py \
--log-file youtube_upload_log.csv \
--json-dir segmentation_reports/json \
--output youtube_mapping.json \
--threshold 0.7
# Output files:
# - youtube_mapping.json (primary mapping)
# - youtube_mapping_detailed.csv (for human review)
# - unmatched_videos.csv (failed matches)
Output Format
{
"1": "dQw4w9WgXcQ",
"2": "9bZkp7q19f0",
"3": "kJQP7kiw5Fk",
"38": "abc123xyz",
"39": "def456uvw"
}
8. Interactive Video Viewer
The HTML viewer is a self-contained single-file application that renders all segmentation data with embedded YouTube playback.
Key Features
π Role-Based Views
Students see simplified content; instructors see engagement tips, difficulty badges, and analytics.
π Full-Text Search
Search across all lectures by title, concept, description, or learning objective.
π¬ Theater Mode
Immersive full-width video playback with hidden sidebar.
π Visual Timeline
Color-coded segment bars showing lecture structure at a glance.
Generation Command
python generate_segmentation_report_youtube.py \
--json-dir segmentation_reports/json \
--youtube-mapping youtube_mapping.json \
--out video_viewer.html
Data Embedding
Segment data is embedded directly in the HTML as JavaScript objects:
// Embedded in generated HTML
window.segmentData = {
"1": {
"lecture_index": 1,
"lecture_title": "Chapter 1.1: Introduction to Statistics",
"total_duration": 1847.5,
"segments": [
{
"start_time": 0,
"end_time": 185.2,
"start_tc": "00:00:00,000",
"end_tc": "00:03:05,200",
"segment_type": "introduction",
"title": "Course Overview",
"description": "Introduction to the course...",
"key_concepts": ["statistics", "data analysis"],
"difficulty_level": "Easy"
}
// ... more segments
]
}
};
const YOUTUBE_MAPPING = {
"1": "dQw4w9WgXcQ",
"2": "9bZkp7q19f0"
};
The viewer is entirely self-containedβno server required. Host it on GitHub Pages, drop it in an LMS, or open directly in a browser. All styles, scripts, and data are inline.
9. Manual Annotation Tool
AI-generated timestamps often need refinement. The Flask-based annotation tool provides a side-by-side interface for human correction.
Starting the Server
# Start the Flask server python lecture_segment_annotator.py # Server runs at http://localhost:5005 # Open in browser to begin annotation
Workflow
-
Initialize: On first run, the tool copies all original JSONs to a
corrected/directory. - Select Lecture: Browse available lectures in the sidebar. Videos with corrections show a badge.
- Adjust Timestamps: Play the YouTube video and click on segments to adjust start/end times.
- Save Corrections: Changes are saved to the corrected directory with metadata timestamps.
- Export: Download all corrected segments as a ZIP for deployment.
API Endpoints
| Endpoint | Method | Purpose |
|---|---|---|
/api/lectures |
GET | List all available lectures with correction status |
/api/segments/<filename> |
GET | Get segments for a specific lecture |
/api/segments/save |
POST | Save corrected timestamps |
/api/reset/<filename> |
POST | Reset to original timestamps |
/api/export |
GET | Download all segments as ZIP |
10. Utility Scripts
LLM output often contains artifacts that need post-processing. These utilities clean and repair JSON files.
fix_json_line_breaks.py
Removes unwanted line breaks and HTML artifacts from text fields:
python fix_json_line_breaks.py --json-dir segmentation_reports/json # Fixes: # - Removes \n, \r, \t sequences # - Cleans HTML entities (<br/>) # - Normalizes whitespace # - Creates .bak backups
merge_split_items.py
LLMs sometimes split list items incorrectly. This script merges fragments:
// Before (incorrectly split)
"prerequisites": [
"High",
"school algebra",
"Basic probability"
]
// After (merged)
"prerequisites": [
"High-school algebra",
"Basic probability"
]
fix_lecture_indices.py
Ensures lecture_index fields match filename prefixes:
# Preview changes python fix_lecture_indices.py --json-dir output/json --dry-run # Apply changes python fix_lecture_indices.py --json-dir output/json # File: 038_STAT_350_Chapter_7_segments.json # Before: lecture_index: 1 # After: lecture_index: 38
rebuild_course_context.py
After reprocessing individual lectures, rebuild the cross-lecture concept graph:
python rebuild_course_context.py \
--input-dir segmentation_analysis \
--verify \
--enhanced-summary
# Outputs:
# - course_context.json (concept tracking)
# - course_summary.md (human-readable overview)
11. Deployment
Static Hosting (GitHub Pages)
The viewer is a single HTML file that can be hosted anywhere:
# Create a docs folder for GitHub Pages mkdir -p docs cp video_viewer.html docs/index.html # Commit and push git add docs/ git commit -m "Deploy video viewer" git push # Enable GitHub Pages in repository settings # Source: Deploy from branch β main β /docs
LMS Integration
For Brightspace, Canvas, or Blackboard:
- Upload
video_viewer.htmlto course files - Create a content item linking to the uploaded file
- Or embed in an iframe if your LMS allows
The YouTube IFrame API requires the page to be served over HTTP/HTTPS. Opening the HTML file directly (file://) may not load videos. Use a local server for development:
# Python 3 python -m http.server 8000 # Then open http://localhost:8000/video_viewer.html
12. Extensions & Future Work
Potential Enhancements
π Chatbot Integration
Connect segmentation data to an AI tutor that can answer questions about specific video segments.
π Learning Analytics
Track which segments students watch, rewind, or skip to identify difficult content.
π¬ Auto-Clip Generation
Automatically extract "microlecture" clips based on segment boundaries using ffmpeg.
π Quiz Generation
Use LLMs to generate comprehension questions for each segment.
EDL Export for Video Editing
The toolkit can export Edit Decision Lists for Adobe Premiere Pro:
def export_segmentation_edl(segmentation, output_path, video_title, fps):
"""Export segmentation as EDL file for Adobe Premiere Pro."""
with open(output_path, 'w') as f:
f.write(f"TITLE: {video_title}_segments\n")
f.write("FCM: NON-DROP FRAME\n\n")
for i, segment in enumerate(segmentation.segments, 1):
start_tc = seconds_to_tc(segment.start_time, fps, 'frames')
end_tc = seconds_to_tc(segment.end_time, fps, 'frames')
f.write(f"{i:03d} AX V C ")
f.write(f"{start_tc} {end_tc} {start_tc} {end_tc}\n")
f.write(f"* COMMENT: TYPE={segment.segment_type} | {segment.title}\n\n")
Contributing
This toolkit is designed to be modular and extensible. Key extension points include:
- New segment types: Add entries to
SEGMENT_TYPESdictionary - Custom LLM providers: Implement provider adapters in segmentation engine
- Alternative viewers: Generate React, Vue, or native mobile apps from JSON
- Assessment integration: Connect to LMS gradebooks via LTI
You now have a complete understanding of the pedagogical video segmentation toolkit. Start with a single lecture, validate the output, then scale to your full course library. The human-in-the-loop annotation ensures quality while AI handles the heavy lifting.