RAHGIR VLM: Road-scene Analysis for Hazards

RAHGIR VLM: Road-scene Analysis for Hazards with Grounded Interactive Risk Reasoning

Project Timeline: 2026-2028 (Ongoing)

Links

GitHub

RAHGIR is a policy-conditioned, domain-specialized vision–language reasoning system designed to analyze forward-facing ego-vehicle road scenes and produce structured, probabilistic assessments of traffic risk under partial observability.

RAHGIR web interface showing input image, text question, and VLM risk assessment result

What RAHGIR Does

Converts a single road image into explicit risk assessments
Reasons from the ego-vehicle perspective using traffic-domain constraints
Produces calibrated Low / Medium / High risk estimates with justifications

Key Capabilities

Policy-conditioned hazard reasoning grounded in visual evidence
Explicit modeling of interaction risk between traffic agents
Occlusion-aware risk inference under incomplete observability
Structured, evaluation-friendly outputs (not free-form captions)

Reasoning Modes

Observed-agent risk: Risk estimation conditioned on visible road users
Hypothetical-agent risk: Risk estimation under potential agent emergence from occluded regions

RAHGIR is implemented on AWS with explicit reasoning policies, guardrails, calibration layers, and an evaluation harness.

Two-Stage Query Processing

RAHGIR employs a cost-efficient two-stage architecture to handle user queries:

Stage 1 – Query Relevance Analysis: A lightweight text-only model determines if the query is relevant to road safety and risk assessment
Stage 2 – Vision-Language Risk Analysis: Only relevant queries proceed to the expensive VLM for image-based risk assessment

This approach prevents costly VLM calls for off-topic queries while maintaining fast response times for legitimate safety questions.

Design Evolution

Initially, RAHGIR was tested with a single-stage approach where users submitted fully structured queries directly to the VLM. However, this proved inefficient and inflexible. The current two-stage architecture addresses these limitations by:

Accepting natural language queries: Users can ask questions in plain English without formatting requirements
Extracting key topics: Stage 1 identifies the intent and key topics (e.g., INTERACTION, WORKZONE, PEDESTRIAN, OCCLUSION) from the user's query
Structuring queries internally: The system automatically reformats and structures the query with appropriate context before passing it to the VLM
Filtering irrelevant queries: Non-relevant queries are rejected at Stage 1, saving VLM costs entirely

Query Filtering Examples

The system intelligently filters non-relevant queries before invoking the VLM:

Query: "Explain quantum entanglement like I am five" → Stage 1: NOT_RELEVANT | Quantum entanglement → Stage 2: Image analysis skipped

This pre-filtering is crucial because some queries may superficially appear related to transportation or safety but are actually off-topic. Since VLM queries are expensive, this cheap text-based pre-check significantly reduces operational costs while maintaining system responsiveness.

Testing on AWS Bedrock

In addition to testing via the AWS Bedrock console, the VLM was evaluated through programmatic and workflow-level testing to assess its behavior under realistic usage conditions. We also conducted human-in-the-loop validation, manually reviewing and validating the model’s outputs across 100 representative test cases to assess accuracy, relevance, and safety. We plan to construct a dedicated evaluation dataset and implement automated testing and scoring pipelines to enable repeatable benchmarking, regression testing, and continuous performance monitoring as the system evolves.

VLM response showing structured risk assessment in Bedrock

Contextual Understanding

RAHGIR demonstrates ability to:

Recognize road-relevant scenes: Correctly identifies work zones with appropriate risk levels
Reject non-road scenes: Recognizes when an image doesn't depict a road environment
Adapt to transportation contexts: Understands railroad tracks as a transportation risk domain
Provide nuanced assessments: Distinguishes between interaction risks and work zone risks with separate justifications

AWS Guardrails

RAHGIR implements comprehensive AWS Bedrock Guardrails to ensure safe and appropriate system behavior:

Content filtering: Blocks toxic, harmful, or inappropriate text and image inputs
Input validation: Ensures queries and images meet safety and quality standards
Output moderation: Validates VLM responses before returning to users
Policy enforcement: Maintains domain-specific constraints on reasoning and outputs

AWS Architecture

RAHGIR is deployed as a serverless application on AWS, leveraging multiple managed services for scalability, reliability, and cost-efficiency:

Core Services

Amazon Bedrock: Hosts the vision-language models (Claude 3 Haiku) for both query analysis and image-based risk assessment
AWS Lambda: Executes the two-stage analysis pipeline with automatic scaling and pay-per-use pricing
Amazon API Gateway: Provides RESTful API endpoint for client requests with request validation and throttling
Amazon S3: Hosts the static web interface with public read access
Amazon CloudFront: Content delivery network (CDN) with global edge locations for low-latency access

Benefits of This Architecture

Serverless: No infrastructure management, automatic scaling, pay-per-use pricing
Cost-efficient: Two-stage filtering prevents expensive VLM calls for irrelevant queries
Secure: Bedrock Guardrails, API Gateway throttling, CloudFront HTTPS
Scalable: Handles traffic spikes automatically without manual intervention
Global: CloudFront CDN provides low-latency access worldwide

Contact

Address
Miner School of Computer & Information Sciences,
1 University Avenue, Lowell MA 01854
United States

Github
Github
LinkedIn
LinkedIn
Scholar
Google Scholar
Medium
Medium

Zubin Bhuyan, Ph.D.
← Back to Portfolio

Recent Projects