October 2025

NGen 3.1 Pro Multimodal

Overview.

NGen 3.1 Pro represents a quantum leap in artificial intelligence, combining cutting-edge multimodal capabilities with unprecedented reasoning power. Built on advanced transformer architecture with specialized multimodal fusion layers, NGen 3.1 Pro seamlessly processes and understands both textual and visual information.

Model Performance Benchmarks.

Benchmark	NGen3.1-Pro	GPT-4o	Claude 3.5
College-level Problems
MMMU	57.1	70.3	70.4
MMMU Pro	34.6	54.5	54.7
Document and Diagrams Reading
DocVQA	95.0	91.1	95.2
InfoVQA	80.7	80.7	74.3
OCRBench V2	58.3	46.5	45.2
General Visual Question Answering
MMStar	57.6	64.7	65.1
MMBench 1.1	71.6	82.1	83.4
Math
MathVista	68.1	63.8	65.4
MathVision	20.6	30.4	38.3
Video Understanding
VideoMME	71.8	71.9	60.0
LVBench	63.9	30.8	–
Visual Agent
AITZ	84.4	35.3	–
ScreenSpot	65.1	18.1	83.0
ScreenSpot Pro	43.9	17.1	–

Average Performance by Category.

NGen3.1-Pro

GPT-4o

Claude 3.5

Gemini-2 Flash

College-level Problems

Document and Diagrams Reading

General Visual Question Answering

Math

Video Understanding

—

Visual Agent

Key Capabilities.

True multimodal understanding with text and image processing
Advanced document intelligence and OCR capabilities
Superior performance in visual agent tasks
Video understanding and analysis

Pricing.

₹0.42 input / ₹0.58 output + ₹1.66/image per 1K tokens

Try on API Platform →