October 2025
NGen 3.1 Pro Multimodal

Overview.
NGen 3.1 Pro represents a quantum leap in artificial intelligence, combining cutting-edge multimodal capabilities with unprecedented reasoning power. Built on advanced transformer architecture with specialized multimodal fusion layers, NGen 3.1 Pro seamlessly processes and understands both textual and visual information.
Model Performance Benchmarks.
| Benchmark | NGen3.1-Pro | GPT-4o | Claude 3.5 |
|---|---|---|---|
| College-level Problems | |||
| MMMU | 57.1 | 70.3 | 70.4 |
| MMMU Pro | 34.6 | 54.5 | 54.7 |
| Document and Diagrams Reading | |||
| DocVQA | 95.0 | 91.1 | 95.2 |
| InfoVQA | 80.7 | 80.7 | 74.3 |
| OCRBench V2 | 58.3 | 46.5 | 45.2 |
| General Visual Question Answering | |||
| MMStar | 57.6 | 64.7 | 65.1 |
| MMBench 1.1 | 71.6 | 82.1 | 83.4 |
| Math | |||
| MathVista | 68.1 | 63.8 | 65.4 |
| MathVision | 20.6 | 30.4 | 38.3 |
| Video Understanding | |||
| VideoMME | 71.8 | 71.9 | 60.0 |
| LVBench | 63.9 | 30.8 | – |
| Visual Agent | |||
| AITZ | 84.4 | 35.3 | – |
| ScreenSpot | 65.1 | 18.1 | 83.0 |
| ScreenSpot Pro | 43.9 | 17.1 | – |
Average Performance by Category.
NGen3.1-Pro
GPT-4o
Claude 3.5
Gemini-2 Flash
College-level Problems
47
62
62
63
Document and Diagrams Reading
81
76
72
79
General Visual Question Answering
54
67
67
69
Math
52
53
55
57
Video Understanding
43
31
34
—
Visual Agent
65
83
9
44
Key Capabilities.
- True multimodal understanding with text and image processing
- Advanced document intelligence and OCR capabilities
- Superior performance in visual agent tasks
- Video understanding and analysis
Pricing.
₹0.42 input / ₹0.58 output + ₹1.66/image per 1K tokens