Content is user-generated and unverified.

AI Model Parameter Counts: A Comprehensive Analysis

The landscape of large language models has evolved dramatically since 2018, with parameter counts ranging from millions to potentially trillions. This comprehensive research reveals significant patterns in how companies approach model transparency, with early models typically having confirmed parameter counts while newer models increasingly keep this information proprietary.

Key Findings and Trends

Transparency shifts across the industry. Companies like OpenAI, Google, and Anthropic have moved from openly sharing parameter counts in their early models to keeping this information confidential for competitive reasons. OpenAI disclosed full details for GPT-1 through GPT-3 but has remained silent on GPT-4 and newer models. Similarly, Google openly shared PaLM's 540B parameters but keeps Gemini specifications under wraps.

The rise of efficient architectures. Many companies now use Mixture of Experts (MoE) architectures that activate only a fraction of total parameters per token. DeepSeek-V3, for instance, has 671B total parameters but uses only 37B per token, achieving remarkable efficiency. This architectural innovation allows models to scale without proportionally increasing inference costs.

Parameter count isn't everything. Google's PaLM 2 demonstrates this principle, achieving better performance than its 540B-parameter predecessor with only 340B parameters. Companies increasingly focus on training efficiency, data quality, and architectural innovations rather than raw parameter count.

Comprehensive Parameter Count Table

Confirmed Parameter Counts (Official Sources)

Company	Model	Parameters	Release Date	Architecture	Source
OpenAI	GPT-1	117M	June 2018	Transformer	Official paper
	GPT-2 Small	124M	Feb 2019	Transformer	GitHub repo
	GPT-2 Medium	355M	Feb 2019	Transformer	GitHub repo
	GPT-2 Large	774M	Feb 2019	Transformer	GitHub repo
	GPT-2 XL	1.5B	Feb 2019	Transformer	GitHub repo
	GPT-3 Ada	~350M	May 2020	Transformer	EleutherAI analysis
	GPT-3 Babbage	~1.3B	May 2020	Transformer	EleutherAI analysis
	GPT-3 Curie	~6.7B	May 2020	Transformer	EleutherAI analysis
	GPT-3 Davinci	175B	May 2020	Transformer	Official paper
Meta	Llama 1 (7B)	7B	Feb 2023	Transformer	Official release
	Llama 1 (13B)	13B	Feb 2023	Transformer	Official release
	Llama 1 (30B)	30B	Feb 2023	Transformer	Official release
	Llama 1 (65B)	65B	Feb 2023	Transformer	Official release
	Llama 2 (7B)	7B	July 2023	Transformer	Official release
	Llama 2 (13B)	13B	July 2023	Transformer	Official release
	Llama 2 (70B)	70B	July 2023	Transformer	Official release
	Code Llama (all sizes)	7B, 13B, 34B, 70B	Aug 2023-Jan 2024	Transformer	Official release
	Llama 3 (8B, 70B)	8B, 70B	April 2024	Transformer	Official release
	Llama 3.1 (8B, 70B, 405B)	8B, 70B, 405B	July 2024	Transformer	Official release
	Llama 3.2	1B, 3B, 11B, 90B	Sept 2024	Transformer/Vision	Official release
	Llama 3.3	70B	Dec 2024	Transformer	Official release
	Llama 4 Scout	109B total, 17B active	April 2025	MoE	Official release
	Llama 4 Maverick	400B total, 17B active	April 2025	MoE	Official release
Google	T5 (all sizes)	77M to 11B	2019	Transformer	Official paper
	LaMDA	137B	2021	Transformer	Official blog
	PaLM	540B	April 2022	Transformer	Official blog
	PaLM 2	340B	May 2023	Transformer	CNBC report
	Gemini Nano	1.8B, 3.25B	Dec 2023	Transformer	Technical docs
DeepSeek	DeepSeek-LLM	7B, 67B	Dec 2023	Transformer	GitHub repo
	DeepSeek-Coder	1.3B, 6.7B, 33B	Nov 2023	Transformer	arXiv paper
	DeepSeek-V2	236B total, 21B active	May 2024	MoE	arXiv paper
	DeepSeek-V3	671B total, 37B active	Dec 2024	MoE	arXiv paper
	DeepSeek-R1	671B total, 37B active	Jan 2025	MoE	GitHub repo
xAI	Grok-1	314B	Nov 2023	MoE (8 experts)	GitHub repo
Mistral	Mistral 7B	7.3B	Oct 2023	Transformer	Official blog
	Mixtral 8x7B	46.7B total, 12.9B active	Jan 2024	MoE	Official blog
	Mixtral 8x22B	141B total, 39B active	April 2024	MoE	Official blog
	Codestral	22B	May 2024	Transformer	Official docs
Others	AI21 Jurassic-1	7B, 178B	Aug 2021	Transformer	Official blog
	Inflection-2	175B	Nov 2023	Transformer	Company announcement
	Cohere Command R+	104B	March 2024	Transformer	HuggingFace
	Alibaba Qwen series	0.5B to 235B	2023-2024	Transformer/MoE	GitHub repos
	01.AI Yi series	6B, 9B, 34B	Nov 2023	Transformer	HuggingFace
	Stability StableLM	1.6B to 7B	2023-2024	Transformer	GitHub repos

Estimated Parameter Counts (Unconfirmed)

Company	Model	Estimated Parameters	Release Date	Source of Estimate
Anthropic	Claude 3 Haiku	~20B	March 2024	Alan D. Thompson (AI researcher)
	Claude 3 Sonnet	~70B	March 2024	Alan D. Thompson
	Claude 3 Opus	~2T	March 2024	Alan D. Thompson
	Claude 3.5 Sonnet	>175B	June 2024	Third-party analysis
OpenAI	GPT-3.5-Turbo	~20B	Nov 2022	Speculation based on performance
	GPT-4	~1.7T total	March 2023	Industry estimates
	GPT-4o	~200B	May 2024	Third-party analysis
	GPT-4o Mini	~8B	May 2024	Third-party analysis
	o1-preview	~300B	Sept 2024	Third-party estimates
	o1-mini	~100B	Sept 2024	Third-party estimates
Google	Gemini Ultra	30-65T (speculated)	Dec 2023	Unconfirmed rumors

Models with Undisclosed Parameters

The following models have no reliable parameter count information available:

Anthropic: All Claude 1, 2, 3.5 v2, 3.7, and 4 models
OpenAI: GPT-4 Turbo, o3, o3-mini
Google: All Gemini models except Nano
xAI: All Grok models except Grok-1
Mistral: Mistral Large, Medium, Small
Others: Inflection-1, Cohere Command R, AI21 Jurassic-2

Notable Architectural Innovations

Mixture of Experts (MoE) Revolution

DeepSeek pioneered cost-efficient MoE architectures, with DeepSeek-V3 achieving GPT-4-level performance at 1/20th the training cost. Their 671B parameter model activates only 37B per token, demonstrating how architectural innovation can dramatically improve efficiency.

Context Length Evolution

While early models like GPT-2 supported 1,024 tokens, modern models reach unprecedented scales. Llama 4 Scout supports 10 million tokens, while DeepSeek and Gemini models commonly support 128K+ tokens. This expansion enables entirely new use cases for AI systems.

Multimodal Integration

The latest generation includes native multimodal capabilities. Llama 3.2 introduced vision models with 11B and 90B parameters, while Llama 4 builds multimodality into its core architecture. This trend reflects the industry's move beyond text-only models.

Cost and Efficiency Insights

Training costs vary dramatically. DeepSeek-V3's 671B parameter model cost approximately $5.6 million to train using 2.788M H800 GPU hours, while estimates suggest GPT-4 cost over $100 million. xAI's Grok-3 reportedly used 200,000 GPUs with an estimated $6-8 billion training cost, highlighting the resource intensity of frontier model development.

Industry Implications

The parameter count data reveals three critical industry trends. First, strategic opacity has become the norm, with leading companies treating model specifications as trade secrets. Second, efficiency over scale drives innovation, as companies achieve better performance with fewer parameters through architectural improvements. Third, open-source momentum continues, with Meta's Llama series and DeepSeek's models providing transparent alternatives to proprietary systems.

This comprehensive analysis demonstrates that while parameter counts provide useful benchmarks, they represent just one dimension of model capability. The future of AI development lies not in raw parameter scaling but in architectural innovation, training efficiency, and multimodal integration.

Content is user-generated and unverified.