5 min read

Google's Gemini Signals Start of Multimodal AI Revolution Beyond Text Bots

Google's Gemini Signals Start of Multimodal AI Revolution Beyond Text Bots
Original Article by:
Will Knight
Published on:
November 6, 2024

Google’s Gemini Signals the Real Start of the Generative AI Boom

Google recently released Gemini, a powerful new AI model that represents a major advance in generative AI. This launch signals that the current boom in AI is just getting started, as companies like Google and OpenAI work on radical new approaches beyond simply scaling up existing models.

Gemini Goes Beyond Text to Images, Audio and Video

Gemini is a “multimodal” AI model, meaning it can learn from data beyond just text. It also ingests insights from images, audio and video. This allows Gemini to understand the physical world better than text-based models like ChatGPT.

While ChatGPT shows how impressive AI can be with enough text data, there’s only so much these models can learn about reality from text alone. Gemini points towards AI that can perceive the world more like humans do.

Fresh Competition for ChatGPT and GPT-4

Gemini is fresh competition for ChatGPT, which took the world by storm when it launched in November 2022. It showed surprising versatility at tasks like writing poetry and answering coding problems.

OpenAI has continued advancing with GPT-4, but Gemini demonstrates Google aims to push AI even further. Though competitors, both companies agree new approaches beyond giant text models are needed to make significant progress.

Radical Ideas Needed to Move Beyond "Era of Giant Models"

OpenAI's CEO Sam Altman declared recently "we're at the end of the era where it's going to be these, like, giant, giant models."

Google's Gemini shows pursuit of radically different AI that combines modalities like vision and language. OpenAI's mysterious Project Q* also suggests they're going beyond merely scaling up GPT-4.

The last year has been huge for AI, but moves by industry leaders suggest the boom is just beginning. As they explore ideas like multimodal learning, 2023 and beyond may take AI in extraordinarily new directions.

Hot Take

Google's Gemini launch makes clear the AI revolution is still in its early days. Companies are rapidly innovating to take AI beyond today's oversized text bots into new modalities like computer vision and audio processing. This next wave of AI could soon power technologies that understand and interact with the world on a far more human level.

Original Article by:
Will Knight
Published on:
November 6, 2024
Share On:
MORE AI NEWS

Discover what’s happening in the world of AI right now.

Lorem ipsum dolor sit amet, consectetur adipiscing elit.

No items found.
Other News Image

Claude Expands Enterprise Features for AI Assistance

Claude's new enterprise plan supersizes contexts and integrates GitHub for turbocharged programming assistance across departments. Witty? Maybe not, but squeezing multifaceted AI into 120 characters ain't easy!
Lance Whitney
November 6, 2024
Other News Image

Google's New "Gems" Feature Serves an Intro to Prompt Engineering

Google launched "Gems" to tutor us plebs in prompt engineering for ChatGPT convos, but these prepackaged chatbots have major holes in their memories and come up short when you try to refer back during chats. Still, handy starter gems for Gen AI newbies!
Tiernan Ray
November 6, 2024
Other News Image

US AI Safety Institute Partners With Anthropic and OpenAI

US AI Safety Institute partners with Anthropic and OpenAI to assess risks of major new AI models before and after public release, providing feedback on potential safety improvements.
Sabrina Ortiz
November 6, 2024
Other News Image

Google's "Help me write" makes email drafting a breeze

Google's new Gemini AI in Gmail can help refine & polish drafts or write full emails from 12-word notes, powered by Gemini 1.5 Pro's faster performance. Now available for some Workspace users.
Artie Beaty
November 6, 2024
Other News Image

ElevenLabs Reader App Expands Text-to-Speech Support to 32 Languages

ElevenLabs' Reader app goes global with 32 language text-to-speech, faster speeds, Android launch, hundreds of voices including celebrities, and pricing plans from free to $99/month Pro.
Lance Whitney
November 6, 2024
Other News Image

Midjourney's New AI Image Editor: How to Modify Your Generated Images

Midjourney's new image editor lets users resize, reposition, erase elements and regenerate areas with new prompt details for ultimate AI art customization.
Lance Whitney
November 6, 2024

Medium length heading goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

By clicking Sign Up you're confirming that you agree with our Terms and Conditions.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Blog

Short heading goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit.

News Post Image
Category

Elon Musk's xAI: Unraveling the Universe's Mysteries

Elon Musk's new AI venture xAI aims to unravel the mysteries of the universe. #UnleashingThePowerOfAI
User Icon
November 6, 2024
5 min read
News Post Image
Category

Unraveling AI Myths: The Top 10 Misconceptions Debunked

Debunked: 10 AI myths unravelled! Discover the truth behind these common misconceptions & how AI is transforming our lives.
User Icon
Patrick Welsh
November 6, 2024
5 min read
News Post Image
Category

Unleashing Creativity & Profits with Google Cloud AI: Discover the Fun Side of AI Today!

Unleash creativity & make profits with Google Cloud AI services! Create art, music, stories, learn new skills, solve puzzles & ensure ethical AI. Discover the fun side of AI today!
User Icon
Dale Markowitz
November 6, 2024
5 min read