AI Papers Podcast

By: PocketPod
  • Summary

  • A daily update on the latest AI Research Papers. We provide a high level overview of a handful of papers each day and will link all papers in the description for further reading. This podcast is created entirely with AI by PocketPod. Head over to https://pocketpod.app to learn more.
    PocketPod
    Show More Show Less
activate_Holiday_promo_in_buybox_DT_T2
activate_samplebutton_t1
Episodes
  • Improving Agent Design, JPEG-LM's Visual Breakthrough, TurboEdit's Real-Time Image Edits, Video Segmentation Advances, LLMs Learning Like Humans, RL Benchmarks
    Aug 21 2024
    xGen-MM (BLIP-3): A Family of Open Large Multimodal Models JPEG-LM: LLMs as Image Generators with Canonical Codec Representations Automated Design of Agentic Systems TurboEdit: Instant text-based image editing Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning Fine-tuning Large Language Models with Human-inspired Learning Strategies in Medical Question Answering D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning
    Show More Show Less
    16 mins
  • Science & Clinical LLMs Leaps, Enhancing Small Model Reasoning, New Frontiers in Controlled Media Generation
    Aug 16 2024
    The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery Med42-v2: A Suite of Clinical LLMs Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers ControlNeXt: Powerful and Efficient Control for Image and Video Generation CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer FruitNeRF: A Unified Neural Radiance Field based Fruit Counting Framework VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
    Show More Show Less
    14 mins
  • Multimodal Benchmarks, Visual Task Transfer, and 3D Object Generation
    Aug 8 2024
    MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models LLaVA-OneVision: Easy Visual Task Transfer An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Diffusion Models as Data Mining Tools
    Show More Show Less
    14 mins

What listeners say about AI Papers Podcast

Average customer ratings

Reviews - Please select the tabs below to change the source of reviews.