[![Contributors][contributors-shield]][contributors-url] [![Forks][forks-shield]][forks-url] [![Stargazers][stars-shield]][stars-url] [![Issues][issues-shield]][issues-url]
Table of Contents
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-07-12 | MUSCLE: A Model Update Strategy for Compatible LLM Evolution | Jessica Echterhoff et.al. | 2407.09435v1 | null |
2024-07-12 | Transformer Layers as Painters | Qi Sun et.al. | 2407.09298v1 | null |
2024-07-12 | Movie Recommendation with Poster Attention via Multi-modal Transformer Feature Fusion | Linhan Xia et.al. | 2407.09157v1 | null |
2024-07-12 | Aligning Diffusion Behaviors with Q-functions for Efficient Continuous Control | Huayu Chen et.al. | 2407.09024v1 | null |
2024-07-12 | Tissue-Contrastive Semi-Masked Autoencoders for Segmentation Pretraining on Chest CT | Jie Zheng et.al. | 2407.08961v1 | null |
2024-07-12 | Symmetry Awareness Encoded Deep Learning Framework for Brain Imaging Analysis | Yang Ma et.al. | 2407.08948v1 | link |
2024-07-11 | Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing | Huanqian Wang et.al. | 2407.08770v1 | link |
2024-07-11 | Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data | Cherie Ho et.al. | 2407.08726v1 | null |
2024-07-11 | A Taxonomy for Data Contamination in Large Language Models | Medha Palavalli et.al. | 2407.08716v1 | null |
2024-07-11 | Mitigating Catastrophic Forgetting in Language Transfer via Model Merging | Anton Alexandrov et.al. | 2407.08699v1 | null |
2024-07-11 | Jet Tagging with More-Interaction Particle Transformer | Yifan Wu et.al. | 2407.08682v1 | null |
2024-07-11 | Emergent Visual-Semantic Hierarchies in Image-Text Representations | Morris Alper et.al. | 2407.08521v1 | null |
2024-07-11 | Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Cross-Regularization | Jinlong Li et.al. | 2407.08374v1 | null |
2024-07-11 | E2VIDiff: Perceptual Events-to-Video Reconstruction using Diffusion Priors | Jinxiu Liang et.al. | 2407.08231v1 | null |
2024-07-11 | Generating Contextually-Relevant Navigation Instructions for Blind and Low Vision People | Zain Merchant et.al. | 2407.08219v1 | null |
2024-07-10 | Knowledge Overshadowing Causes Amalgamated Hallucination in Large Language Models | Yuji Zhang et.al. | 2407.08039v1 | null |
2024-07-10 | Training on the Test Task Confounds Evaluation and Emergence | Ricardo Dominguez-Olmedo et.al. | 2407.07890v1 | link |
2024-07-10 | Learning Spatial-Semantic Features for Robust Video Object Segmentation | Xin Li et.al. | 2407.07760v1 | null |
2024-07-10 | VEnhancer: Generative Space-Time Enhancement for Video Generation | Jingwen He et.al. | 2407.07667v1 | null |
2024-07-10 | Machine Unlearning for Medical Imaging | Reza Nasirigerdeh et.al. | 2407.07539v1 | null |
2024-07-10 | IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection | Mingjin Zhang et.al. | 2407.07520v1 | link |
2024-07-10 | Bucket Pre-training is All You Need | Hongtao Liu et.al. | 2407.07495v1 | null |
2024-07-10 | Exploring the Untouched Sweeps for Conflict-Aware 3D Segmentation Pretraining | Tianfang Sun et.al. | 2407.07465v1 | null |
2024-07-10 | Unity in Diversity: Multi-expert Knowledge Confrontation and Collaboration for Generalizable Vehicle Re-identification | Zhenyu Kuang et.al. | 2407.07351v1 | null |
2024-07-10 | Micro-Expression Recognition by Motion Feature Extraction based on Pre-training | Ruolin Li et.al. | 2407.07345v1 | null |
2024-07-10 | ViTime: A Visual Intelligence-Based Foundation Model for Time Series Forecasting | Luoxiao Yang et.al. | 2407.07311v1 | link |
2024-07-09 | FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation | Liqun Ma et.al. | 2407.07093v1 | link |
2024-07-09 | ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction | Shaozhe Hao et.al. | 2407.07077v1 | link |
2024-07-09 | CycleSAM: One-Shot Surgical Scene Segmentation using Cycle-Consistent Feature Matching to Prompt SAM | Aditya Murali et.al. | 2407.06795v1 | null |
2024-07-09 | CoLA: Conditional Dropout and Language-driven Robust Dual-modal Salient Object Detection | Shuang Hao et.al. | 2407.06780v1 | link |
2024-07-09 | Using Pretrained Large Language Model with Prompt Engineering to Answer Biomedical Questions | Wenxin Zhou et.al. | 2407.06779v1 | null |
2024-07-09 | Pretraining-finetuning Framework for Efficient Co-design: A Case Study on Quadruped Robot Parkour | Ci Chen et.al. | 2407.06770v1 | null |
2024-07-09 | PDEformer-1: A Foundation Model for One-Dimensional Partial Differential Equations | Zhanhong Ye et.al. | 2407.06664v1 | null |
2024-07-09 | Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition | Mingfang Zhang et.al. | 2407.06628v1 | null |
2024-07-09 | F2PAD: A General Optimization Framework for Feature-Level to Pixel-Level Anomaly Detection | Chengyu Tao et.al. | 2407.06519v1 | null |
2024-07-09 | A Clinical Benchmark of Public Self-Supervised Pathology Foundation Models | Gabriele Campanella et.al. | 2407.06508v1 | null |
2024-07-08 | 4D Contrastive Superflows are Dense 3D Representation Learners | Xiang Xu et.al. | 2407.06190v1 | link |
2024-07-08 | Uni-ELF: A Multi-Level Representation Learning Framework for Electrolyte Formulation Design | Boshen Zeng et.al. | 2407.06152v1 | null |
2024-07-08 | 3D Vision and Language Pretraining with Large-Scale Synthetic Data | Dejie Yang et.al. | 2407.06084v1 | link |
2024-07-08 | From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty | Maor Ivgi et.al. | 2407.06071v1 | link |
2024-07-08 | MST5 -- Multilingual Question Answering over Knowledge Graphs | Nikit Srivastava et.al. | 2407.06041v1 | link |
2024-07-08 | Igea: a Decoder-Only Language Model for Biomedical Text Generation in Italian | Tommaso Mario Buonocore et.al. | 2407.06011v1 | null |
2024-07-08 | Pseudo-triplet Guided Few-shot Composed Image Retrieval | Bohan Hou et.al. | 2407.06001v1 | null |
2024-07-08 | Multimodal Diffusion Transformer: Learning Versatile Behavior from Multimodal Goals | Moritz Reuss et.al. | 2407.05996v1 | null |
2024-07-08 | Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning | Bin Ren et.al. | 2407.05862v1 | null |
2024-07-08 | An Empirical Comparison of Vocabulary Expansion and Initialization Approaches for Language Models | Nandini Mundra et.al. | 2407.05841v1 | link |
2024-07-05 | Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs | Rudolf Laine et.al. | 2407.04694v1 | null |
2024-07-05 | Pretraining End-to-End Keyword Search with Automatically Discovered Acoustic Units | Bolaji Yusuf et.al. | 2407.04652v1 | link |
2024-07-05 | Performance Analysis of Speech Encoders for Low-Resource SLU and ASR in Tunisian Dialect | Salima Mdhaffar et.al. | 2407.04533v1 | null |
2024-07-05 | Few-Shot Airway-Tree Modeling using Data-Driven Sparse Priors | Ali Keshavarzi et.al. | 2407.04507v1 | null |
2024-07-05 | Using LLMs to label medical papers according to the CIViC evidence model | Markus Hisch et.al. | 2407.04466v1 | null |
2024-07-05 | Generalists vs. Specialists: Evaluating Large Language Models for Urdu | Samee Arif et.al. | 2407.04459v1 | null |
2024-07-05 | Multi-modal Masked Siamese Network Improves Chest X-Ray Representation Learning | Saeed Shurrab et.al. | 2407.04449v1 | link |
2024-07-05 | XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models | Shashi Kumar et.al. | 2407.04439v1 | null |
2024-07-05 | Understanding the Role of Invariance in Transfer Learning | Till Speicher et.al. | 2407.04325v1 | link |
2024-07-05 | Smart Vision-Language Reasoners | Denisa Roberts et.al. | 2407.04212v1 | link |
2024-07-03 | STF: Sentence Transformer Fine-Tuning For Topic Categorization With Limited Data | Kheir Eddine Daouadi et.al. | 2407.03253v1 | null |
2024-07-03 | CATT: Character-based Arabic Tashkeel Transformer | Faris Alasmary et.al. | 2407.03236v1 | link |
2024-07-03 | SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding | Weitai Kang et.al. | 2407.03200v1 | link |
2024-07-03 | On the Client Preference of LLM Fine-tuning in Federated Learning | Feijie Wu et.al. | 2407.03038v1 | null |
2024-07-03 | Strategies for Arabic Readability Modeling | Juan Piñeros Liberato et.al. | 2407.03032v1 | null |
2024-07-03 | Exploiting Dialect Identification in Automatic Dialectal Text Normalization | Bashar Alhafni et.al. | 2407.03020v1 | null |
2024-07-03 | Large language models, physics-based modeling, experimental measurements: the trinity of data-scarce learning of polymer properties | Ning Liu et.al. | 2407.02770v1 | null |
2024-07-02 | Magic Insert: Style-Aware Drag-and-Drop | Nataniel Ruiz et.al. | 2407.02489v1 | null |
2024-07-02 | GCF: Graph Convolutional Networks for Facial Expression Recognition | Hozaifa Kassab et.al. | 2407.02361v1 | null |
2024-07-02 | Parameter-Selective Continual Test-Time Adaptation | Jiaxu Tian et.al. | 2407.02253v1 | null |
2024-07-02 | MIREncoder: Multi-modal IR-based Pretrained Embeddings for Performance Optimizations | Akash Dutta et.al. | 2407.02238v1 | null |
2024-07-02 | Breaking Language Barriers: Cross-Lingual Continual Pre-Training at Scale | Wenzhen Zheng et.al. | 2407.02118v1 | null |
2024-07-02 | DM3D: Distortion-Minimized Weight Pruning for Lossless 3D Object Detection | Kaixin Xu et.al. | 2407.02098v1 | null |
2024-07-02 | ScaleDreamer: Scalable Text-to-3D Synthesis with Asynchronous Score Distillation | Zhiyuan Ma et.al. | 2407.02040v1 | link |
2024-07-02 | Multi-Grained Contrast for Data-Efficient Unsupervised Representation Learning | Chengchao Shen et.al. | 2407.02014v1 | link |
2024-07-02 | Unleash the Power of Local Representations for Few-Shot Classification | Shi Tang et.al. | 2407.01967v1 | null |
2024-07-02 | Text-Aware Diffusion for Policy Learning | Calvin Luo et.al. | 2407.01903v1 | null |
2024-06-28 | Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs | Sukmin Yun et.al. | 2406.20098v1 | link |
2024-06-28 | LLaRA: Supercharging Robot Learning Data for Vision-Language Policy | Xiang Li et.al. | 2406.20095v1 | link |
2024-06-28 | BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5 | Zhehuai Chen et.al. | 2406.19954v1 | null |
2024-06-28 | Breaking the Script Barrier in Multilingual Pre-Trained Language Models with Transliteration-Based Post-Training Alignment | Orgest Xhelili et.al. | 2406.19759v1 | null |
2024-06-28 | Deep Fusion Model for Brain Tumor Classification Using Fine-Grained Gradient Preservation | Niful Islam et.al. | 2406.19690v1 | null |
2024-06-28 | PopAlign: Population-Level Alignment for Fair Text-to-Image Generation | Shufan Li et.al. | 2406.19668v1 | link |
2024-06-27 | Subtractive Training for Music Stem Insertion using Latent Diffusion Models | Ivan Villa-Renteria et.al. | 2406.19328v1 | null |
2024-06-27 | SimpleFusion: A Simple Fusion Framework for Infrared and Visible Images | Ming Chen et.al. | 2406.19055v1 | link |
2024-06-27 | Fine-tuned network relies on generic representation to solve unseen cognitive task | Dongyan Lin et.al. | 2406.18926v1 | null |
2024-06-27 | Sonnet or Not, Bot? Poetry Evaluation for Large Models and Datasets | Melanie Walsh et.al. | 2406.18906v1 | null |
2024-06-27 | Learning Modality Knowledge Alignment for Cross-Modality Transfer | Wenxuan Ma et.al. | 2406.18864v1 | null |
2024-06-27 | LICO: Large Language Models for In-Context Molecular Optimization | Tung Nguyen et.al. | 2406.18851v1 | null |
2024-06-26 | Learn it or Leave it: Module Composition and Pruning for Continual Learning | Mingyang Wang et.al. | 2406.18708v1 | null |
2024-06-26 | Automatic Prediction of Amyotrophic Lateral Sclerosis Progression using Longitudinal Speech Transformer | Liming Wang et.al. | 2406.18625v1 | null |
2024-06-26 | Mental Modeling of Reinforcement Learning Agents by Language Models | Wenhao Lu et.al. | 2406.18505v1 | null |
2024-06-26 | Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference | Yuan Gao et.al. | 2406.18453v1 | link |
2024-06-27 | Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs | Lei Zhang et.al. | 2406.18294v2 | link |
2024-06-26 | Generative artificial intelligence in ophthalmology: multimodal retinal images for the diagnosis of Alzheimer's disease with convolutional neural networks | I. R. Slootweg et.al. | 2406.18247v1 | null |
2024-06-26 | 3D-MVP: 3D Multiview Pretraining for Robotic Manipulation | Shengyi Qian et.al. | 2406.18158v1 | null |
2024-06-26 | Open-vocabulary Mobile Manipulation in Unseen Dynamic Environments with 3D Semantic Maps | Dicong Qiu et.al. | 2406.18115v1 | null |
2024-06-26 | The Surprising Effectiveness of Multimodal Large Language Models for Video Moment Retrieval | Meinardus Boris et.al. | 2406.18113v1 | link |
2024-06-26 | Multilingual Knowledge Graph Completion from Pretrained Language Models with Knowledge Constraints | Ran Song et.al. | 2406.18085v1 | link |
2024-06-26 | Few-Shot Medical Image Segmentation with High-Fidelity Prototypes | Song Tang et.al. | 2406.18074v1 | link |
2024-06-27 | EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation | Baoqi Pei et.al. | 2406.18070v2 | null |
2024-06-25 | Data curation via joint example selection further accelerates multimodal learning | Talfan Evans et.al. | 2406.17711v1 | null |
2024-06-25 | This Paper Had the Smartest Reviewers -- Flattery Detection Utilising an Audio-Textual Transformer-Based Approach | Lukas Christ et.al. | 2406.17667v1 | null |
2024-06-25 | Transformer-based segmentation of adnexal lesions and ovarian implants in CT images | Aneesh Rangnekar et.al. | 2406.17666v1 | null |
2024-06-25 | Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients | Aashiq Muhamed et.al. | 2406.17660v1 | link |
2024-06-26 | Minimal Interaction Edge Tuning: A New Paradigm for Visual Adaptation | Ningyuan Tang et.al. | 2406.17559v2 | null |
2024-06-25 | The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale | Guilherme Penedo et.al. | 2406.17557v1 | null |
2024-06-25 | Retrieval-style In-Context Learning for Few-shot Hierarchical Text Classification | Huiyao Chen et.al. | 2406.17534v1 | null |
2024-06-25 | Towards Federated Low-Rank Adaptation with Rank-Heterogeneous Communication | Yuji Byun et.al. | 2406.17477v1 | null |
2024-06-25 | Investigating Self-Supervised Methods for Label-Efficient Learning | Srinivasa Rao Nandam et.al. | 2406.17460v1 | null |
2024-06-25 | Native Design Bias: Studying the Impact of English Nativeness on Language Model Performance | Manon Reusens et.al. | 2406.17385v1 | null |
2024-06-24 | Dreamitate: Real-World Visuomotor Policy Learning via Video Generation | Junbang Liang et.al. | 2406.16862v1 | null |
2024-06-24 | Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters | Euiin Yi et.al. | 2406.16758v1 | null |
2024-06-24 | Geometry-Aware Score Distillation via 3D Consistent Noising and Gradient Consistency Modeling | Min-Seop Kwak et.al. | 2406.16695v1 | null |
2024-06-24 | Segment Any Text: A Universal Approach for Robust, Efficient and Adaptable Sentence Segmentation | Markus Frohmann et.al. | 2406.16678v1 | null |
2024-06-24 | CAVE: Controllable Authorship Verification Explanations | Sahana Ramnath et.al. | 2406.16672v1 | link |
2024-06-24 | DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-Resolution | Aiwen Jiang et.al. | 2406.16477v1 | null |
2024-06-24 | Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation | Yuchen Yang et.al. | 2406.16282v1 | link |
2024-06-24 | Feature-prompting GBMSeg: One-Shot Reference Guided Training-Free Prompt Engineering for Glomerular Basement Membrane Segmentation | Xueyu Liu et.al. | 2406.16271v1 | null |
2024-06-23 | Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking | Yuwei Zhang et.al. | 2406.16148v1 | link |
2024-06-23 | Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models | Lynn Chua et.al. | 2406.16135v1 | link |
2024-06-21 | Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning | Brandon Huang et.al. | 2406.15334v1 | null |
2024-06-21 | GiusBERTo: A Legal Language Model for Personal Data De-identification in Italian Court of Auditors Decisions | Giulio Salierno et.al. | 2406.15032v1 | null |
2024-06-21 | Uni-Mol2: Exploring Molecular Pretraining Model at Scale | Xiaohong Ji et.al. | 2406.14969v1 | null |
2024-06-21 | ICLEval: Evaluating In-Context Learning Ability of Large Language Models | Wentong Chen et.al. | 2406.14955v1 | link |
2024-06-21 | 70B-parameter large language models in Japanese medical question-answering | Issey Sukeda et.al. | 2406.14882v1 | null |
2024-06-20 | Understanding Finetuning for Factual Knowledge Extraction | Gaurav Ghosal et.al. | 2406.14785v1 | null |
2024-06-20 | Factual Dialogue Summarization via Learning from Large Language Models | Rongxin Zhu et.al. | 2406.14709v1 | null |
2024-06-20 | Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities | Sachit Menon et.al. | 2406.14562v1 | null |
2024-06-20 | V-LASIK: Consistent Glasses-Removal from Videos Using Synthetic Data | Rotem Shalev-Arkushin et.al. | 2406.14510v1 | null |
2024-06-20 | Data-Centric AI in the Age of Large Language Models | Xinyi Xu et.al. | 2406.14473v1 | null |
2024-06-20 | Decoding Vocal Articulations from Acoustic Latent Representations | Mateo Cámara et.al. | 2406.14379v1 | null |
2024-06-20 | Infusing clinical knowledge into tokenisers for language models | Abul Hasan et.al. | 2406.14312v1 | null |
2024-06-20 | On the Evaluation Practices in Multilingual NLP: Can Machine Translation Offer an Alternative to Human Translations? | Rochelle Choenni et.al. | 2406.14267v1 | null |
2024-06-20 | Geometric Self-Supervised Pretraining on 3D Protein Structures using Subgraphs | Michail Chatzianastasis et.al. | 2406.14142v1 | null |
2024-06-20 | Two-Stage Depth Enhanced Learning with Obstacle Map For Object Navigation | Yanwei Zheng et.al. | 2406.14103v1 | null |
2024-06-20 | Protecting Privacy Through Approximating Optimal Parameters for Sequence Unlearning in Language Models | Dohyun Lee et.al. | 2406.14091v1 | null |
2024-06-20 | Information Guided Regularization for Fine-tuning Language Models | Mandar Sharma et.al. | 2406.14005v1 | link |
2024-06-18 | GFM4MPM: Towards Geospatial Foundation Models for Mineral Prospectivity Mapping | Angel Daruna et.al. | 2406.12756v1 | null |
2024-06-18 | BIOSCAN-5M: A Multimodal Dataset for Insect Biodiversity | Zahra Gharaee et.al. | 2406.12723v1 | link |
2024-06-18 | GeoBench: Benchmarking and Analyzing Monocular Geometry Estimation Models | Yongtao Ge et.al. | 2406.12671v1 | link |
2024-06-18 | News Without Borders: Domain Adaptation of Multilingual Sentence Embeddings for Cross-lingual News Recommendation | Andreea Iana et.al. | 2406.12634v1 | link |
2024-06-18 | From Instance Training to Instruction Learning: Task Adapters Generation from Instructions | Huanxuan Liao et.al. | 2406.12382v1 | null |
2024-06-18 | Cross-Lingual Unlearning of Selective Knowledge in Multilingual Language Models | Minseok Choi et.al. | 2406.12354v1 | null |
2024-06-18 | JEN-1 DreamStyler: Customized Musical Concept Learning via Pivotal Parameters Tuning | Boyu Chen et.al. | 2406.12292v1 | null |
2024-06-18 | VIRL: Volume-Informed Representation Learning towards Few-shot Manufacturability Estimation | Yu-hsuan Chen et.al. | 2406.12286v1 | null |
2024-06-18 | LLMs Are Prone to Fallacies in Causal Inference | Nitish Joshi et.al. | 2406.12158v1 | null |
2024-06-17 | Efficient Sequential Decision Making with Large Language Models | Dingyang Chen et.al. | 2406.12125v1 | null |
2024-06-17 | Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations | Kazusato Oko et.al. | 2406.11828v1 | null |
2024-06-17 | How Do Large Language Models Acquire Factual Knowledge During Pretraining? | Hoyeon Chang et.al. | 2406.11813v1 | null |
2024-06-17 | DataComp-LM: In search of the next generation of training sets for language models | Jeffrey Li et.al. | 2406.11794v1 | null |
2024-06-17 | A Brief Survey on Leveraging Large Scale Vision Models for Enhanced Robot Grasping | Abhi Kamboj et.al. | 2406.11786v1 | null |
2024-06-17 | Input Conditioned Graph Generation for Language Agents | Lukas Vierling et.al. | 2406.11555v1 | link |
2024-06-17 | BAMBINO-LM: (Bilingual-)Human-Inspired Continual Pretraining of BabyLM | Zhewen Shen et.al. | 2406.11418v1 | null |
2024-06-17 | CodeGemma: Open Code Models Based on Gemma | CodeGemma Team et.al. | 2406.11409v1 | null |
2024-06-17 | Preserving Knowledge in Large Language Model: A Model-Agnostic Self-Decompression Approach | Zilun Zhang et.al. | 2406.11354v1 | null |
2024-06-18 | BaFTA: Backprop-Free Test-Time Adaptation For Zero-Shot Vision-Language Models | Xuefeng Hu et.al. | 2406.11309v2 | null |
2024-06-17 | MiniConGTS: A Near Ultimate Minimalist Contrastive Grid Tagging Scheme for Aspect Sentiment Triplet Extraction | Qiao Sun et.al. | 2406.11234v1 | null |
2024-06-14 | Quantifying Variance in Evaluation Benchmarks | Lovish Madaan et.al. | 2406.10229v1 | null |
2024-06-14 | PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting | Alex Hanson et.al. | 2406.10219v1 | null |
2024-06-14 | AlignNet: Learning dataset score alignment functions to enable better training of speech quality estimators | Jaden Pieper et.al. | 2406.10205v1 | null |
2024-06-14 | Improving rule mining via embedding-based link prediction | N'Dah Jean Kouagou et.al. | 2406.10144v1 | link |
2024-06-14 | Training-free Camera Control for Video Generation | Chen Hou et.al. | 2406.10126v1 | null |
2024-06-14 | Intepretative Deep Learning using Domain Adaptation for Fluorescence Spectroscopy | Umberto Michelucci et.al. | 2406.10031v1 | null |
2024-06-14 | Group and Shuffle: Efficient Structured Orthogonal Parametrization | Mikhail Gorbunov et.al. | 2406.10019v1 | null |
2024-06-14 | OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control | Yuzhong Huang et.al. | 2406.10000v1 | null |
2024-06-14 | TabularFM: An Open Framework For Tabular Foundational Models | Quan M. Tran et.al. | 2406.09837v1 | null |
2024-06-14 | HiP Attention: Sparse Sub-Quadratic Attention with Hierarchical Attention Pruning | Heejun Lee et.al. | 2406.09827v1 | null |
2024-06-13 | Explore the Limits of Omni-modal Pretraining at Scale | Yiyuan Zhang et.al. | 2406.09412v1 | link |
2024-06-13 | Reflecting on the State of Rehearsal-free Continual Learning with Pretrained Models | Lukas Thede et.al. | 2406.09384v1 | null |
2024-06-13 | Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations | Rylan Schaeffer et.al. | 2406.09366v1 | null |
2024-06-13 | End-to-end Streaming model for Low-Latency Speech Anonymization | Waris Quamer et.al. | 2406.09277v1 | null |
2024-06-13 | OpenVLA: An Open-Source Vision-Language-Action Model | Moo Jin Kim et.al. | 2406.09246v1 | null |
2024-06-13 | Language Complexity and Speech Recognition Accuracy: Orthographic Complexity Hurts, Phonological Complexity Doesn't | Chihiro Taguchi et.al. | 2406.09202v1 | null |
2024-06-13 | SR-CACO-2: A Dataset for Confocal Fluorescence Microscopy Image Super-Resolution | Soufiane Belharbi et.al. | 2406.09168v1 | link |
2024-06-13 | MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning | Hanqing Wang et.al. | 2406.09044v1 | null |
2024-06-13 | Enhancing Cross-Modal Fine-Tuning with Gradually Intermediate Modality Generation | Lincan Cai et.al. | 2406.09003v1 | null |
2024-06-13 | Exploring Multilingual Unseen Speaker Emotion Recognition: Leveraging Co-Attention Cues in Multitask Learning | Arnav Goel et.al. | 2406.08931v1 | link |
2024-06-12 | On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models | Hashmat Shadab Malik et.al. | 2406.08486v1 | link |
2024-06-12 | Improving LLMs for Recommendation with Out-Of-Vocabulary Tokens | Ting-Ji Huang et.al. | 2406.08477v1 | null |
2024-06-12 | Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models | Yuxuan Xue et.al. | 2406.08475v1 | null |
2024-06-12 | Strategies for Pretraining Neural Operators | Anthony Zhou et.al. | 2406.08473v1 | link |
2024-06-12 | PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences | Daiwei Chen et.al. | 2406.08469v1 | null |
2024-06-12 | The Impact of Initialization on LoRA Finetuning Dynamics | Soufiane Hayou et.al. | 2406.08447v1 | null |
2024-06-12 | State Soup: In-Context Skill Learning, Retrieval and Mixing | Maciej Pióro et.al. | 2406.08423v1 | null |
2024-06-12 | WMAdapter: Adding WaterMark Control to Latent Diffusion Models | Hai Ci et.al. | 2406.08337v1 | null |
2024-06-12 | Multimodal Representation Loss Between Timed Text and Audio for Regularized Speech Separation | Tsun-An Hsieh et.al. | 2406.08328v1 | null |
2024-06-12 | Is Programming by Example solved by LLMs? | Wen-Ding Li et.al. | 2406.08316v1 | null |
2024-06-11 | Autoregressive Pretraining with Mamba in Vision | Sucheng Ren et.al. | 2406.07537v1 | null |
2024-06-11 | CTC-based Non-autoregressive Textless Speech-to-Speech Translation | Qingkai Fang et.al. | 2406.07330v1 | link |
2024-06-11 | Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data? | Qingkai Fang et.al. | 2406.07289v1 | null |
2024-06-11 | ParaCLAP -- Towards a general language-audio model for computational paralinguistic tasks | Xin Jing et.al. | 2406.07203v1 | null |
2024-06-11 | Translating speech with just images | Dan Oneata et.al. | 2406.07133v1 | null |
2024-06-11 | Reading Miscue Detection in Primary School through Automatic Speech Recognition | Lingyun Gao et.al. | 2406.07060v1 | null |
2024-06-11 | Eye-for-an-eye: Appearance Transfer with Semantic Correspondence in Diffusion Models | Sooyeon Go et.al. | 2406.07008v1 | null |
2024-06-11 | UVIS: Unsupervised Video Instance Segmentation | Shuaiyi Huang et.al. | 2406.06908v1 | null |
2024-06-10 | BTS: Bridging Text and Sound Modalities for Metadata-Aided Respiratory Sound Classification | June-Woo Kim et.al. | 2406.06786v1 | null |
2024-06-10 | Video-based Exercise Classification and Activated Muscle Group Prediction with Hybrid X3D-SlowFast Network | Manvik Pasula et.al. | 2406.06703v1 | null |
2024-06-10 | Direct Preference Optimization for Suppressing Hallucinated Prior Exams in Radiology Report Generation | Oishi Banerjee et.al. | 2406.06496v1 | null |
2024-06-10 | AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction | Zhen Xing et.al. | 2406.06465v1 | null |
2024-06-10 | Foundation Inference Models for Markov Jump Processes | David Berghaus et.al. | 2406.06419v1 | null |
2024-06-10 | Meta Learning Text-to-Speech Synthesis in over 7000 Languages | Florian Lux et.al. | 2406.06403v1 | link |
2024-06-10 | Towards Lifelong Learning of Large Language Models: A Survey | Junhao Zheng et.al. | 2406.06391v1 | link |
2024-06-10 | Low-Rank Quantization-Aware Training for LLMs | Yelysei Bondarenko et.al. | 2406.06385v1 | null |
2024-06-10 | Tx-LLM: A Large Language Model for Therapeutics | Juan Manuel Zambrano Chaves et.al. | 2406.06316v1 | null |
2024-06-10 | iMotion-LLM: Motion Prediction Instruction Tuning | Abdulwahab Felemban et.al. | 2406.06211v1 | null |
2024-06-10 | DiffInject: Revisiting Debias via Synthetic Data Generation using Diffusion-based Style Injection | Donggeun Ko et.al. | 2406.06134v1 | null |
2024-06-10 | EXPIL: Explanatory Predicate Invention for Learning in Games | Jingyuan Sha et.al. | 2406.06107v1 | null |
2024-06-07 | Hibou: A Family of Foundational Vision Transformers for Pathology | Dmitry Nechaev et.al. | 2406.05074v1 | null |
2024-06-07 | Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning | Subhojyoti Mukherjee et.al. | 2406.05064v1 | null |
2024-06-07 | Scenarios and Approaches for Situated Natural Language Explanations | Pengshuo Qiu et.al. | 2406.05035v1 | null |
2024-06-07 | Nacala-Roof-Material: Drone Imagery for Roof Detection, Classification, and Segmentation to Support Mosquito-borne Disease Risk Assessment | Venkanna Babu Guthula et.al. | 2406.04949v1 | null |
2024-06-07 | Stochastic full waveform inversion with deep generative prior for uncertainty quantification | Yuke Xie et.al. | 2406.04859v1 | null |
2024-06-07 | Uncertainty Aware Learning for Language Model Alignment | Yikun Wang et.al. | 2406.04854v1 | null |
2024-06-07 | Predicting Polymer Properties Based on Multimodal Multitask Pretraining | Fanmeng Wang et.al. | 2406.04727v1 | null |
2024-06-07 | Evaluating and Mitigating IP Infringement in Visual Generative AI | Zhenting Wang et.al. | 2406.04662v1 | link |
2024-06-07 | STAR: Skeleton-aware Text-based 4D Avatar Generation with In-Network Motion Retargeting | Zenghao Chai et.al. | 2406.04629v1 | link |
2024-06-07 | Camera-Pose Robust Crater Detection from Chang'e 5 | Matthew Rodda et.al. | 2406.04569v1 | null |
2024-06-06 | Everything to the Synthetic: Diffusion-driven Test-time Adaptation via Synthetic-Domain Alignment | Jiayi Guo et.al. | 2406.04295v1 | link |
2024-06-06 | Solving Inverse Problems in Protein Space Using Diffusion-Based Priors | Axel Levy et.al. | 2406.04239v1 | null |
2024-06-06 | Towards Understanding Task-agnostic Debiasing Through the Lenses of Intrinsic Bias and Forgetfulness | Guangliang Liu et.al. | 2406.04146v1 | null |
2024-06-06 | UrbanSARFloods: Sentinel-1 SLC-Based Benchmark Dataset for Urban and Open-Area Flood Mapping | Jie Zhao et.al. | 2406.04111v1 | null |
2024-06-06 | Weight-based Decomposition: A Case for Bilinear MLPs | Michael T. Pearce et.al. | 2406.03947v1 | null |
2024-06-06 | BLSP-Emo: Towards Empathetic Large Speech-Language Models | Chen Wang et.al. | 2406.03872v1 | link |
2024-06-06 | MuJo: Multimodal Joint Feature Space Learning for Human Activity Recognition | Stefan Gerd Fritsch et.al. | 2406.03857v1 | null |
2024-06-07 | Enhanced Semantic Segmentation Pipeline for WeatherProof Dataset Challenge | Nan Zhang et.al. | 2406.03799v2 | link |
2024-06-06 | Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining | Jinlong Xue et.al. | 2406.03714v1 | null |
2024-06-06 | Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model | Jinlong Xue et.al. | 2406.03706v1 | null |
2024-06-05 | Does your data spark joy? Performance gains from domain upsampling at the end of training | Cody Blakeney et.al. | 2406.03476v1 | null |
2024-06-05 | LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection | Qiang Chen et.al. | 2406.03459v1 | link |
2024-06-05 | FILS: Self-Supervised Video Feature Prediction In Semantic Language Space | Mona Ahmadian et.al. | 2406.03447v1 | null |
2024-06-05 | Text-to-Events: Synthetic Event Camera Streams from Conditional Text Input | Joachim Ott et.al. | 2406.03439v1 | null |
2024-06-05 | SuperFormer: Volumetric Transformer Architectures for MRI Super-Resolution | Cristhian Forigua et.al. | 2406.03359v1 | link |
2024-06-05 | Choice of PEFT Technique in Continual Learning: Prompt Tuning is Not All You Need | Martin Wistuba et.al. | 2406.03216v1 | null |
2024-06-05 | Computational Limits of Low-Rank Adaptation (LoRA) for Transformer-Based Models | Jerry Yao-Chieh Hu et.al. | 2406.03136v1 | null |
2024-06-05 | DEER: A Delay-Resilient Framework for Reinforcement Learning with Variable Delays | Bo Xia et.al. | 2406.03102v1 | null |
2024-06-05 | Population Transformer: Learning Population-level Representations of Intracranial Activity | Geeling Chau et.al. | 2406.03044v1 | null |
2024-06-05 | GraphAlign: Pretraining One Graph Neural Network on Multiple Graphs via Feature Alignment | Zhenyu Hou et.al. | 2406.02953v1 | null |
2024-06-04 | Landscape-Aware Growing: The Power of a Little LAG | Stefani Karp et.al. | 2406.02469v1 | null |
2024-06-04 | An Empirical Study into Clustering of Unseen Datasets with Self-Supervised Encoders | Scott C. Lowe et.al. | 2406.02465v1 | link |
2024-06-04 | CADE: Cosine Annealing Differential Evolution for Spiking Neural Network | Runhua Jiang et.al. | 2406.02349v1 | link |
2024-06-04 | Probing the Category of Verbal Aspect in Transformer Language Models | Anisia Katinskaia et.al. | 2406.02335v1 | null |
2024-06-04 | SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining | Andi Han et.al. | 2406.02214v1 | link |
2024-06-04 | Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations | Sarthak Yadav et.al. | 2406.02178v1 | null |
2024-06-05 | Multimodal Reasoning with Multimodal Knowledge Graph | Junlin Lee et.al. | 2406.02030v2 | null |
2024-06-04 | Zyda: A 1.3T Dataset for Open Language Modeling | Yury Tokpanov et.al. | 2406.01981v1 | null |
2024-06-04 | GOMAA-Geo: GOal Modality Agnostic Active Geo-localization | Anindya Sarkar et.al. | 2406.01917v1 | null |
2024-06-04 | ProGEO: Generating Prompts through Image-Text Contrastive Learning for Visual Geo-localization | Chen Mao et.al. | 2406.01906v1 | link |
2024-05-31 | Code Pretraining Improves Entity Tracking Abilities of Language Models | Najoung Kim et.al. | 2405.21068v1 | null |
2024-05-31 | Spectrum-Aware Parameter Efficient Fine-Tuning for Diffusion Models | Xinxi Zhang et.al. | 2405.21050v1 | null |
2024-05-31 | Improving Reward Models with Synthetic Critiques | Zihuiwen Ye et.al. | 2405.20850v1 | null |
2024-05-31 | Conditioning GAN Without Training Dataset | Kidist Amde Mekonnen et.al. | 2405.20687v1 | link |
2024-05-31 | Shotluck Holmes: A Family of Efficient Small-Scale Large Language Vision Models For Video Captioning and Summarization | Richard Luo et.al. | 2405.20648v1 | null |
2024-05-30 | Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models | Zachary Ankner et.al. | 2405.20541v1 | null |
2024-05-30 | Unveiling the Impact of Coding Data Instruction Fine-Tuning on Large Language Models Reasoning | Xinlu Zhang et.al. | 2405.20535v1 | null |
2024-05-30 | Multi-Label Guided Soft Contrastive Learning for Efficient Earth Observation Pretraining | Yi Wang et.al. | 2405.20462v1 | null |
2024-05-30 | Scalable Detection of Salient Entities in News Articles | Eliyar Asgarieh et.al. | 2405.20461v1 | null |
2024-05-30 | Enhancing Antibiotic Stewardship using a Natural Language Approach for Better Feature Representation | Simon A. Lee et.al. | 2405.20419v1 | null |
2024-05-31 | KerasCV and KerasNLP: Vision and Language Power-Ups | Matthew Watson et.al. | 2405.20247v2 | null |
2024-05-30 | Jina CLIP: Your CLIP Model Is Also Your Text Retriever | Andreas Koukounas et.al. | 2405.20204v1 | null |
2024-05-30 | Exploring Diffusion Models' Corruption Stage in Few-Shot Fine-tuning and Mitigating with Bayesian Neural Networks | Xiaoyu Wu et.al. | 2405.19931v1 | null |
2024-05-30 | From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems | Jianliang He et.al. | 2405.19883v1 | null |
2024-05-30 | Uncertainty-guided Optimal Transport in Depth Supervised Sparse-View 3D Gaussian | Wei Sun et.al. | 2405.19657v1 | null |
2024-05-29 | CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning | Yiping Wang et.al. | 2405.19547v1 | null |
2024-05-29 | Posterior Sampling via Autoregressive Generation | Kelly W Zhang et.al. | 2405.19466v1 | null |
2024-05-29 | Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice | Jian-Qiao Zhu et.al. | 2405.19313v1 | null |
2024-05-29 | Poseidon: Efficient Foundation Models for PDEs | Maximilian Herde et.al. | 2405.19101v1 | link |
2024-05-29 | BLSP-KD: Bootstrapping Language-Speech Pre-training via Knowledge Distillation | Chen Wang et.al. | 2405.19041v1 | null |
2024-05-29 | Tuning-Free Alignment of Diffusion Models with Direct Noise Optimization | Zhiwei Tang et.al. | 2405.18881v1 | null |
2024-05-29 | Domain-Inspired Sharpness-Aware Minimization Under Domain Shifts | Ruipeng Zhang et.al. | 2405.18861v1 | link |
2024-05-29 | LetsMap: Unsupervised Representation Learning for Semantic BEV Mapping | Nikhil Gosala et.al. | 2405.18852v1 | null |
2024-05-29 | LLaMA-Reg: Using LLaMA 2 for Unsupervised Medical Image Registration | Mingrui Ma et.al. | 2405.18774v1 | null |
2024-05-29 | Multi-objective Cross-task Learning via Goal-conditioned GPT-based Decision Transformers for Surgical Robot Task Automation | Jiawei Fu et.al. | 2405.18757v1 | null |
2024-05-29 | To FP8 and Back Again: Quantifying the Effects of Reducing Precision on LLM Training Stability | Joonhyung Lee et.al. | 2405.18710v1 | null |
2024-05-29 | Rejection via Learning Density Ratios | Alexander Soen et.al. | 2405.18686v1 | null |
2024-05-28 | WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization | Jiawei Ma et.al. | 2405.18405v1 | null |
2024-05-28 | Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning | Yixiao Zhang et.al. | 2405.18386v1 | link |
2024-05-28 | Computing hydration free energies of small molecules with first principles accuracy | J. Harry Moore et.al. | 2405.18171v1 | null |
2024-05-28 | Time Series Representation Models | Robert Leppich et.al. | 2405.18165v1 | link |
2024-05-28 | An Empirical Analysis of Forgetting in Pre-trained Models with Incremental Low-Rank Updates | Albin Soutif--Cormerais et.al. | 2405.18069v1 | null |
2024-05-28 | Visualizing the loss landscape of Self-supervised Vision Transformer | Youngwan Lee et.al. | 2405.18042v1 | null |
2024-05-28 | fMRI predictors based on language models of increasing complexity recover brain left lateralization | Laurent Bonnasse-Gahot et.al. | 2405.17992v1 | null |
2024-05-28 | Cross-Context Backdoor Attacks against Graph Prompt Learning | Xiaoting Lyu et.al. | 2405.17984v1 | null |
2024-05-28 | Knowledge Circuits in Pretrained Transformers | Yunzhi Yao et.al. | 2405.17969v1 | link |
2024-05-28 | Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment | Keming Lu et.al. | 2405.17931v1 | null |
2024-05-27 | Privacy-Aware Visual Language Models | Laurens Samson et.al. | 2405.17423v1 | null |
2024-05-28 | Controllable Longer Image Animation with Diffusion Models | Qiang Wang et.al. | 2405.17306v2 | null |
2024-05-27 | Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling | Cristian Rodriguez-Opazo et.al. | 2405.17139v1 | null |
2024-05-27 | Position: Foundation Agents as the Paradigm Shift for Decision Making | Xiaoqian Liu et.al. | 2405.17009v1 | null |
2024-05-27 | Vision-and-Language Navigation Generative Pretrained Transformer | Wen Hanlin et.al. | 2405.16994v1 | null |
2024-05-27 | Exploring the LLM Journey from Cognition to Expression with Linear Representations | Yuzi Yan et.al. | 2405.16964v1 | null |
2024-05-27 | Anonymization Prompt Learning for Facial Privacy-Preserving Text-to-Image Generation | Liang Shi et.al. | 2405.16895v1 | null |
2024-05-27 | Unsupervised Generative Feature Transformation via Graph Contrastive Pre-training and Multi-objective Fine-tuning | Wangyang Ying et.al. | 2405.16879v1 | null |
2024-05-27 | CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild | Xingqun Qi et.al. | 2405.16874v1 | null |
2024-05-27 | TokenUnify: Scalable Autoregressive Visual Pre-training with Mixture Token Prediction | Yinda Chen et.al. | 2405.16847v1 | null |
2024-05-24 | ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models | Chunjiang Ge et.al. | 2405.15738v1 | link |
2024-05-24 | Disease-informed Adaptation of Vision-Language Models | Jiajin Zhang et.al. | 2405.15728v1 | link |
2024-05-24 | GECKO: Generative Language Model for English, Code and Korean | Sungwoo Oh et.al. | 2405.15640v1 | null |
2024-05-24 | SEP: Self-Enhanced Prompt Tuning for Visual-Language Model | Hantao Yao et.al. | 2405.15549v1 | link |
2024-05-24 | Polyp Segmentation Generalisability of Pretrained Backbones | Edward Sanderson et.al. | 2405.15524v1 | null |
2024-05-24 | Detection and Positive Reconstruction of Cognitive Distortion sentences: Mandarin Dataset and Evaluation | Shuya Lin et.al. | 2405.15334v1 | null |
2024-05-24 | StyleMaster: Towards Flexible Stylized Image Generation with Diffusion Models | Chengming Xu et.al. | 2405.15287v1 | null |
2024-05-24 | MindShot: Brain Decoding Framework Using Only One Image | Shuai Jiang et.al. | 2405.15278v1 | null |
2024-05-24 | Shopping Queries Image Dataset (SQID): An Image-Enriched ESCI Dataset for Exploring Multimodal Learning in Product Search | Marie Al Ghossein et.al. | 2405.15190v1 | link |
2024-05-24 | From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks | Jacob Russin et.al. | 2405.15164v1 | null |
2024-05-23 | Bitune: Bidirectional Instruction-Tuning | Dawid J. Kopiczko et.al. | 2405.14862v1 | null |
2024-05-23 | Semantica: An Adaptable Image-Conditioned Diffusion Model | Manoj Kumar et.al. | 2405.14857v1 | null |
2024-05-23 | Analysis of Atom-level pretraining with QM data for Graph Neural Networks Molecular property models | Jose Arjona-Medina et.al. | 2405.14837v1 | null |
2024-05-23 | Masked Image Modelling for retinal OCT understanding | Theodoros Pissas et.al. | 2405.14788v1 | null |
2024-05-23 | EditWorld: Simulating World Dynamics for Instruction-Following Image Editing | Ling Yang et.al. | 2405.14785v1 | null |
2024-05-23 | WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models | Peng Wang et.al. | 2405.14768v1 | link |
2024-05-23 | Distilling Vision-Language Pretraining for Efficient Cross-Modal Retrieval | Young Kyun Jang et.al. | 2405.14726v1 | null |
2024-05-23 | Towards Cross-modal Backward-compatible Representation Learning for Vision-Language Models | Young Kyun Jang et.al. | 2405.14715v1 | null |
2024-05-23 | Combining Denoising Autoencoders with Contrastive Learning to fine-tune Transformer Models | Alejo Lopez-Avila et.al. | 2405.14437v1 | link |
2024-05-23 | Look into the Future: Deep Contextualized Sequential Recommendation | Lei Zheng et.al. | 2405.14359v1 | null |
2024-05-21 | Personalized Residuals for Concept-Driven Text-to-Image Generation | Cusuh Ham et.al. | 2405.12978v1 | null |
2024-05-21 | Transparency Distortion Robustness for SOTA Image Segmentation Tasks | Volker Knauthe et.al. | 2405.12864v1 | null |
2024-05-21 | DisenStudio: Customized Multi-subject Text-to-Video Generation with Disentangled Spatial Control | Hong Chen et.al. | 2405.12796v1 | null |
2024-05-21 | EchoPT: A Pretrained Transformer Architecture that Predicts 2D In-Air Sonar Images for Mobile Robotics | Jan Steckel et.al. | 2405.12573v1 | null |
2024-05-21 | ProtT3: Protein-to-Text Generation for Text-based Protein Understanding | Zhiyuan Liu et.al. | 2405.12564v1 | link |
2024-05-20 | Octo: An Open-Source Generalist Robot Policy | Octo Model Team et.al. | 2405.12213v1 | null |
2024-05-20 | Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices | Nathaniel Cohen et.al. | 2405.12211v1 | null |
2024-05-20 | MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning | Ting Jiang et.al. | 2405.12130v1 | link |
2024-05-21 | Sheet Music Transformer ++: End-to-End Full-Page Optical Music Recognition for Pianoform Sheet Music | Antonio Ríos-Vila et.al. | 2405.12105v2 | link |
2024-05-20 | Continuous Sign Language Recognition with Adapted Conformer via Unsupervised Pretraining | Neena Aloysius et.al. | 2405.12018v1 | null |
2024-05-20 | Biomedical Entity Linking for Dutch: Fine-tuning a Self-alignment BERT Model on an Automatically Generated Wikipedia Corpus | Fons Hartendorp et.al. | 2405.11941v1 | link |
2024-05-20 | Depth Prompting for Sensor-Agnostic Depth Estimation | Jin-Hwi Park et.al. | 2405.11867v1 | null |
2024-05-20 | SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model | Siavash Shams et.al. | 2405.11831v1 | null |
2024-05-20 | MM-Retinal: Knowledge-Enhanced Foundational Pretraining with Fundus Image-Text Expertise | Ruiqi Wu et.al. | 2405.11793v1 | link |
2024-05-20 | TinyLLaVA Factory: A Modularized Codebase for Small-scale Large Multimodal Models | Junlong Jia et.al. | 2405.11788v1 | link |
2024-05-17 | FA-Depth: Toward Fast and Accurate Self-supervised Monocular Depth Estimation | Fei Wang et.al. | 2405.10885v1 | link |
2024-05-17 | Multicenter Privacy-Preserving Model Training for Deep Learning Brain Metastases Autosegmentation | Yixing Huang et.al. | 2405.10870v1 | null |
2024-05-17 | Improving face generation quality and prompt following with synthetic captions | Michail Tarasiou et.al. | 2405.10864v1 | null |
2024-05-17 | Open-Vocabulary Spatio-Temporal Action Detection | Tao Wu et.al. | 2405.10832v1 | null |
2024-05-17 | Specialising and Analysing Instruction-Tuned and Byte-Level Language Models for Organic Reaction Prediction | Jiayun Pang et.al. | 2405.10625v1 | null |
2024-05-17 | UniCL: A Universal Contrastive Learning Framework for Large Time Series Models | Jiawei Li et.al. | 2405.10597v1 | null |
2024-05-17 | A Deep Learning Approach to Heterogeneous Consumer Aesthetics in Retail Fashion | Pranjal Rawat et.al. | 2405.10498v1 | null |
2024-05-16 | Data Selection for Transfer Unlearning | Nazanin Mohammadi Sepahvand et.al. | 2405.10425v1 | null |
2024-05-16 | Analogist: Out-of-the-box Visual In-Context Learning with Image Diffusion Model | Zheng Gu et.al. | 2405.10316v1 | null |
2024-05-16 | Libra: Building Decoupled Vision System on Large Language Models | Yifan Xu et.al. | 2405.10140v1 | link |
2024-05-16 | Continuous Transfer Learning for UAV Communication-aware Trajectory Design | Chenrui Sun et.al. | 2405.10087v1 | null |
2024-05-16 | HecVL: Hierarchical Video-Language Pretraining for Zero-shot Surgical Phase Recognition | Kun Yuan et.al. | 2405.10075v1 | null |
2024-05-16 | Natural Language Can Help Bridge the Sim2Real Gap | Albert Yu et.al. | 2405.10020v1 | null |
2024-05-16 | Histopathology Foundation Models Enable Accurate Ovarian Cancer Subtype Classification | Jack Breen et.al. | 2405.09990v1 | link |
2024-05-16 | Cross-sensor self-supervised training and alignment for remote sensing | Valerio Marsocci et.al. | 2405.09922v1 | null |
2024-05-16 | TransMI: A Framework to Create Strong Baselines from Multilingual Pretrained Language Models for Transliterated Data | Yihong Liu et.al. | 2405.09913v1 | link |
2024-05-16 | IGOT: Information Gain Optimized Tokenizer on Domain Adaptive Pretraining | Dawei Feng et.al. | 2405.09857v1 | null |
2024-05-15 | LoRA Learns Less and Forgets Less | Dan Biderman et.al. | 2405.09673v1 | null |
2024-05-15 | Time-Equivariant Contrastive Learning for Degenerative Disease Progression in Retinal OCT | Taha Emre et.al. | 2405.09404v1 | null |
2024-05-15 | Matching domain experts by training from scratch on domain knowledge | Xiaoliang Luo et.al. | 2405.09395v1 | null |
2024-05-15 | HumanRankEval: Automatic Evaluation of LMs as Conversational Assistants | Milan Gritta et.al. | 2405.09186v1 | null |
2024-05-14 | Self-supervised vision-langage alignment of deep learning representations for bone X-rays analysis | Alexandre Englebert et.al. | 2405.08932v1 | null |
2024-05-14 | CLIP with Quality Captions: A Strong Pretraining for Vision Tasks | Pavan Kumar Anasosalu Vasu et.al. | 2405.08911v1 | null |
2024-05-14 | Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding | Zhimin Li et.al. | 2405.08748v1 | link |
2024-05-14 | Self-supervised learning improves robustness of deep learning lung tumor segmentation to CT imaging differences | Jue Jiang et.al. | 2405.08657v1 | null |
2024-05-14 | Hearing Touch: Audio-Visual Pretraining for Contact-Rich Manipulation | Jared Mejia et.al. | 2405.08576v1 | null |
2024-05-14 | Improving Transformers with Dynamically Composable Multi-Head Attention | Da Xiao et.al. | 2405.08553v1 | link |
2024-05-14 | Self-Distillation Improves DNA Sequence Inference | Tong Yu et.al. | 2405.08538v1 | link |
2024-05-14 | Parameter-Efficient Instance-Adaptive Neural Video Compression | Hyunmo Yang et.al. | 2405.08530v1 | null |
2024-05-14 | Investigating the 'Autoencoder Behavior' in Speech Self-Supervised Models: a focus on HuBERT's Pretraining | Valentin Vielzeuf et.al. | 2405.08402v1 | null |
2024-05-14 | Could Chemical LLMs benefit from Message Passing | Jiaqing Xie et.al. | 2405.08334v1 | null |
2024-05-13 | Rethinking Histology Slide Digitization Workflows for Low-Resource Settings | Talat Zehra et.al. | 2405.08169v1 | link |
2024-05-13 | Improving Breast Cancer Grade Prediction with Multiparametric MRI Created Using Optimized Synthetic Correlated Diffusion Imaging | Chi-en Amy Tai et.al. | 2405.07861v1 | null |
2024-05-13 | SAR Image Synthesis with Diffusion Models | Denisa Qosja et.al. | 2405.07776v1 | null |
2024-05-13 | LlamaTurk: Adapting Open-Source Generative Large Language Models for Low-Resource Language | Cagri Toraman et.al. | 2405.07745v1 | link |
2024-05-13 | Environmental Matching Attack Against Unmanned Aerial Vehicles Object Detection | Dehong Kong et.al. | 2405.07595v1 | null |
2024-05-13 | Thai Universal Dependency Treebank | Panyut Sriwirote et.al. | 2405.07586v1 | null |
2024-05-13 | Consistency Policy: Accelerated Visuomotor Policies via Consistency Distillation | Aaditya Prasad et.al. | 2405.07503v1 | null |
2024-05-13 | CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering | Yuanyuan Jiang et.al. | 2405.07451v1 | null |
2024-05-13 | Sakuga-42M Dataset: Scaling Up Cartoon Research | Zhenglin Pan et.al. | 2405.07425v1 | link |
2024-05-13 | MoVL:Exploring Fusion Strategies for the Domain-Adaptive Application of Pretrained Models in Medical Imaging Tasks | Haijiang Tian et.al. | 2405.07411v1 | null |
2024-05-12 | Zero Shot Context-Based Object Segmentation using SLIP (SAM+CLIP) | Saaketh Koundinya Gundavarapu et.al. | 2405.07284v1 | null |
2024-05-10 | Federated Document Visual Question Answering: A Pilot Study | Khanh Nguyen et.al. | 2405.06636v1 | null |
2024-05-10 | LMD3: Language Model Data Density Dependence | John Kirchenbauer et.al. | 2405.06331v1 | null |
2024-05-10 | Decoding Emotions in Abstract Art: Cognitive Plausibility of CLIP in Recognizing Color-Emotion Associations | Hanna-Sophia Widhoelzl et.al. | 2405.06319v1 | null |
2024-05-10 | SaudiBERT: A Large Language Model Pretrained on Saudi Dialect Corpora | Faisal Qarah et.al. | 2405.06239v1 | null |
2024-05-10 | VLSM-Adapter: Finetuning Vision-Language Segmentation Efficiently with Lightweight Blocks | Manish Dhakal et.al. | 2405.06196v1 | null |
2024-05-10 | ACTION: Augmentation and Computation Toolbox for Brain Network Analysis with Functional MRI | Yuqi Fang et.al. | 2405.06178v1 | null |
2024-05-09 | UnSegGNet: Unsupervised Image Segmentation using Graph Neural Networks | Kovvuri Sai Gopal Reddy et.al. | 2405.06057v1 | link |
2024-05-09 | Efficient Pretraining Model based on Multi-Scale Local Visual Field Feature Reconstruction for PCB CT Image Element Segmentation | Chen Chen et.al. | 2405.05745v1 | null |
2024-05-09 | Parameter-Efficient Fine-Tuning With Adapters | Keyu Chen et.al. | 2405.05493v1 | null |
2024-05-09 | PLLM-CS: Pre-trained Large Language Model (LLM) for Cyber Threat Detection in Satellite Networks | Mohammed Hassanin et.al. | 2405.05469v1 | null |
2024-05-08 | Deep Learning Method to Predict Wound Healing Progress Based on Collagen Fibers in Wound Tissue | Juan He et.al. | 2405.05297v1 | null |
2024-05-08 | Encoder-Decoder Framework for Interactive Free Verses with Generation with Controllable High-Quality Rhyming | Tommaso Pasini et.al. | 2405.05176v1 | null |
2024-05-08 | Seeds of Stereotypes: A Large-Scale Textual Analysis of Race and Gender Associations with Diseases in Online Sources | Lasse Hyldig Hansen et.al. | 2405.05049v1 | null |
2024-05-08 | ${M^2D}$NeRF: Multi-Modal Decomposition NeRF with 3D Feature Fields | Ning Wang et.al. | 2405.05010v1 | null |
2024-05-08 | ChuXin: 1.6B Technical Report | Xiaomin Zhuang et.al. | 2405.04828v1 | null |
2024-05-07 | Remote Diffusion | Kunal Sunil Kasodekar et.al. | 2405.04717v1 | null |
2024-05-07 | Bridging the Bosphorus: Advancing Turkish Large Language Models through Strategies for Low-Resource Language Adaptation and Benchmarking | Emre Can Acikgoz et.al. | 2405.04685v1 | null |
2024-05-07 | TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation | Hritik Bansal et.al. | 2405.04682v1 | null |
2024-05-07 | S3Former: Self-supervised High-resolution Transformer for Solar PV Profiling | Minh Tran et.al. | 2405.04489v1 | null |
2024-05-08 | DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model | DeepSeek-AI et.al. | 2405.04434v2 | link |
2024-05-07 | Cross-IQA: Unsupervised Learning for Image Quality Assessment | Zhen Zhang et.al. | 2405.04311v1 | null |
2024-05-07 | Sign2GPT: Leveraging Large Language Models for Gloss-Free Sign Language Translation | Ryan Wong et.al. | 2405.04164v1 | null |
2024-05-07 | Locally Differentially Private In-Context Learning | Chunyan Zheng et.al. | 2405.04032v1 | null |
2024-05-07 | SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing | Yuying Ge et.al. | 2405.04007v1 | null |
2024-05-07 | Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application | Jian Jia et.al. | 2405.03988v1 | null |
2024-05-07 | Contextualization with SPLADE for High Recall Retrieval | Eugene Yang et.al. | 2405.03972v1 | link |
2024-05-07 | AdsorbDiff: Adsorbate Placement via Conditional Denoising Diffusion | Adeesh Kolluru et.al. | 2405.03962v1 | null |
2024-05-06 | Provable Preconditioned Plug-and-Play Approach for Compressed Sensing MRI Reconstruction | Tao Hong et.al. | 2405.03854v1 | null |
2024-05-06 | Pose Priors from Language Models | Sanjay Subramanian et.al. | 2405.03689v1 | null |
2024-05-06 | AtomGPT: Atomistic Generative Pre-trained Transformer for Forward and Inverse Materials Design | Kamal Choudhary et.al. | 2405.03680v1 | null |
2024-05-06 | Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment | Abhinav Agarwalla et.al. | 2405.03594v1 | null |
2024-05-06 | Whispy: Adapting STT Whisper Models to Real-Time Environments | Antonio Bevilacqua et.al. | 2405.03484v1 | null |
2024-05-06 | Adapting Dual-encoder Vision-language Models for Paraphrased Retrieval | Jiacheng Cheng et.al. | 2405.03190v1 | null |
2024-05-06 | GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document Understanding | Nil Biescas et.al. | 2405.03104v1 | null |
2024-05-06 | SketchGPT: Autoregressive Modeling for Sketch Generation and Recognition | Adarsh Tiwari et.al. | 2405.03099v1 | null |
2024-05-05 | RepAugment: Input-Agnostic Representation-Level Augmentation for Respiratory Sound Classification | June-Woo Kim et.al. | 2405.02996v1 | null |
2024-05-05 | Score-based Generative Priors Guided Model-driven Network for MRI Reconstruction | Xiaoyu Qiao et.al. | 2405.02958v1 | null |
2024-05-05 | IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs | Yuzhen Mao et.al. | 2405.02842v1 | null |
2024-05-03 | Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets | Xuelong Geng et.al. | 2405.02132v1 | null |
2024-05-03 | A Mutual Information Perspective on Federated Contrastive Learning | Christos Louizos et.al. | 2405.02081v1 | null |
2024-05-03 | SATO: Stable Text-to-Motion Framework | Wenshuo Chen et.al. | 2405.01461v2 | link |
2024-05-02 | StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation | Yupeng Zhou et.al. | 2405.01434v1 | link |
2024-05-02 | CromSS: Cross-modal pre-training with noisy labels for remote sensing image segmentation | Chenying Liu et.al. | 2405.01217v1 | null |
2024-05-02 | Language Fairness in Multilingual Information Retrieval | Eugene Yang et.al. | 2405.00978v1 | link |
2024-05-02 | PLAID SHIRTTT for Large-Scale Streaming Dense Retrieval | Dawn Lawrie et.al. | 2405.00975v1 | link |
2024-05-01 | Transformer-Based Self-Supervised Learning for Histopathological Classification of Ischemic Stroke Clot Origin | K. Yeh et.al. | 2405.00908v1 | null |
2024-05-01 | SonicDiffusion: Audio-Driven Image Generation and Editing with Pretrained Diffusion Models | Burak Can Biner et.al. | 2405.00878v1 | null |
2024-05-01 | Adapting Pretrained Networks for Image Quality Assessment on High Dynamic Range Displays | Andrei Chubarau et.al. | 2405.00670v1 | null |
2024-05-01 | Are Models Biased on Text without Gender-related Language? | Catarina G Belém et.al. | 2405.00588v1 | link |
2024-05-01 | Self-supervised Pre-training of Text Recognizers | Martin Kišš et.al. | 2405.00420v1 | link |
2024-05-01 | Expert Insight-Enhanced Follow-up Chest X-Ray Summary Generation | Zhichuan Wang et.al. | 2405.00344v1 | null |
2024-04-30 | PAODING: A High-fidelity Data-free Pruning Toolkit for Debloating Pre-trained Neural Networks | Mark Huasong Meng et.al. | 2405.00074v1 | null |
2024-04-30 | Seeing Through the Clouds: Cloud Gap Imputation with Prithvi Foundation Model | Denys Godwin et.al. | 2404.19609v1 | null |
2024-04-30 | Automatic Cardiac Pathology Recognition in Echocardiography Images Using Higher Order Dynamic Mode Decomposition and a Vision Transformer for Small Datasets | Andrés Bell-Navas et.al. | 2404.19579v1 | null |
2024-04-30 | CLIP-Mamba: CLIP Pretrained Mamba Models with OOD and Hessian Evaluation | Weiquan Huang et.al. | 2404.19394v1 | link |
2024-04-30 | Knowledge Distillation vs. Pretraining from Scratch under a Fixed (Computation) Budget | Minh Duc Bui et.al. | 2404.19319v1 | null |
2024-04-30 | Robust Pedestrian Detection via Constructing Versatile Pedestrian Knowledge Bank | Sungjune Park et.al. | 2404.19299v1 | null |
2024-04-30 | Revisiting the Adversarial Robustness of Vision Language Models: a Multimodal Perspective | Wanqi Zhou et.al. | 2404.19287v1 | null |
2024-04-30 | Understanding Multimodal Contrastive Learning Through Pointwise Mutual Information | Toshimitsu Uesaka et.al. | 2404.19228v1 | null |
2024-04-29 | What Drives Performance in Multilingual Language Models? | Sina Bagheri Nezhad et.al. | 2404.19159v1 | link |
2024-04-29 | Swin2-MoSE: A New Single Image Super-Resolution Model for Remote Sensing | Leonardo Rossi et.al. | 2404.18924v1 | null |
2024-04-29 | Overcoming Knowledge Barriers: Online Imitation Learning from Observation with Pretrained World Models | Xingyuan Zhang et.al. | 2404.18896v1 | null |
2024-04-29 | It's Difficult to be Neutral -- Human and LLM-based Sentiment Annotation of Patient Comments | Petter Mæhlum et.al. | 2404.18832v1 | null |
2024-04-30 | PatentGPT: A Large Language Model for Intellectual Property | Zilong Bai et.al. | 2404.18255v2 | null |
2024-04-28 | Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment | Tengjun Huang et.al. | 2404.18253v1 | link |
2024-04-28 | TextGram: Towards a better domain-adaptive pretraining | Sharayu Hiwarkhedkar et.al. | 2404.18228v1 | null |
2024-04-28 | Can Perplexity Predict Fine-Tuning Performance? An Investigation of Tokenization Effects on Sequential Language Models for Nepali | Nishant Luitel et.al. | 2404.18071v1 | null |
2024-04-28 | Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model | Xiaolong Li et.al. | 2404.18065v1 | null |
2024-04-27 | Critical Review for One-class Classification: recent advances and the reality behind them | Toshitaka Hayashi et.al. | 2404.17931v1 | null |
2024-04-27 | T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining | Yi Yuan et.al. | 2404.17806v1 | null |
2024-04-26 | Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo | Stephen Zhao et.al. | 2404.17546v1 | null |
2024-04-26 | Low Cost Machine Vision for Insect Classification | Danja Brandt et.al. | 2404.17488v1 | null |
2024-04-26 | SAGHOG: Self-Supervised Autoencoder for Generating HOG Features for Writer Retrieval | Marco Peer et.al. | 2404.17221v1 | link |
2024-04-26 | Self-supervised visual learning in the low-data regime: a comparative evaluation | Sotirios Konstantakos et.al. | 2404.17202v1 | null |
2024-04-26 | Few-shot Calligraphy Style Learning | Fangda Chen et.al. | 2404.17199v1 | link |
2024-04-26 | TIGQA:An Expert Annotated Question Answering Dataset in Tigrinya | Hailay Teklehaymanot et.al. | 2404.17194v1 | null |
2024-04-25 | Türkçe Dil Modellerinin Performans Karşılaştırması Performance Comparison of Turkish Language Models | Eren Dogan et.al. | 2404.17010v1 | null |
2024-04-25 | Constellation Dataset: Benchmarking High-Altitude Object Detection for an Urban Intersection | Mehmet Kerem Turkcan et.al. | 2404.16944v1 | link |
2024-04-25 | A Short Survey of Human Mobility Prediction in Epidemic Modeling from Transformers to LLMs | Christian N. Mayemba et.al. | 2404.16921v1 | null |
2024-04-25 | Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding | Mostafa Elhoushi et.al. | 2404.16710v1 | null |
2024-04-25 | Road Surface Friction Estimation for Winter Conditions Utilising General Visual Features | Risto Ojala et.al. | 2404.16578v1 | null |
2024-04-25 | Leveraging Pretrained Latent Representations for Few-Shot Imitation Learning on a Dexterous Robotic Hand | Davide Liconti et.al. | 2404.16483v1 | null |
2024-04-25 | Leveraging tropical reef, bird and unrelated sounds for superior transfer learning in marine bioacoustics | Ben Williams et.al. | 2404.16436v1 | null |
2024-04-25 | TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models | Haomiao Ni et.al. | 2404.16306v1 | null |
2024-04-24 | Towards a Holistic Evaluation of LLMs on Factual Knowledge Recall | Jiaqing Yuan et.al. | 2404.16164v1 | null |
2024-04-24 | FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication | Eric Slyman et.al. | 2404.16123v1 | null |
2024-04-24 | MoDE: CLIP Data Experts via Clustering | Jiawei Ma et.al. | 2404.16030v1 | link |
2024-04-24 | Representing Part-Whole Hierarchies in Foundation Models by Learning Localizability, Composability, and Decomposability from Anatomy via Self-Supervision | Mohammad Reza Hosseinzadeh Taher et.al. | 2404.15672v1 | null |
2024-04-24 | HybridVC: Efficient Voice Style Conversion with Text and Audio Prompts | Xinlei Niu et.al. | 2404.15637v1 | null |
2024-04-24 | Can Foundational Large Language Models Assist with Conducting Pharmaceuticals Manufacturing Investigations? | Hossein Salami et.al. | 2404.15578v1 | null |
2024-04-24 | Retrieval Head Mechanistically Explains Long-Context Factuality | Wenhao Wu et.al. | 2404.15574v1 | null |
2024-04-23 | SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation | Xiangyu Xu et.al. | 2404.15276v1 | link |
2024-04-23 | CT-GLIP: 3D Grounded Language-Image Pretraining with CT Scans and Radiology Reports for Full-Body Scenarios | Jingyang Lin et.al. | 2404.15272v1 | null |
2024-04-23 | Setting up the Data Printer with Improved English to Ukrainian Machine Translation | Yurii Paniv et.al. | 2404.15196v1 | link |
2024-04-23 | Combating Missing Modalities in Egocentric Videos at Test Time | Merey Ramazanova et.al. | 2404.15161v1 | null |
2024-04-23 | DP-Net: Learning Discriminative Parts for image recognition | Ronan Sicre et.al. | 2404.15037v1 | null |
2024-04-23 | IPAD: Industrial Process Anomaly Detection Dataset | Jinfan Liu et.al. | 2404.15033v1 | null |
2024-04-23 | Multi-Modal Prompt Learning on Blind Image Quality Assessment | Wensheng Pan et.al. | 2404.14949v1 | null |
2024-04-23 | Driver Activity Classification Using Generalizable Representations from Vision-Language Models | Ross Greer et.al. | 2404.14906v1 | null |
2024-04-23 | FMint: Bridging Human Designed and Data Pretrained Models for Differential Equation Foundation Model | Zezheng Song et.al. | 2404.14688v1 | null |
2024-04-23 | Automated Multi-Language to English Machine Translation Using Generative Pre-Trained Transformers | Elijah Pelofske et.al. | 2404.14680v1 | null |
2024-04-22 | PARAMANU-GANITA: Language Model with Mathematical Capabilities | Mitodru Niyogi et.al. | 2404.14395v1 | null |
2024-04-22 | Calc-CMU at SemEval-2024 Task 7: Pre-Calc -- Learning to Use the Calculator Improves Numeracy in Language Models | Vishruth Veerendranath et.al. | 2404.14355v1 | link |
2024-04-22 | Automatic Discovery of Visual Circuits | Achyuta Rajaram et.al. | 2404.14349v1 | link |
2024-04-22 | Heterogeneous Face Recognition Using Domain Invariant Units | Anjith George et.al. | 2404.14343v1 | null |
2024-04-22 | Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels | Jan-Philipp Fränken et.al. | 2404.14313v1 | link |
2024-04-22 | OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks | Sophia Sirko-Galouchenko et.al. | 2404.14027v1 | null |
2024-04-22 | EventLens: Leveraging Event-Aware Pretraining and Cross-modal Linking Enhances Visual Commonsense Reasoning | Mingjie Ma et.al. | 2404.13847v1 | null |
2024-04-21 | FiLo: Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality Localization | Zhaopeng Gu et.al. | 2404.13671v1 | null |
2024-04-21 | PEACH: Pretrained-embedding Explanation Across Contextual and Hierarchical Structure | Feiqi Cao et.al. | 2404.13645v1 | link |
2024-04-21 | Lost in Space: Probing Fine-grained Spatial Understanding in Vision and Language Resamplers | Georgios Pantazopoulos et.al. | 2404.13594v1 | link |
2024-04-19 | MoVA: Adapting Mixture of Vision Experts to Multimodal Context | Zhuofan Zong et.al. | 2404.13046v1 | link |
2024-04-19 | Training-and-prompt-free General Painterly Harmonization Using Image-wise Attention Sharing | Teng-Fang Hsiao et.al. | 2404.12900v1 | link |
2024-04-19 | Grasper: A Generalist Pursuer for Pursuit-Evasion Problems | Pengdeng Li et.al. | 2404.12626v1 | link |
2024-04-18 | Towards Large Language Models as Copilots for Theorem Proving in Lean | Peiyang Song et.al. | 2404.12534v1 | link |
2024-04-18 | Understanding Optimal Feature Transfer via a Fine-Grained Bias-Variance Analysis | Yufan Li et.al. | 2404.12481v1 | null |
2024-04-18 | mOthello: When Do Cross-Lingual Representation Alignment and Cross-Lingual Transfer Emerge in Multilingual Models? | Tianze Hua et.al. | 2404.12444v1 | null |
2024-04-18 | MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale | Xiaotang Gai et.al. | 2404.12372v1 | null |
2024-04-18 | AniClipart: Clipart Animation with Text-to-Video Priors | Ronghuan Wu et.al. | 2404.12347v1 | null |
2024-04-18 | GraFIQs: Face Image Quality Assessment Using Gradient Magnitudes | Jan Niklas Kolf et.al. | 2404.12203v1 | link |
2024-04-18 | OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data | Chandeepa Dissanayake et.al. | 2404.12195v1 | link |
2024-04-18 | How to Benchmark Vision Foundation Models for Semantic Segmentation? | Tommie Kerssies et.al. | 2404.12172v1 | null |
2024-04-18 | Aligning language models with human preferences | Tomasz Korbak et.al. | 2404.12150v1 | link |
2024-04-18 | MaskCD: A Remote Sensing Change Detection Network Based on Mask Classification | Weikang Yu et.al. | 2404.12081v1 | link |
2024-04-18 | Simultaneous Detection and Interaction Reasoning for Object-Centric Action Recognition | Xunsong Li et.al. | 2404.11903v1 | null |
2024-04-17 | How often are errors in natural language reasoning due to paraphrastic variability? | Neha Srikanth et.al. | 2404.11717v1 | null |
2024-04-17 | Pretraining Billion-scale Geospatial Foundational Models on Frontier | Aristeidis Tsaris et.al. | 2404.11706v1 | null |
2024-04-17 | On the Scalability of GNNs for Molecular Graphs | Maciej Sypetkowski et.al. | 2404.11568v1 | null |
2024-04-17 | Predicting Long-horizon Futures by Conditioning on Geometry and Time | Tarasha Khurana et.al. | 2404.11554v1 | null |
2024-04-17 | ScaleFold: Reducing AlphaFold Initial Training Time to 10 Hours | Feiwen Zhu et.al. | 2404.11068v1 | null |
2024-04-17 | Lightweight Unsupervised Federated Learning with Pretrained Vision Language Model | Hao Yan et.al. | 2404.11046v1 | null |
2024-04-17 | Many-Shot In-Context Learning | Rishabh Agarwal et.al. | 2404.11018v1 | null |
2024-04-17 | MaeFuse: Transferring Omni Features with Pretrained Masked Autoencoders for Infrared and Visible Image Fusion via Guided Training | Jiayang Li et.al. | 2404.11016v1 | null |
2024-04-16 | More Room for Language: Investigating the Effect of Retrieval on Language Models | David Samuel et.al. | 2404.10939v1 | null |
2024-04-16 | Retrieval Augmented Verification : Unveiling Disinformation with Structured Representations for Zero-Shot Real-Time Evidence-guided Fact-Checking of Multi-modal Social media posts | Arka Ujjal Dey et.al. | 2404.10702v1 | null |
2024-04-17 | Do Counterfactual Examples Complicate Adversarial Training? | Eric Yeats et.al. | 2404.10588v2 | null |
2024-04-17 | Optimization of Prompt Learning via Multi-Knowledge Representation for Vision-Language Models | Enming Zhang et.al. | 2404.10357v2 | null |
2024-04-16 | From Data Deluge to Data Curation: A Filtering-WoRA Paradigm for Efficient Text-based Person Search | Jintao Sun et.al. | 2404.10292v1 | null |
2024-04-16 | Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology | Oren Kraus et.al. | 2404.10242v1 | link |
2024-04-16 | Compressible and Searchable: AI-native Multi-Modal Retrieval System with Learned Image Compression | Jixiang Luo et.al. | 2404.10234v1 | null |
2024-04-15 | Self-Supervised Learning Featuring Small-Scale Image Dataset for Treatable Retinal Diseases Classification | Luffina C. Huang et.al. | 2404.10166v1 | null |
2024-04-15 | NOISe: Nuclei-Aware Osteoclast Instance Segmentation for Mouse-to-Human Domain Transfer | Sai Kumar Reddy Manne et.al. | 2404.10130v1 | link |
2024-04-15 | Explainable Light-Weight Deep Learning Pipeline for Improved Drought Stres | Aswini Kumar Patra et.al. | 2404.10073v1 | null |
2024-04-15 | EgoPet: Egomotion and Interaction Data from an Animal's Perspective | Amir Bar et.al. | 2404.09991v1 | null |
2024-04-15 | Contrastive Pretraining for Visual Concept Explanations of Socioeconomic Outcomes | Ivica Obadic et.al. | 2404.09768v1 | null |
2024-04-15 | Bridging Vision and Language Spaces with Assignment Prediction | Jungin Park et.al. | 2404.09632v1 | link |
2024-04-15 | Magic Clothing: Controllable Garment-Driven Image Synthesis | Weifeng Chen et.al. | 2404.09512v1 | link |
2024-04-15 | Leveraging Temporal Contextualization for Video Action Recognition | Minji Kim et.al. | 2404.09490v1 | null |
2024-04-15 | RankCLIP: Ranking-Consistent Language-Image Pretraining | Yiming Zhang et.al. | 2404.09387v1 | null |
2024-04-16 | Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment | Zhiqing Hong et.al. | 2404.09313v2 | null |
2024-04-13 | MaSkel: A Model for Human Whole-body X-rays Generation from Human Masking Images | Yingjie Xi et.al. | 2404.09000v1 | link |
2024-04-13 | DeDoDe v2: Analyzing and Improving the DeDoDe Keypoint Detector | Johan Edstedt et.al. | 2404.08928v1 | link |
2024-04-13 | Is Next Token Prediction Sufficient for GPT? Exploration on Code Logic Comprehension | Mengnan Qi et.al. | 2404.08885v1 | null |
2024-04-12 | BERT-LSH: Reducing Absolute Compute For Attention | Zezheng Li et.al. | 2404.08836v1 | null |
2024-04-12 | Probing the 3D Awareness of Visual Foundation Models | Mohamed El Banani et.al. | 2404.08636v1 | link |
2024-04-12 | Pre-training Small Base LMs with Fewer Tokens | Sunny Sanyal et.al. | 2404.08634v1 | link |
2024-04-12 | Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation | Haozhe Zhao et.al. | 2404.08491v1 | link |
2024-04-12 | OTTER: Improving Zero-Shot Classification via Optimal Transport | Changho Shin et.al. | 2404.08461v1 | null |
2024-04-12 | AdapterSwap: Continuous Training of LLMs with Data Removal and Access-Control Guarantees | William Fleshman et.al. | 2404.08417v1 | null |
2024-04-12 | Pretraining and Updating Language- and Domain-specific Large Language Model: A Case Study in Japanese Business Domain | Kosuke Takahashi et.al. | 2404.08262v1 | null |
2024-04-12 | Improving Continuous Sign Language Recognition with Adapted Image Models | Lianyu Hu et.al. | 2404.08226v1 | link |
2024-04-12 | Measuring Cross-lingual Transfer in Bytes | Leandro Rodrigues de Souza et.al. | 2404.08191v1 | link |
2024-04-11 | Self-supervised Dataset Distillation: A Good Compression Is All You Need | Muxin Zhou et.al. | 2404.07976v1 | link |
2024-04-11 | Rho-1: Not All Tokens Are What You Need | Zhenghao Lin et.al. | 2404.07965v1 | link |
2024-04-11 | MindBridge: A Cross-Subject Brain Decoding Framework | Shizun Wang et.al. | 2404.07850v1 | link |
2024-04-11 | Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck | Nathan Godey et.al. | 2404.07647v1 | null |
2024-04-11 | Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval | Minkuk Kim et.al. | 2404.07610v1 | link |
2024-04-11 | GLID: Pre-training a Generalist Encoder-Decoder Vision Model | Jihao Liu et.al. | 2404.07603v1 | null |
2024-04-11 | A fine-tuning workflow for automatic first-break picking with deep learning | Amir Mardan et.al. | 2404.07400v1 | link |
2024-04-10 | Accurate Tennis Court Line Detection on Amateur Recorded Matches | Sameer Agrawal et.al. | 2404.06977v1 | null |
2024-04-10 | GraSAME: Injecting Token-Level Structural Information to Pretrained Language Models via Graph-guided Self-Attention Mechanism | Shuzhou Yuan et.al. | 2404.06911v1 | null |
2024-04-10 | Text-Based Reasoning About Vector Graphics | Zhenhailong Wang et.al. | 2404.06479v2 | null |
2024-04-10 | MuPT: A Generative Symbolic Music Pretrained Transformer | Xingwei Qu et.al. | 2404.06393v2 | null |
2024-04-11 | On adversarial training and the 1 Nearest Neighbor classifier | Amir Hagai et.al. | 2404.06313v2 | link |
2024-04-09 | ColorMNet: A Memory-based Deep Spatial-Temporal Feature Propagation Network for Video Colorization | Yixin Yang et.al. | 2404.06251v1 | link |
2024-04-09 | Anchor-based Robust Finetuning of Vision-Language Models | Jinwei Han et.al. | 2404.06244v1 | null |
2024-04-09 | [Call for Papers] The 2nd BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus | Leshem Choshen et.al. | 2404.06214v1 | null |
2024-04-09 | OmniFusion Technical Report | Elizaveta Goncharova et.al. | 2404.06212v1 | link |
2024-04-09 | Unified Multi-modal Diagnostic Framework with Reconstruction Pre-training and Heterogeneity-combat Tuning | Yupei Zhang et.al. | 2404.06057v1 | link |
2024-04-09 | Online/Offline Learning to Enable Robust Beamforming: Limited Feedback Meets Deep Generative Models | Ying Li et.al. | 2404.06055v1 | null |
2024-04-08 | Language-Independent Representations Improve Zero-Shot Summarization | Vladimir Solovyev et.al. | 2404.05720v1 | null |
2024-04-08 | MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning | Matteo Farina et.al. | 2404.05621v1 | null |
2024-04-08 | Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining | Nikola Ljubešić et.al. | 2404.05428v1 | link |
2024-04-08 | Relation Extraction Using Large Language Models: A Case Study on Acupuncture Point Locations | Yiming Li et.al. | 2404.05415v1 | null |
2024-04-07 | StockGPT: A GenAI Model for Stock Prediction and Trading | Dat Mai et.al. | 2404.05101v1 | null |
2024-04-07 | AUEditNet: Dual-Branch Facial Action Unit Intensity Manipulation with Implicit Disentanglement | Shiwei Jin et.al. | 2404.05063v1 | null |
2024-04-07 | PagPassGPT: Pattern Guided Password Guessing via Generative Pretrained Transformer | Xingyu Su et.al. | 2404.04886v1 | link |
2024-04-07 | Msmsfnet: a multi-stream and multi-scale fusion net for edge detection | Chenguang Liu et.al. | 2404.04856v1 | null |
2024-04-07 | F-MALLOC: Feed-forward Memory Allocation for Continual Learning in Neural Machine Translation | Junhong Wu et.al. | 2404.04846v1 | null |
2024-04-07 | Data Bias According to Bipol: Men are Naturally Right and It is the Role of Women to Follow Their Lead | Irene Pagliai et.al. | 2404.04838v1 | null |
2024-04-05 | Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model | Xinrun Du et.al. | 2404.04167v1 | null |
2024-04-04 | No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance | Vishaal Udandarao et.al. | 2404.04125v1 | link |
2024-04-05 | Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation | Mingyuan Zhou et.al. | 2404.04057v1 | null |
2024-04-05 | Teaching Llama a New Language Through Cross-Lingual Knowledge Transfer | Hele-Andra Kuulmets et.al. | 2404.04042v1 | null |
2024-04-05 | Willkommens-Merkel, Chaos-Johnson, and Tore-Klose: Modeling the Evaluative Meaning of German Personal Name Compounds | Annerose Eichel et.al. | 2404.04031v1 | null |
2024-04-04 | Layerwise Early Stopping for Test Time Adaptation | Sabyasachi Sahoo et.al. | 2404.03784v1 | null |
2024-04-04 | DiffBody: Human Body Restoration by Imagining with Generative Diffusion Prior | Yiming Zhang et.al. | 2404.03642v1 | null |
2024-04-04 | Learn When (not) to Trust Language Models: A Privacy-Centric Adaptive Model-Aware Approach | Chengkai Huang et.al. | 2404.03514v1 | null |
2024-04-04 | A Cause-Effect Look at Alleviating Hallucination of Knowledge-grounded Dialogue Generation | Jifan Yu et.al. | 2404.03491v1 | null |
2024-04-04 | Scaling Up Video Summarization Pretraining with Large Language Models | Dawit Mureja Argaw et.al. | 2404.03398v1 | null |
2024-04-03 | Scaling Laws for Galaxy Images | Mike Walmsley et.al. | 2404.02973v1 | link |
2024-04-03 | MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation | Petru-Daniel Tudosiu et.al. | 2404.02790v1 | null |
2024-04-03 | CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech | Jaehyeon Kim et.al. | 2404.02781v1 | null |
2024-04-03 | DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement | Hao Wu et.al. | 2404.02755v1 | null |
2024-04-03 | Cross-Architecture Transfer Learning for Linear-Cost Inference Transformers | Sehyun Choi et.al. | 2404.02684v1 | null |
2024-04-03 | Large Language Models for Expansion of Spoken Language Understanding Systems to New Languages | Jakub Hoscilowicz et.al. | 2404.02588v1 | link |
2024-04-03 | The Promises and Pitfalls of Using Language Models to Measure Instruction Quality in Education | Paiheng Xu et.al. | 2404.02444v1 | null |
2024-04-03 | What Are We Measuring When We Evaluate Large Vision-Language Models? An Analysis of Latent Factors and Biases | Anthony Meng Huat Tiong et.al. | 2404.02415v1 | link |
2024-04-02 | Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Orthogonal Diffusion Models | Zeyu Yang et.al. | 2404.02148v1 | link |
2024-04-02 | Iterated Learning Improves Compositionality in Large Vision-Language Models | Chenhao Zheng et.al. | 2404.02145v1 | null |
2024-04-03 | ViTamin: Designing Scalable Vision Models in the Vision-Language Era | Jieneng Chen et.al. | 2404.02132v2 | link |
2024-04-02 | FLawN-T5: An Empirical Examination of Effective Instruction-Tuning Data Mixtures for Legal Reasoning | Joel Niklaus et.al. | 2404.02127v1 | link |
2024-04-02 | Adaptive Feature Fusion Neural Network for Glaucoma Segmentation on Unseen Fundus Images | Jiyuan Zhong et.al. | 2404.02084v1 | null |
2024-04-02 | Noise Masking Attacks and Defenses for Pretrained Speech Models | Matthew Jagielski et.al. | 2404.02052v1 | null |
2024-04-02 | Dissecting Paraphrases: The Impact of Prompt Syntax and supplementary Information on Knowledge Retrieval from Pretrained Language Models | Stephan Linzbach et.al. | 2404.01992v1 | null |
2024-04-02 | Activation Steering for Robust Type Prediction in CodeLLMs | Francesca Lucchetti et.al. | 2404.01903v1 | null |
2024-04-02 | Poro 34B and the Blessing of Multilinguality | Risto Luukkonen et.al. | 2404.01856v1 | null |
2024-04-02 | Where to Move Next: Zero-shot Generalization of LLMs for Next POI Recommendation | Shanshan Feng et.al. | 2404.01855v1 | null |
2024-03-29 | Convolutional Prompting meets Language Models for Continual Learning | Anurag Roy et.al. | 2403.20317v1 | null |
2024-03-29 | Latxa: An Open Language Model and Evaluation Suite for Basque | Julen Etxaniz et.al. | 2403.20266v1 | link |
2024-03-29 | Long-Tailed Anomaly Detection with Learnable Class Names | Chih-Hui Ho et.al. | 2403.20236v1 | null |
2024-03-29 | StegoGAN: Leveraging Steganography for Non-Bijective Image-to-Image Translation | Sidi Wu et.al. | 2403.20142v1 | null |
2024-03-29 | FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models | Barbara Toniella Corradini et.al. | 2403.20105v1 | null |
2024-03-29 | Negative Label Guided OOD Detection with Pretrained Vision-Language Models | Xue Jiang et.al. | 2403.20078v1 | link |
2024-03-28 | Siamese Vision Transformers are Scalable Audio-visual Learners | Yan-Bo Lin et.al. | 2403.19638v1 | link |
2024-03-28 | SA-GS: Scale-Adaptive Gaussian Splatting for Training-Free Anti-Aliasing | Xiaowei Song et.al. | 2403.19615v1 | link |
2024-03-28 | LocCa: Visual Pretraining with Location-aware Captioners | Bo Wan et.al. | 2403.19596v1 | null |
2024-03-28 | Situation Awareness for Driver-Centric Driving Style Adaptation | Johann Haselberger et.al. | 2403.19595v1 | link |
2024-03-28 | Keypoint Action Tokens Enable In-Context Imitation Learning in Robotics | Norman Di Palo et.al. | 2403.19578v1 | null |
2024-03-28 | Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment | Alireza Ganjdanesh et.al. | 2403.19490v1 | null |
2024-03-28 | Checkpoint Merging via Bayesian Optimization in LLM Pretraining | Deyuan Liu et.al. | 2403.19390v1 | null |
2024-03-28 | NaijaHate: Evaluating Hate Speech Detection on Nigerian Twitter Using Representative Data | Manuel Tonneau et.al. | 2403.19260v1 | link |
2024-03-29 | STaR-GATE: Teaching Language Models to Ask Clarifying Questions | Chinmaya Andukuri et.al. | 2403.19154v2 | null |
2024-03-28 | Instruction-based Hypergraph Pretraining | Mingdai Yang et.al. | 2403.19063v1 | null |
2024-03-27 | Bringing Textual Prompt to AI-Generated Image Quality Assessment | Bowen Qu et.al. | 2403.18714v1 | null |
2024-03-27 | Noise-Robust Keyword Spotting through Self-supervised Pretraining | Jacob Mørk et.al. | 2403.18560v1 | null |
2024-03-27 | OrCo: Towards Better Generalization via Orthogonality and Contrast for Few-Shot Class-Incremental Learning | Noor Ahmed et.al. | 2403.18550v1 | null |
2024-03-27 | Enhanced Generative Recommendation via Content and Collaboration Integration | Yidan Wang et.al. | 2403.18480v1 | null |
2024-03-27 | NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation | Jingyang Huo et.al. | 2403.18211v1 | null |
2024-03-26 | Juru: Legal Brazilian Large Language Model from Reputable Sources | Roseval Malaquias Junior et.al. | 2403.18140v1 | null |
2024-03-26 | The Impact of Syntactic and Semantic Proximity on Machine Translation with Back-Translation | Nicolas Guerin et.al. | 2403.18031v1 | null |
2024-03-26 | The Unreasonable Ineffectiveness of the Deeper Layers | Andrey Gromov et.al. | 2403.17887v1 | null |
2024-03-26 | GenesisTex: Adapting Image Denoising Diffusion to Texture Space | Chenjian Gao et.al. | 2403.17782v1 | null |
2024-03-26 | Leave No Patient Behind: Enhancing Medication Recommendation for Rare Disease Patients | Zihao Zhao et.al. | 2403.17745v1 | null |
2024-03-26 | Masked Autoencoders are PDE Learners | Anthony Zhou et.al. | 2403.17728v1 | null |
2024-03-26 | REFeREE: A REference-FREE Model-Based Metric for Text Simplification | Yichen Huang et.al. | 2403.17640v1 | link |
2024-03-25 | Exploring CausalWorld: Enhancing robotic manipulation via knowledge transfer and curriculum learning | Xinrui Wang et.al. | 2403.17266v1 | null |
2024-03-25 | Joint chest X-ray diagnosis and clinical visual attention prediction with multi-stage cooperative learning: enhancing interpretability | Zirui Qiu et.al. | 2403.16970v1 | null |
2024-03-25 | Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance | Jiasheng Ye et.al. | 2403.16952v1 | link |
2024-03-25 | Make-It-Vivid: Dressing Your Animatable Biped Cartoon Characters from Text | Junshu Tang et.al. | 2403.16897v1 | null |
2024-03-25 | Can Machine Translation Bridge Multilingual Pretraining and Cross-lingual Transfer Learning? | Shaoxiong Ji et.al. | 2403.16777v1 | null |
2024-03-25 | ProCQA: A Large-scale Community-based Programming Question Answering Dataset for Code Search | Zehan Li et.al. | 2403.16702v1 | null |
2024-03-25 | A comparative analysis of embedding models for patent similarity | Grazia Sveva Ascione et.al. | 2403.16630v1 | null |
2024-03-25 | Elysium: Exploring Object-level Perception in Videos via MLLM | Han Wang et.al. | 2403.16558v1 | link |
2024-03-25 | An Intermediate Fusion ViT Enables Efficient Text-Image Alignment in Diffusion Models | Zizhao Hu et.al. | 2403.16530v1 | null |
2024-03-25 | Self-Supervised Learning for Medical Image Data with Anatomy-Oriented Imaging Planes | Tianwei Zhang et.al. | 2403.16499v1 | null |
2024-03-25 | PathoTune: Adapting Visual Foundation Model to Pathological Specialists | Jiaxuan Lu et.al. | 2403.16497v1 | null |
2024-03-25 | LSTTN: A Long-Short Term Transformer-based Spatio-temporal Neural Network for Traffic Flow Forecasting | Qinyao Luo et.al. | 2403.16495v1 | null |
2024-03-25 | DeepMachining: Online Prediction of Machining Errors of Lathe Machines | Xiang-Li Lu et.al. | 2403.16451v1 | null |
2024-03-25 | KIT-19: A Comprehensive Korean Instruction Toolkit on 19 Tasks for Fine-Tuning Korean Large Language Models | Dongjun Jang et.al. | 2403.16444v1 | null |
2024-03-22 | Long-CLIP: Unlocking the Long-Text Capability of CLIP | Beichen Zhang et.al. | 2403.15378v1 | null |
2024-03-22 | CoLLEGe: Concept Embedding Generation for Large Language Models | Ryan Teehan et.al. | 2403.15362v1 | null |
2024-03-22 | Neural Plasticity-Inspired Foundation Model for Observing the Earth Crossing Modalities | Zhitong Xiong et.al. | 2403.15356v1 | null |
2024-03-22 | SFOD: Spiking Fusion Object Detector | Yimeng Fan et.al. | 2403.15192v1 | link |
2024-03-22 | Brain-grounding of semantic vectors improves neural decoding of visual stimuli | Shirin Vafaei et.al. | 2403.15176v1 | null |
2024-03-22 | LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement | Nicholas Lee et.al. | 2403.15042v1 | null |
2024-03-22 | Risk and Response in Large Language Models: Evaluating Key Threat Categories | Bahareh Harandizadeh et.al. | 2403.14988v1 | null |
2024-03-22 | CLIP-VQDiffusion : Langauge Free Training of Text To Image generation using CLIP and vector quantized diffusion model | Seungdae Han et.al. | 2403.14944v1 | null |
2024-03-21 | VidLA: Video-Language Alignment at Scale | Mamshad Nayeem Rizve et.al. | 2403.14870v1 | null |
2024-03-21 | TAMS: Translation-Assisted Morphological Segmentation | Enora Rice et.al. | 2403.14840v1 | null |
2024-03-21 | ReNoise: Real Image Inversion Through Iterative Noising | Daniel Garibi et.al. | 2403.14602v1 | null |
2024-03-21 | Towards Efficient Information Fusion: Concentric Dual Fusion Attention Based Multiple Instance Learning for Whole Slide Images | Yujian Liu et.al. | 2403.14346v1 | null |
2024-03-21 | Beyond Surface Similarity: Detecting Subtle Semantic Shifts in Financial Narratives | Jiaxin Liu et.al. | 2403.14341v1 | null |
2024-03-21 | Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition | Sihyun Yu et.al. | 2403.14148v1 | null |
2024-03-21 | Text-Enhanced Data-free Approach for Federated Class-Incremental Learning | Minh-Tuan Tran et.al. | 2403.14101v1 | link |
2024-03-20 | Evaluating Unsupervised Dimensionality Reduction Methods for Pretrained Sentence Embeddings | Gaifan Zhang et.al. | 2403.14001v1 | null |
2024-03-20 | Visually Grounded Speech Models have a Mutual Exclusivity Bias | Leanne Nortje et.al. | 2403.13922v1 | null |
2024-03-20 | Leveraging Linguistically Enhanced Embeddings for Open Information Extraction | Fauzan Farooqui et.al. | 2403.13903v1 | null |
2024-03-20 | On Pretraining Data Diversity for Self-Supervised Learning | Hasan Abed Al Kader Hammoud et.al. | 2403.13808v1 | link |
2024-03-20 | Learning from Models and Data for Visual Grounding | Ruozhen He et.al. | 2403.13804v1 | null |
2024-03-20 | RewardBench: Evaluating Reward Models for Language Modeling | Nathan Lambert et.al. | 2403.13787v1 | link |
2024-03-20 | When Cars meet Drones: Hyperbolic Federated Learning for Source-Free Domain Adaptation in Adverse Weather | Giulia Rizzoli et.al. | 2403.13762v1 | null |
2024-03-20 | PARAMANU-AYN: An Efficient Novel Generative and Instruction-tuned Language Model for Indian Legal Case Documents | Mitodru Niyogi et.al. | 2403.13681v1 | null |
2024-03-20 | Grounding Spatial Relations in Text-Only Language Models | Gorka Azkune et.al. | 2403.13666v1 | link |
2024-03-20 | Do Not Worry if You Do Not Have Data: Building Pretrained Language Models Using Translationese | Meet Doshi et.al. | 2403.13638v1 | null |
2024-03-20 | Bayesian Physics-informed Neural Networks for System Identification of Inverter-dominated Power Systems | Simon Stock et.al. | 2403.13602v1 | null |
2024-03-20 | VL-Mamba: Exploring State Space Models for Multimodal Learning | Yanyuan Qiao et.al. | 2403.13600v1 | null |
2024-03-20 | ReGround: Improving Textual and Spatial Grounding at No Cost | Yuseung Lee et.al. | 2403.13589v1 | null |
2024-03-19 | Zero-Reference Low-Light Enhancement via Physical Quadruple Priors | Wenjing Wang et.al. | 2403.12933v1 | null |
2024-03-19 | Generalizable and Stable Finetuning of Pretrained Language Models on Low-Resource Texts | Sai Ashish Somayajula et.al. | 2403.12918v1 | link |
2024-03-19 | Yell At Your Robot: Improving On-the-Fly from Language Corrections | Lucy Xiaoyang Shi et.al. | 2403.12910v1 | null |
2024-03-20 | MEDBind: Unifying Language and Multimodal Medical Data Embeddings | Yuan Gao et.al. | 2403.12894v2 | null |
2024-03-19 | Automated Data Curation for Robust Language Model Fine-Tuning | Jiuhai Chen et.al. | 2403.12776v1 | null |
2024-03-19 | Diffusion-Driven Self-Supervised Learning for Shape Reconstruction and Pose Estimation | Jingtao Sun et.al. | 2403.12728v1 | link |
2024-03-19 | Simple Hack for Transformers against Heavy Long-Text Classification on a Time- and Memory-Limited GPU Service | Mirza Alim Mutasodirin et.al. | 2403.12563v1 | null |
2024-03-19 | Equity through Access: A Case for Small-scale Deep Learning | Raghavendra Selvan et.al. | 2403.12562v1 | link |
2024-03-19 | Pretraining Codomain Attention Neural Operators for Solving Multiphysics PDEs | Md Ashiqur Rahman et.al. | 2403.12553v1 | null |
2024-03-19 | TT-BLIP: Enhancing Fake News Detection Using BLIP and Tri-Transformer | Eunjee Choi et.al. | 2403.12481v1 | null |
2024-03-18 | Urban Scene Diffusion through Semantic Occupancy Map | Junge Zhang et.al. | 2403.11697v1 | null |
2024-03-18 | Prioritized Semantic Learning for Zero-shot Instance Navigation | Xander Sun et.al. | 2403.11650v1 | null |
2024-03-18 | Arc2Face: A Foundation Model of Human Faces | Foivos Paraperas Papantoniou et.al. | 2403.11641v1 | null |
2024-03-18 | End-to-end multi-modal product matching in fashion e-commerce | Sándor Tóth et.al. | 2403.11593v1 | null |
2024-03-18 | CasSR: Activating Image Power for Real-World Image Super-Resolution | Haolan Chen et.al. | 2403.11451v1 | null |
2024-03-18 | Zero-shot Compound Expression Recognition with Visual Language Model at the 6th ABAW Challenge | Jiahe Wang et.al. | 2403.11450v1 | null |
2024-03-18 | Boosting Continuous Emotion Recognition with Self-Pretraining using Masked Autoencoders, Temporal Convolutional Networks, and Transformers | Weiwei Zhou et.al. | 2403.11440v1 | null |
2024-03-18 | X-LLaVA: Optimizing Bilingual Large Vision-Language Alignment | Dongjae Shin et.al. | 2403.11399v1 | null |
2024-03-17 | Ensembling and Test Augmentation for Covid-19 Detection and Covid-19 Domain Adaptation from 3D CT-Scans | Fares Bougourzi et.al. | 2403.11338v1 | null |
2024-03-17 | Stylized Face Sketch Extraction via Generative Prior with Limited Data | Kwan Yun et.al. | 2403.11263v1 | null |
2024-03-15 | Frozen Feature Augmentation for Few-Shot Image Classification | Andreas Bär et.al. | 2403.10519v1 | null |
2024-03-15 | Approximate Nullspace Augmented Finetuning for Robust Vision Transformers | Haoyang Liu et.al. | 2403.10476v1 | null |
2024-03-15 | Using an LLM to Turn Sign Spottings into Spoken Language Sentences | Ozge Mercanoglu Sincan et.al. | 2403.10434v1 | null |
2024-03-15 | Monotonic Representation of Numeric Properties in Language Models | Benjamin Heinzerling et.al. | 2403.10381v1 | null |
2024-03-15 | Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder | Jinseok Kim et.al. | 2403.10255v1 | null |
2024-03-15 | Generative Region-Language Pretraining for Open-Ended Object Detection | Chuang Lin et.al. | 2403.10191v1 | link |
2024-03-15 | RAFT: Adapting Language Model to Domain Specific RAG | Tianjun Zhang et.al. | 2403.10131v1 | link |
2024-03-15 | Codebook Transfer with Part-of-Speech for Vector-Quantized Image Modeling | Baoquan Zhang et.al. | 2403.10071v1 | null |
2024-03-15 | Boundary Matters: A Bi-Level Active Finetuning Framework | Han Lu et.al. | 2403.10069v1 | null |
2024-03-14 | Adapting OC20-trained EquiformerV2 Models for High-Entropy Materials | Christian M. Clausen et.al. | 2403.09811v1 | null |
2024-03-14 | OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning | Lingyi Hong et.al. | 2403.09634v1 | null |
2024-03-14 | Holo-Relighting: Controllable Volumetric Portrait Relighting from a Single Image | Yiqun Mei et.al. | 2403.09632v1 | null |
2024-03-14 | Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking | Eric Zelikman et.al. | 2403.09629v1 | null |
2024-03-14 | Counterfactual contrastive learning: robust representations via causal image synthesis | Melanie Roschewitz et.al. | 2403.09605v1 | link |
2024-03-14 | uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with Unsupervised Audio Mixtures | Afrina Tabassum et.al. | 2403.09579v1 | link |
2024-03-14 | Unsupervised Modality-Transferable Video Highlight Detection with Representation Activation Sequence Learning | Tingtian Li et.al. | 2403.09401v1 | null |
2024-03-14 | PreConfig: A Pretrained Model for Automating Network Configuration | Fuliang Li et.al. | 2403.09369v1 | null |
2024-03-14 | HeadEvolver: Text to Head Avatars via Locally Learnable Mesh Deformation | Duotun Wang et.al. | 2403.09326v1 | null |
2024-03-14 | Annotation Free Semantic Segmentation with Vision Foundation Models | Soroush Seifi et.al. | 2403.09307v1 | null |
2024-03-14 | CLIP-EBC: CLIP Can Count Accurately through Enhanced Blockwise Classification | Yiming Ma et.al. | 2403.09281v1 | null |
2024-03-13 | DAM: Dynamic Adapter Merging for Continual Video QA Learning | Feng Cheng et.al. | 2403.08755v1 | link |
2024-03-13 | Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization | Renjie Pi et.al. | 2403.08730v1 | null |
2024-03-13 | Data-Efficient Sleep Staging with Synthetic Time Series Pretraining | Niklas Grieger et.al. | 2403.08592v1 | null |
2024-03-13 | Gaussian Splatting in Style | Abhishek Saroha et.al. | 2403.08498v1 | null |
2024-03-13 | Towards Dense and Accurate Radar Perception Via Efficient Cross-Modal Diffusion Model | Ruibin Zhang et.al. | 2403.08460v1 | null |
2024-03-13 | Gemma: Open Models Based on Gemini Research and Technology | Gemma Team et.al. | 2403.08295v1 | null |
2024-03-13 | Generative Pretrained Structured Transformers: Unsupervised Syntactic Language Models at Scale | Xiang Hu et.al. | 2403.08293v1 | null |
2024-03-13 | GPT, Ontology, and CAABAC: A Tripartite Personalized Access Control Model Anchored by Compliance, Context and Attribute | Raza Nowrozy et.al. | 2403.08264v1 | null |
2024-03-13 | LAFS: Landmark-based Facial Self-supervised Learning for Face Recognition | Zhonglin Sun et.al. | 2403.08161v1 | link |
2024-03-12 | Learning Data Association for Multi-Object Tracking using Only Coordinates | Mehdi Miah et.al. | 2403.08018v1 | null |
2024-03-12 | 12 mJ per Class On-Device Online Few-Shot Class-Incremental Learning | Yoga Esa Wibowo et.al. | 2403.07851v1 | link |
2024-03-12 | Chronos: Learning the Language of Time Series | Abdul Fatir Ansari et.al. | 2403.07815v1 | link |
2024-03-12 | Boosting keyword spotting through on-device learnable user speech characteristics | Cristian Cioflan et.al. | 2403.07802v1 | null |
2024-03-12 | Fine-tuning Neural Network Quantum States | Riccardo Rende et.al. | 2403.07795v1 | null |
2024-03-12 | Synth$^2$: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings | Sahand Sharifzadeh et.al. | 2403.07750v1 | null |
2024-03-12 | MoralBERT: Detecting Moral Values in Social Discourse | Vjosa Preniqi et.al. | 2403.07678v1 | null |
2024-03-12 | Characterization of Large Language Model Development in the Datacenter | Qinghao Hu et.al. | 2403.07648v1 | link |
2024-03-12 | Triples-to-isiXhosa (T2X): Addressing the Challenges of Low-Resource Agglutinative Data-to-Text Generation | Francois Meyer et.al. | 2403.07567v1 | link |
2024-03-12 | Matrix-Transformation Based Low-Rank Adaptation (MTLoRA): A Brain-Inspired Method for Parameter-Efficient Fine-Tuning | Yao Liang et.al. | 2403.07440v1 | null |
2024-03-12 | In-context learning enables multimodal large language models to classify cancer pathology images | Dyke Ferber et.al. | 2403.07407v1 | null |
2024-03-11 | VideoMamba: State Space Model for Efficient Video Understanding | Kunchang Li et.al. | 2403.06977v1 | link |
2024-03-11 | MEND: Meta dEmonstratioN Distillation for Efficient and Effective In-Context Learning | Yichuan Li et.al. | 2403.06914v1 | null |
2024-03-11 | FocusCLIP: Multimodal Subject-Level Guidance for Zero-Shot Transfer in Human-Centric Tasks | Muhammad Saif Ullah Khan et.al. | 2403.06904v1 | null |
2024-03-11 | On the Generalization Ability of Unsupervised Pretraining | Yuyang Deng et.al. | 2403.06871v1 | null |
2024-03-11 | Data-Independent Operator: A Training-Free Artifact Representation Extractor for Generalizable Deepfake Detection | Chuangchuang Tan et.al. | 2403.06803v1 | link |
2024-03-11 | PeerAiD: Improving Adversarial Distillation from a Specialized Peer Tutor | Jaewon Jung et.al. | 2403.06668v1 | null |
2024-03-11 | Cross-domain and Cross-dimension Learning for Image-to-Graph Transformers | Alexander H. Berger et.al. | 2403.06601v1 | null |
2024-03-11 | SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection | Yuxuan Li et.al. | 2403.06534v1 | link |
2024-03-11 | FontCLIP: A Semantic Typography Visual-Language Model for Multilingual Font Applications | Yuki Tatsukawa et.al. | 2403.06453v1 | null |
2024-03-11 | Can LLMs' Tuning Methods Work in Medical Multimodal Domain? | Jiawei Chen et.al. | 2403.06407v1 | null |
2024-03-08 | DeepSeek-VL: Towards Real-World Vision-Language Understanding | Haoyu Lu et.al. | 2403.05525v1 | link |
2024-03-08 | Self-Supervised Multiple Instance Learning for Acute Myeloid Leukemia Classification | Salome Kazeminia et.al. | 2403.05379v1 | null |
2024-03-08 | ACLSum: A New Dataset for Aspect-based Summarization of Scientific Publications | Sotaro Takeshita et.al. | 2403.05303v1 | link |
2024-03-08 | CommitBench: A Benchmark for Commit Message Generation | Maximilian Schall et.al. | 2403.05188v1 | link |
2024-03-08 | GSEdit: Efficient Text-Guided Editing of 3D Objects via Gaussian Splatting | Francesco Palandra et.al. | 2403.05154v1 | null |
2024-03-08 | Face2Diffusion for Fast and Editable Face Personalization | Kaede Shiohara et.al. | 2403.05094v1 | link |
2024-03-08 | Agile Multi-Source-Free Domain Adaptation | Xinyao Li et.al. | 2403.05062v1 | link |
2024-03-07 | An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control | Aosong Feng et.al. | 2403.04880v1 | null |
2024-03-07 | I Can't Believe It's Not Scene Flow! | Ishan Khatri et.al. | 2403.04739v1 | link |
2024-03-07 | Masked Capsule Autoencoders | Miles Everett et.al. | 2403.04724v1 | null |
2024-03-07 | Yi: Open Foundation Models by 01.AI | 01. AI et.al. | 2403.04652v1 | link |
2024-03-07 | Teaching Large Language Models to Reason with Reinforcement Learning | Alex Havrilla et.al. | 2403.04642v1 | null |
2024-03-07 | Pix2Gif: Motion-Guided Diffusion for GIF Generation | Hitesh Kandala et.al. | 2403.04634v1 | null |
2024-03-07 | CLIP the Bias: How Useful is Balancing Data in Multimodal Learning? | Ibrahim Alabdulmohsin et.al. | 2403.04547v1 | null |
2024-03-07 | Source Matters: Source Dataset Impact on Model Robustness in Medical Imaging | Dovile Juodelyte et.al. | 2403.04484v1 | link |
2024-03-07 | Enhancing Court View Generation with Knowledge Injection and Guidance | Ang Li et.al. | 2403.04366v1 | link |
2024-03-07 | Federated Recommendation via Hybrid Retrieval Augmented Generation | Huimin Zeng et.al. | 2403.04256v1 | link |
2024-03-07 | DEEP-ICL: Definition-Enriched Experts for Language Model In-Context Learning | Xingwei Qu et.al. | 2403.04233v1 | null |
2024-03-06 | Bridging Language and Items for Retrieval and Recommendation | Yupeng Hou et.al. | 2403.03952v1 | link |
2024-03-06 | The Heuristic Core: Understanding Subnetwork Generalization in Pretrained Language Models | Adithya Bhaskar et.al. | 2403.03942v1 | link |
2024-03-06 | Designing Informative Metrics for Few-Shot Example Selection | Rishabh Adiga et.al. | 2403.03861v1 | null |
2024-03-06 | MeaCap: Memory-Augmented Zero-shot Image Captioning | Zequn Zeng et.al. | 2403.03715v1 | null |
2024-03-06 | On Transfer in Classification: How Well do Subsets of Classes Generalize? | Raphael Baena et.al. | 2403.03569v1 | null |
2024-03-06 | Low-Dose CT Image Reconstruction by Fine-Tuning a UNet Pretrained for Gaussian Denoising for the Downstream Task of Image Enhancement | Tim Selig et.al. | 2403.03551v1 | null |
2024-03-06 | CNN-based End-to-End Adaptive Controller with Stability Guarantees | Myeongseok Ryu et.al. | 2403.03499v1 | null |
2024-03-06 | Multi-modal Deep Learning | Chen Yuhua et.al. | 2403.03385v1 | null |
2024-03-05 | XAI-Based Detection of Adversarial Attacks on Deepfake Detectors | Ben Pinhasov et.al. | 2403.02955v1 | null |
2024-03-05 | Enhancing Conceptual Understanding in Multimodal Contrastive Learning through Hard Negative Samples | Philipp J. Rösch et.al. | 2403.02875v1 | null |
2024-03-05 | Crossing Linguistic Horizons: Finetuning and Comprehensive Evaluation of Vietnamese Large Language Models | Sang T. Truong et.al. | 2403.02715v1 | null |
2024-03-05 | Breeze-7B Technical Report | Chan-Jan Hsu et.al. | 2403.02712v1 | null |
2024-03-04 | A Tutorial on the Pretrain-Finetune Paradigm for Natural Language Processing | Yu Wang et.al. | 2403.02504v1 | null |
2024-03-04 | Encodings for Prediction-based Neural Architecture Search | Yash Akhauri et.al. | 2403.02484v1 | link |
2024-03-04 | Transformers Provably Learn Feature-Position Correlations in Masked Image Modeling | Yu Huang et.al. | 2403.02233v1 | null |
2024-03-04 | TPLLM: A Traffic Prediction Framework Based on Pretrained Large Language Models | Yilong Ren et.al. | 2403.02221v1 | null |
2024-03-04 | What has LeBenchmark Learnt about French Syntax? | Zdravko Dugonjić et.al. | 2403.02173v1 | null |
2024-03-04 | Enhancing Information Maximization with Distance-Aware Contrastive Learning for Source-Free Cross-Domain Few-Shot Learning | Huali Xu et.al. | 2403.01966v1 | link |
2024-03-02 | Data-free Multi-label Image Recognition via LLM-powered Prompt Tuning | Shuo Yang et.al. | 2403.01209v1 | null |
2024-03-01 | Tree-Regularized Tabular Embeddings | Xuan Li et.al. | 2403.00963v1 | link |
2024-03-01 | G3DR: Generative 3D Reconstruction in ImageNet | Pradyumna Reddy et.al. | 2403.00939v1 | null |
2024-03-01 | Word Order and World Knowledge | Qinghua Zhao et.al. | 2403.00876v1 | null |
2024-03-01 | Hierarchical Indexing for Retrieval-Augmented Opinion Summarization | Tom Hosking et.al. | 2403.00435v1 | null |
2024-03-01 | Private Benchmarking to Prevent Contamination and Improve Comparative Evaluation of LLMs | Nishanth Chandran et.al. | 2403.00393v1 | null |
2024-03-01 | MaskLRF: Self-supervised Pretraining via Masked Autoencoding of Local Reference Frames for Rotation-invariant 3D Point Set Analysis | Takahiko Furuya et.al. | 2403.00206v1 | link |
2024-02-29 | Ask Your Distribution Shift if Pre-Training is Right for You | Benjamin Cohen-Wang et.al. | 2403.00194v1 | link |
2024-02-29 | Non-Invasive Medical Digital Twins using Physics-Informed Self-Supervised Learning | Keying Kuang et.al. | 2403.00177v1 | link |
2024-02-29 | UniTS: Building a Unified Time Series Model | Shanghua Gao et.al. | 2403.00131v1 | link |
2024-02-29 | SeD: Semantic-Aware Discriminator for Image Super-Resolution | Bingchen Li et.al. | 2402.19387v1 | null |
2024-02-29 | OzMAC: An Energy-Efficient Sparsity-Exploiting Multiply-Accumulate-Unit Design for DL Inference | Harideep Nair et.al. | 2402.19376v1 | null |
2024-02-29 | Compact Speech Translation Models via Discrete Speech Units Pretraining | Tsz Kin Lam et.al. | 2402.19333v1 | null |
2024-02-29 | Mirage: Cross-Embodiment Zero-Shot Policy Transfer with Cross-Painting | Lawrence Yunliang Chen et.al. | 2402.19249v1 | null |
2024-02-29 | PeLLE: Encoder-based language models for Brazilian Portuguese based on open data | Guilherme Lamartine de Mello et.al. | 2402.19204v1 | null |
2024-02-29 | VIXEN: Visual Text Comparison Network for Image Difference Captioning | Alexander Black et.al. | 2402.19119v1 | null |
2024-02-29 | Improving Group Connectivity for Generalization of Federated Deep Learning | Zexi Li et.al. | 2402.18949v1 | null |
2024-02-29 | Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data | Takaaki Saeki et.al. | 2402.18932v1 | null |
2024-02-29 | Reducing Hallucinations in Entity Abstract Summarization with Facts-Template Decomposition | Fangwei Zhu et.al. | 2402.18873v1 | link |
2024-02-29 | Dual Operating Modes of In-Context Learning | Ziqian Lin et.al. | 2402.18819v1 | null |
2024-02-28 | Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation | Nihal V. Nayak et.al. | 2402.18334v1 | link |
2024-02-28 | How to think step-by-step: A mechanistic understanding of chain-of-thought reasoning | Subhabrata Dutta et.al. | 2402.18312v1 | link |
2024-02-28 | Feature Denoising For Low-Light Instance Segmentation Using Weighted Non-Local Blocks | Joanne Lin et.al. | 2402.18307v1 | null |
2024-02-28 | Self-Supervised Learning in Electron Microscopy: Towards a Foundation Model for Advanced Image Analysis | Bashir Kazimi et.al. | 2402.18286v1 | null |
2024-02-28 | NToP: NeRF-Powered Large-scale Dataset Generation for 2D and 3D Human Pose Estimation in Top-View Fisheye Images | Jingrui Yu et.al. | 2402.18196v1 | null |
2024-02-28 | Diffusion-based Neural Network Weights Generation | Bedionita Soro et.al. | 2402.18153v1 | null |
2024-02-28 | DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning | Jianxiong Li et.al. | 2402.18137v1 | null |
2024-02-28 | Downstream Task Guided Masking Learning in Masked Autoencoders Using Multi-Level Optimization | Han Guo et.al. | 2402.18128v1 | link |
2024-02-28 | Collaborative decoding of critical tokens for boosting factuality of large language models | Lifeng Jin et.al. | 2402.17982v1 | null |
2024-02-27 | Acquiring Linguistic Knowledge from Multimodal Input | Theodor Amariucai et.al. | 2402.17936v1 | null |
2024-02-27 | Tower: An Open Multilingual Large Language Model for Translation-Related Tasks | Duarte M. Alves et.al. | 2402.17733v1 | null |
2024-02-27 | MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation | Hanan Gani et.al. | 2402.17725v1 | link |
2024-02-27 | NextLevelBERT: Investigating Masked Language Modeling with Higher-Level Representations for Long Documents | Tamara Czinczoll et.al. | 2402.17682v1 | null |
2024-02-27 | SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation | Shuangrui Ding et.al. | 2402.17645v1 | null |
2024-02-27 | Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data | Xiao Liu et.al. | 2402.17644v1 | link |
2024-02-27 | Adapt Before Comparison: A New Perspective on Cross-Domain Few-Shot Segmentation | Jonas Herzog et.al. | 2402.17614v1 | null |
2024-02-27 | A Large-scale Evaluation of Pretraining Paradigms for the Detection of Defects in Electroluminescence Solar Cell Images | David Torpey et.al. | 2402.17611v1 | null |
2024-02-27 | Training-Free Long-Context Scaling of Large Language Models | Chenxin An et.al. | 2402.17463v1 | link |
2024-02-27 | Enhancing EEG-to-Text Decoding through Transferable Representations from Pre-trained Contrastive EEG-Text Masked Autoencoder | Jiaqi Wang et.al. | 2402.17433v1 | null |
2024-02-27 | Investigating Continual Pretraining in Large Language Models: Insights and Implications | Çağatay Yıldız et.al. | 2402.17400v1 | null |
2024-02-26 | Immunization against harmful fine-tuning attacks | Domenic Rosati et.al. | 2402.16382v1 | null |
2024-02-26 | An Integrated Data Processing Framework for Pretraining Foundation Models | Yiding Sun et.al. | 2402.16358v1 | link |
2024-02-26 | MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs | Zimu Lu et.al. | 2402.16352v1 | null |
2024-02-26 | BLO-SAM: Bi-level Optimization Based Overfitting-Preventing Finetuning of SAM | Li Zhang et.al. | 2402.16338v1 | null |
2024-02-26 | Learning Translations: Emergent Communication Pretraining for Cooperative Language Acquisition | Dylan Cope et.al. | 2402.16247v1 | null |
2024-02-26 | High-Frequency-aware Hierarchical Contrastive Selective Coding for Representation Learning on Text-attributed Graphs | Peiyan Zhang et.al. | 2402.16240v1 | null |
2024-02-25 | Task Specific Pretraining with Noisy Labels for Remote sensing Image Segmentation | Chenying Liu et.al. | 2402.16164v1 | null |
2024-02-25 | StochCA: A Novel Approach for Exploiting Pretrained Models with Cross-Attention | Seungwon Seo et.al. | 2402.16092v1 | link |
2024-02-25 | LSTP: Language-guided Spatial-Temporal Prompt Learning for Long-form Video-Text Understanding | Yuxuan Wang et.al. | 2402.16050v1 | link |
2024-02-25 | Adversarial-Robust Transfer Learning for Medical Imaging via Domain Assimilation | Xiaohui Chen et.al. | 2402.16005v1 | null |
2024-02-23 | Repetition Improves Language Model Embeddings | Jacob Mitchell Springer et.al. | 2402.15449v1 | link |
2024-02-23 | PREDILECT: Preferences Delineated with Zero-Shot Language-based Reasoning in Reinforcement Learning | Simon Holk et.al. | 2402.15420v1 | null |
2024-02-23 | United We Pretrain, Divided We Fail! Representation Learning for Time Series by Pretraining on 75 Datasets at Once | Maurice Kraus et.al. | 2402.15404v1 | null |
2024-02-23 | Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized Control | Masatoshi Uehara et.al. | 2402.15194v1 | null |
2024-02-23 | The Surprising Effectiveness of Skip-Tuning in Diffusion Sampling | Jiajun Ma et.al. | 2402.15170v1 | null |
2024-02-23 | Self-Adaptive Reconstruction with Contrastive Learning for Unsupervised Sentence Embeddings | Junlong Liu et.al. | 2402.15153v1 | null |
2024-02-23 | ColBERT-XM: A Modular Multi-Vector Representation Model for Zero-Shot Multilingual Information Retrieval | Antoine Louis et.al. | 2402.15059v1 | null |
2024-02-23 | CARBD-Ko: A Contextually Annotated Review Benchmark Dataset for Aspect-Level Sentiment Classification in Korean | Dongjun Jang et.al. | 2402.15046v1 | null |
2024-02-22 | Towards Few-Shot Adaptation of Foundation Models via Multitask Finetuning | Zhuoyan Xu et.al. | 2402.15017v1 | link |
2024-02-22 | Zero-shot cross-lingual transfer in instruction tuning of large language model | Nadezhda Chirkova et.al. | 2402.14778v1 | null |
2024-02-22 | Prompting a Pretrained Transformer Can Be a Universal Approximator | Aleksandar Petrov et.al. | 2402.14753v1 | null |
2024-02-22 | Dependency Annotation of Ottoman Turkish with Multilingual BERT | Şaziye Betül Özateş et.al. | 2402.14743v1 | null |
2024-02-22 | Cleaner Pretraining Corpus Curation with Neural Web Scraping | Zhipeng Xu et.al. | 2402.14652v1 | link |
2024-02-22 | Rethinking Scientific Summarization Evaluation: Grounding Explainable Metrics on Facet-aware Benchmark | Xiuying Chen et.al. | 2402.14359v1 | null |
2024-02-22 | GAM-Depth: Self-Supervised Indoor Depth Estimation Leveraging a Gradient-Aware Mask and Semantic Constraints | Anqi Cheng et.al. | 2402.14354v1 | null |
2024-02-22 | MVD$^2$: Efficient Multiview 3D Reconstruction for Multiview Diffusion | Xin-Yang Zheng et.al. | 2402.14253v1 | null |
2024-02-22 | Swin3D++: Effective Multi-Source Pretraining for 3D Indoor Scene Understanding | Yu-Qi Yang et.al. | 2402.14215v1 | link |
2024-02-22 | BeTAIL: Behavior Transformer Adversarial Imitation Learning from Human Racing Gameplay | Catherine Weaver et.al. | 2402.14194v1 | null |
2024-02-21 | T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching | Zizheng Pan et.al. | 2402.14167v1 | link |
2024-02-21 | User-LLM: Efficient LLM Contextualization with User Embeddings | Lin Ning et.al. | 2402.13598v1 | null |
2024-02-21 | Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment | Yunxin Li et.al. | 2402.13561v1 | null |
2024-02-21 | LLMs Meet Long Video: Advancing Long Video Comprehension with An Interactive Visual Adapter in LLMs | Yunxin Li et.al. | 2402.13546v1 | null |
2024-02-21 | FinGPT-HPC: Efficient Pretraining and Finetuning Large Language Models for Financial Applications with High-Performance Computing | Xiao-Yang Liu et.al. | 2402.13533v1 | null |
2024-02-21 | How Important is Domain Specificity in Language Models and Instruction Finetuning for Biomedical Relation Extraction? | Aviv Brokman et.al. | 2402.13470v1 | null |
2024-02-20 | Investigating Cultural Alignment of Large Language Models | Badr AlKhamissi et.al. | 2402.13231v1 | link |
2024-02-20 | RoCode: A Dataset for Measuring Code Intelligence from Problem Definitions in Romanian | Adrian Cosma et.al. | 2402.13222v1 | link |
2024-02-20 | VideoPrism: A Foundational Visual Encoder for Video Understanding | Long Zhao et.al. | 2402.13217v1 | null |
2024-02-20 | Heterogeneous Graph Reasoning for Fact Checking over Texts and Tables | Haisong Gong et.al. | 2402.13028v1 | link |
2024-02-20 | Cell Graph Transformer for Nuclei Classification | Wei Lou et.al. | 2402.12946v1 | link |
2024-02-20 | More Discriminative Sentence Embeddings via Semantic Graph Smoothing | Chakib Fettal et.al. | 2402.12890v1 | link |
2024-02-20 | ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic | Fajri Koto et.al. | 2402.12840v1 | link |
2024-02-20 | Equivariant Pretrained Transformer for Unified Geometric Learning on Multi-Domain 3D Molecules | Rui Jiao et.al. | 2402.12714v1 | null |
2024-02-20 | PDEformer: Towards a Foundation Model for One-Dimensional Partial Differential Equations | Zhanhong Ye et.al. | 2402.12652v1 | null |
2024-02-19 | GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations | Jinhao Duan et.al. | 2402.12348v1 | link |
2024-02-19 | Key ingredients for effective zero-shot cross-lingual knowledge transfer in generative tasks | Nadezhda Chirkova et.al. | 2402.12279v1 | null |
2024-02-19 | High-quality Data-to-Text Generation for Severely Under-Resourced Languages with Out-of-the-box Large Language Models | Michela Lorandi et.al. | 2402.12267v1 | link |
2024-02-19 | Is It a Free Lunch for Removing Outliers during Pretraining? | Baohao Liao et.al. | 2402.12102v1 | null |
2024-02-19 | Direct Consistency Optimization for Compositional Text-to-Image Personalization | Kyungmin Lee et.al. | 2402.12004v1 | null |
2024-02-19 | DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation | Chong Zeng et.al. | 2402.11929v1 | null |
2024-02-19 | MRKE: The Multi-hop Reasoning Evaluation of LLMs by Knowledge Edition | Jian Wu et.al. | 2402.11924v1 | null |
2024-02-19 | ComFusion: Personalized Subject Generation in Multiple Specific Scenes From Single Image | Yan Hong et.al. | 2402.11849v1 | null |
2024-02-19 | UniST: A Prompt-Empowered Universal Model for Urban Spatio-Temporal Prediction | Yuan Yuan et.al. | 2402.11838v1 | null |
2024-02-19 | LLM as Prompter: Low-resource Inductive Reasoning on Arbitrary Knowledge Graphs | Kai Wang et.al. | 2402.11804v1 | null |
2024-02-16 | Proving membership in LLM pretraining data via data watermarks | Johnny Tian-Zheng Wei et.al. | 2402.10892v1 | null |
2024-02-16 | Enhancement-Driven Pretraining for Robust Fingerprint Representation Learning | Ekta Gavas et.al. | 2402.10847v1 | null |
2024-02-16 | Associative Memories in the Feature Space | Tommaso Salvatori et.al. | 2402.10814v1 | null |
2024-02-16 | BioFusionNet: Deep Learning-Based Survival Risk Stratification in ER+ Breast Cancer Through Multifeature and Multimodal Data Fusion | Raktim Kumar Mondol et.al. | 2402.10717v1 | null |
2024-02-16 | Are ID Embeddings Necessary? Whitening Pre-trained Text Embeddings for Effective Sequential Recommendation | Lingzi Zhang et.al. | 2402.10602v1 | null |
2024-02-16 | SPAR: Personalized Content-Based Recommendation via Long Engagement Attention | Chiyu Zhang et.al. | 2402.10555v1 | null |
2024-02-16 | MFBind: a Multi-Fidelity Approach for Evaluating Drug Compounds in Practical Generative Modeling | Peter Eckmann et.al. | 2402.10387v1 | null |
2024-02-15 | BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains | Yanis Labrak et.al. | 2402.10373v1 | null |
2024-02-15 | Euclid preparation. Measuring detailed galaxy morphologies for Euclid with Machine Learning | Euclid Collaboration et.al. | 2402.10187v1 | link |
2024-02-15 | Data Engineering for Scaling Language Models to 128K Context | Yao Fu et.al. | 2402.10171v1 | link |
2024-02-15 | Towards Safer Large Language Models through Machine Unlearning | Zheyuan Liu et.al. | 2402.10058v1 | null |
2024-02-15 | LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition | Jinyuan Li et.al. | 2402.09989v1 | null |
2024-02-15 | Data Augmentation and Transfer Learning Approaches Applied to Facial Expressions Recognition | Enrico Randellini et.al. | 2402.09982v1 | null |
2024-02-15 | All in One and One for All: A Simple yet Effective Method towards Cross-domain Graph Pretraining | Haihong Zhao et.al. | 2402.09834v1 | null |
2024-02-15 | Knowledge of Pretrained Language Models on Surface Information of Tokens | Tatsuya Hiraoka et.al. | 2402.09808v1 | null |
2024-02-14 | Towards Privacy-Aware Sign Language Translation at Scale | Phillip Rust et.al. | 2402.09611v1 | null |
2024-02-14 | DeepATLAS: One-Shot Localization for Biomedical Data | Peter D. Chang et.al. | 2402.09587v1 | null |
2024-02-14 | Deep Rib Fracture Instance Segmentation and Classification from CT on the RibFrac Challenge | Jiancheng Yang et.al. | 2402.09372v1 | null |
2024-02-14 | Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking | Yi Fung et.al. | 2402.09369v1 | null |
2024-02-14 | HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM Inference | Yashas Samaga B L et.al. | 2402.09360v1 | null |
2024-02-14 | Few-Shot Object Detection with Sparse Context Transformers | Jie Mei et.al. | 2402.09315v1 | null |
2024-02-14 | Embracing the black box: Heading towards foundation models for causal discovery from time series data | Gideon Stein et.al. | 2402.09305v1 | null |
2024-02-14 | Spectral Filters, Dark Signals, and Attention Sinks | Nicola Cancedda et.al. | 2402.09221v1 | null |
2024-02-14 | MPIrigen: MPI Code Generation through Domain-Specific Language Models | Nadav Schneider et.al. | 2402.09126v1 | link |
2024-02-14 | I can't see it but I can Fine-tune it: On Encrypted Fine-tuning of Transformers using Fully Homomorphic Encryption | Prajwal Panzade et.al. | 2402.09059v1 | null |
2024-02-14 | Pretraining Vision-Language Model for Difference Visual Question Answering in Longitudinal Chest X-rays | Yeongjae Cho et.al. | 2402.08966v1 | null |
2024-02-14 | Moving Object Proposals with Deep Learned Optical Flow for Video Object Segmentation | Ge Shi et.al. | 2402.08882v1 | null |
2024-02-13 | Human Curriculum Effects Emerge with In-Context Learning in Neural Networks | Jacob Russin et.al. | 2402.08674v1 | null |
2024-02-13 | Tandem Transformers for Inference Efficient LLMs | Aishwarya P S et.al. | 2402.08644v1 | null |
2024-02-13 | Captions Are Worth a Thousand Words: Enhancing Product Retrieval with Pretrained Image-to-Text Models | Jason Tang et.al. | 2402.08532v1 | null |
2024-02-13 | Concept-1K: A Novel Benchmark for Instance Incremental Learning | Junhao Zheng et.al. | 2402.08526v1 | link |
2024-02-13 | Pixel Sentence Representation Learning | Chenghao Xiao et.al. | 2402.08183v1 | null |
2024-02-12 | Which Pretrain Samples to Rehearse when Finetuning Pretrained Models? | Andrew Bai et.al. | 2402.08096v1 | null |
2024-02-12 | Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models | Siddharth Karamcheti et.al. | 2402.07865v1 | link |
2024-02-12 | Differentially Private Zeroth-Order Methods for Scalable Large Language Model Finetuning | Z Liu et.al. | 2402.07818v1 | null |
2024-02-12 | AutoMathText: Autonomous Data Selection with Language Models for Mathematical Texts | Yifan Zhang et.al. | 2402.07625v1 | link |
2024-02-12 | Foundational Inference Models for Dynamical Systems | Patrick Seifner et.al. | 2402.07594v1 | null |
2024-02-12 | Only the Curve Shape Matters: Training Foundation Models for Zero-Shot Multivariate Time Series Forecasting through Next Curve Shape Prediction | Cheng Feng et.al. | 2402.07570v1 | link |
2024-02-12 | MAFIA: Multi-Adapter Fused Inclusive LanguAge Models | Prachi Jain et.al. | 2402.07519v1 | null |
2024-02-12 | SLIT: Boosting Audio-Text Pre-Training via Multi-Stage Learning and Instruction Tuning | Hang Zhao et.al. | 2402.07485v1 | null |
2024-02-12 | Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT | Jon Saad-Falcon et.al. | 2402.07440v1 | null |
2024-02-12 | SemTra: A Semantic Skill Translator for Cross-Domain Zero-Shot Policy Adaptation | Sangwoo Shin et.al. | 2402.07418v1 | null |
2024-02-11 | Multi-Modal Emotion Recognition by Text, Speech and Video Using Pretrained Transformers | Minoo Shayaninasab et.al. | 2402.07327v1 | null |
2024-02-09 | Feature Density Estimation for Out-of-Distribution Detection via Normalizing Flows | Evan D. Cook et.al. | 2402.06537v1 | null |
2024-02-09 | GS-CLIP: Gaussian Splatting for Contrastive Language-Image-3D Pretraining from Real-World Data | Haoyuan Li et.al. | 2402.06198v1 | null |
2024-02-09 | Premier-TACO: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss | Ruijie Zheng et.al. | 2402.06187v1 | null |
2024-02-09 | MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models | Yixiao Zhang et.al. | 2402.06178v1 | null |
2024-02-08 | Early Fusion of Features for Semantic Segmentation | Anupam Gupta et.al. | 2402.06091v1 | null |
2024-02-08 | Exploring Visual Culture Awareness in GPT-4V: A Comprehensive Probing | Yong Cao et.al. | 2402.06015v1 | null |
2024-02-08 | WebLINX: Real-World Website Navigation with Multi-Turn Dialogue | Xing Han Lù et.al. | 2402.05930v1 | null |
2024-02-08 | Collaborative Control for Geometry-Conditioned PBR Image Generation | Shimon Vainer et.al. | 2402.05919v1 | null |
2024-02-08 | Efficient Stagewise Pretraining via Progressive Subnetworks | Abhishek Panigrahi et.al. | 2402.05913v1 | null |
2024-02-08 | SpiRit-LM: Interleaved Spoken and Written Language Model | Tu Anh Nguyen et.al. | 2402.05755v1 | null |
2024-02-08 | Unified Speech-Text Pretraining for Spoken Dialog Modeling | Heeseung Kim et.al. | 2402.05706v1 | null |
2024-02-08 | Pretrained Generative Language Models as General Learning Frameworks for Sequence-Based Tasks | Ben Fauber et.al. | 2402.05616v1 | null |
2024-02-08 | Establishing degrees of closeness between audio recordings along different dimensions using large-scale cross-lingual models | Maxime Fily et.al. | 2402.05581v1 | null |
2024-02-07 | BIKED++: A Multimodal Dataset of 1.4 Million Bicycle Image and Parametric CAD Designs | Lyle Regenwetter et.al. | 2402.05301v1 | null |
2024-02-07 | SPAD : Spatially Aware Multiview Diffusers | Yash Kant et.al. | 2402.05235v1 | null |
2024-02-07 | A Sober Look at LLMs for Material Discovery: Are They Actually Good for Bayesian Optimization Over Molecules? | Agustinus Kristiadi et.al. | 2402.05015v1 | link |
2024-02-07 | Personalized Text Generation with Fine-Grained Linguistic Control | Bashar Alhafni et.al. | 2402.04914v1 | link |
2024-02-07 | OV-NeRF: Open-vocabulary Neural Radiance Fields with Vision and Language Foundation Models for 3D Semantic Understanding | Guibiao Liao et.al. | 2402.04648v1 | null |
2024-02-06 | PreGIP: Watermarking the Pretraining of Graph Neural Networks for Deep Intellectual Property Protection | Enyan Dai et.al. | 2402.04435v1 | null |
2024-02-06 | Fine-Tuned Language Models Generate Stable Inorganic Materials as Text | Nate Gruver et.al. | 2402.04379v1 | link |
2024-02-06 | The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry | Michael Zhang et.al. | 2402.04347v1 | null |
2024-02-06 | EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters | Quan Sun et.al. | 2402.04252v1 | link |
2024-02-06 | MusicRL: Aligning Music Generation to Human Preferences | Geoffrey Cideron et.al. | 2402.04229v1 | null |
2024-02-06 | Scaling Laws for Downstream Task Performance of Large Language Models | Berivan Isik et.al. | 2402.04177v1 | null |
2024-02-06 | Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains | Ashok Vardhan Makkuva et.al. | 2402.04161v1 | link |
2024-02-06 | A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation | Zhengbo Wang et.al. | 2402.04087v1 | link |
2024-02-06 | Connecting the Dots: Collaborative Fine-tuning for Black-Box Vision-Language Models | Zhengbo Wang et.al. | 2402.04050v1 | null |
2024-02-06 | Polyp-DDPM: Diffusion-Based Semantic Polyp Synthesis for Enhanced Segmentation | Zolnamar Dorjsembe et.al. | 2402.04031v1 | link |
2024-02-06 | Low-rank Attention Side-Tuning for Parameter-Efficient Fine-Tuning | Ningyuan Tang et.al. | 2402.04009v1 | null |
2024-02-06 | Understanding the Effect of Noise in LLM Training Data with Algorithmic Chains of Thought | Alex Havrilla et.al. | 2402.04004v1 | null |
2024-02-06 | Humans Beat Deep Networks at Recognizing Objects in Unusual Poses, Given Enough Time | Netta Ollikka et.al. | 2402.03973v1 | null |
2024-02-05 | Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining | Jiarun Liu et.al. | 2402.03302v1 | link |
2024-02-05 | Training-Free Consistent Text-to-Image Generation | Yoad Tewel et.al. | 2402.03286v1 | null |
2024-02-05 | CLIP Can Understand Depth | Dunam Kim et.al. | 2402.03251v1 | null |
2024-02-05 | FROSTER: Frozen CLIP Is A Strong Teacher for Open-Vocabulary Action Recognition | Xiaohu Huang et.al. | 2402.03241v1 | null |
2024-02-05 | Towards mitigating uncann(eye)ness in face swaps via gaze-centric loss terms | Ethan Wilson et.al. | 2402.03188v1 | null |
2024-02-05 | Time-, Memory- and Parameter-Efficient Visual Adaptation | Otniel-Bogdan Mercea et.al. | 2402.02887v1 | null |
2024-02-05 | Enhancing Compositional Generalization via Compositional Feature Alignment | Haoxiang Wang et.al. | 2402.02851v1 | link |
2024-02-04 | Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning | Haoyi Zhu et.al. | 2402.02500v1 | null |
2024-02-04 | A Graph is Worth |
Zhangyang Gao et.al. | 2402.02464v1 | null |
2024-02-04 | BECLR: Batch Enhanced Contrastive Few-Shot Learning | Stylianos Poulakakis-Daktylidis et.al. | 2402.02444v1 | link |
2024-02-02 | From Words to Molecules: A Survey of Large Language Models in Chemistry | Chang Liao et.al. | 2402.01439v1 | null |
2024-02-02 | Continual Learning for Large Language Models: A Survey | Tongtong Wu et.al. | 2402.01364v1 | null |
2024-02-02 | Describing Images |
Ece Takmaz et.al. | 2402.01352v1 | null |
2024-02-02 | Training-time Neuron Alignment through Permutation Subspace for Improving Linear Mode Connectivity and Model Fusion | Zexi Li et.al. | 2402.01342v1 | null |
2024-02-02 | On the Transferability of Large-Scale Self-Supervision to Few-Shot Audio Classification | Calum Heggan et.al. | 2402.01274v1 | null |
2024-02-02 | Can Shape-Infused Joint Embeddings Improve Image-Conditioned 3D Diffusion? | Cristian Sbrolli et.al. | 2402.01241v1 | null |
2024-02-02 | In-Context Learning for Few-Shot Nested Named Entity Recognition | Meishan Zhang et.al. | 2402.01182v1 | null |
2024-02-02 | Interpretation of Intracardiac Electrograms Through Textual Representations | William Jongwon Han et.al. | 2402.01115v1 | null |
2024-02-02 | Double-Dip: Thwarting Label-Only Membership Inference Attacks with Transfer Learning and Randomization | Arezoo Rajabi et.al. | 2402.01114v1 | null |
2024-02-02 | Specialized Language Models with Cheap Inference from Limited Domain Data | David Grangier et.al. | 2402.01093v1 | null |
2024-02-01 | Can Large Language Models Understand Context? | Yilun Zhu et.al. | 2402.00858v1 | null |
2024-02-01 | LLMs learn governing principles of dynamical systems, revealing an in-context neural scaling law | Toni J. B. Liu et.al. | 2402.00795v1 | null |
2024-02-01 | CroissantLLM: A Truly Bilingual French-English Language Model | Manuel Faysse et.al. | 2402.00786v1 | link |
2024-02-01 | AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning | Fu-Yun Wang et.al. | 2402.00769v1 | link |
2024-02-01 | Unlearnable Algorithms for In-context Learning | Andrei Muresanu et.al. | 2402.00751v1 | null |
2024-02-01 | Approximating Optimal Morphing Attacks using Template Inversion | Laurent Colbois et.al. | 2402.00695v1 | null |
2024-02-01 | Improving Critical Node Detection Using Neural Network-based Initialization in a Genetic Algorithm | Chanjuan Liu et.al. | 2402.00404v1 | null |
2024-02-01 | Real-time Stereo Speech Enhancement with Spatial-Cue Preservation based on Dual-Path Structure | Masahito Togami et.al. | 2402.00337v1 | null |
2024-02-01 | Towards AI-Assisted Synthesis of Verified Dafny Methods | Md Rakib Hossain Misu et.al. | 2402.00247v1 | link |
2024-01-31 | Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research | Luca Soldaini et.al. | 2402.00159v1 | link |
2024-01-31 | Binding Touch to Everything: Learning Unified Multimodal Tactile Representations | Fengyu Yang et.al. | 2401.18084v1 | null |
2024-01-31 | Paramanu: A Family of Novel Efficient Indic Generative Foundation Language Models | Mitodru Niyogi et.al. | 2401.18034v1 | null |
2024-01-31 | Efficient Subseasonal Weather Forecast using Teleconnection-informed Transformers | Shan Zhao et.al. | 2401.17870v1 | null |
2024-01-31 | Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model | Zihan Zhong et.al. | 2401.17868v1 | null |
2024-01-31 | Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction | Xueyuan Chen et.al. | 2401.17796v1 | null |
2024-01-31 | EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning | Jaeyeon Kim et.al. | 2401.17690v1 | link |
2024-01-31 | Towards Efficient and Reliable LLM Serving: A Real-World Workload Study | Yuxin Wang et.al. | 2401.17644v1 | null |
2024-01-31 | Local and Global Contexts for Conversation | Zuoquan Lin et.al. | 2401.17588v1 | link |
2024-01-30 | Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens | Jiacheng Liu et.al. | 2401.17377v1 | null |
2024-01-30 | Transfer Learning for Text Diffusion Models | Kehang Han et.al. | 2401.17181v1 | null |
2024-01-29 | Unsupervised Discovery of Steerable Factors When Graph Deep Generative Models Are Entangled | Shengchao Liu et.al. | 2401.17123v1 | link |
2024-01-30 | Finetuning Large Language Models for Vulnerability Detection | Alexey Shestov et.al. | 2401.17010v1 | null |
2024-01-30 | Distinguishing Fictional Voices: a Study of Authorship Verification Models for Quotation Attribution | Gaspard Michel et.al. | 2401.16968v1 | link |
2024-01-30 | PBSCSR: The Piano Bootleg Score Composer Style Recognition Dataset | Arhan Jain et.al. | 2401.16803v1 | link |
2024-01-30 | MolPLA: A Molecular Pretraining Framework for Learning Cores, R-Groups and their Linker Joints | Mogan Gim et.al. | 2401.16771v1 | null |
2024-01-30 | Gradient-Based Language Model Red Teaming | Nevan Wichers et.al. | 2401.16656v1 | link |
2024-01-30 | IRCoCo: Immediate Rewards-Guided Deep Reinforcement Learning for Code Completion | Bolun Li et.al. | 2401.16637v1 | link |
2024-01-29 | ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks | Bolei Ma et.al. | 2401.16589v1 | link |
2024-01-29 | Massively Multilingual Text Translation For Low-Resource Languages | Zhong Zhou et.al. | 2401.16582v1 | null |
2024-01-29 | Scaling Sparse Fine-Tuning to Large Language Models | Alan Ansell et.al. | 2401.16405v1 | null |
2024-01-29 | PICL: Physics Informed Contrastive Learning for Partial Differential Equations | Cooper Lorsung et.al. | 2401.16327v1 | null |
2024-01-29 | Enhancing Molecular Property Prediction with Auxiliary Learning and Task-Specific Adaptation | Vishal Dey et.al. | 2401.16299v1 | null |
2024-01-29 | Textual Entailment for Effective Triple Validation in Object Prediction | Andrés García-Silva et.al. | 2401.16293v1 | null |
2024-01-29 | Cutup and Detect: Human Fall Detection on Cutup Untrimmed Videos Using a Large Foundational Video Understanding Model | Till Grutschus et.al. | 2401.16280v1 | null |
2024-01-29 | Type-based Neural Link Prediction Adapter for Complex Query Answering | Lingning Song et.al. | 2401.16045v1 | null |
2024-01-29 | Finding Challenging Metaphors that Confuse Pretrained Language Models | Yucheng Li et.al. | 2401.16012v1 | null |
2024-01-29 | StableIdentity: Inserting Anybody into Anywhere at First Sight | Qinghe Wang et.al. | 2401.15975v1 | null |
2024-01-29 | Masked Audio Modeling with CLAP and Multi-Objective Learning | Yifei Xin et.al. | 2401.15953v1 | null |
2024-01-29 | HICH Image/Text (HICH-IT): Comprehensive Text and Image Datasets for Hypertensive Intracerebral Hemorrhage Research | Jie Li et.al. | 2401.15934v1 | null |
2024-01-26 | RESPRECT: Speeding-up Multi-fingered Grasping with Residual Reinforcement Learning | Federico Ceola et.al. | 2401.14858v1 | null |
2024-01-26 | Endowing Protein Language Models with Structural Knowledge | Dexiong Chen et.al. | 2401.14819v1 | null |
2024-01-26 | MaLLaM -- Malaysia Large Language Model | Husein Zolkepli et.al. | 2401.14680v1 | null |
2024-01-26 | An Empirical Investigation of Domain Adaptation Ability for Chinese Spelling Check Models | Xi Wang et.al. | 2401.14630v1 | null |
2024-01-26 | Towards Lifelong Scene Graph Generation with Knowledge-ware In-context Prompt Learning | Tao He et.al. | 2401.14626v1 | null |
2024-01-25 | MResT: Multi-Resolution Sensing for Real-Time Control with Vision-Language Models | Saumya Saxena et.al. | 2401.14502v1 | null |
2024-01-25 | Rethinking Patch Dependence for Masked Autoencoders | Letian Fu et.al. | 2401.14391v1 | null |
2024-01-25 | TURNA: A Turkish Encoder-Decoder Language Model for Enhanced Understanding and Generation | Gökçe Uludoğan et.al. | 2401.14373v1 | link |
2024-01-25 | Sketch2NeRF: Multi-view Sketch-guided Text-to-3D Generation | Minglin Chen et.al. | 2401.14257v1 | null |
2024-01-25 | Assessing the Portability of Parameter Matrices Trained by Parameter-Efficient Finetuning Methods | Mohammed Sabry et.al. | 2401.14228v1 | null |
2024-01-25 | BootPIG: Bootstrapping Zero-shot Personalized Image Generation Capabilities in Pretrained Diffusion Models | Senthil Purushwalkam et.al. | 2401.13974v1 | null |
2024-01-24 | S2TPVFormer: Spatio-Temporal Tri-Perspective View for temporally coherent 3D Semantic Occupancy Prediction | Sathira Silva et.al. | 2401.13785v1 | null |
2024-01-24 | Enhancing Image Retrieval : A Comprehensive Study on Photo Search using the CLIP Mode | Naresh Kumar Lahajal et.al. | 2401.13613v1 | null |
2024-01-24 | Large Malaysian Language Model Based on Mistral for Enhanced Local Language Understanding | Husein Zolkepli et.al. | 2401.13565v1 | null |
2024-01-25 | Finetuning Foundation Models for Joint Analysis Optimization | Matthias Vigl et.al. | 2401.13536v2 | null |
2024-01-24 | Generative Human Motion Stylization in Latent Space | Chuan Guo et.al. | 2401.13505v1 | null |
2024-01-24 | MaLA-500: Massive Language Adaptation of Large Language Models | Peiqin Lin et.al. | 2401.13303v1 | null |
2024-01-24 | Audio-Infused Automatic Image Colorization by Exploiting Audio Scene Semantics | Pengcheng Zhao et.al. | 2401.13270v1 | null |
2024-01-24 | Segment Any Cell: A SAM-based Auto-prompting Fine-tuning Framework for Nuclei Segmentation | Saiyang Na et.al. | 2401.13220v1 | null |
2024-01-24 | AdCorDA: Classifier Refinement via Adversarial Correction and Domain Adaptation | Lulan Shen et.al. | 2401.13212v1 | null |
2024-01-23 | The Language Barrier: Dissecting Safety Challenges of LLMs in Multilingual Contexts | Lingfeng Shen et.al. | 2401.13136v1 | null |
2024-01-23 | Digital Divides in Scene Recognition: Uncovering Socioeconomic Biases in Deep Learning Systems | Michelle R. Greene et.al. | 2401.13097v1 | null |
2024-01-23 | GALA: Generating Animatable Layered Assets from a Single Scan | Taeksoo Kim et.al. | 2401.12979v1 | null |
2024-01-23 | Pretraining and the Lasso | Erin Craig et.al. | 2401.12911v1 | null |
2024-01-23 | PSDF: Prior-Driven Neural Implicit Surface Learning for Multi-view Reconstruction | Wanjuan Su et.al. | 2401.12751v1 | null |
2024-01-23 | Evaluation of large language models for assessing code maintainability | Marc Dillmann et.al. | 2401.12714v1 | null |
2024-01-23 | Persona-centric Metamorphic Relation guided Robustness Evaluation for Multi-turn Dialogue Modelling | Yanbing Chen et.al. | 2401.12483v1 | null |
2024-01-23 | The Neglected Tails of Vision-Language Models | Shubham Parashar et.al. | 2401.12425v1 | null |
2024-01-22 | OCT-SelfNet: A Self-Supervised Framework with Multi-Modal Datasets for Generalized and Robust Retinal Disease Detection | Fatema-E Jannat et.al. | 2401.12344v1 | null |
2024-01-22 | Contrastive Learning and Cycle Consistency-based Transductive Transfer Learning for Target Annotation | Shoaib Meraj Sami et.al. | 2401.12340v1 | null |
2024-01-22 | APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference | Bowen Zhao et.al. | 2401.12200v1 | null |
2024-01-22 | An Empirical Analysis of In-context Learning Abilities of LLMs for MT | Pranjal A. Chitale et.al. | 2401.12097v1 | null |
2024-01-22 | Multi-level Cross-modal Alignment for Image Clustering | Liping Qiu et.al. | 2401.11740v1 | null |
2024-01-22 | M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition | Mengmeng Wang et.al. | 2401.11649v1 | null |
2024-01-21 | MolTailor: Tailoring Chemical Molecular Representation to Specific Tasks via Text Prompts | Haoqiang Guo et.al. | 2401.11403v1 | link |
2024-01-21 | LLMRA: Multi-modal Large Language Model based Restoration Assistant | Xiaoyu Jin et.al. | 2401.11401v1 | null |
2024-01-19 | Revealing Emotional Clusters in Speaker Embeddings: A Contrastive Learning Strategy for Speech Emotion Recognition | Ismail Rasim Ulgen et.al. | 2401.11017v1 | null |
2024-01-19 | Mitigating Hallucinations of Large Language Models via Knowledge Consistent Alignment | Fanqi Wan et.al. | 2401.10768v1 | link |
2024-01-19 | DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval | Xiangpeng Yang et.al. | 2401.10588v1 | null |
2024-01-19 | Name Tagging Under Domain Shift via Metric Learning for Life Sciences | Hongyi Liu et.al. | 2401.10472v1 | null |
2024-01-19 | Investigating Training Strategies and Model Robustness of Low-Rank Adaptation for Language Modeling in Speech Recognition | Yu Yu et.al. | 2401.10447v1 | null |
2024-01-18 | Supervised Fine-tuning in turn Improves Visual Foundation Models | Xiaohu Jiang et.al. | 2401.10222v1 | link |
2024-01-18 | Evolutionary Computation in the Era of Large Language Model: Survey and Roadmap | Xingyu Wu et.al. | 2401.10034v1 | null |
2024-01-18 | Gender Bias in Machine Translation and The Era of Large Language Models | Eva Vanmassenhove et.al. | 2401.10016v1 | null |
2024-01-18 | Meme-ingful Analysis: Enhanced Understanding of Cyberbullying in Memes Through Multimodal Explanations | Prince Jha et.al. | 2401.09899v1 | link |
2024-01-18 | Improving fine-grained understanding in image-text pre-training | Ioana Bica et.al. | 2401.09865v1 | null |
2024-01-18 | Improving the Accuracy of Analog-Based In-Memory Computing Accelerators Post-Training | Corey Lammie et.al. | 2401.09859v1 | null |
2024-01-18 | Simple and effective data augmentation for compositional generalization | Yuekun Yao et.al. | 2401.09815v1 | null |
2024-01-18 | Instance Brownian Bridge as Texts for Open-vocabulary Video Instance Segmentation | Zesen Cheng et.al. | 2401.09732v1 | link |
2024-01-17 | CT Liver Segmentation via PVT-based Encoding and Refined Decoding | Debesh Jha et.al. | 2401.09630v1 | link |
2024-01-17 | Aligning Large Language Models with Counterfactual DPO | Bradley Butcher et.al. | 2401.09566v1 | null |
2024-01-17 | Deciphering Textual Authenticity: A Generalized Strategy through the Lens of Large Language Semantics for Detecting Human vs. Machine-Generated Text | Mazal Bethany et.al. | 2401.09407v1 | null |
2024-01-17 | Machines Do See Color: A Guideline to Classify Different Forms of Racist Discourse in Large Corpora | Diana Davila Gordillo et.al. | 2401.09333v1 | null |
2024-01-17 | An Efficient Generalizable Framework for Visuomotor Policies via Control-aware Augmentation and Privilege-guided Distillation | Yinuo Zhao et.al. | 2401.09258v1 | null |
2024-01-17 | Preparing Lessons for Progressive Training on Language Models | Yu Pan et.al. | 2401.09192v1 | null |
2024-01-17 | Visual Robotic Manipulation with Depth-Aware Pretraining | Wanying Wang et.al. | 2401.09038v1 | null |
2024-01-16 | Fast Dynamic 3D Object Generation from a Single-view Video | Zijie Pan et.al. | 2401.08742v1 | null |
2024-01-16 | Fixed Point Diffusion Models | Xingjian Bai et.al. | 2401.08741v1 | null |
2024-01-16 | Tuning Language Models by Proxy | Alisa Liu et.al. | 2401.08565v1 | null |
2024-01-16 | GATS: Gather-Attend-Scatter | Konrad Zolna et.al. | 2401.08525v1 | null |
2024-01-17 | Salute the Classic: Revisiting Challenges of Machine Translation in the Age of Large Language Models | Jianhui Pang et.al. | 2401.08350v2 | null |
2024-01-16 | MCRPL: A Pretrain, Prompt & Fine-tune Paradigm for Non-overlapping Many-to-one Cross-domain Recommendation | Hao Liu et.al. | 2401.08228v1 | null |
2024-01-16 | SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation | Zhixuan Liu et.al. | 2401.08053v1 | null |
2024-01-15 | How does self-supervised pretraining improve robustness against noisy labels across various medical image classification datasets? | Bidur Khanal et.al. | 2401.07990v1 | null |
2024-01-15 | Word Boundary Information Isn't Useful for Encoder Language Models | Edward Gow-Smith et.al. | 2401.07923v1 | null |
2024-01-15 | EMBRE: Entity-aware Masking for Biomedical Relation Extraction | Mingjie Li et.al. | 2401.07877v1 | null |
2024-01-15 | VeCAF: VLM-empowered Collaborative Active Finetuning with Training Objective Awareness | Rongyu Zhang et.al. | 2401.07853v1 | null |
2024-01-15 | Fusing Echocardiography Images and Medical Records for Continuous Patient Stratification | Nathan Painchaud et.al. | 2401.07796v1 | null |
2024-01-15 | On the importance of Data Scale in Pretraining Arabic Language Models | Abbas Ghaddar et.al. | 2401.07760v1 | link |
2024-01-15 | HexaGen3D: StableDiffusion is just one step away from Fast and Diverse Text-to-3D Generation | Antoine Mercier et.al. | 2401.07727v1 | null |
2024-01-12 | Scalable 3D Panoptic Segmentation With Superpoint Graph Clustering | Damien Robert et.al. | 2401.06704v1 | link |
2024-01-12 | TransliCo: A Contrastive Learning Framework to Address the Script Barrier in Multilingual Pretrained Language Models | Yihong Liu et.al. | 2401.06620v1 | null |
2024-01-12 | BOK-VQA: Bilingual Outside Knowledge-based Visual Question Answering via Graph Representation Pretraining | Minjun Kim et.al. | 2401.06443v1 | null |
2024-01-12 | AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters | Li Lucy et.al. | 2401.06408v1 | link |
2024-01-12 | AffordanceLLM: Grounding Affordance from Vision Language Models | Shengyi Qian et.al. | 2401.06341v1 | null |
2024-01-12 | Application Of Vision-Language Models For Assessing Osteoarthritis Disease Severity | Banafshe Felfeliyan et.al. | 2401.06331v1 | null |
2024-01-11 | A Study on Self-Supervised Pretraining for Vision Problems in Gastrointestinal Endoscopy | Edward Sanderson et.al. | 2401.06278v1 | null |
2024-01-11 | Transformers are Multi-State RNNs | Matanel Oren et.al. | 2401.06104v1 | null |
2024-01-11 | Autocompletion of Chief Complaints in the Electronic Health Records using Large Language Models | K M Sajjadul Islam et.al. | 2401.06088v1 | null |
2024-01-11 | LinguAlchemy: Fusing Typological and Geographical Elements for Unseen Language Generalization | Muhammad Farid Adilazuarda et.al. | 2401.06034v1 | null |
2024-01-11 | DiffDA: a diffusion model for weather-scale data assimilation | Langwen Huang et.al. | 2401.05932v1 | null |
2024-01-11 | Towards Boosting Many-to-Many Multilingual Machine Translation with Large Language Models | Pengzhi Gao et.al. | 2401.05861v1 | link |
2024-01-11 | Discovering Low-rank Subspaces for Language-agnostic Multilingual Representations | Zhihui Xie et.al. | 2401.05792v1 | link |
2024-01-11 | Zero Resource Cross-Lingual Part Of Speech Tagging | Sahil Chopra et.al. | 2401.05727v1 | null |
2024-01-10 | Diffusion Priors for Dynamic View Synthesis from Monocular Videos | Chaoyang Wang et.al. | 2401.05583v1 | null |
2024-01-10 | Siamese Networks with Soft Labels for Unsupervised Lesion Detection and Patch Pretraining on Screening Mammograms | Kevin Van Vorst et.al. | 2401.05570v1 | null |
2024-01-10 | Physics guided dual Self-supervised learning for structure-based materials property prediction | Nihang Fu et.al. | 2401.05223v1 | link |
2024-01-10 | Pre-trained Large Language Models for Financial Sentiment Analysis | Wei Luo et.al. | 2401.05215v1 | null |
2024-01-10 | MISS: A Generative Pretraining and Finetuning Approach for Med-VQA | Jiawei Chen et.al. | 2401.05163v1 | null |
2024-01-09 | Phishing Website Detection through Multi-Model Analysis of HTML Content | Furkan Çolhak et.al. | 2401.04820v1 | null |
2024-01-10 | RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation | Mahdi Nikdan et.al. | 2401.04679v2 | null |
2024-01-09 | DepressionEmo: A novel dataset for multilabel classification of depression emotions | Abu Bakar Siddiqur Rahman et.al. | 2401.04655v1 | link |
2024-01-09 | Representative Feature Extraction During Diffusion Process for Sketch Extraction with One Example | Kwan Yun et.al. | 2401.04362v1 | null |
2024-01-09 | Private Fine-tuning of Large Language Models with Zeroth-order Optimization | Xinyu Tang et.al. | 2401.04343v1 | null |
2024-01-08 | Dr$^2$Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning | Chen Zhao et.al. | 2401.04105v1 | null |
2024-01-08 | FFSplit: Split Feed-Forward Network For Optimizing Accuracy-Efficiency Trade-off in Language Model Inference | Zirui Liu et.al. | 2401.04044v1 | null |
2024-01-08 | TTMs: Fast Multi-level Tiny Time Mixers for Improved Zero-shot and Few-shot Forecasting of Multivariate Time Series | Vijay Ekambaram et.al. | 2401.03955v1 | null |
2024-01-08 | TeleChat Technical Report | Zihan Wang et.al. | 2401.03804v1 | null |
2024-01-08 | Anatomy of Neural Language Models | Majd Saleh et.al. | 2401.03797v1 | link |
2024-01-07 | Transfer the linguistic representations from TTS to accent conversion with non-parallel data | Xi Chen et.al. | 2401.03538v1 | null |
2024-01-05 | Locally Adaptive Neural 3D Morphable Models | Michail Tarasiou et.al. | 2401.02937v1 | link |
2024-01-05 | MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance | Renjie Pi et.al. | 2401.02906v1 | link |
2024-01-05 | Pheme: Efficient and Conversational Speech Generation | Paweł Budzianowski et.al. | 2401.02839v1 | null |
2024-01-05 | Fus-MAE: A cross-attention-based data fusion approach for Masked Autoencoders in remote sensing | Hugo Chan-To-Hing et.al. | 2401.02764v1 | null |
2024-01-05 | Detection and Classification of Diabetic Retinopathy using Deep Learning Algorithms for Segmentation to Facilitate Referral Recommendation for Test and Treatment Prediction | Manoj S H et.al. | 2401.02759v1 | link |
2024-01-05 | MAMI: Multi-Attentional Mutual-Information for Long Sequence Neuron Captioning | Alfirsa Damasyifa Fauzulhaq et.al. | 2401.02744v1 | null |
2024-01-05 | Synergistic Formulaic Alpha Generation for Quantitative Trading based on Reinforcement Learning | Hong-Gi Shin et.al. | 2401.02710v1 | null |
2024-01-05 | Benchmarking PathCLIP for Pathology Image Analysis | Sunyi Zheng et.al. | 2401.02651v1 | null |
2024-01-05 | MOODv2: Masked Image Modeling for Out-of-Distribution Detection | Jingyao Li et.al. | 2401.02611v1 | null |
2024-01-04 | Vulnerabilities Unveiled: Adversarially Attacking a Multimodal Vision Langauge Model for Pathology Imaging | Jai Prakash Veerla et.al. | 2401.02565v1 | null |
2024-01-04 | LLaMA Pro: Progressive LLaMA with Block Expansion | Chengyue Wu et.al. | 2401.02415v1 | link |
2024-01-04 | TinyLlama: An Open-Source Small Language Model | Peiyuan Zhang et.al. | 2401.02385v1 | link |
2024-01-04 | DIALIGHT: Lightweight Multilingual Development and Evaluation of Task-Oriented Dialogue Systems with Large Language Models | Songbo Hu et.al. | 2401.02208v1 | null |
2024-01-04 | Location Aware Modular Biencoder for Tourism Question Answering | Haonan Li et.al. | 2401.02187v1 | link |
2024-01-04 | SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment | Ziping Ma et.al. | 2401.02137v1 | null |
2024-01-04 | Text2MDT: Extracting Medical Decision Trees from Medical Texts | Wei Zhu et.al. | 2401.02034v1 | null |
2024-01-03 | Revisiting Zero-Shot Abstractive Summarization in the Era of Large Language Models from the Perspective of Position Bias | Anshuman Chhabra et.al. | 2401.01989v1 | link |
2024-01-03 | The Power of Training: How Different Neural Network Setups Influence the Energy Demand | Daniel Geißler et.al. | 2401.01851v1 | null |
2024-01-03 | FullLoRA-AT: Efficiently Boosting the Robustness of Pretrained Vision Transformers | Zheng Yuan et.al. | 2401.01752v1 | null |
2024-01-03 | De-Confusing Pseudo-Labels in Source-Free Domain Adaptation | Idit Diamant et.al. | 2401.01650v1 | null |
2024-01-03 | Towards a Foundation Purchasing Model: Pretrained Generative Autoregression on Transaction Sequences | Piotr Skalski et.al. | 2401.01641v1 | link |
2024-01-02 | Deep-ELA: Deep Exploratory Landscape Analysis with Self-Supervised Pretrained Transformers for Single- and Multi-Objective Continuous Optimization Problems | Moritz Vinzent Seiler et.al. | 2401.01192v1 | null |
2024-01-02 | Query-Based Knowledge Sharing for Open-Vocabulary Multi-Label Classification | Xuelin Zhu et.al. | 2401.01181v1 | null |
2024-01-02 | Quokka: An Open-source Large Language Model ChatBot for Material Science | Xianjun Yang et.al. | 2401.01089v1 | link |
2024-01-02 | LLaMA Beyond English: An Empirical Study on Language Capability Transfer | Jun Zhao et.al. | 2401.01055v1 | null |
2024-01-02 | Cheetah: Natural Language Generation for 517 African Languages | Ife Adebara et.al. | 2401.01053v1 | null |
2024-01-01 | Multi-Lattice Sampling of Quantum Field Theories via Neural Operators | Bálint Máté et.al. | 2401.00828v1 | null |
2024-01-01 | Self-supervised learning for skin cancer diagnosis with limited training data | Hamish Haggerty et.al. | 2401.00692v1 | null |
2024-01-01 | Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models | Guangji Bai et.al. | 2401.00625v1 | null |
2023-12-31 | Neural Networks Against (and For) Self-Training: Classification with Small Labeled and Large Unlabeled Sets | Payam Karisani et.al. | 2401.00575v1 | link |
2023-12-31 | A Generalist FaceX via Learning Unified Facial Representation | Yue Han et.al. | 2401.00551v1 | link |
2023-12-29 | MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining | Jacob Portes et.al. | 2312.17482v1 | null |
2023-12-29 | FerKD: Surgical Label Adaptation for Efficient Distillation | Zhiqiang Shen et.al. | 2312.17473v1 | link |
2023-12-29 | Video Understanding with Large Language Models: A Survey | Yunlong Tang et.al. | 2312.17432v1 | link |
2023-12-28 | The LLM Surgeon | Tycho F. A. van der Ouderaa et.al. | 2312.17244v1 | null |
2023-12-28 | Unsupervised Universal Image Segmentation | Dantong Niu et.al. | 2312.17243v1 | link |
2023-12-28 | Visual Explanations of Image-Text Representations via Multi-Modal Information Bottleneck Attribution | Ying Wang et.al. | 2312.17174v1 | link |
2023-12-28 | Non-Vacuous Generalization Bounds for Large Language Models | Sanae Lotfi et.al. | 2312.17173v1 | null |
2023-12-28 | Restoration by Generation with Constrained Priors | Zheng Ding et.al. | 2312.17161v1 | null |
2023-12-28 | Generative AI for Math: Part I -- MathPile: A Billion-Token-Scale Pretraining Corpus for Math | Zengzhi Wang et.al. | 2312.17120v1 | link |
2023-12-29 | Length Extrapolation of Transformers: A Survey from the Perspective of Position Encoding | Liang Zhao et.al. | 2312.17044v2 | null |
2023-12-28 | 3DTINC: Time-Equivariant Non-Contrastive Learning for Predicting Disease Progression from Longitudinal OCTs | Taha Emre et.al. | 2312.16980v1 | null |
2023-12-27 | I2V-Adapter: A General Image-to-Video Adapter for Video Diffusion Models | Xun Guo et.al. | 2312.16693v1 | null |
2023-12-27 | LIP-Loc: LiDAR Image Pretraining for Cross-Modal Localization | Sai Shubodh Puligilla et.al. | 2312.16648v1 | null |
2023-12-22 | DRStageNet: Deep Learning for Diabetic Retinopathy Staging from Fundus Images | Yevgeniy Men et.al. | 2312.14891v1 | null |
2023-12-22 | Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models | Alan Chan et.al. | 2312.14751v1 | null |
2023-12-22 | Harnessing Diffusion Models for Visual Perception with Meta Prompts | Qiang Wan et.al. | 2312.14733v1 | link |
2023-12-22 | Inclusive normalization of face images to passport format | Hongliu Cao et.al. | 2312.14544v1 | null |
2023-12-22 | ADA-GAD: Anomaly-Denoised Autoencoders for Graph Anomaly Detection | Junwei He et.al. | 2312.14535v1 | null |
2023-12-22 | Generative Pretraining at Scale: Transformer-Based Encoding of Transactional Behavior for Fraud Detection | Ze Yu Zhao et.al. | 2312.14406v1 | null |
2023-12-22 | Unveiling Backbone Effects in CLIP: Exploring Representational Synergies and Variances | Cristian Rodriguez-Opazo et.al. | 2312.14400v1 | null |
2023-12-22 | StyleRetoucher: Generalized Portrait Image Retouching with GAN Priors | Wanchao Su et.al. | 2312.14389v1 | null |
2023-12-21 | Crystal Growth Characterization of WSe$_2$ Thin Film Using Machine Learning | Isaiah A. Moses et.al. | 2312.14311v1 | null |
2023-12-21 | DUSt3R: Geometric 3D Vision Made Easy | Shuzhe Wang et.al. | 2312.14132v1 | null |
2023-12-21 | VideoPoet: A Large Language Model for Zero-Shot Video Generation | Dan Kondratyuk et.al. | 2312.14125v1 | null |
2023-12-21 | Typhoon: Thai Large Language Models | Kunat Pipatanakul et.al. | 2312.13951v1 | null |
2023-12-21 | TinySAM: Pushing the Envelope for Efficient Segment Anything Model | Han Shu et.al. | 2312.13789v1 | link |
2023-12-21 | DreamTuner: Single Image is Enough for Subject-Driven Generation | Miao Hua et.al. | 2312.13691v1 | null |
2023-12-21 | DyBluRF: Dynamic Deblurring Neural Radiance Fields for Blurry Monocular Video | Minh-Quan Viet Bui et.al. | 2312.13528v1 | null |
2023-12-20 | Time is Encoded in the Weights of Finetuned Language Models | Kai Nylund et.al. | 2312.13401v1 | null |
2023-12-20 | Conditional Image Generation with Pretrained Generative Model | Rajesh Shrestha et.al. | 2312.13253v1 | null |
2023-12-20 | A 3D super-resolution of wind fields via physics-informed pixel-wise self-attention generative adversarial network | Takuya Kurihana et.al. | 2312.13212v1 | null |
2023-12-21 | Molecular Hypergraph Neural Networks | Junwu Chen et.al. | 2312.13136v2 | link |
2023-12-19 | Value Explicit Pretraining for Goal-Based Transfer Learning | Kiran Lekkala et.al. | 2312.12339v1 | null |
2023-12-19 | Parameter-Efficient Fine-Tuning Methods for Pretrained Language Models: A Critical Review and Assessment | Lingling Xu et.al. | 2312.12148v1 | null |
2023-12-19 | ZS-SRT: An Efficient Zero-Shot Super-Resolution Training Method for Neural Radiance Fields | Xiang Feng et.al. | 2312.12122v1 | null |
2023-12-19 | DMT: Comprehensive Distillation with Multiple Self-supervised Teachers | Yuang Liu et.al. | 2312.11938v1 | null |
2023-12-19 | Empowering Dual-Level Graph Self-Supervised Pretraining with Motif Discovery | Pengwei Yan et.al. | 2312.11927v1 | link |
2023-12-18 | Ultrasound Image Enhancement using CycleGAN and Perceptual Loss | Shreeram Athreya et.al. | 2312.11748v1 | link |
2023-12-18 | Evaluating Language-Model Agents on Realistic Autonomous Tasks | Megan Kinniment et.al. | 2312.11671v1 | null |
2023-12-18 | Implicit Affordance Acquisition via Causal Action-Effect Modeling in the Video Domain | Hsiu-Yu Yang et.al. | 2312.11345v1 | null |
2023-12-18 | UniDCP: Unifying Multiple Medical Vision-language Tasks via Dynamic Cross-modal Learnable Prompts | Chenlu Zhan et.al. | 2312.11171v1 | null |
2023-12-19 | Split and Rephrase with Large Language Models | David Ponce et.al. | 2312.11075v2 | null |
2023-12-17 | CEIR: Concept-based Explainable Image Representation Learning | Yan Cui et.al. | 2312.10747v1 | null |
2023-12-17 | Addressing Sample Inefficiency in Multi-View Representation Learning | Kumar Krishna Agrawal et.al. | 2312.10725v1 | null |
2023-12-17 | T2M-HiFiGPT: Generating High Quality Human Motion from Textual Descriptions with Residual Discrete Representations | Congyi Wang et.al. | 2312.10628v1 | null |
2023-12-17 | Do LLMs Work on Charts? Designing Few-Shot Prompts for Chart Question Answering and Summarization | Xuan Long Do et.al. | 2312.10610v1 | null |
2023-12-16 | Paloma: A Benchmark for Evaluating Language Model Fit | Ian Magnusson et.al. | 2312.10523v1 | null |
2023-12-16 | Enhancing Person Re-Identification through Tensor Feature Fusion | Akram Abderraouf Gharbi et.al. | 2312.10470v1 | null |
2023-12-16 | RetailKLIP : Finetuning OpenCLIP backbone using metric learning on a single GPU for Zero-shot retail product image classification | Muktabh Mayank Srivastava et.al. | 2312.10282v1 | null |
2023-12-15 | Bayesian Estimate of Mean Proper Scores for Diversity-Enhanced Active Learning | Wei Tan et.al. | 2312.10116v1 | null |
2023-12-15 | PathoDuet: Foundation Models for Pathological Slide Analysis of H&E and IHC Stains | Shengyi Hua et.al. | 2312.09894v1 | link |
2023-12-15 | Probing Pretrained Language Models with Hierarchy Properties | Jesús Lovón-Melgarejo et.al. | 2312.09670v1 | null |
2023-12-15 | Vectorizing string entries for data processing on tables: when are larger language models better? | Léo Grinsztajn et.al. | 2312.09634v1 | null |
2023-12-15 | Image Deblurring using GAN | Zhengdong Li et.al. | 2312.09496v1 | null |
2023-12-14 | Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision | Collin Burns et.al. | 2312.09390v1 | null |
2023-12-14 | Weight subcloning: direct initialization of transformers using larger pretrained ones | Mohammad Samragh et.al. | 2312.09299v1 | null |
2023-12-14 | ZeroRF: Fast Sparse View 360° Reconstruction with Zero Pretraining | Ruoxi Shi et.al. | 2312.09249v1 | null |
2023-12-14 | Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking | Jacob Eisenstein et.al. | 2312.09244v1 | null |
2023-12-14 | OccNeRF: Self-Supervised Multi-Camera Occupancy Prediction with Neural Radiance Fields | Chubin Zhang et.al. | 2312.09243v1 | link |
2023-12-14 | Reliability in Semantic Segmentation: Can We Use Synthetic Data? | Thibaut Loiseau et.al. | 2312.09231v1 | null |
2023-12-14 | WIT-UAS: A Wildland-fire Infrared Thermal Dataset to Detect Crew Assets From Aerial Views | Andrew Jong et.al. | 2312.09159v1 | link |
2023-12-14 | Exploring Transferability for Randomized Smoothing | Kai Qiu et.al. | 2312.09020v1 | null |
2023-12-14 | OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers | Han Liang et.al. | 2312.08985v1 | null |
2023-12-14 | BiPFT: Binary Pre-trained Foundation Transformer with Low-rank Estimation of Binarization Residual Polynomials | Xingrun Xing et.al. | 2312.08937v1 | link |
2023-12-14 | Guided Diffusion from Self-Supervised Diffusion Features | Vincent Tao Hu et.al. | 2312.08825v1 | null |
2023-12-14 | Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention | Kaiqiang Song et.al. | 2312.08618v1 | null |
2023-12-13 | Enhancing Robot Program Synthesis Through Environmental Context | Tianyi Chen et.al. | 2312.08250v1 | null |
2023-12-13 | Patch-wise Graph Contrastive Learning for Image Translation | Chanyong Jung et.al. | 2312.08223v1 | null |
2023-12-13 | Knowledge-Aware Artifact Image Synthesis with LLM-Enhanced Prompting and Multi-Source Supervision | Shengguang Wu et.al. | 2312.08056v1 | null |
2023-12-13 | SLJP: Semantic Extraction based Legal Judgment Prediction | Prameela Madambakam et.al. | 2312.07979v1 | null |
2023-12-13 | CoIE: Chain-of-Instruct Editing for Multi-Attribute Face Manipulation | Zhenduo Zhang et.al. | 2312.07879v1 | null |
2023-12-13 | Foundation Models in Robotics: Applications, Challenges, and the Future | Roya Firoozi et.al. | 2312.07843v1 | null |
2023-12-13 | A Foundational Multimodal Vision Language AI Assistant for Human Pathology | Ming Y. Lu et.al. | 2312.07814v1 | null |
2023-12-12 | Tell, don't show: Declarative facts influence how LLMs generalize | Alexander Meinke et.al. | 2312.07779v1 | null |
2023-12-12 | A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning | Yinmin Zhang et.al. | 2312.07685v1 | null |
2023-12-12 | GMTalker: Gaussian Mixture based Emotional talking video Portraits | Yibo Xia et.al. | 2312.07669v1 | null |
2023-12-12 | Double-Flow GAN model for the reconstruction of perceived faces from brain activities | Zihao Wang et.al. | 2312.07478v1 | null |
2023-12-12 | Cross-modal Contrastive Learning with Asymmetric Co-attention Network for Video Moment Retrieval | Love Panta et.al. | 2312.07435v1 | null |
2023-12-12 | How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation | Zhongyi Han et.al. | 2312.07424v1 | null |
2023-12-12 | ICL Markup: Structuring In-Context Learning using Soft-Token Tags | Marc-Etienne Brunet et.al. | 2312.07405v1 | null |
2023-12-12 | Benchmarking Pretrained Vision Embeddings for Near- and Duplicate Detection in Medical Images | Tuan Truong et.al. | 2312.07273v1 | null |
2023-12-12 | ProxyDet: Synthesizing Proxy Novel Classes via Classwise Mixup for Open Vocabulary Object Detection | Joonhyun Jeong et.al. | 2312.07266v1 | null |
2023-12-12 | Dynamic Corrective Self-Distillation for Better Fine-Tuning of Pretrained Models | Ibtihel Amara et.al. | 2312.07028v1 | null |
2023-12-12 | CCM: Adding Conditional Controls to Text-to-Image Consistency Models | Jie Xiao et.al. | 2312.06971v1 | null |
2023-12-12 | READ-PVLA: Recurrent Adapter with Partial Video-Language Alignment for Parameter-Efficient Transfer Learning in Low-Resource Video-Language Modeling | Thong Nguyen et.al. | 2312.06950v1 | null |
2023-12-11 | DYAD: A Descriptive Yet Abjuring Density efficient approximation to linear neural network layers | Sarin Chandy et.al. | 2312.06881v1 | link |
2023-12-11 | De novo Design of Polymer Electrolytes with High Conductivity using GPT-based and Diffusion-based Generative Models | Zhenze Yang et.al. | 2312.06470v1 | null |
2023-12-11 | PointVoxel: A Simple and Effective Pipeline for Multi-View Multi-Modal 3D Human Pose Estimation | Zhiyu Pan et.al. | 2312.06409v1 | null |
2023-12-11 | MMDesign: Multi-Modality Transfer Learning for Generative Protein Design | Jiangbin Zheng et.al. | 2312.06297v1 | null |
2023-12-11 | Medical Vision Language Pretraining: A survey | Prashant Shrestha et.al. | 2312.06224v1 | null |
2023-12-10 | NovaCOMET: Open Commonsense Foundation Models with Symbolic Knowledge Distillation | Peter West et.al. | 2312.05979v1 | null |
2023-12-10 | A Comprehensive Dataset and Automated Pipeline for Nailfold Capillary Analysis | Linxi Zhao et.al. | 2312.05930v1 | link |
2023-12-10 | Building Variable-sized Models via Learngene Pool | Boyu Shi et.al. | 2312.05743v1 | null |
2023-12-10 | Initialization Matters for Adversarial Transfer Learning | Andong Hua et.al. | 2312.05716v1 | null |
2023-12-09 | Understanding the Effect of Model Compression on Social Bias in Large Language Models | Gustavo Gonçalves et.al. | 2312.05662v1 | link |
2023-12-09 | Enhancing Medical Specialty Assignment to Patients using NLP Techniques | Chris Solomou et.al. | 2312.05585v1 | null |
2023-12-08 | SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation | Thuan Hoang Nguyen et.al. | 2312.05239v1 | null |
2023-12-08 | Datasets, Models, and Algorithms for Multi-Sensor, Multi-agent Autonomy Using AVstack | R. Spencer Hallyburton et.al. | 2312.04970v1 | null |
2023-12-08 | Zoology: Measuring and Improving Recall in Efficient Language Models | Simran Arora et.al. | 2312.04927v1 | link |
2023-12-08 | Cross-BERT for Point Cloud Pretraining | Xin Li et.al. | 2312.04891v1 | null |
2023-12-08 | Adapting Vision Transformer for Efficient Change Detection | Yang Zhao et.al. | 2312.04869v1 | null |
2023-12-08 | HuRef: HUman-REadable Fingerprint for Large Language Models | Boyi Zeng et.al. | 2312.04828v1 | null |
2023-12-07 | STraceBERT: Source Code Retrieval using Semantic Application Traces | Claudio Spiess et.al. | 2312.04731v1 | null |
2023-12-07 | Simul-LLM: A Framework for Exploring High-Quality Simultaneous Translation with Large Language Models | Victor Agostinelli et.al. | 2312.04691v1 | null |
2023-12-07 | ConVRT: Consistent Video Restoration Through Turbulence with Test-time Optimization of Neural Video Representations | Haoming Cai et.al. | 2312.04679v1 | null |
2023-12-07 | On Sarcasm Detection with OpenAI GPT-based Models | Montgomery Gole et.al. | 2312.04642v1 | null |
2023-12-07 | Intelligent Anomaly Detection for Lane Rendering Using Transformer with Self-Supervised Pre-Training and Customized Fine-Tuning | Yongqi Dong et.al. | 2312.04398v1 | null |
2023-12-07 | Multi-View Unsupervised Image Generation with Cross Attention Guidance | Llukman Cerkezi et.al. | 2312.04337v1 | null |
2023-12-07 | Diffusing Colors: Image Colorization with Text Guided Diffusion | Nir Zabari et.al. | 2312.04145v1 | null |
2023-12-07 | Instance Tracking in 3D Scenes from Egocentric Videos | Yunhan Zhao et.al. | 2312.04117v1 | link |
2023-12-07 | Enhancing the Rationale-Input Alignment for Self-explaining Rationalization | Wei Liu et.al. | 2312.04103v1 | null |
2023-12-05 | DiffusionAtlas: High-Fidelity Consistent Diffusion Video Editing | Shao-Yu Chang et.al. | 2312.03772v1 | null |
2023-12-06 | Blueprinting the Future: Automatic Item Categorization using Hierarchical Zero-Shot and Few-Shot Classifiers | Ting Wang et.al. | 2312.03561v1 | null |
2023-12-06 | PneumoLLM: Harnessing the Power of Large Language Model for Pneumoconiosis Diagnosis | Meiyue Song et.al. | 2312.03490v1 | link |
2023-12-06 | Molecule Joint Auto-Encoding: Trajectory Pretraining with 2D and 3D Diffusion | Weitao Du et.al. | 2312.03475v1 | null |
2023-12-05 | Leveraging Laryngograph Data for Robust Voicing Detection in Speech | Yixuan Zhang et.al. | 2312.03129v1 | link |
2023-12-05 | DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control | Yuru Jia et.al. | 2312.03048v1 | null |
2023-12-05 | MagicStick: Controllable Video Editing via Control Handle Transformations | Yue Ma et.al. | 2312.03047v1 | link |
2023-12-05 | Zero-Shot Point Cloud Registration | Weijie Wang et.al. | 2312.03032v1 | null |
2023-12-05 | WhisBERT: Multimodal Text-Audio Language Modeling on 100M Words | Lukas Wolf et.al. | 2312.02931v1 | null |
2023-12-05 | Rare Galaxy Classes Identified In Foundation Model Representations | Mike Walmsley et.al. | 2312.02910v1 | null |
2023-12-05 | Large Knowledge Model: Perspectives and Challenges | Huajun Chen et.al. | 2312.02706v1 | null |
2023-12-05 | Prompt2NeRF-PIL: Fast NeRF Generation via Pretrained Implicit Latent | Jianmeng Liu et.al. | 2312.02568v1 | null |
2023-12-05 | Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation | Shanshan Zhong et.al. | 2312.02439v1 | link |
2023-12-05 | Visually Grounded Language Learning: a review of language games, datasets, tasks, and models | Alessandro Suglia et.al. | 2312.02431v1 | null |
2023-12-05 | Efficient Online Data Mixing For Language Model Pre-Training | Alon Albalak et.al. | 2312.02406v1 | null |
2023-12-04 | FaultFormer: Transformer-based Prediction of Bearing Faults | Anthony Zhou et.al. | 2312.02380v1 | null |
2023-12-04 | Rejuvenating image-GPT as Strong Visual Representation Learners | Sucheng Ren et.al. | 2312.02147v1 | link |
2023-12-04 | Object Recognition as Next Token Prediction | Kaiyu Yue et.al. | 2312.02142v1 | link |
2023-12-04 | TPPoet: Transformer-Based Persian Poem Generation using Minimal Data and Advanced Decoding Techniques | Amir Panahandeh et.al. | 2312.02125v1 | null |
2023-12-04 | Open-DDVM: A Reproduction and Extension of Diffusion Model for Optical Flow Estimation | Qiaole Dong et.al. | 2312.01746v1 | link |
2023-12-04 | Data Management For Large Language Models: A Survey | Zige Wang et.al. | 2312.01700v1 | null |
2023-12-04 | SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference | Feng Wang et.al. | 2312.01597v1 | null |
2023-12-04 | APoLLo: Unified Adapter and Prompt Learning for Vision Language Models | Sanjoy Chowdhury et.al. | 2312.01564v1 | null |
2023-12-03 | Effectively Fine-tune to Improve Large Multimodal Models for Radiology Report Generation | Yuzhe Lu et.al. | 2312.01504v1 | null |
2023-12-03 | Improving In-Context Learning in Diffusion Models with Visual Context-Modulated Prompts | Tianqi Chen et.al. | 2312.01408v1 | null |
2023-12-03 | Stable Messenger: Steganography for Message-Concealed Image Generation | Quang Nguyen et.al. | 2312.01284v1 | null |
2023-12-01 | Mamba: Linear-Time Sequence Modeling with Selective State Spaces | Albert Gu et.al. | 2312.00752v1 | null |
2023-12-01 | GIFT: Generative Interpretable Fine-Tuning Transformers | Chinmay Savadikar et.al. | 2312.00700v1 | link |
2023-12-01 | Nonparametric Variational Regularisation of Pretrained Transformers | Fabio Fehr et.al. | 2312.00662v1 | null |
2023-12-01 | Summarization-based Data Augmentation for Document Classification | Yueguan Wang et.al. | 2312.00513v1 | link |
2023-12-01 | PyraTrans: Learning Attention-Enriched Multi-Scale Pyramid Network from Pre-Trained Transformers for Effective Malicious URL Detection | Ruitong Liu et.al. | 2312.00508v1 | null |
2023-12-01 | On the Out-Of-Distribution Robustness of Self-Supervised Representation Learning for Phonocardiogram Signals | Aristotelis Ballas et.al. | 2312.00502v1 | link |
2023-12-01 | Towards Generalizable Referring Image Segmentation via Target Prompt and Visual Coherence | Yajie Liu et.al. | 2312.00452v1 | null |
2023-12-01 | Dolphins: Multimodal Language Model for Driving | Yingzi Ma et.al. | 2312.00438v1 | null |
2023-12-01 | PEFTDebias : Capturing debiasing information using PEFTs | Sumit Agarwal et.al. | 2312.00434v1 | null |
2023-12-01 | SynFundus: Generating a synthetic fundus images dataset with millions of samples and multi-disease annotations | Fangxin Shang et.al. | 2312.00377v1 | null |
2023-11-30 | Initializing Models with Larger Ones | Zhiqiu Xu et.al. | 2311.18823v1 | link |
2023-11-30 | ElasticDiffusion: Training-free Arbitrary Size Image Generation | Moayed Haji-Ali et.al. | 2311.18822v1 | link |
2023-11-30 | ArthModel: Enhance Arithmetic Skills to Large Language Model | Yingdi Guo et.al. | 2311.18609v1 | null |
2023-11-30 | Dataset Distillation via the Wasserstein Metric | Haoyang Liu et.al. | 2311.18531v1 | null |
2023-11-30 | MV-CLIP: Multi-View CLIP for Zero-shot 3D Shape Recognition | Dan Song et.al. | 2311.18402v1 | null |
2023-11-30 | Transfer Learning across Different Chemical Domains: Virtual Screening of Organic Materials with Deep Learning Models Pretrained on Small Molecule and Chemical Reaction Data | Chengwei Zhang et.al. | 2311.18377v1 | null |
2023-11-30 | Hubness Reduction Improves Sentence-BERT Semantic Spaces | Beatrix M. G. Nielsen et.al. | 2311.18364v1 | link |
2023-11-30 | OmniMotionGPT: Animal Motion Generation with Limited Data | Zhangsihao Yang et.al. | 2311.18303v1 | null |
2023-11-30 | HKUST at SemEval-2023 Task 1: Visual Word Sense Disambiguation with Context Augmentation and Visual Assistance | Zhuohao Yin et.al. | 2311.18273v1 | link |
2023-11-30 | LLVMs4Protest: Harnessing the Power of Large Language and Vision Models for Deciphering Protests in the News | Yongjun Zhang et.al. | 2311.18241v1 | link |
2023-11-29 | A Simple Recipe for Language-guided Domain Generalized Segmentation | Mohammad Fahes et.al. | 2311.17922v1 | null |
2023-11-29 | Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation | Shuangrui Ding et.al. | 2311.17893v1 | null |
2023-11-29 | Evaluating VLMs for Score-Based, Multi-Probe Annotation of 3D Objects | Rishabh Kabra et.al. | 2311.17851v1 | null |
2023-11-29 | SPiC-E : Structural Priors in 3D Diffusion Models using Cross Entity Attention | Etai Sella et.al. | 2311.17834v1 | null |
2023-11-29 | DAP: Domain-aware Prompt Learning for Vision-and-Language Navigation | Ting Liu et.al. | 2311.17812v1 | null |
2023-11-29 | PillarNeSt: Embracing Backbone Scaling and Pretraining for Pillar-based 3D Object Detection | Weixin Mao et.al. | 2311.17770v1 | null |
2023-11-29 | SAMPro3D: Locating SAM Prompts in 3D for Zero-Shot Scene Segmentation | Mutian Xu et.al. | 2311.17707v1 | null |
2023-11-29 | Vulnerability of Automatic Identity Recognition to Audio-Visual Deepfakes | Pavel Korshunov et.al. | 2311.17655v1 | null |
2023-11-29 | Continual Learning with Low Rank Adaptation | Martin Wistuba et.al. | 2311.17601v1 | null |
2023-11-29 | HiDiffusion: Unlocking High-Resolution Creativity and Efficiency in Low-Resolution Trained Diffusion Models | Shen Zhang et.al. | 2311.17528v1 | null |
2023-11-28 | Self-Supervised Motion Magnification by Backpropagating Through Optical Flow | Zhaoying Pan et.al. | 2311.17056v1 | null |
2023-11-28 | Surf-D: High-Quality Surface Generation for Arbitrary Topologies using Diffusion Models | Zhengming Yu et.al. | 2311.17050v1 | null |
2023-11-28 | MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training | Pavan Kumar Anasosalu Vasu et.al. | 2311.17049v1 | null |
2023-11-28 | Debiasing Multimodal Models via Causal Information Minimization | Vaidehi Patil et.al. | 2311.16941v1 | link |
2023-11-28 | LLaFS: When Large-Language Models Meet Few-Shot Segmentation | Lanyun Zhu et.al. | 2311.16926v1 | link |
2023-11-28 | The Falcon Series of Open Language Models | Ebtesam Almazrouei et.al. | 2311.16867v1 | null |
2023-11-28 | As-Plausible-As-Possible: Plausibility-Aware Mesh Deformation Using 2D Diffusion Priors | Seungwoo Yoo et.al. | 2311.16739v1 | null |
2023-11-28 | CLAP: Contrastive Learning with Augmented Prompts for Robustness on Pretrained Vision-Language Models | Yichao Cai et.al. | 2311.16445v1 | null |
2023-11-28 | Text-Driven Image Editing via Learnable Regions | Yuanze Lin et.al. | 2311.16432v1 | link |
2023-11-28 | Manifold Preserving Guided Diffusion | Yutong He et.al. | 2311.16424v1 | null |
2023-11-27 | On Bringing Robots Home | Nur Muhammad Mahi Shafiullah et.al. | 2311.16098v1 | link |
2023-11-27 | ViT-Lens-2: Gateway to Omni-modal Intelligence | Weixian Lei et.al. | 2311.16081v1 | link |
2023-11-27 | MEDITRON-70B: Scaling Medical Pretraining for Large Language Models | Zeming Chen et.al. | 2311.16079v1 | link |
2023-11-27 | Exploring Attribute Variations in Style-based GANs using Diffusion Models | Rishubh Parihar et.al. | 2311.16052v1 | null |
2023-11-27 | Sparsify-then-Classify: From Internal Neurons of Large Language Models To Efficient Text Classifiers | Yilun Liu et.al. | 2311.15983v1 | null |
2023-11-27 | YUAN 2.0: A Large Language Model with Localized Filtering-based Attention | Shaohua Wu et.al. | 2311.15786v1 | link |
2023-11-27 | Enhancing Diffusion Models with Text-Encoder Reinforcement Learning | Chaofeng Chen et.al. | 2311.15657v1 | link |
2023-11-27 | ET3D: Efficient Text-to-3D Generation via Multi-View Distillation | Yiming Chen et.al. | 2311.15561v1 | null |
2023-11-27 | Dataset Distillation in Latent Space | Yuxuan Duan et.al. | 2311.15547v1 | null |
2023-11-27 | AerialBooth: Mutual Information Guidance for Text Controlled Aerial View Synthesis from a Single Image | Divya Kothandaraman et.al. | 2311.15478v1 | null |
2023-11-24 | Calibrated Language Models Must Hallucinate | Adam Tauman Kalai et.al. | 2311.14648v1 | null |
2023-11-24 | tinyCLAP: Distilling Constrastive Language-Audio Pretrained Models | Francesco Paissan et.al. | 2311.14517v1 | null |
2023-11-24 | LLamol: A Dynamic Multi-Conditional Generative Transformer for De Novo Molecular Design | Niklas Dobberstein et.al. | 2311.14407v1 | null |
2023-11-24 | ÚFAL CorPipe at CRAC 2023: Larger Context Improves Multilingual Coreference Resolution | Milan Straka et.al. | 2311.14391v1 | null |
2023-11-24 | Binarized 3D Whole-body Human Mesh Recovery | Zhiteng Li et.al. | 2311.14323v1 | link |
2023-11-24 | ZeroPS: High-quality Cross-modal Knowledge Transfer for Zero-Shot 3D Part Segmentation | Yuheng Xue et.al. | 2311.14262v1 | null |
2023-11-23 | Learning to Solve Inverse Problems for Perceptual Sound Matching | Han Han et.al. | 2311.14213v1 | null |
2023-11-23 | Hardware Resilience Properties of Text-Guided Image Classifiers | Syed Talal Wasim et.al. | 2311.14062v1 | link |
2023-11-23 | Dialogue Quality and Emotion Annotations for Customer Support Conversations | John Mendonça et.al. | 2311.13910v1 | link |
2023-11-23 | General Phrase Debiaser: Debiasing Masked Language Models at a Multi-Token Level | Bingkang Shi et.al. | 2311.13892v1 | link |
2023-11-22 | Medical Image Retrieval Using Pretrained Embeddings | Farnaz Khun Jush et.al. | 2311.13547v1 | null |
2023-11-22 | Revisiting Machine Learning based Test Case Prioritization for Continuous Integration | Yifan Zhao et.al. | 2311.13413v1 | link |
2023-11-22 | High-Quality Face Caricature via Style Translation | Lamyanba Laishram et.al. | 2311.13338v1 | null |
2023-11-22 | FedFN: Feature Normalization for Alleviating Data Heterogeneity Problem in Federated Learning | Seongyoon Kim et.al. | 2311.13267v1 | null |
2023-11-22 | On the Calibration of Large Language Models and Alignment | Chiwei Zhu et.al. | 2311.13240v1 | null |
2023-11-22 | Volumetric Reconstruction Resolves Off-Resonance Artifacts in Static and Dynamic PROPELLER MRI | Annesha Ghosh et.al. | 2311.13177v1 | link |
2023-11-22 | GENET: Unleashing the Power of Side Information for Recommendation via Hypergraph Pre-training | Yang Li et.al. | 2311.13121v1 | null |
2023-11-21 | Diffusion Model Alignment Using Direct Preference Optimization | Bram Wallace et.al. | 2311.12908v1 | null |
2023-11-21 | Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks | Samyak Jain et.al. | 2311.12786v1 | null |
2023-11-21 | Oasis: Data Curation and Assessment System for Pretraining of Large Language Models | Tong Zhou et.al. | 2311.12537v1 | link |
2023-11-21 | Adapting pretrained speech model for Mandarin lyrics transcription and alignment | Jun-You Wang et.al. | 2311.12488v1 | null |
2023-11-21 | PhayaThaiBERT: Enhancing a Pretrained Thai Language Model with Unassimilated Loanwords | Panyut Sriwirote et.al. | 2311.12475v1 | null |
2023-11-21 | Malicious URL Detection via Pretrained Language Model Guided Multi-Level Feature Attention Network | Ruitong Liu et.al. | 2311.12372v1 | null |
2023-11-21 | A Supervised Contrastive Learning Pretrain-Finetune Approach for Time Series | Trang H. Tran et.al. | 2311.12290v1 | null |
2023-11-21 | ATLANTIC: Structure-Aware Retrieval-Augmented Language Model for Interdisciplinary Science | Sai Munikoti et.al. | 2311.12289v1 | null |
2023-11-21 | Equipping Pretrained Unconditional Music Transformers with Instrument and Genre Controls | Weihan Xu et.al. | 2311.12257v1 | null |
2023-11-20 | LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning | Han Guo et.al. | 2311.12023v1 | link |
2023-11-20 | Context-aware Neural Machine Translation for English-Japanese Business Scene Dialogues | Sumire Honda et.al. | 2311.11976v1 | link |
2023-11-20 | Adaptive Training Distributions with Scalable Online Bilevel Optimization | David Grangier et.al. | 2311.11973v1 | null |
2023-11-20 | LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge | Gongwei Chen et.al. | 2311.11860v1 | null |
2023-11-20 | Efficient Grammatical Error Correction Via Multi-Task Training and Optimized Training Schedule | Andrey Bout et.al. | 2311.11813v1 | null |
2023-11-20 | KBioXLM: A Knowledge-anchored Biomedical Multilingual Pretrained Language Model | Lei Geng et.al. | 2311.11564v1 | link |
2023-11-20 | A Multi-Center Study on the Adaptability of a Shared Foundation Model for Electronic Health Records | Lin Lawrence Guo et.al. | 2311.11483v1 | null |
2023-11-19 | Self-Supervised Pretraining for Heterogeneous Hypergraph Neural Networks | Abdalgader Abubaker et.al. | 2311.11368v1 | null |
2023-11-19 | Pair-wise Layer Attention with Spatial Masking for Video Prediction | Ping Li et.al. | 2311.11289v1 | link |
2023-11-19 | Uncertainty quantification for noisy inputs-outputs in physics-informed neural networks and neural operators | Zongren Zou et.al. | 2311.11262v1 | null |
2023-11-17 | Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2 | Hamish Ivison et.al. | 2311.10702v1 | null |
2023-11-17 | Bias A-head? Analyzing Bias in Transformer-Based Language Model Attention Heads | Yi Yang et.al. | 2311.10395v1 | null |
2023-11-17 | Leveraging Function Space Aggregation for Federated Learning at Scale | Nikita Dhawan et.al. | 2311.10291v1 | null |
2023-11-17 | Diagnosing and Debiasing Corpus-Based Political Bias and Insults in GPT2 | Ambri Ma et.al. | 2311.10266v1 | null |
2023-11-16 | Latent Feature-based Data Splits to Improve Generalisation Evaluation: A Hate Speech Detection Case Study | Maike Züfle et.al. | 2311.10236v1 | link |
2023-11-16 | Self-supervised learning of multi-omics embeddings in the low-label, high-data regime | Christian John Hurry et.al. | 2311.09962v1 | null |
2023-11-16 | Overcoming Data Scarcity in Biomedical Imaging with a Foundational Multi-Task Model | Raphael Schäfer et.al. | 2311.09847v1 | null |
2023-11-16 | Investigating Data Contamination in Modern Benchmarks for Large Language Models | Chunyuan Deng et.al. | 2311.09783v1 | null |
2023-11-16 | Back to Basics: A Simple Recipe for Improving Out-of-Domain Retrieval in Dense Encoders | Hyunji Lee et.al. | 2311.09765v1 | link |
2023-11-16 | Translation Aligned Sentence Embeddings for Turkish Language | Eren Unlu et.al. | 2311.09748v1 | null |
2023-11-16 | Whispers of Doubt Amidst Echoes of Triumph in NLP Robustness | Ashim Gupta et.al. | 2311.09694v1 | null |
2023-11-16 | Augmenting Unsupervised Reinforcement Learning with Self-Reference | Andrew Zhao et.al. | 2311.09692v1 | null |
2023-11-16 | Evolving Domain Adaptation of Pretrained Language Models for Text Classification | Yun-Shiuan Chuang et.al. | 2311.09661v1 | null |
2023-11-16 | Efficient End-to-End Visual Document Understanding with Rationale Distillation | Wang Zhu et.al. | 2311.09612v1 | null |
2023-11-16 | Enchancing Semi-Supervised Learning for Extractive Summarization with an LLM-based pseudolabeler | Gaurav Sahu et.al. | 2311.09559v1 | null |
2023-11-15 | Single-Image 3D Human Digitization with Shape-Guided Diffusion | Badour AlBahar et.al. | 2311.09221v1 | null |
2023-11-15 | TableLlama: Towards Open Large Generalist Models for Tables | Tianshu Zhang et.al. | 2311.09206v1 | null |
2023-11-15 | Do Localization Methods Actually Localize Memorized Data in LLMs? | Ting-Yun Chang et.al. | 2311.09060v1 | null |
2023-11-15 | Data Similarity is Not Enough to Explain Language Model Performance | Gregory Yauney et.al. | 2311.09006v1 | link |
2023-11-15 | Incremental Object-Based Novelty Detection with Feedback Loop | Simone Caldarella et.al. | 2311.09004v1 | null |
2023-11-15 | OFA: A Framework of Initializing Unseen Subword Embeddings for Efficient Large-scale Multilingual Continued Pretraining | Yihong Liu et.al. | 2311.08849v1 | null |
2023-11-15 | An Eye on Clinical BERT: Investigating Language Model Generalization for Diabetic Eye Disease Phenotyping | Keith Harrigian et.al. | 2311.08687v1 | null |
2023-11-15 | Multistage Collaborative Knowledge Distillation from Large Language Models | Jiachen Zhao et.al. | 2311.08640v1 | null |
2023-11-14 | Unsupervised segmentation of irradiation$\unicode{x2010}$induced order$\unicode{x2010}$disorder phase transitions in electron microscopy | Arman H Ter-Petrosyan et.al. | 2311.08585v1 | null |
2023-11-14 | UT5: Pretraining Non autoregressive T5 with unrolled denoising | Mahmoud G. Salem et.al. | 2311.08552v1 | null |
2023-11-14 | Open-vocabulary keyword spotting in any language through multilingual contrastive speech-phoneme pretraining | Jian Zhu et.al. | 2311.08323v1 | null |
2023-11-14 | ARTEMIS: Using GANs with Multiple Discriminators to Generate Art | James Baker et.al. | 2311.08278v1 | null |
2023-11-14 | Investigating the Encoding of Words in BERT's Neurons using Feature Textualization | Tanja Baeumel et.al. | 2311.08240v1 | null |
2023-11-14 | Unlock the Power: Competitive Distillation for Multi-Modal Large Language Models | Xinwei Li et.al. | 2311.08213v1 | null |
2023-11-14 | A Survey on Language Models for Code | Ziyin Zhang et.al. | 2311.07989v1 | link |
2023-11-14 | Test-Time Training for Semantic Segmentation with Output Contrastive Loss | Yunlong Zhang et.al. | 2311.07877v1 | link |
2023-11-14 | Probing clustering in neural network representations | Thao Nguyen et.al. | 2311.07864v1 | null |
2023-11-14 | Overview of the TREC 2023 Product Product Search Track | Daniel Campos et.al. | 2311.07861v1 | null |
2023-11-14 | Learning Mutually Informed Representations for Characters and Subwords | Yilin Wang et.al. | 2311.07853v1 | null |
2023-11-13 | IruMozhi: Automatically classifying diglossia in Tamil | Kabilan Prasanna et.al. | 2311.07804v1 | null |
2023-11-13 | Masked Face Dataset Generation and Masked Face Recognition | Rui Cai et.al. | 2311.07475v1 | link |
2023-11-13 | Are We Falling in a Middle-Intelligence Trap? An Analysis and Mitigation of the Reversal Curse | Ang Lv et.al. | 2311.07468v1 | null |
2023-11-13 | Language Grounded QFormer for Efficient Vision Language Understanding | Moulik Choraria et.al. | 2311.07449v1 | null |
2023-11-13 | Hallucination Augmented Recitations for Language Models | Abdullatif Köksal et.al. | 2311.07424v1 | null |
2023-11-13 | Fine-Tuning the Retrieval Mechanism for Tabular Deep Learning | Felix den Breejen et.al. | 2311.07343v1 | null |
2023-11-13 | On Elastic Language Models | Chen Zhang et.al. | 2311.07204v1 | null |
2023-11-13 | Developing a Named Entity Recognition Dataset for Tagalog | Lester James V. Miranda et.al. | 2311.07161v1 | null |
2023-11-13 | SpectralGPT: Spectral Foundation Model | Danfeng Hong et.al. | 2311.07113v1 | null |
2023-11-13 | ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models | Ilker Kesen et.al. | 2311.07022v1 | null |
2023-11-12 | Concept-wise Fine-tuning Matters in Preventing Negative Transfer | Yunqiao Yang et.al. | 2311.06868v1 | null |
2023-11-10 | BanglaBait: Semi-Supervised Adversarial Approach for Clickbait Detection on Bangla Clickbait Dataset | Md. Motahar Mahtab et.al. | 2311.06204v1 | link |
2023-11-10 | FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores | Daniel Y. Fu et.al. | 2311.05908v1 | null |
2023-11-10 | AI-native Interconnect Framework for Integration of Large Language Model Technologies in 6G Systems | Sasu Tarkoma et.al. | 2311.05842v1 | null |
2023-11-09 | Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in Clutter | Georgios Tziafas et.al. | 2311.05779v1 | link |
2023-11-09 | Efficiently Adapting Pretrained Language Models To New Languages | Zoltan Csaki et.al. | 2311.05741v1 | null |
2023-11-09 | Window Attention is Bugged: How not to Interpolate Position Embeddings | Daniel Bolya et.al. | 2311.05613v1 | null |
2023-11-09 | Mirror: A Universal Framework for Various Information Extraction Tasks | Tong Zhu et.al. | 2311.05419v1 | link |
2023-11-09 | Improving Hand Recognition in Uncontrolled and Uncooperative Environments using Multiple Spatial Transformers and Loss Functions | Wojciech Michal Matkowski et.al. | 2311.05383v1 | null |
2023-11-09 | ConRad: Image Constrained Radiance Fields for 3D Generation from a Single Image | Senthil Purushwalkam et.al. | 2311.05230v1 | null |
2023-11-08 | Interpreting Pretrained Language Models via Concept Bottlenecks | Zhen Tan et.al. | 2311.05014v1 | null |
2023-11-08 | Lightweight Diffusion Models with Distillation-Based Block Neural Architecture Search | Siao Tang et.al. | 2311.04950v1 | null |
2023-11-08 | Beyond Size: How Gradients Shape Pruning Decisions in Large Language Models | Rocktim Jyoti Das et.al. | 2311.04902v1 | link |
2023-11-08 | Towards Few-Annotation Learning in Computer Vision: Application to Image Classification and Object Detection tasks | Quentin Bouniot et.al. | 2311.04888v1 | null |
2023-11-08 | DACBERT: Leveraging Dependency Agreement for Cost-Efficient Bert Pretraining | Martin Kuo et.al. | 2311.04799v1 | null |
2023-11-08 | Training CLIP models on Data from Scientific Papers | Calvin Metzger et.al. | 2311.04711v1 | link |
2023-11-08 | Army of Thieves: Enhancing Black-Box Model Extraction via Ensemble based sample selection | Akshit Jindal et.al. | 2311.04588v1 | link |
2023-11-08 | Large GPT-like Models are Bad Babies: A Closer Look at the Relationship between Linguistic Competence and Psycholinguistic Measures | Julius Steuer et.al. | 2311.04547v1 | null |
2023-11-07 | 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features | Chenfeng Xu et.al. | 2311.04391v1 | null |
2023-11-07 | Evaluating the Effectiveness of Retrieval-Augmented Large Language Models in Scientific Document Reasoning | Sai Munikoti et.al. | 2311.04348v1 | null |
2023-11-07 | Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning | Rishabh Jain et.al. | 2311.04313v1 | null |
2023-11-07 | Selective Visual Representations Improve Convergence and Generalization for Embodied AI | Ainaz Eftekhar et.al. | 2311.04193v1 | null |
2023-11-07 | Do Language Models Learn Semantics of Code? A Case Study in Vulnerability Detection | Benjamin Steenhoek et.al. | 2311.04109v1 | null |
2023-11-07 | Language Representation Projection: Can We Transfer Factual Knowledge across Languages in Multilingual Language Models? | Shaoyang Xu et.al. | 2311.03788v1 | null |
2023-11-07 | Unified Low-Resource Sequence Labeling by Sample-Aware Dynamic Sparse Finetuning | Sarkar Snigdha Sarathi Das et.al. | 2311.03748v1 | link |
2023-11-07 | Analysis of the User Perception of Chatbots in Education Using A Partial Least Squares Structural Equation Modeling Approach | Md Rabiul Hasan et.al. | 2311.03636v1 | null |
2023-11-06 | Uni-O4: Unifying Online and Offline Deep Reinforcement Learning with Multi-Step On-Policy Optimization | Kun Lei et.al. | 2311.03351v1 | null |
2023-11-06 | A Foundation Model for Music Informatics | Minz Won et.al. | 2311.03318v1 | link |
2023-11-07 | S-LoRA: Serving Thousands of Concurrent LoRA Adapters | Ying Sheng et.al. | 2311.03285v2 | link |
2023-11-06 | An Efficient Self-Supervised Cross-View Training For Sentence Embedding | Peerat Limkonchotiwat et.al. | 2311.03228v1 | link |
2023-11-06 | LDM3D-VR: Latent Diffusion Model for 3D VR | Gabriela Ben Melech Stan et.al. | 2311.03226v1 | null |
2023-11-06 | A Simple yet Efficient Ensemble Approach for AI-generated Text Detection | Harika Abburi et.al. | 2311.03084v1 | null |
2023-11-06 | CogVLM: Visual Expert for Pretrained Language Models | Weihan Wang et.al. | 2311.03079v1 | link |
2023-11-06 | SugarViT -- Multi-objective Regression of UAV Images with Vision Transformers and Deep Label Distribution Learning Demonstrated on Disease Severity Prediction in Sugar Beet | Maurice Günder et.al. | 2311.03076v1 | null |
2023-11-06 | Masking Hyperspectral Imaging Data with Pretrained Models | Elias Arbash et.al. | 2311.03053v1 | null |
2023-11-06 | The Pursuit of Human Labeling: A New Perspective on Unsupervised Learning | Artyom Gadetsky et.al. | 2311.02940v1 | link |
2023-11-03 | Leveraging Large-Scale Pretrained Vision Foundation Models for Label-Efficient 3D Point Cloud Segmentation | Shichao Dong et.al. | 2311.01989v1 | null |
2023-11-03 | Generalization of Graph-Based Active Learning Relaxation Strategies Across Materials | Xiaoxiao Wang et.al. | 2311.01987v1 | null |
2023-11-03 | The language of prompting: What linguistic properties make a prompt successful? | Alina Leidinger et.al. | 2311.01967v1 | null |
2023-11-03 | ForecastPFN: Synthetically-Trained Zero-Shot Forecasting | Samuel Dooley et.al. | 2311.01933v1 | link |
2023-11-03 | Towards Concept-Aware Large Language Models | Chen Shani et.al. | 2311.01866v1 | null |
2023-11-03 | TCM-GPT: Efficient Pre-training of Large Language Models for Domain Adaptation in Traditional Chinese Medicine | Guoxing Yang et.al. | 2311.01786v1 | null |
2023-11-03 | Data-Free Distillation of Language Model by Text-to-Text Transfer | Zheyuan Bai et.al. | 2311.01689v1 | null |
2023-11-02 | Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models | Andy Zhou et.al. | 2311.01441v1 | link |
2023-11-02 | Recognize Any Regions | Haosen Yang et.al. | 2311.01373v1 | null |
2023-11-02 | Collaborative Large Language Model for Recommender Systems | Yaochen Zhu et.al. | 2311.01343v1 | link |
2023-11-02 | Terrain-Informed Self-Supervised Learning: Enhancing Building Footprint Extraction from LiDAR Data with Limited Annotations | Anuja Vats et.al. | 2311.01188v1 | null |
2023-11-02 | Noise-Robust Fine-Tuning of Pretrained Language Models via External Guidance | Song Wang et.al. | 2311.01108v1 | null |
2023-11-02 | Expanding Expressiveness of Diffusion Models with Limited Data via Self-Distillation based Fine-Tuning | Jiwan Hur et.al. | 2311.01018v1 | null |
2023-11-02 | VideoDreamer: Customized Multi-Subject Text-to-Video Generation with Disen-Mix Finetuning | Hong Chen et.al. | 2311.00990v1 | null |
2023-11-02 | MAAIG: Motion Analysis And Instruction Generation | Wei-Hsin Yeh et.al. | 2311.00980v1 | null |
2023-11-02 | Blending Reward Functions via Few Expert Demonstrations for Faithful and Accurate Knowledge-Grounded Dialogue Generation | Wanyu Du et.al. | 2311.00953v1 | null |
2023-11-02 | Learning Defect Prediction from Unrealistic Data | Kamel Alrashedy et.al. | 2311.00931v1 | null |
2023-11-01 | Crosslingual Retrieval Augmented In-context Learning for Bangla | Xiaoqian Li et.al. | 2311.00587v1 | null |
2023-11-01 | MNN: Mixed Nearest-Neighbors for Self-Supervised Learning | Chen Peng et.al. | 2311.00562v1 | link |
2023-11-01 | Form follows Function: Text-to-Text Conditional Graph Generation based on Functional Requirements | Peter A. Zachares et.al. | 2311.00444v1 | null |
2023-11-01 | An analysis of large speech models-based representations for speech emotion recognition | Adrian Bogdan Stânea et.al. | 2311.00394v1 | null |
2023-11-01 | fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for Multi-Subject Brain Activity Decoding | Xuelin Qian et.al. | 2311.00342v1 | null |
2023-11-01 | Syntactic Inductive Bias in Transformer Language Models: Especially Helpful for Low-Resource Languages? | Luke Gessler et.al. | 2311.00268v1 | null |
2023-11-01 | ChatGPT-Powered Hierarchical Comparisons for Image Classification | Zhiyuan Ren et.al. | 2311.00206v1 | link |
2023-10-31 | Object-centric Video Representation for Long-term Action Anticipation | Ce Zhang et.al. | 2311.00180v1 | link |
2023-10-31 | ChipNeMo: Domain-Adapted LLMs for Chip Design | Mingjie Liu et.al. | 2311.00176v1 | null |
2023-10-31 | Neuroformer: Multimodal and Multitask Generative Pretraining for Brain Data | Antonis Antoniades et.al. | 2311.00136v1 | null |
2023-10-31 | Vanishing Gradients in Reinforcement Finetuning of Language Models | Noam Razin et.al. | 2310.20703v1 | null |
2023-10-31 | The Unreasonable Effectiveness of Random Target Embeddings for Continuous-Output Neural Machine Translation | Evgeniia Tokarchuk et.al. | 2310.20620v1 | null |
2023-10-31 | Increasing The Performance of Cognitively Inspired Data-Efficient Language Models via Implicit Structure Building | Omar Momen et.al. | 2310.20589v1 | null |
2023-10-31 | Breaking the Token Barrier: Chunking and Convolution for Efficient Long Text Classification with BERT | Aman Jaiswal et.al. | 2310.20558v1 | null |
2023-10-31 | CapsFusion: Rethinking Image-Text Data at Scale | Qiying Yu et.al. | 2310.20550v1 | null |
2023-10-31 | HWD: A Novel Evaluation Score for Styled Handwritten Text Generation | Vittorio Pippi et.al. | 2310.20316v1 | link |
2023-10-31 | AutoMixer for Improved Multivariate Time-Series Forecasting on BizITOps Data | Santosh Palaskar et.al. | 2310.20280v1 | null |
2023-10-31 | DEPN: Detecting and Editing Privacy Neurons in Pretrained Language Models | Xinwei Wu et.al. | 2310.20138v1 | null |
2023-10-31 | Improving Prompt Tuning with Learned Prompting Layers | Wei Zhu et.al. | 2310.20127v1 | null |
2023-10-30 | GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models | Mianchu Wang et.al. | 2310.20025v1 | null |
2023-10-30 | LitCab: Lightweight Calibration of Language Models on Outputs of Varied Lengths | Xin Liu et.al. | 2310.19208v1 | link |
2023-10-29 | Deep Audio Analyzer: a Framework to Industrialize the Research on Audio Forensics | Valerio Francesco Puglisi et.al. | 2310.19081v1 | null |
2023-10-29 | Prompt-Engineering and Transformer-based Question Generation and Evaluation | Rubaba Amyeen et.al. | 2310.18867v1 | null |
2023-10-28 | Rethinking Semi-Supervised Federated Learning: How to co-train fully-labeled and fully-unlabeled client imaging data | Pramit Saha et.al. | 2310.18815v1 | null |
2023-10-28 | ProMap: Effective Bilingual Lexicon Induction via Language Model Prompting | Abdellah El Mekki et.al. | 2310.18778v1 | link |
2023-10-28 | Integration of persistent Laplacian and pre-trained transformer for protein solubility changes upon mutation | Jiahui Chen et.al. | 2310.18760v1 | null |
2023-10-28 | Probing LLMs for Joint Encoding of Linguistic Categories | Giulio Starace et.al. | 2310.18696v1 | link |
2023-10-28 | Feature Guided Masked Autoencoder for Self-supervised Learning in Remote Sensing | Yi Wang et.al. | 2310.18653v1 | link |
2023-10-28 | Local-Global Self-Supervised Visual Representation Learning | Ali Javidani et.al. | 2310.18651v1 | link |
2023-10-28 | Setting the Trap: Capturing and Defeating Backdoors in Pretrained Language Models through Honeypots | Ruixiang Tang et.al. | 2310.18633v1 | null |
2023-10-27 | Style Description based Text-to-Speech with Conditional Prosodic Layer Normalization based Diffusion GAN | Neeraj Kumar et.al. | 2310.18169v1 | null |
2023-10-27 | Does Role-Playing Chatbots Capture the Character Personalities? Assessing Personality Traits for Role-Playing Chatbots | Xintao Wang et.al. | 2310.17976v1 | link |
2023-10-27 | FaultSeg Swin-UNETR: Transformer-Based Self-Supervised Pretraining Model for Fault Recognition | Zeren Zhang et.al. | 2310.17974v1 | null |
2023-10-27 | Multivessel Coronary Artery Segmentation and Stenosis Localisation using Ensemble Learning | Muhammad Bilal et.al. | 2310.17954v1 | null |
2023-10-27 | Transformers as Graph-to-Graph Models | James Henderson et.al. | 2310.17936v1 | link |
2023-10-27 | Grid Jigsaw Representation with CLIP: A New Perspective on Image Clustering | Zijie Song et.al. | 2310.17869v1 | null |
2023-10-26 | StyleBART: Decorate Pretrained Model with Style Adapters for Unsupervised Stylistic Headline Generation | Hanqing Wang et.al. | 2310.17743v1 | null |
2023-10-26 | Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model | Karsten Roth et.al. | 2310.17653v1 | null |
2023-10-26 | InstOptima: Evolutionary Multi-objective Instruction Optimization via Large Language Model-based Instruction Operators | Heng Yang et.al. | 2310.17630v1 | link |
2023-10-26 | Proving Test Set Contamination in Black Box Language Models | Yonatan Oren et.al. | 2310.17623v1 | null |
2023-10-26 | Lil-Bevo: Explorations of Strategies for Training Language Models in More Humanlike Ways | Venkata S Govindarajan et.al. | 2310.17591v1 | link |
2023-10-26 | PAC-tuning:Fine-tuning Pretrained Language Models with PAC-driven Perturbed Gradient Descent | Guangliang Liu et.al. | 2310.17588v1 | null |
2023-10-26 | Evaluating Bias and Fairness in Gender-Neutral Pretrained Vision-and-Language Models | Laura Cabello et.al. | 2310.17530v1 | link |
2023-10-26 | Dialect Adaptation and Data Augmentation for Low-Resource ASR: TalTech Systems for the MADASR 2023 Challenge | Tanel Alumäe et.al. | 2310.17448v1 | null |
2023-10-26 | AntifakePrompt: Prompt-Tuned Vision-Language Models are Fake Image Detectors | You-Ming Chang et.al. | 2310.17419v1 | null |
2023-10-26 | CADS: Unleashing the Diversity of Diffusion Models through Condition-Annealed Sampling | Seyedmorteza Sadat et.al. | 2310.17347v1 | null |
2023-10-26 | Prototypical Contrastive Learning-based CLIP Fine-tuning for Object Re-identification | Jiachen Li et.al. | 2310.17218v1 | null |
2023-10-25 | Proposal-Contrastive Pretraining for Object Detection from Fewer Data | Quentin Bouniot et.al. | 2310.16835v1 | null |
2023-10-25 | Exploring OCR Capabilities of GPT-4V(ision) : A Quantitative and In-depth Evaluation | Yongxin Shi et.al. | 2310.16809v1 | link |
2023-10-25 | Detecting Pretraining Data from Large Language Models | Weijia Shi et.al. | 2310.16789v1 | null |
2023-10-25 | BabyStories: Can Reinforcement Learning Teach Baby Language Models to Write Better Stories? | Xingmeng Zhao et.al. | 2310.16681v1 | link |
2023-10-25 | Learning to Explain: A Model-Agnostic Framework for Explaining Black Box Models | Oren Barkan et.al. | 2310.16584v1 | link |
2023-10-25 | The Distributional Hypothesis Does Not Fully Explain the Benefits of Masked Language Model Pretraining | Ting-Rui Chiang et.al. | 2310.16261v1 | null |
2023-10-24 | Octopus: A Multitask Model and Toolkit for Arabic Natural Language Generation | AbdelRahim Elmadany et.al. | 2310.16127v1 | null |
2023-10-24 | Locally Differentially Private Document Generation Using Zero Shot Prompting | Saiteja Utpala et.al. | 2310.16111v1 | null |
2023-10-24 | A Unified, Scalable Framework for Neural Population Decoding | Mehdi Azabou et.al. | 2310.16046v1 | null |
2023-10-24 | Finetuning Offline World Models in the Real World | Yunhai Feng et.al. | 2310.16029v1 | null |
2023-10-24 | Characterizing Mechanisms for Factual Recall in Language Models | Qinan Yu et.al. | 2310.15910v1 | null |
2023-10-24 | Do Stochastic Parrots have Feelings Too? Improving Neural Detection of Synthetic Text via Emotion Recognition | Alan Cowap et.al. | 2310.15904v1 | link |
2023-10-24 | Automatic Aorta Segmentation with Heavily Augmented, High-Resolution 3-D ResUNet: Contribution to the SEG.A Challenge | Marek Wodzinski et.al. | 2310.15827v1 | null |
2023-10-24 | Discriminator Guidance for Autoregressive Diffusion Models | Filip Ekström Kelvinius et.al. | 2310.15817v1 | null |
2023-10-24 | Improving generalization in large language models by learning prefix subspaces | Louis Falissard et.al. | 2310.15793v1 | null |
2023-10-24 | Mean Teacher DETR with Masked Feature Alignment: A Robust Domain Adaptive Detection Transformer Framework | Weixi Weng et.al. | 2310.15646v1 | null |
2023-10-24 | Unveiling Multilinguality in Transformer Models: Exploring Language Specificity in Feed-Forward Networks | Sunit Bhattacharya et.al. | 2310.15552v1 | null |
2023-10-24 | Let the Pretrained Language Models "Imagine" for Short Texts Topic Modeling | Pritom Saha Akash et.al. | 2310.15420v1 | null |
2023-10-23 | FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling | Haonan Qiu et.al. | 2310.15169v1 | null |
2023-10-23 | Novel-View Acoustic Synthesis from 3D Reconstructed Rooms | Byeongjoo Ahn et.al. | 2310.15130v1 | link |
2023-10-23 | Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model | Ruoxi Shi et.al. | 2310.15110v1 | link |
2023-10-23 | E4S: Fine-grained Face Swapping via Editing With Regional GAN Inversion | Maomao Li et.al. | 2310.15081v1 | link |
2023-10-23 | SLOG: A Structural Generalization Benchmark for Semantic Parsing | Bingzhi Li et.al. | 2310.15040v1 | null |
2023-10-23 | Fast 2D Bicephalous Convolutional Autoencoder for Compressing 3D Time Projection Chamber Data | Yi Huang et.al. | 2310.15026v1 | null |
2023-10-23 | Once Upon a |
Sen Yang et.al. | 2310.14709v1 | null |
2023-10-23 | Extending Input Contexts of Language Models through Training on Segmented Sequences | Petros Karypis et.al. | 2310.14633v1 | null |
2023-10-23 | Generative Pre-trained Transformer for Vietnamese Community-based COVID-19 Question Answering | Tam Minh Vo et.al. | 2310.14602v1 | null |
2023-10-23 | The Skipped Beat: A Study of Sociopragmatic Understanding in LLMs for 64 Languages | Chiyu Zhang et.al. | 2310.14557v1 | null |
2023-10-20 | Technical Report for ICCV 2023 Visual Continual Learning Challenge: Continuous Test-time Adaptation for Semantic Segmentation | Damian Sójka et.al. | 2310.13533v1 | null |
2023-10-20 | Cache me if you Can: an Online Cost-aware Teacher-Student framework to Reduce the Calls to Large Language Models | Ilias Stogiannidis et.al. | 2310.13395v1 | null |
2023-10-20 | SILC: Improving Vision Language Pretraining with Self-Distillation | Muhammad Ferjad Naeem et.al. | 2310.13355v1 | null |
2023-10-20 | Exploring the Impact of Corpus Diversity on Financial Pretrained Language Models | Jaeyoung Choe et.al. | 2310.13312v1 | null |
2023-10-20 | Unified Pretraining for Recommendation via Task Hypergraphs | Mingdai Yang et.al. | 2310.13286v1 | link |
2023-10-20 | DPM-Solver-v3: Improved Diffusion ODE Solver with Empirical Model Statistics | Kaiwen Zheng et.al. | 2310.13268v1 | link |
2023-10-20 | On the Language Encoder of Contrastive Cross-modal Models | Mengjie Zhao et.al. | 2310.13267v1 | null |
2023-10-20 | Anomaly Detection of Command Shell Sessions based on DistilBERT: Unsupervised and Supervised Approaches | Zefang Liu et.al. | 2310.13247v1 | null |
2023-10-19 | A Car Model Identification System for Streamlining the Automobile Sales Process | Said Togru et.al. | 2310.13198v1 | null |
2023-10-19 | Do Language Models Learn about Legal Entity Types during Pretraining? | Claire Barale et.al. | 2310.13092v1 | link |
2023-10-19 | A Predictive Factor Analysis of Social Biases and Task-Performance in Pretrained Masked Language Models | Yi Zhou et.al. | 2310.12936v1 | null |
2023-10-19 | Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning | Juan Rocamonde et.al. | 2310.12921v1 | null |
2023-10-19 | A Systematic Study of Performance Disparities in Multilingual Task-Oriented Dialogue Systems | Songbo Hu et.al. | 2310.12892v1 | null |
2023-10-19 | Predicting Ovarian Cancer Treatment Response in Histopathology using Hierarchical Vision Transformers and Multiple Instance Learning | Jack Breen et.al. | 2310.12866v1 | link |
2023-10-19 | Survival of the Most Influential Prompts: Efficient Black-Box Prompt Search via Clustering and Pruning | Han Zhou et.al. | 2310.12774v1 | link |
2023-10-19 | Query-aware Long Video Localization and Relation Discrimination for Deep Video Understanding | Yuanxing Xu et.al. | 2310.12724v1 | null |
2023-10-19 | Reliable and Efficient In-Memory Fault Tolerance of Large Language Model Pretraining | Yuxin Wang et.al. | 2310.12670v1 | null |
2023-10-19 | Pretraining Language Models with Text-Attributed Heterogeneous Graphs | Tao Zou et.al. | 2310.12580v1 | link |
2023-10-19 | Not All Countries Celebrate Thanksgiving: On the Cultural Dominance in Large Language Models | Wenxuan Wang et.al. | 2310.12481v1 | null |
2023-10-19 | Enhancing High-Resolution 3D Generation through Pixel-wise Gradient Clipping | Zijie Pan et.al. | 2310.12474v1 | link |
2023-10-18 | Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture | Daniel Y. Fu et.al. | 2310.12109v1 | null |
2023-10-18 | Evaluating the Fairness of Discriminative Foundation Models in Computer Vision | Junaid Ali et.al. | 2310.11867v1 | null |
2023-10-18 | Masked Pretraining for Multi-Agent Decision Making | Jie Liu et.al. | 2310.11846v1 | null |
2023-10-18 | Subject-specific Deep Neural Networks for Count Data with High-cardinality Categorical Features | Hangbin Lee et.al. | 2310.11654v1 | null |
2023-10-18 | Systematic Assessment of Factual Knowledge in Large Language Models | Linhao Luo et.al. | 2310.11638v1 | null |
2023-10-17 | GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment | Dhruba Ghosh et.al. | 2310.11513v1 | link |
2023-10-17 | Hybrid quantum-classical graph neural networks for tumor classification in digital pathology | Anupama Ray et.al. | 2310.11353v1 | null |
2023-10-17 | Elucidating The Design Space of Classifier-Guided Diffusion Generation | Jiajun Ma et.al. | 2310.11311v1 | null |
2023-10-17 | Utilizing Weak Supervision To Generate Indonesian Conservation Dataset | Mega Fransiska et.al. | 2310.11258v1 | null |
2023-10-17 | Query2Triple: Unified Query Encoding for Answering Diverse Complex Queries over Knowledge Graphs | Yao Xu et.al. | 2310.11246v1 | link |
2023-10-17 | Leveraging Content-based Features from Multiple Acoustic Models for Singing Voice Conversion | Xueyao Zhang et.al. | 2310.11160v1 | null |
2023-10-17 | MeKB-Rec: Personal Knowledge Graph Learning for Cross-Domain Recommendation | Xin Su et.al. | 2310.11088v1 | null |
2023-10-17 | Domain Generalization Using Large Pretrained Models with Mixture-of-Adapters | Gyuseong Lee et.al. | 2310.11031v1 | null |
2023-10-16 | SD-HuBERT: Self-Distillation Induces Syllabic Organization in HuBERT | Cheol Jun Cho et.al. | 2310.10803v1 | null |
2023-10-16 | LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation | Ruiqi Wu et.al. | 2310.10769v1 | null |
2023-10-16 | BiomedJourney: Counterfactual Biomedical Image Generation by Instruction-Learning from Multimodal Patient Journeys | Yu Gu et.al. | 2310.10765v1 | null |
2023-10-16 | Interactive Task Planning with Language Models | Boyi Li et.al. | 2310.10645v1 | null |
2023-10-16 | Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models | Kevin Black et.al. | 2310.10639v1 | null |
2023-10-16 | In-Context Pretraining: Language Modeling Beyond Document Boundaries | Weijia Shi et.al. | 2310.10638v1 | null |
2023-10-16 | Llemma: An Open Language Model For Mathematics | Zhangir Azerbayev et.al. | 2310.10631v1 | link |
2023-10-16 | Video Language Planning | Yilun Du et.al. | 2310.10625v1 | null |
2023-10-16 | One For All & All For One: Bypassing Hyperparameter Tuning with Model Averaging For Cross-Lingual Transfer | Fabian David Schmidt et.al. | 2310.10532v1 | link |
2023-10-16 | Unifying Image Processing as Visual Prompting Question Answering | Yihao Liu et.al. | 2310.10513v1 | null |
2023-10-16 | Can Word Sense Distribution Detect Semantic Changes of Words? | Xiaohang Tang et.al. | 2310.10400v1 | link |
2023-10-16 | Taichi Aida et.al. | 2310.10397v1 | link | |
2023-10-16 | Cross-Lingual Consistency of Factual Knowledge in Multilingual Language Models | Jirui Qi et.al. | 2310.10378v1 | link |
2023-10-13 | PromptRE: Weakly-Supervised Document-Level Relation Extraction via Prompting-Based Data Programming | Chufan Gao et.al. | 2310.09265v1 | null |
2023-10-13 | Hypernymy Understanding Evaluation of Text-to-Image Models via WordNet Hierarchy | Anton Baryshnikov et.al. | 2310.09247v1 | link |
2023-10-13 | ClickPrompt: CTR Models are Strong Prompt Generators for Adapting Language Models to CTR Prediction | Jianghao Lin et.al. | 2310.09234v1 | null |
2023-10-13 | PaLI-3 Vision Language Models: Smaller, Faster, Stronger | Xi Chen et.al. | 2310.09199v1 | null |
2023-10-13 | Jointly-Learned Exit and Inference for a Dynamic Neural Network : JEI-DNN | Florence Regol et.al. | 2310.09163v1 | null |
2023-10-13 | UniParser: Multi-Human Parsing with Unified Correlation Representation Learning | Jiaming Chu et.al. | 2310.08984v1 | link |
2023-10-13 | Exploration with Principles for Diverse AI Supervision | Hao Liu et.al. | 2310.08899v1 | null |
2023-10-13 | Open X-Embodiment: Robotic Learning Datasets and RT-X Models | Abhishek Padalkar et.al. | 2310.08864v1 | null |
2023-10-13 | Speaking rate attention-based duration prediction for speed control TTS | Jesuraj Bandekar et.al. | 2310.08846v1 | null |
2023-10-13 | From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models | Dongsheng Jiang et.al. | 2310.08825v1 | link |
2023-10-12 | Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video | Shashanka Venkataramanan et.al. | 2310.08584v1 | null |
2023-10-12 | Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining | Licong Lin et.al. | 2310.08566v1 | null |
2023-10-12 | Do pretrained Transformers Really Learn In-context by Gradient Descent? | Lingfeng Shen et.al. | 2310.08540v1 | null |
2023-10-12 | "SegLoc": Study on Novel Visual Self-supervised Learning Scheme (Segment Localization) Tailored for Dense Prediction Tasks of Security Inspection X-ray Images | Shervin Halat et.al. | 2310.08421v1 | null |
2023-10-12 | How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression? | Jingfeng Wu et.al. | 2310.08391v1 | null |
2023-10-12 | Improving Factual Consistency for Knowledge-Grounded Dialogue Systems via Knowledge Enhancement and Alignment | Boyang Xue et.al. | 2310.08372v1 | null |
2023-10-12 | GePSAn: Generative Procedure Step Anticipation in Cooking Videos | Mohamed Ashraf Abdelsalam et.al. | 2310.08312v1 | null |
2023-10-12 | CHIP: Contrastive Hierarchical Image Pretraining | Arpit Mittal et.al. | 2310.08304v1 | null |
2023-10-12 | Expanding the Vocabulary of BERT for Knowledge Base Construction | Dong Yang et.al. | 2310.08291v1 | link |
2023-10-12 | Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting | Zijie Chen et.al. | 2310.08129v1 | null |
2023-10-11 | InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining | Boxin Wang et.al. | 2310.07713v1 | null |
2023-10-12 | Rethinking the BERT-like Pretraining for DNA Sequences | Chaoqi Liang et.al. | 2310.07644v2 | null |
2023-10-12 | Multimodal Graph Learning for Generative Tasks | Minji Yoon et.al. | 2310.07478v2 | link |
2023-10-12 | NuTime: Numerically Multi-Scaled Embedding for Large-Scale Time Series Pretraining | Chenguo Lin et.al. | 2310.07402v2 | null |
2023-10-11 | CLIP for Lightweight Semantic Segmentation | Ke Jin et.al. | 2310.07394v1 | null |
2023-10-11 | Beyond Memorization: Violating Privacy Via Inference with Large Language Models | Robin Staab et.al. | 2310.07298v1 | null |
2023-10-12 | Score Regularized Policy Optimization through Diffusion Behavior | Huayu Chen et.al. | 2310.07297v2 | link |
2023-10-11 | IBoxCLA: Towards Robust Box-supervised Segmentation of Polyp via Improved Box-dice and Contrastive Latent-anchors | Zhiwei Wang et.al. | 2310.07248v1 | null |
2023-10-11 | Crowd Counting in Harsh Weather using Image Denoising with Pix2Pix GANs | Muhammad Asif Khan et.al. | 2310.07245v1 | null |
2023-10-11 | Self-supervised Pocket Pretraining via Protein Fragment-Surroundings Alignment | Bowen Gao et.al. | 2310.07229v1 | null |
2023-10-10 | OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text | Keiran Paster et.al. | 2310.06786v1 | null |
2023-10-10 | Uni3D: Exploring Unified 3D Representation at Scale | Junsheng Zhou et.al. | 2310.06773v1 | link |
2023-10-10 | Tweedie Moment Projected Diffusions For Inverse Problems | Benjamin Boys et.al. | 2310.06721v1 | null |
2023-10-10 | Learning Multiplex Embeddings on Text-rich Networks with One Text Encoder | Bowen Jin et.al. | 2310.06684v1 | null |
2023-10-10 | Self-Supervised Representation Learning for Online Handwriting Text Classification | Pouya Mehralian et.al. | 2310.06645v1 | null |
2023-10-10 | SpikeCLIP: A Contrastive Language-Image Pretrained Spiking Neural Network | Tianlong Li et.al. | 2310.06488v1 | null |
2023-10-10 | CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model | Peng Di et.al. | 2310.06266v1 | null |
2023-10-10 | Model Tuning or Prompt Tuning? A Study of Large Language Models for Clinical Concept and Relation Extraction | Cheng Peng et.al. | 2310.06239v1 | null |
2023-10-10 | Domain Expansion via Network Adaptation for Solving Inverse Problems | Nebiyou Yismaw et.al. | 2310.06235v1 | null |
2023-10-10 | GeoLLM: Extracting Geospatial Knowledge from Large Language Models | Rohin Manvi et.al. | 2310.06213v1 | null |
2023-10-09 | TAIL: Task-specific Adapters for Imitation Learning with Large Pretrained Models | Zuxin Liu et.al. | 2310.05905v1 | null |
2023-10-09 | Planning to Go Out-of-Distribution in Offline-to-Online Reinforcement Learning | Trevor McInroe et.al. | 2310.05723v1 | null |
2023-10-09 | A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics | Kai He et.al. | 2310.05694v1 | link |
2023-10-09 | No Token Left Behind: Efficient Vision Transformer via Dynamic Token Idling | Xuwei Xu et.al. | 2310.05654v1 | null |
2023-10-09 | UAVs and Neural Networks for search and rescue missions | Hartmut Surmann et.al. | 2310.05512v1 | null |
2023-10-09 | Sentence-level Prompts Benefit Composed Image Retrieval | Yang Bai et.al. | 2310.05473v1 | link |
2023-10-09 | Augmented Embeddings for Custom Retrievals | Anirudh Khatry et.al. | 2310.05380v1 | null |
2023-10-08 | Visual Storytelling with Question-Answer Plans | Danyang Liu et.al. | 2310.05295v1 | null |
2023-10-08 | Do Large Language Models Know about Facts? | Xuming Hu et.al. | 2310.05177v1 | null |
2023-10-08 | UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model | Jiabo Ye et.al. | 2310.05126v1 | link |
2023-10-06 | Transferring speech-generic and depression-specific knowledge for Alzheimer's disease detection | Ziyun Cui et.al. | 2310.04358v1 | null |
2023-10-06 | A Comprehensive Evaluation of Large Language Models on Benchmark Biomedical Text Processing Tasks | Israt Jahan et.al. | 2310.04270v1 | null |
2023-10-06 | Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement Learning | Yinda Chen et.al. | 2310.04148v1 | link |
2023-10-06 | Robust Multimodal Learning with Missing Modalities via Parameter-Efficient Adaptation | Md Kaykobad Reza et.al. | 2310.03986v1 | null |
2023-10-05 | Hard View Selection for Contrastive Learning | Fabio Ferreira et.al. | 2310.03940v1 | null |
2023-10-05 | Bridging Low-level Geometry to High-level Concepts in Visual Servoing of Robot Manipulation Task Using Event Knowledge Graphs and Vision-Language Models | Chen Jiang et.al. | 2310.03932v1 | null |
2023-10-05 | Less is More: On the Feature Redundancy of Pretrained Models When Transferring to Few-shot Tasks | Xu Luo et.al. | 2310.03843v1 | null |
2023-10-05 | PrIeD-KIE: Towards Privacy Preserved Document Key Information Extraction | Saifullah Saifullah et.al. | 2310.03777v1 | null |
2023-10-05 | Stylist: Style-Driven Feature Ranking for Robust Novelty Detection | Stefan Smeu et.al. | 2310.03738v1 | null |
2023-10-05 | Tik-to-Tok: Translating Language Models One Token at a Time: An Embedding Initialization Strategy for Efficient Language Adaptation | François Remy et.al. | 2310.03477v1 | null |
2023-10-05 | FreeReg: Image-to-Point Cloud Registration Leveraging Pretrained Diffusion Models and Monocular Depth Estimators | Haiping Wang et.al. | 2310.03420v1 | null |
2023-10-05 | Procedural Text Mining with Large Language Models | Anisa Rula et.al. | 2310.03376v1 | link |
2023-10-05 | Benchmarking Large Language Models As AI Research Agents | Qian Huang et.al. | 2310.03302v1 | link |
2023-10-05 | SimVLG: Simple and Efficient Pretraining of Visual Language Generative Models | Yiren Jian et.al. | 2310.03291v1 | null |
2023-10-05 | Fragment-based Pretraining and Finetuning on Molecular Graphs | Kha-Dinh Luong et.al. | 2310.03274v1 | null |
2023-10-04 | On the Performance of Multimodal Language Models | Utsav Garg et.al. | 2310.03211v1 | null |
2023-10-04 | Enhancing Accuracy in Deep Learning Using Random Matrix Theory | Leonid Berlyand et.al. | 2310.03165v1 | null |
2023-10-04 | OpenMM 8: Molecular Dynamics Simulation with Machine Learning Potentials | Peter Eastman et.al. | 2310.03121v1 | null |
2023-10-04 | Retrieval meets Long Context Large Language Models | Peng Xu et.al. | 2310.03025v1 | null |
2023-10-04 | AstroCLIP: Cross-Modal Pre-Training for Astronomical Foundation Models | Francois Lanusse et.al. | 2310.03024v1 | null |
2023-10-04 | Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions | Satwik Bhattamishra et.al. | 2310.03016v1 | null |
2023-10-04 | Multiple Physics Pretraining for Physical Surrogate Models | Michael McCabe et.al. | 2310.02994v1 | null |
2023-10-04 | Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors | Ido Amos et.al. | 2310.02980v1 | null |
2023-10-04 | T$^3$Bench: Benchmarking Current Progress in Text-to-3D Generation | Yuze He et.al. | 2310.02977v1 | null |
2023-10-04 | Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task Adaptation | Chen Dun et.al. | 2310.02842v1 | null |
2023-10-03 | Implicit regularization of multi-task learning and finetuning in overparameterized neural networks | Jack W. Lindsey et.al. | 2310.02396v1 | null |
2023-10-04 | Who's Harry Potter? Approximate Unlearning in LLMs | Ronen Eldan et.al. | 2310.02238v2 | null |
2023-10-03 | Think before you speak: Training Language Models With Pause Tokens | Sachin Goyal et.al. | 2310.02226v1 | null |
2023-10-03 | SIEVE: Multimodal Dataset Pruning Using Image Captioning Models | Anas Mahmoud et.al. | 2310.02110v1 | null |
2023-10-03 | Understanding Masked Autoencoders From a Local Contrastive Perspective | Xiaoyu Yue et.al. | 2310.01994v1 | null |
2023-10-03 | Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving | Long Chen et.al. | 2310.01957v1 | link |
2023-10-03 | MFOS: Model-Free & One-Shot Object Pose Estimation | JongMin Lee et.al. | 2310.01897v1 | null |
2023-10-04 | LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment | Bin Zhu et.al. | 2310.01852v2 | link |
2023-10-03 | MIMO-NeRF: Fast Neural Rendering with Multi-input Multi-output Neural Radiance Fields | Takuhiro Kaneko et.al. | 2310.01821v1 | null |
2023-10-03 | SEA: Sparse Linear Attention with Estimated Attention Mask | Heejun Lee et.al. | 2310.01777v1 | null |
2023-10-03 | Backdiff: a diffusion model for generalized transferable protein backmapping | Yikai Liu et.al. | 2310.01768v1 | null |
2023-10-02 | L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models | Ansong Ni et.al. | 2309.17446v2 | null |
2023-09-29 | Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks | Vaidehi Patil et.al. | 2309.17410v1 | link |
2023-09-29 | Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency | Zhihan Liu et.al. | 2309.17382v1 | null |
2023-09-29 | Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation | Shih-Lun Wu et.al. | 2309.17352v1 | null |
2023-09-29 | Scaling Experiments in Self-Supervised Cross-Table Representation Learning | Maximilian Schambach et.al. | 2309.17339v1 | null |
2023-09-29 | Consistent123: One Image to Highly Consistent 3D Asset Using Case-Aware Diffusion Priors | Yukang Lin et.al. | 2309.17261v1 | null |
2023-09-29 | Glioma subtype classification from histopathological images using in-domain and out-of-domain transfer learning: An experimental study | Vladimir Despotovic et.al. | 2309.17223v1 | null |
2023-09-29 | Reconstruction of Patient-Specific Confounders in AI-based Radiologic Image Interpretation using Generative Pretraining | Tianyu Han et.al. | 2309.17123v1 | link |
2023-09-28 | Qwen Technical Report | Jinze Bai et.al. | 2309.16609v1 | link |
2023-09-28 | Tensor Factorization for Leveraging Cross-Modal Knowledge in Data-Constrained Infrared Object Detection | Manish Sharma et.al. | 2309.16592v1 | null |
2023-09-28 | Universal Sleep Decoder: Aligning awake and sleep neural representation across subjects | Hui Zheng et.al. | 2309.16457v1 | null |
2023-09-28 | Predicting performance difficulty from piano sheet music images | Pedro Ramoneda et.al. | 2309.16287v1 | null |
2023-09-28 | Hierarchical Cross-Modality Knowledge Transfer with Sinkhorn Attention for CTC-based ASR | Xugang Lu et.al. | 2309.16093v1 | null |
2023-09-27 | Effective Long-Context Scaling of Foundation Models | Wenhan Xiong et.al. | 2309.16039v1 | null |
2023-09-27 | Graph-level Representation Learning with Joint-Embedding Predictive Architectures | Geri Skenderi et.al. | 2309.16014v1 | null |
2023-09-27 | Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts | Deniz Engin et.al. | 2309.15915v1 | null |
2023-09-27 | One For All: Video Conversation is Feasible Without Video Instruction Tuning | Ruyang Liu et.al. | 2309.15785v1 | null |
2023-09-27 | Question answering using deep learning in low resource Indian language Marathi | Dhiraj Amin et.al. | 2309.15779v1 | null |
2023-09-27 | ChatGPT-BCI: Word-Level Neural State Classification Using GPT, EEG, and Eye-Tracking Biomarkers in Semantic Inference Reading Comprehension | Yuhong Zhang et.al. | 2309.15714v1 | null |
2023-09-27 | Jointly Training Large Autoregressive Multimodal Models | Emanuele Aiello et.al. | 2309.15564v1 | null |
2023-09-27 | High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models | Chunyu Qiang et.al. | 2309.15512v1 | null |
2023-09-27 | DreamCom: Finetuning Text-guided Inpainting Model for Image Composition | Lingxiao Lu et.al. | 2309.15508v1 | null |
2023-09-27 | VideoAdviser: Video Knowledge Distillation for Multimodal Transfer Learning | Yanan Wang et.al. | 2309.15494v1 | null |
2023-09-27 | Tackling VQA with Pretrained Foundation Models without Further Training | Alvin De Jun Tan et.al. | 2309.15487v1 | null |
2023-09-27 | Towards Foundation Models Learned from Anatomy in Medical Imaging via Self-Supervision | Mohammad Reza Hosseinzadeh Taher et.al. | 2309.15358v1 | null |
2023-09-26 | SEPT: Towards Efficient Scene Representation Learning for Motion Prediction | Zhiqian Lan et.al. | 2309.15289v1 | null |
2023-09-26 | Language-EXtended Indoor SLAM (LEXIS): A Versatile System for Real-time Visual Scene Understanding | Christina Kassab et.al. | 2309.15065v1 | null |
2023-09-25 | When Automated Assessment Meets Automated Content Generation: Examining Text Quality in the Era of GPTs | Marialena Bevilacqua et.al. | 2309.14488v1 | null |
2023-09-25 | SINCERE: Supervised Information Noise-Contrastive Estimation REvisited | Patrick Feeney et.al. | 2309.14277v1 | link |
2023-09-26 | Species196: A One-Million Semi-supervised Dataset for Fine-grained Species Recognition | Wei He et.al. | 2309.14183v2 | null |
2023-09-25 | VidChapters-7M: Video Chapters at Scale | Antoine Yang et.al. | 2309.13952v1 | link |
2023-09-25 | TouchUp-G: Improving Feature Representation through Graph-Centric Finetuning | Jing Zhu et.al. | 2309.13885v1 | null |
2023-09-24 | Accelerating Large Batch Training via Gradient Signal to Noise Ratio (GSNR) | Guo-qing Jiang et.al. | 2309.13681v1 | null |
2023-09-24 | VoiceLDM: Text-to-Speech with Environmental Context | Yeonghyeon Lee et.al. | 2309.13664v1 | null |
2023-09-24 | Cross-modal Alignment with Optimal Transport for CTC-based ASR | Xugang Lu et.al. | 2309.13650v1 | null |
2023-09-24 | Robust data driven discovery of a seismic wave equation | Shijun Cheng et.al. | 2309.13645v1 | null |
2023-09-24 | Towards Robust Robot 3D Perception in Urban Environments: The UT Campus Object Dataset | Arthur Zhang et.al. | 2309.13549v1 | null |
2023-09-24 | InSpaceType: Reconsider Space Type in Indoor Monocular Depth Estimation | Cho-Ying Wu et.al. | 2309.13516v1 | null |
2023-09-22 | A matter of attitude: Focusing on positive and active gradients to boost saliency maps | Oscar Llorente et.al. | 2309.12913v1 | link |
2023-09-22 | SRFNet: Monocular Depth Estimation with Fine-grained Structure via Spatial Reliability-oriented Fusion of Frames and Events | Tianbo Pan et.al. | 2309.12842v1 | null |
2023-09-22 | Synthetic Boost: Leveraging Synthetic Data for Enhanced Vision-Language Segmentation in Echocardiography | Rabin Adhikari et.al. | 2309.12829v1 | link |
2023-09-22 | Unsupervised Representations Improve Supervised Learning in Speech Emotion Recognition | Amirali Soltani Tehrani et.al. | 2309.12714v1 | null |
2023-09-21 | Studying and improving reasoning in humans and machines | Nicolas Yax et.al. | 2309.12485v1 | null |
2023-09-21 | Environment-biased Feature Ranking for Novelty Detection Robustness | Stefan Smeu et.al. | 2309.12301v1 | null |
2023-09-21 | Weakly-supervised Automated Audio Captioning via text only training | Theodoros Kouzelis et.al. | 2309.12242v1 | link |
2023-09-21 | Exploiting CLIP-based Multi-modal Approach for Artwork Classification and Retrieval | Alberto Baldrati et.al. | 2309.12110v1 | null |
2023-09-21 | Accelerating Thematic Investment with Prompt Tuned Pretrained Language Models | Valentin Leonhard Buchner et.al. | 2309.12075v1 | null |
2023-09-21 | BELT:Bootstrapping Electroencephalography-to-Language Decoding and Zero-Shot Sentiment Classification by Natural Language Supervision | Jinzhao Zhou et.al. | 2309.12056v1 | null |
2023-09-21 | Beyond Image Borders: Learning Feature Extrapolation for Unbounded Image Composition | Xiaoyu Liu et.al. | 2309.12042v1 | link |
2023-09-21 | DEYOv3: DETR with YOLO for Real-time Object Detection | Haodong Ouyang et.al. | 2309.11851v1 | null |
2023-09-21 | Evaluating Large Language Models for Document-grounded Response Generation in Information-Seeking Dialogues | Norbert Braunschweiler et.al. | 2309.11838v1 | null |
2023-09-21 | Multimodal Transformers for Wireless Communications: A Case Study in Beam Prediction | Yu Tian et.al. | 2309.11811v1 | link |
2023-09-21 | SLHCat: Mapping Wikipedia Categories and Lists to DBpedia by Leveraging Semantic, Lexical, and Hierarchical Features | Zhaoyi Wang et.al. | 2309.11791v1 | null |
2023-09-20 | Galaxy Zoo DESI: Detailed Morphology Measurements for 8.7M Galaxies in the DESI Legacy Imaging Surveys | Mike Walmsley et.al. | 2309.11425v1 | link |
2023-09-20 | GECTurk: Grammatical Error Correction and Detection Dataset for Turkish | Atakan Kara et.al. | 2309.11346v1 | link |
2023-09-21 | Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism | Chengcheng Wang et.al. | 2309.11331v2 | link |
2023-09-20 | Uncovering the effects of model initialization on deep model generalization: A study with adult and pediatric Chest X-ray images | Sivaramakrishnan Rajaraman et.al. | 2309.11318v1 | null |
2023-09-20 | Using Artificial Intelligence for the Automation of Knitting Patterns | Uduak Uboh et.al. | 2309.11202v1 | null |
2023-09-20 | Assessment of Pre-Trained Models Across Languages and Grammars | Alberto Muñoz-Ortiz et.al. | 2309.11165v1 | link |
2023-09-20 | Hyperspectral Benchmark: Bridging the Gap between HSI Applications through Comprehensive Dataset and Pretraining | Hannah Frank et.al. | 2309.11122v1 | link |
2023-09-20 | Visual Question Answering in the Medical Domain | Louisa Canepa et.al. | 2309.11080v1 | null |
2023-09-20 | Weak Supervision for Label Efficient Visual Bug Detection | Farrukh Rahman et.al. | 2309.11077v1 | null |
2023-09-20 | 3D-U-SAM Network For Few-shot Tooth Segmentation in CBCT Images | Yifu Zhang et.al. | 2309.11015v1 | null |
2023-09-19 | Motif-Centric Representation Learning for Symbolic Music | Yuxuan Wu et.al. | 2309.10597v1 | null |
2023-09-19 | A Neighbourhood-Aware Differential Privacy Mechanism for Static Word Embeddings | Danushka Bollegala et.al. | 2309.10551v1 | null |
2023-09-19 | OpenMSD: Towards Multilingual Scientific Documents Similarity Measurement | Yang Gao et.al. | 2309.10539v1 | link |
2023-09-19 | FoleyGen: Visually-Guided Audio Generation | Xinhao Mei et.al. | 2309.10537v1 | null |
2023-09-19 | Improving CLIP Robustness with Knowledge Distillation and Self-Training | Clement Laroudie et.al. | 2309.10361v1 | null |
2023-09-19 | KoBigBird-large: Transformation of Transformer for Korean Language Understanding | Kisu Yang et.al. | 2309.10339v1 | null |
2023-09-19 | Mixed-Distil-BERT: Code-mixed Language Modeling for Bangla, English, and Hindi | Md Nishat Raihan et.al. | 2309.10272v1 | null |
2023-09-18 | Generative modeling, design and analysis of spider silk protein sequences for enhanced mechanical properties | Wei Lu et.al. | 2309.10170v1 | null |
2023-09-18 | Understanding Catastrophic Forgetting in Language Models via Implicit Inference | Suhas Kotha et.al. | 2309.10105v1 | link |
2023-09-18 | Plug in the Safety Chip: Enforcing Constraints for LLM-driven Robot Agents | Ziyi Yang et.al. | 2309.09919v1 | null |
2023-09-19 | Harnessing Collective Intelligence Under a Lack of Cultural Consensus | Necdet Gürkan et.al. | 2309.09787v2 | null |
2023-09-18 | DGM-DR: Domain Generalization with Mutual Information Regularized Diabetic Retinopathy Classification | Aleksandr Matsun et.al. | 2309.09670v1 | null |
2023-09-18 | DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation | Bowen Yin et.al. | 2309.09668v1 | link |
2023-09-18 | Electrolaryngeal Speech Intelligibility Enhancement Through Robust Linguistic Encoders | Lester Phillip Violeta et.al. | 2309.09627v1 | null |
2023-09-18 | PromptST: Prompt-Enhanced Spatio-Temporal Multi-Attribute Prediction | Zijian Zhang et.al. | 2309.09500v1 | null |
2023-09-18 | Self-supervised TransUNet for Ultrasound regional segmentation of the distal radius in children | Yuyue Zhou et.al. | 2309.09490v1 | null |
2023-09-18 | Face-Driven Zero-Shot Voice Conversion with Memory-based Face-Voice Alignment | Zheng-Yan Sheng et.al. | 2309.09470v1 | null |
2023-09-18 | Investigating Zero- and Few-shot Generalization in Fact Verification | Liangming Pan et.al. | 2309.09444v1 | link |
2023-09-18 | Unified Pretraining Target Based Video-music Retrieval With Music Rhythm And Video Optical Flow Information | Tianjun Mao et.al. | 2309.09421v1 | null |
2023-09-15 | How Transferable are Attribute Controllers on Pretrained Multilingual Translation Models? | Danni Liu et.al. | 2309.08565v1 | link |
2023-09-15 | Breathing New Life into 3D Assets with Generative Repainting | Tianfu Wang et.al. | 2309.08523v1 | link |
2023-09-15 | Scaling Laws for Sparsely-Connected Foundation Models | Elias Frantar et.al. | 2309.08520v1 | null |
2023-09-15 | Audio-free Prompt Tuning for Language-Audio Models | Yiming Li et.al. | 2309.08357v1 | null |
2023-09-15 | Headless Language Models: Learning without Predicting with Contrastive Weight Tying | Nathan Godey et.al. | 2309.08351v1 | null |
2023-09-15 | Leveraging the Power of Data Augmentation for Transformer-based Tracking | Jie Zhao et.al. | 2309.08264v1 | null |
2023-09-15 | BROW: Better featuRes fOr Whole slide image based on self-distillation | Yuanfeng Wu et.al. | 2309.08259v1 | null |
2023-09-15 | Fine-tune the pretrained ATST model for sound event detection | Nian Shao et.al. | 2309.08153v1 | link |
2023-09-15 | Multi-Scale Estimation for Omni-Directional Saliency Maps Using Learnable Equator Bias | Takao Yamanaka et.al. | 2309.08139v1 | link |
2023-09-15 | AnyOKP: One-Shot and Instance-Aware Object Keypoint Extraction with Pretrained ViT | Fangbo Qin et.al. | 2309.08134v1 | null |
2023-09-14 | Physically Plausible Full-Body Hand-Object Interaction Synthesis | Jona Braun et.al. | 2309.07907v1 | null |
2023-09-15 | Virchow: A Million-Slide Digital Pathology Foundation Model | Eugene Vorontsov et.al. | 2309.07778v2 | null |
2023-09-14 | PerPLM: Personalized Fine-tuning of Pretrained Language Models via Writer-specific Intermediate Learning and Prompts | Daisuke Oba et.al. | 2309.07727v1 | null |
2023-09-14 | L1-aware Multilingual Mispronunciation Detection Framework | Yassine El Kheir et.al. | 2309.07719v1 | null |
2023-09-14 | NutritionVerse: Empirical Study of Various Dietary Intake Estimation Approaches | Chi-en Amy Tai et.al. | 2309.07704v1 | null |
2023-09-14 | SwitchGPT: Adapting Large Language Models for Non-Text Outputs | Xinyu Wang et.al. | 2309.07623v1 | null |
2023-09-14 | VerilogEval: Evaluating Large Language Models for Verilog Code Generation | Mingjie Liu et.al. | 2309.07544v1 | null |
2023-09-14 | DePT: Decoupled Prompt Tuning | Ji Zhang et.al. | 2309.07439v1 | link |
2023-09-14 | Nucleus-aware Self-supervised Pretraining Using Unpaired Image-to-image Translation for Histopathology Images | Zhiyun Song et.al. | 2309.07394v1 | link |
2023-09-14 | Training Audio Captioning Models without Audio | Soham Deshmukh et.al. | 2309.07372v1 | link |
2023-09-13 | TransNet: A Transfer Learning-Based Network for Human Action Recognition | K. Alomar et.al. | 2309.06951v1 | null |
2023-09-13 | Enhancing the Performance of Multi-Agent Reinforcement Learning for Controlling HVAC Systems | Daniel Bayer et.al. | 2309.06940v1 | null |
2023-09-14 | VEATIC: Video-based Emotion and Affect Tracking in Context Dataset | Zhihang Ren et.al. | 2309.06745v2 | null |
2023-09-13 | VLSlice: Interactive Vision-and-Language Slice Discovery | Eric Slyman et.al. | 2309.06703v1 | link |
2023-09-13 | STUPD: A Synthetic Dataset for Spatial and Temporal Relation Reasoning | Palaash Agrawal et.al. | 2309.06680v1 | null |
2023-09-12 | Zero-Shot Visual Classification with Guided Cropping | Piyapat Saranrittichai et.al. | 2309.06581v1 | null |
2023-09-12 | Attention De-sparsification Matters: Inducing Diversity in Digital Pathology Representation Learning | Saarthak Kapse et.al. | 2309.06439v1 | null |
2023-09-12 | Learning to Predict Concept Ordering for Common Sense Generation | Tianhui Zhang et.al. | 2309.06363v1 | link |
2023-09-12 | 360$^\circ$ from a Single Camera: A Few-Shot Approach for LiDAR Segmentation | Laurenz Reichardt et.al. | 2309.06197v1 | null |
2023-09-12 | Active Label Refinement for Semantic Segmentation of Satellite Images | Tuan Pham Minh et.al. | 2309.06159v1 | null |
2023-09-12 | Annotating Data for Fine-Tuning a Neural Ranker? Current Active Learning Strategies are not Better than Random Selection | Sophia Althammer et.al. | 2309.06131v1 | null |
2023-09-12 | Do PLMs Know and Understand Ontological Knowledge? | Weiqi Wu et.al. | 2309.05936v1 | link |
2023-09-12 | Frequency-Aware Masked Autoencoders for Multimodal Pretraining on Biosignals | Ran Liu et.al. | 2309.05927v1 | null |
2023-09-11 | Natural Language Supervision for General-Purpose Audio Representations | Benjamin Elizalde et.al. | 2309.05767v1 | null |
2023-09-11 | Learning the Geodesic Embedding with Graph Neural Networks | Bo Pang et.al. | 2309.05613v1 | null |
2023-09-11 | Temporal Action Localization with Enhanced Instant Discriminability | Dingfeng Shi et.al. | 2309.05590v1 | link |
2023-09-11 | Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation | Anna Deichler et.al. | 2309.05455v1 | null |
2023-09-11 | Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP | Jinzuomu Zhong et.al. | 2309.05423v1 | null |
2023-09-11 | Towards generalisable and calibrated synthetic speech detection with self-supervised representations | Dan Oneata et.al. | 2309.05384v1 | null |
2023-09-11 | DeCUR: decoupling common & unique representations for multimodal self-supervision | Yi Wang et.al. | 2309.05300v1 | link |
2023-09-11 | Quantifying and Attributing the Hallucination of Large Language Models via Association Analysis | Li Du et.al. | 2309.05217v1 | null |
2023-09-11 | SIM-Sync: From Certifiably Optimal Synchronization over the 3D Similarity Group to Scene Reconstruction with Learned Depth | Xihang Yu et.al. | 2309.05184v1 | null |
2023-09-10 | Anatomy Completor: A Multi-class Completion Framework for 3D Anatomy Reconstruction | Jianning Li et.al. | 2309.04956v1 | null |
2023-09-10 | Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation | Yuan Gan et.al. | 2309.04946v1 | link |
2023-09-08 | Subwords as Skills: Tokenization for Sparse-Reward Reinforcement Learning | David Yunis et.al. | 2309.04459v1 | null |
2023-09-08 | Zero-Shot Robustification of Zero-Shot Models With Foundation Models | Dyah Adila et.al. | 2309.04344v1 | null |
2023-09-08 | Enhancing Hierarchical Transformers for Whole Brain Segmentation with Intracranial Measurements Integration | Xin Yu et.al. | 2309.04071v1 | null |
2023-09-08 | 3D Denoisers are Good 2D Teachers: Molecular Pretraining via Denoising and Cross-Modal Distillation | Sungjun Cho et.al. | 2309.04062v1 | null |
2023-09-07 | Multiple Representation Transfer from Large Language Models to End-to-End ASR Systems | Takuma Udagawa et.al. | 2309.04031v1 | null |
2023-09-07 | Multimodal Transformer for Material Segmentation | Md Kaykobad Reza et.al. | 2309.04001v1 | link |
2023-09-07 | DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models | Yung-Sung Chuang et.al. | 2309.03883v1 | link |
2023-09-07 | Prompt-based Context- and Domain-aware Pretraining for Vision and Language Navigation | Ting Liu et.al. | 2309.03661v1 | null |
2023-09-08 | All Labels Together: Low-shot Intent Detection with an Efficient Label Semantic Encoding Paradigm | Jiangshu Du et.al. | 2309.03563v2 | null |
2023-09-07 | SyncDreamer: Generating Multiview-consistent Images from a Single-view Image | Yuan Liu et.al. | 2309.03453v1 | null |
2023-09-06 | Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation | Arvind Krishna Sridhar et.al. | 2309.03340v1 | null |
2023-09-06 | EvoCLINICAL: Evolving Cyber-Cyber Digital Twin with Active Transfer Learning for Automated Cancer Registry System | Chengjie Lu et.al. | 2309.03246v1 | null |
2023-09-06 | Leveraging ASR Pretrained Conformers for Speaker Verification through Transfer Learning and Knowledge Distillation | Danwei Cai et.al. | 2309.03019v1 | null |
2023-09-07 | HAE-RAE Bench: Evaluation of Korean Knowledge in Language Models | Guijin Son et.al. | 2309.02706v2 | null |
2023-09-05 | Self-Supervised Pretraining Improves Performance and Inference Efficiency in Multiple Lung Ultrasound Interpretation Tasks | Blake VanBerlo et.al. | 2309.02596v1 | null |
2023-09-05 | Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning | Lili Yu et.al. | 2309.02591v1 | null |
2023-09-05 | A Survey of the Impact of Self-Supervised Pretraining for Diagnostic Tasks with Radiological Images | Blake VanBerlo et.al. | 2309.02555v1 | null |
2023-09-05 | Building a Winning Team: Selecting Source Model Ensembles using a Submodular Transferability Estimation Approach | Vimal K B et.al. | 2309.02429v1 | null |
2023-09-05 | Weigh Your Own Words: Improving Hate Speech Counter Narrative Generation via Attention Regularization | Helena Bonaldi et.al. | 2309.02311v1 | null |
2023-09-05 | Bring the Noise: Introducing Noise Robustness to Pretrained Automatic Speech Recognition | Patrick Eickhoff et.al. | 2309.02145v1 | null |
2023-09-04 | Uncertainty in AI: Evaluating Deep Neural Networks on Out-of-Distribution Images | Jamiu Idowu et.al. | 2309.01850v1 | null |
2023-09-06 | An Empirical Analysis for Zero-Shot Multi-Label Classification on COVID-19 CT Scans and Uncurated Reports | Ethan Dack et.al. | 2309.01740v2 | null |
2023-09-04 | A Comparative Analysis of Pretrained Language Models for Text-to-Speech | Marcel Granero-Moya et.al. | 2309.01576v1 | null |
2023-09-04 | DiverseMotion: Towards Diverse Human Motion Generation via Discrete Diffusion | Yunhong Lou et.al. | 2309.01372v1 | null |
2023-09-04 | Can I Trust Your Answer? Visually Grounded Video Question Answering | Junbin Xiao et.al. | 2309.01327v1 | null |
2023-09-03 | COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using Transformers | Julien Denize et.al. | 2309.01270v1 | link |
2023-09-03 | Optimizing Mobile-Edge AI-Generated Everything (AIGX) Services by Prompt Engineering: Fundamental, Framework, and Case Study | Yinqiu Liu et.al. | 2309.01065v1 | null |
2023-09-01 | Catalyst Property Prediction with CatBERTa: Unveiling Feature Exploration Strategies through Large Language Models | Janghoon Ock et.al. | 2309.00563v1 | link |
2023-09-01 | Trust your Good Friends: Source-free Domain Adaptation by Reciprocal Neighborhood Clustering | Shiqi Yang et.al. | 2309.00528v1 | null |
2023-09-01 | CPSP: Learning Speech Concepts From Phoneme Supervision | Chunyu Qiang et.al. | 2309.00424v1 | null |
2023-09-01 | FactLLaMA: Optimizing Instruction-Following Language Models with External Knowledge for Automated Fact-Checking | Tsun-Hin Cheung et.al. | 2309.00240v1 | null |
2023-08-31 | A Sequential Framework for Detection and Classification of Abnormal Teeth in Panoramic X-rays | Tudor Dascalu et.al. | 2309.00027v1 | link |
2023-08-31 | StyleInV: A Temporal Style Modulated Inversion Network for Unconditional Video Generation | Yuhan Wang et.al. | 2308.16909v1 | link |
2023-08-31 | The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants | Lucas Bandarkar et.al. | 2308.16884v1 | link |
2023-08-31 | Towards Multilingual Automatic Dialogue Evaluation | John Mendonça et.al. | 2308.16795v1 | null |
2023-08-31 | Generate Your Own Scotland: Satellite Image Generation Conditioned on Maps | Miguel Espinosa et.al. | 2308.16648v1 | link |
2023-08-31 | Expanding Frozen Vision-Language Models without Retraining: Towards Improved Robot Perception | Riley Tavassoli et.al. | 2308.16493v1 | null |
2023-08-31 | Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff | Satoshi Suzuki et.al. | 2308.16454v1 | null |
2023-08-30 | ToddlerBERTa: Exploiting BabyBERTa for Grammar Learning and Language Understanding | Omer Veysel Cagatan et.al. | 2308.16336v1 | null |
2023-08-30 | Can Prompt Learning Benefit Radiology Report Generation? | Jun Wang et.al. | 2308.16269v1 | null |
2023-08-30 | SAM-Med2D | Junlong Cheng et.al. | 2308.16184v1 | link |
2023-08-30 | Quantifying Uncertainty in Answers from any Language Model via Intrinsic and Extrinsic Confidence Assessment | Jiuhai Chen et.al. | 2308.16175v1 | null |
2023-08-30 | Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models | Neha Sengupta et.al. | 2308.16149v1 | null |
2023-08-30 | MerA: Merging Pretrained Adapters For Few-Shot Learning | Shwai He et.al. | 2308.15982v1 | null |
2023-08-29 | A General-Purpose Self-Supervised Model for Computational Pathology | Richard J. Chen et.al. | 2308.15474v1 | null |
2023-08-29 | DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior | Xinqi Lin et.al. | 2308.15070v1 | link |
2023-08-29 | Generative Model for Models: Rapid DNN Customization for Diverse Tasks and Resource Constraints | Wenxing Xu et.al. | 2308.15003v1 | null |
2023-08-28 | SynthDistill: Face Recognition with Knowledge Distillation from Synthetic Data | Hatef Otroshi Shahreza et.al. | 2308.14852v1 | null |
2023-08-28 | VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation | Xudong Wang et.al. | 2308.14710v1 | link |
2023-08-28 | Fine-Tuning Llama 2 Large Language Models for Detecting Online Sexual Predatory Chats and Abusive Texts | Thanh Thi Nguyen et.al. | 2308.14683v1 | null |
2023-08-28 | Adversarial Attacks on Foundational Vision Models | Nathan Inkawhich et.al. | 2308.14597v1 | null |
2023-08-28 | Multimodal Detection of Social Spambots in Twitter using Transformers | Loukas Ilias et.al. | 2308.14484v1 | null |
2023-08-28 | Self-Supervision for Tackling Unsupervised Anomaly Detection: Pitfalls and Opportunities | Leman Akoglu et.al. | 2308.14380v1 | null |
2023-08-28 | FonMTL: Towards Multitask Learning for the Fon Language | Bonaventure F. P. Dossou et.al. | 2308.14280v1 | link |
2023-08-28 | Parameter-Efficient Transfer Learning for Audio-Visual-Language Tasks | Hongye Liu et.al. | 2308.14274v1 | null |
2023-08-27 | SketchDreamer: Interactive Text-Augmented Creative Sketch Ideation | Zhiyu Qu et.al. | 2308.14191v1 | null |
2023-08-27 | Only Encode Once: Making Content-based News Recommender Greener | Qijiong Liu et.al. | 2308.14155v1 | null |
2023-08-27 | Situated Natural Language Explanations | Zining Zhu et.al. | 2308.14115v1 | null |
2023-08-25 | In-context learning for model-free system identification | Marco Forgione et.al. | 2308.13380v1 | link |
2023-08-25 | Refine Neutrino Events Reconstruction with BEiT-3 | Chen Li et.al. | 2308.13285v1 | link |
2023-08-25 | Self-supervised Scene Text Segmentation with Object-centric Layered Representations Augmented by Text Regions | Yibo Wang et.al. | 2308.13178v1 | null |
2023-08-25 | Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining? | Fei Wang et.al. | 2308.12898v2 | link |
2023-08-24 | A Parse-Then-Place Approach for Generating Graphic Layouts from Textual Descriptions | Jiawei Lin et.al. | 2308.12700v1 | null |
2023-08-25 | Masked Feature Modelling: Feature Masking for the Unsupervised Pre-training of a Graph Attention Network Block for Bottom-up Video Event Recognition | Dimitrios Daskalakis et.al. | 2308.12673v2 | null |
2023-08-24 | A Small and Fast BERT for Chinese Medical Punctuation Restoration | Tongtao Ling et.al. | 2308.12568v1 | link |
2023-08-24 | Parameter-Efficient Transfer Learning for Remote Sensing Image-Text Retrieval | Yuan Yuan et.al. | 2308.12509v1 | null |
2023-08-24 | Source-Free Collaborative Domain Adaptation via Multi-Perspective Feature Enrichment for Functional MRI Analysis | Yuqi Fang et.al. | 2308.12495v1 | link |
2023-08-23 | D4: Improving LLM Pretraining via Document De-Duplication and Diversification | Kushal Tirumala et.al. | 2308.12284v1 | null |
2023-08-23 | Language Reward Modulation for Pretraining Reinforcement Learning | Ademi Adeniji et.al. | 2308.12270v1 | link |
2023-08-23 | Prompt2Model: Generating Deployable Models from Natural Language Instructions | Vijay Viswanathan et.al. | 2308.12261v1 | link |
2023-08-25 | Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning | Jiasheng Ye et.al. | 2308.12219v2 | link |
2023-08-23 | DR-Tune: Improving Fine-tuning of Pretrained Visual Models by Distribution Regularization with Semantic Calibration | Nan Zhou et.al. | 2308.12058v1 | link |
2023-08-23 | Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages | Jinyi Hu et.al. | 2308.12038v1 | link |
2023-08-23 | Local Distortion Aware Efficient Transformer Adaptation for Image Quality Assessment | Kangmin Xu et.al. | 2308.12001v1 | null |
2023-08-23 | Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields | Hyeonseop Song et.al. | 2308.11974v1 | null |
2023-08-23 | CED: Consistent ensemble distillation for audio tagging | Heinrich Dinkel et.al. | 2308.11957v1 | link |
2023-08-22 | Time Does Tell: Self-Supervised Time-Tuning of Dense Image Representations | Mohammadreza Salehi et.al. | 2308.11796v1 | link |
2023-08-22 | Open Set Synthetic Image Source Attribution | Shengbang Fang et.al. | 2308.11557v1 | null |
2023-08-22 | Masked Momentum Contrastive Learning for Zero-shot Semantic Understanding | Jiantao Wu et.al. | 2308.11448v1 | null |
2023-08-22 | Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and Captioning | Shansong Liu et.al. | 2308.11276v1 | null |
2023-08-21 | UnLoc: A Unified Framework for Video Localization Tasks | Shen Yan et.al. | 2308.11062v1 | link |
2023-08-21 | SupEuclid: Extremely Simple, High Quality OoD Detection with Supervised Contrastive Learning and Euclidean Distance | Jarrod Haas et.al. | 2308.10973v1 | null |
2023-08-21 | DocPrompt: Large-scale continue pretrain for zero-shot and few-shot document question answering | Sijin Wu et.al. | 2308.10959v1 | null |
2023-08-21 | EALink: An Efficient and Accurate Pre-trained Framework for Issue-Commit Link Recovery | Chenyuan Zhang et.al. | 2308.10759v1 | link |
2023-08-23 | Foundation Model-oriented Robustness: Robust Image Model Evaluation with Pretrained Models | Peiyan Zhang et.al. | 2308.10632v2 | null |
2023-08-21 | When Prompt-based Incremental Learning Does Not Meet Strong Pretraining | Yu-Ming Tang et.al. | 2308.10445v1 | link |
2023-08-21 | Turning a CLIP Model into a Scene Text Spotter | Wenwen Yu et.al. | 2308.10408v1 | link |
2023-08-20 | cantnlp@LT-EDI@RANLP-2023: Homophobia/Transphobia Detection in Social Media Comments using Spatio-Temporally Retrained Language Models | Sidney G. -J. Wong et.al. | 2308.10370v1 | null |
2023-08-22 | Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting | Qidong Huang et.al. | 2308.10315v2 | link |
2023-08-20 | Make-It-4D: Synthesizing a Consistent Long-Term Dynamic Scene Video from a Single Image | Liao Shen et.al. | 2308.10257v1 | null |
2023-08-20 | From Global to Local: Multi-scale Out-of-distribution Detection | Ji Zhang et.al. | 2308.10239v1 | link |
2023-08-20 | ViT-Lens: Towards Omni-modal Representations | Weixian Lei et.al. | 2308.10185v1 | link |
2023-08-19 | Efficient Representation Learning for Healthcare with Cross-Architectural Self-Supervision | Pranav Singh et.al. | 2308.10064v1 | link |
2023-08-18 | Artificial-Spiking Hierarchical Networks for Vision-Language Representation Learning | Yeming Chen et.al. | 2308.09455v1 | null |
2023-08-18 | Accelerated materials language processing enabled by GPT | Jaewoong Choi et.al. | 2308.09354v1 | null |
2023-08-18 | DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability | Runhui Huang et.al. | 2308.09306v1 | null |
2023-08-18 | V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models | Heng Wang et.al. | 2308.09300v1 | null |
2023-08-18 | Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality Assessment Model | Ryandhimas E. Zezario et.al. | 2308.09262v1 | null |
2023-08-17 | Semantic Consistency for Assuring Reliability of Large Language Models | Harsh Raj et.al. | 2308.09138v1 | null |
2023-08-17 | Edit Temporal-Consistent Videos with Image Diffusion Model | Yuanzhi Wang et.al. | 2308.09091v1 | null |
2023-08-17 | On the Evaluation of Neural Code Translation: Taxonomy and Benchmark | Mingsheng Jiao et.al. | 2308.08961v1 | null |
2023-08-17 | Bag of Tricks for Long-Tailed Multi-Label Classification on Chest X-Rays | Feng Hong et.al. | 2308.08853v1 | null |
2023-08-16 | Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer | Guangyi Chen et.al. | 2308.08414v1 | null |
2023-08-16 | Advancing continual lifelong learning in neural information retrieval: definition, dataset, framework, and empirical evaluation | Jingrui Hou et.al. | 2308.08378v1 | null |
2023-08-16 | Boosting Commit Classification with Contrastive Learning | Jiajun Tong et.al. | 2308.08263v1 | null |
2023-08-16 | Is Self-Supervised Pretraining Good for Extrapolation in Molecular Property Prediction? | Shun Takashige et.al. | 2308.08129v1 | null |
2023-08-15 | End-to-End Open Vocabulary Keyword Search With Multilingual Neural Representations | Bolaji Yusuf et.al. | 2308.08027v1 | null |
2023-08-15 | RAVEN: In-Context Learning with Retrieval Augmented Encoder-Decoder Language Models | Jie Huang et.al. | 2308.07922v1 | null |
2023-08-15 | Dancing Avatar: Pose and Text-Guided Human Motion Videos Synthesis with Image Diffusion Model | Bosheng Qin et.al. | 2308.07749v1 | null |
2023-08-16 | SPM: Structured Pretraining and Matching Architectures for Relevance Modeling in Meituan Search | Wen Zan et.al. | 2308.07711v2 | null |
2023-08-15 | Self-supervised Hypergraphs for Learning Multiple World Interpretations | Alina Marcu et.al. | 2308.07615v1 | null |
2023-08-15 | SGDiff: A Style Guided Diffusion Model for Fashion Synthesis | Zhengwentai Sun et.al. | 2308.07605v1 | link |
2023-08-15 | AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model | Jeong Hun Yeo et.al. | 2308.07593v1 | null |
2023-08-14 | Semantic Similarity Loss for Neural Source Code Summarization | Chia-Yi Su et.al. | 2308.07429v1 | link |
2023-08-14 | Platypus: Quick, Cheap, and Powerful Refinement of LLMs | Ariel N. Lee et.al. | 2308.07317v1 | link |
2023-08-15 | SEMI-CenterNet: A Machine Learning Facilitated Approach for Semiconductor Defect Inspection | Vic De Ridder et.al. | 2308.07180v2 | null |
2023-08-14 | CTP: Towards Vision-Language Continual Pretraining via Compatible Momentum Contrast and Topology Preservation | Hongguang Zhu et.al. | 2308.07146v1 | link |
2023-08-14 | On the Importance of Spatial Relations for Few-shot Action Recognition | Yilun Zhang et.al. | 2308.07119v1 | null |
2023-08-14 | A One Stop 3D Target Reconstruction and multilevel Segmentation Method | Jiexiong Xu et.al. | 2308.06974v1 | link |
2023-08-14 | Robustness Stress Testing in Medical Image Classification | Mobarakol Islam et.al. | 2308.06889v1 | link |
2023-08-14 | Towards Open-Set Test-Time Adaptation Utilizing the Wisdom of Crowds in Entropy Minimization | Jungsoo Lee et.al. | 2308.06879v1 | null |
2023-08-13 | Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks | David Junhao Zhang et.al. | 2308.06739v1 | null |
2023-08-13 | IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models | Hu Ye et.al. | 2308.06721v1 | null |
2023-08-12 | GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher | Youliang Yuan et.al. | 2308.06463v1 | link |
2023-08-11 | Zero-shot Text-driven Physically Interpretable Face Editing | Yapeng Meng et.al. | 2308.05976v1 | null |
2023-08-10 | AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining | Haohe Liu et.al. | 2308.05734v1 | null |
2023-08-10 | Generative Diffusion Models for Radio Wireless Channel Modelling and Sampling | Ushnish Sengupta et.al. | 2308.05583v1 | null |
2023-08-10 | Fine-grained building roof instance segmentation based on domain adapted pretraining and composite dual-backbone | Guozhang Liu et.al. | 2308.05358v1 | null |
2023-08-10 | Multimodal Pretrained Models for Sequential Decision-Making: Synthesis, Verification, Grounding, and Perception | Yunhao Yang et.al. | 2308.05295v1 | null |
2023-08-09 | Deep Learning Model Transfer in Forest Mapping using Multi-source Satellite SAR and Optical Images | Shaojia Ge et.al. | 2308.05005v1 | null |
2023-08-09 | Transferable Models for Bioacoustics with Human Language Supervision | David Robinson et.al. | 2308.04978v1 | link |
2023-08-09 | JEDI: Joint Expert Distillation in a Semi-Supervised Multi-Dataset Student-Teacher Scenario for Video Action Recognition | Lucian Bicsi et.al. | 2308.04934v1 | null |
2023-08-09 | Deep Generative Networks for Heterogeneous Augmentation of Cranial Defects | Kamil Kwarciak et.al. | 2308.04883v1 | null |
2023-08-09 | Optimizing a Transformer-based network for a deep learning seismic processing workflow | Randy Harsuko et.al. | 2308.04739v1 | null |
2023-08-08 | Temporal DINO: A Self-supervised Video Strategy to Enhance Action Prediction | Izzeddin Teeti et.al. | 2308.04589v1 | null |
2023-08-08 | Semi-Supervised Semantic Segmentation of Cell Nuclei via Diffusion-based Large-Scale Pre-Training and Collaborative Learning | Zhuchen Shao et.al. | 2308.04578v1 | null |
2023-08-08 | Improving Medical Image Classification in Noisy Labels Using Only Self-supervised Pretraining | Bidur Khanal et.al. | 2308.04551v1 | link |
2023-08-08 | Pengembangan Model untuk Mendeteksi Kerusakan pada Terumbu Karang dengan Klasifikasi Citra | Fadhil Muhammad et.al. | 2308.04337v1 | null |
2023-08-08 | In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning | Xiaochuang Han et.al. | 2308.04275v1 | link |
2023-08-08 | Prompted Contrast with Masked Motion Modeling: Towards Versatile 3D Action Representation Learning | Jiahang Zhang et.al. | 2308.03975v1 | null |
2023-08-07 | AdaptiveSAM: Towards Efficient Tuning of SAM for Surgical Scene Segmentation | Jay N. Paranjape et.al. | 2308.03726v1 | link |
2023-08-07 | Exploring Visual Pre-training for Robot Manipulation: Datasets, Models and Methods | Ya Jing et.al. | 2308.03620v1 | null |
2023-08-07 | When GPT Meets Program Analysis: Towards Intelligent Detection of Smart Contract Logic Vulnerabilities in GPTScan | Yuqiang Sun et.al. | 2308.03314v1 | null |
2023-08-06 | Introducing Feature Attention Module on Convolutional Neural Network for Diabetic Retinopathy Detection | Susmita Ghosh et.al. | 2308.02985v1 | null |
2023-08-05 | DiffDance: Cascaded Human Motion Diffusion Model for Dance Generation | Qiaosong Qi et.al. | 2308.02915v1 | null |
2023-08-05 | Improving Generalization of Image Captioning with Unsupervised Prompt Learning | Hongchen Wei et.al. | 2308.02862v1 | null |
2023-08-05 | Dual Degradation-Inspired Deep Unfolding Network for Low-Light Image Enhancement | Huake Wang et.al. | 2308.02776v1 | null |
2023-08-04 | Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP | Qihang Yu et.al. | 2308.02487v1 | link |
2023-08-04 | A Parameter-efficient Multi-subject Model for Predicting fMRI Activity | Connor Lane et.al. | 2308.02351v1 | link |
2023-08-04 | Explaining Relation Classification Models with Semantic Extents | Lars Klöser et.al. | 2308.02193v1 | link |
2023-08-03 | DualCoOp++: Fast and Effective Adaptation to Multi-Label Recognition with Limited Annotations | Ping Hu et.al. | 2308.01890v1 | null |
2023-08-03 | MAP: A Model-agnostic Pretraining Framework for Click-through Rate Prediction | Jianghao Lin et.al. | 2308.01737v1 | link |
2023-08-03 | Baby's CoThought: Leveraging Large Language Models for Enhanced Reasoning in Compact Models | Zheyu Zhang et.al. | 2308.01684v1 | link |
2023-08-03 | MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies | Ke Chen et.al. | 2308.01546v1 | null |
2023-08-03 | Multimodal Neurons in Pretrained Text-Only Transformers | Sarah Schwettmann et.al. | 2308.01544v1 | null |
2023-08-03 | MFIM: Megapixel Facial Identity Manipulation | Sanghyeon Na et.al. | 2308.01536v1 | null |
2023-08-02 | Teaching Smaller Language Models To Generalise To Unseen Compositional Questions | Tim Hartill et.al. | 2308.00946v1 | null |
2023-08-01 | Ada-DQA: Adaptive Diverse Quality-aware Feature Acquisition for Video Quality Assessment | Hongbo Liu et.al. | 2308.00729v1 | null |
2023-08-01 | Adaptive Semantic Consistency for Cross-domain Few-shot Classification | Hengchu Lu et.al. | 2308.00727v1 | null |
2023-08-01 | CodeBPE: Investigating Subtokenization Options for Large Language Model Pretraining on Source Code | Nadezhda Chirkova et.al. | 2308.00683v1 | null |
2023-08-01 | An L2-Normalized Spatial Attention Network For Accurate And Fast Classification Of Brain Tumors In 2D T1-Weighted CE-MRI Images | Grace Billingsley et.al. | 2308.00491v1 | link |
2023-08-01 | DINO-CXR: A self supervised method based on vision transformer for chest X-ray classification | Mohammadreza Shakouri et.al. | 2308.00475v1 | null |
2023-08-01 | ViT2EEG: Leveraging Hybrid Pretrained Vision Transformers for EEG Data | Ruiqi Yang et.al. | 2308.00454v1 | link |
2023-08-01 | Fountain -- an intelligent contextual assistant combining knowledge representation and language models for manufacturing risk identification | Saurabh Kumar et.al. | 2308.00364v1 | null |
2023-08-01 | Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models | Jiaao Chen et.al. | 2308.00304v1 | null |
2023-08-01 | The Algonauts Project 2023 Challenge: UARK-UAlbany Team Solution | Xuan-Bac Nguyen et.al. | 2308.00262v1 | link |
2023-08-01 | Instructed to Bias: Instruction-Tuned Language Models Exhibit Emergent Cognitive Bias | Itay Itzhak et.al. | 2308.00225v1 | null |
2023-07-31 | Generative Models as a Complex Systems Science: How can we make sense of large language model behavior? | Ari Holtzman et.al. | 2308.00189v1 | null |
2023-07-31 | Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity | Charlie Hou et.al. | 2308.00177v1 | null |
2023-07-31 | Towards Trustworthy and Aligned Machine Learning: A Data-centric Survey with Causality Perspectives | Haoyang Liu et.al. | 2307.16851v1 | null |
2023-07-31 | UniVTG: Towards Unified Video-Language Temporal Grounding | Kevin Qinghong Lin et.al. | 2307.16715v1 | link |
2023-07-31 | DDG-Net: Discriminability-Driven Graph Network for Weakly-supervised Temporal Action Localization | Xiaojun Tang et.al. | 2307.16415v1 | link |
2023-07-31 | MobileVidFactory: Automatic Diffusion-Based Social Media Video Generation for Mobile Devices from Text | Junchen Zhu et.al. | 2307.16371v1 | null |
2023-07-31 | AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos? | Qi Zhao et.al. | 2307.16368v1 | null |
2023-07-30 | Unified Model for Image, Video, Audio and Language Tasks | Mustafa Shukor et.al. | 2307.16184v1 | link |
2023-07-30 | HD-Fusion: Detailed Text-to-3D Generation Leveraging Multiple Noise Estimation | Jinbo Wu et.al. | 2307.16183v1 | null |
2023-08-01 | Motion Degeneracy in Self-supervised Learning of Elevation Angle Estimation for 2D Forward-Looking Sonar | Yusheng Wang et.al. | 2307.16160v2 | null |
2023-07-29 | Instance-Wise Adaptive Tuning and Caching for Vision-Language Models | Chunjin Yang et.al. | 2307.15983v1 | null |
2023-07-29 | GeneMask: Fast Pretraining of Gene Sequences to Enable Few-Shot Learning | Soumyadeep Roy et.al. | 2307.15933v1 | link |
2023-07-28 | SimDETR: Simplifying self-supervised pretraining for DETR | Ioannis Maniadis Metaxas et.al. | 2307.15697v1 | null |
2023-07-28 | The FlySpeech Audio-Visual Speaker Diarization System for MISP Challenge 2022 | Li Zhang et.al. | 2307.15400v1 | null |
2023-07-28 | Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions | Yifei Xin et.al. | 2307.15344v1 | null |
2023-07-28 | ChatHome: Development and Evaluation of a Domain-Specific Language Model for Home Renovation | Cheng Wen et.al. | 2307.15290v1 | link |
2023-07-28 | Multilingual Lexical Simplification via Paraphrase Generation | Kang Liu et.al. | 2307.15286v1 | link |
2023-07-28 | AC-Norm: Effective Tuning for Medical Image Analysis via Affine Collaborative Normalization | Chuyan Zhang et.al. | 2307.15282v1 | link |
2023-07-28 | A deep transfer learning network for structural condition identification with limited real-world training data | Nengxin Bao et.al. | 2307.15249v1 | null |
2023-07-27 | Seal-3D: Interactive Pixel-Level Editing for Neural Radiance Fields | Xiangyu Wang et.al. | 2307.15131v1 | link |
2023-07-27 | Sample Less, Learn More: Efficient Action Recognition via Frame Feature Restoration | Harry Cheng et.al. | 2307.14866v1 | null |
2023-07-27 | Gloss-free Sign Language Translation: Improving from Visual-Language Pretraining | Benjia Zhou et.al. | 2307.14768v1 | null |
2023-07-27 | A Weakly Supervised Segmentation Network Embedding Cross-scale Attention Guidance and Noise-sensitive Constraint for Detecting Tertiary Lymphoid Structures of Pancreatic Tumors | Bingxue Wang et.al. | 2307.14603v1 | null |
2023-07-26 | MiDaS v3.1 -- A Model Zoo for Robust Monocular Relative Depth Estimation | Reiner Birkl et.al. | 2307.14460v1 | link |
2023-07-26 | Controllable Generation of Dialogue Acts for Dialogue Systems via Few-Shot Response Generation and Ranking | Angela Ramirez et.al. | 2307.14440v1 | link |
2023-07-26 | Visual Instruction Inversion: Image Editing via Visual Prompting | Thao Nguyen et.al. | 2307.14331v1 | null |
2023-07-26 | Comparative Analysis of Libraries for the Sentimental Analysis | Wendy Ccoya et.al. | 2307.14311v1 | null |
2023-07-27 | RPG-Palm: Realistic Pseudo-data Generation for Palmprint Recognition | Lei Shen et.al. | 2307.14016v2 | null |
2023-07-26 | ESSAformer: Efficient Transformer for Hyperspectral Image Super-resolution | Mingjin Zhang et.al. | 2307.14010v1 | null |
2023-07-26 | Tracking Anything in High Quality | Jiawen Zhu et.al. | 2307.13974v1 | link |
2023-07-26 | How Does Diffusion Influence Pretrained Language Models on Out-of-Distribution Data? | Huazheng Wang et.al. | 2307.13949v1 | link |
2023-07-26 | FinTree: Financial Dataset Pretrain Transformer Encoder for Relation Extraction | Hyunjong Ok et.al. | 2307.13900v1 | null |
2023-07-25 | Pretrained Deep 2.5D Models for Efficient Predictive Modeling from Retinal OCT | Taha Emre et.al. | 2307.13865v1 | null |
2023-07-25 | E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning | Cheng Han et.al. | 2307.13770v1 | link |
2023-07-25 | QuickQual: Lightweight, convenient retinal image quality scoring with off-the-shelf pretrained models | Justin Engelmann et.al. | 2307.13646v1 | link |
2023-07-25 | XDLM: Cross-lingual Diffusion Language Model for Machine Translation | Linyao Chen et.al. | 2307.13560v1 | null |
2023-07-25 | Zshot: An Open-source Framework for Zero-Shot Named Entity Recognition and Relation Extraction | Gabriele Picco et.al. | 2307.13497v1 | null |
2023-07-24 | DeepGATGO: A Hierarchical Pretraining-Based Graph-Attention Model for Automatic Protein Function Prediction | Zihao Li et.al. | 2307.13004v1 | null |
2023-07-25 | Towards a Visual-Language Foundation Model for Computational Pathology | Ming Y. Lu et.al. | 2307.12914v2 | null |
2023-07-24 | Multiscale Video Pretraining for Long-Term Activity Forecasting | Reuben Tan et.al. | 2307.12854v1 | null |
2023-07-24 | Predicting Ordinary Differential Equations with Transformers | Sören Becker et.al. | 2307.12617v1 | null |
2023-07-25 | TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition | Shilin Lu et.al. | 2307.12493v2 | link |
2023-07-23 | CommonsenseVIS: Visualizing and Understanding Commonsense Reasoning Capabilities of Natural Language Models | Xingbo Wang et.al. | 2307.12382v1 | null |
2023-07-23 | Self-Supervised Learning for Audio-Based Emotion Recognition | Peranut Nimitsurachat et.al. | 2307.12343v1 | null |
2023-07-23 | Geometry-Aware Adaptation for Pretrained Models | Nicholas Roberts et.al. | 2307.12226v1 | null |
2023-07-22 | Pathology-and-genomics Multimodal Transformer for Survival Outcome Prediction | Kexin Ding et.al. | 2307.11952v1 | link |
2023-07-21 | Bibliometric Analysis of Publisher and Journal Instructions to Authors on Generative-AI in Academic and Scientific Publishing | Conner Ganjavi et.al. | 2307.11918v1 | null |
2023-07-21 | Enhancing CLIP with GPT-4: Harnessing Visual Descriptions as Prompts | Mayug Maniparambil et.al. | 2307.11661v1 | null |
2023-07-21 | Advancing Visual Grounding with Scene Knowledge: Benchmark and Method | Zhihong Chen et.al. | 2307.11558v1 | link |
2023-07-21 | Generating Image-Specific Text Improves Fine-grained Image Classification | Emily Mu et.al. | 2307.11315v1 | null |
2023-07-20 | Heuristic Hyperparameter Choice for Image Anomaly Detection | Zeyu Jiang et.al. | 2307.11197v1 | null |
2023-07-20 | Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding | Siddhant Arora et.al. | 2307.11005v1 | null |
2023-07-20 | PASTA: Pretrained Action-State Transformer Agents | Raphael Boige et.al. | 2307.10936v1 | null |
2023-07-20 | BlendFace: Re-designing Identity Encoders for Face-Swapping | Kaede Shiohara et.al. | 2307.10854v1 | link |
2023-07-20 | HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and Retarget Faces | Stella Bounareli et.al. | 2307.10797v1 | link |
2023-07-20 | Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition | Weidong Chen et.al. | 2307.10757v1 | link |
2023-07-20 | Learning Discriminative Visual-Text Representation for Polyp Re-Identification | Suncheng Xiang et.al. | 2307.10625v1 | link |
2023-07-20 | Deep fused flow and topology features for botnet detection basing on pretrained GCN | Meng Xiaoyuan et.al. | 2307.10583v1 | null |
2023-07-20 | SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer | Daegyeom Kim et.al. | 2307.10550v1 | null |
2023-07-19 | Interpreting and Correcting Medical Image Classification with PIP-Net | Meike Nauta et.al. | 2307.10404v1 | null |
2023-07-19 | Gradient Sparsification For Masked Fine-Tuning of Transformers | James O' Neill et.al. | 2307.10098v1 | null |
2023-07-20 | Class Attention to Regions of Lesion for Imbalanced Medical Image Recognition | Jia-Xin Zhuang et.al. | 2307.10036v2 | null |
2023-07-19 | An analysis on the effects of speaker embedding choice in non auto-regressive TTS | Adriana Stan et.al. | 2307.09898v1 | null |
2023-07-19 | Pseudo Outlier Exposure for Out-of-Distribution Detection using Pretrained Transformers | Jaeyoung Kim et.al. | 2307.09455v2 | null |
2023-07-19 | Llama 2: Open Foundation and Fine-Tuned Chat Models | Hugo Touvron et.al. | 2307.09288v2 | link |
2023-07-18 | UniTabE: Pretraining a Unified Tabular Encoder for Heterogeneous Tabular Data | Yazheng Yang et.al. | 2307.09249v1 | null |
2023-07-18 | Division Gets Better: Learning Brightness-Aware and Detail-Sensitive Representations for Low-Light Image Enhancement | Huake Wang et.al. | 2307.09104v1 | null |
2023-07-18 | Multimodal Machine Learning for Extraction of Theorems and Proofs in the Scientific Literature | Shrey Mishra et.al. | 2307.09047v1 | link |
2023-07-18 | Accuracy versus time frontiers of semi-supervised and self-supervised learning on medical images | Zhe Huang et.al. | 2307.08919v1 | link |
2023-07-17 | Flow Matching in Latent Space | Quan Dao et.al. | 2307.08698v1 | link |
2023-07-17 | Deficiency-Aware Masked Transformer for Video Inpainting | Yongsheng Yu et.al. | 2307.08629v1 | link |
2023-07-17 | Scale-Aware Modulation Meet Transformer | Weifeng Lin et.al. | 2307.08579v1 | link |
2023-07-17 | Does Visual Pretraining Help End-to-End Reasoning? | Chen Sun et.al. | 2307.08506v1 | null |
2023-07-17 | Improving End-to-End Speech Translation by Imitation-Based Knowledge Distillation with Synthetic Transcripts | Rebekka Hubert et.al. | 2307.08426v1 | link |
2023-07-18 | CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing | Ahmet Canberk Baykal et.al. | 2307.08397v2 | null |
2023-07-17 | Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition | Shaoshi Ling et.al. | 2307.08234v1 | null |
2023-07-17 | Zero-Shot Image Harmonization with Generative Model Prior | Jianqi Chen et.al. | 2307.08182v1 | link |
2023-07-16 | Diffusion to Confusion: Naturalistic Adversarial Patch Generation Based on Diffusion Model for Object Detector | Shuo-Yen Lin et.al. | 2307.08076v1 | null |
2023-07-16 | Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language Modelling | Longyue Wang et.al. | 2307.08074v1 | null |
2023-07-14 | DreamTeacher: Pretraining Image Backbones with Deep Generative Models | Daiqing Li et.al. | 2307.07487v1 | null |
2023-07-14 | Towards spoken dialect identification of Irish | Liam Lonergan et.al. | 2307.07436v1 | null |
2023-07-14 | Improving Zero-Shot Generalization for CLIP with Synthesized Prompts | Zhengbo Wang et.al. | 2307.07397v1 | link |
2023-07-14 | Using Large Language Models for Zero-Shot Natural Language Generation from Knowledge Graphs | Agnes Axelsson et.al. | 2307.07312v1 | null |
2023-07-13 | Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling | He Huang et.al. | 2307.07057v1 | null |
2023-07-13 | In-context Autoencoder for Context Compression in a Large Language Model | Tao Ge et.al. | 2307.06945v1 | null |
2023-07-13 | mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs | Gregor Geigle et.al. | 2307.06930v1 | link |
2023-07-13 | Explainable 2D Vision Models for 3D Medical Data | Alexander Ziller et.al. | 2307.06614v1 | null |
2023-07-12 | T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation | Kaiyi Huang et.al. | 2307.06350v1 | null |
2023-07-12 | Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution | Mostafa Dehghani et.al. | 2307.06304v1 | null |
2023-07-12 | Instruction Mining: High-Quality Instruction Data Selection for Large Language Models | Yihan Cao et.al. | 2307.06290v1 | null |
2023-07-12 | Pluggable Neural Machine Translation Models via Memory-augmented Adapters | Yuzhuang Xu et.al. | 2307.06029v1 | link |
2023-07-12 | What Happens During Finetuning of Vision Transformers: An Invariance Based Investigation | Gabriele Merlin et.al. | 2307.06006v1 | null |
2023-07-13 | PIGEON: Predicting Image Geolocations | Lukas Haas et.al. | 2307.05845v2 | null |
2023-07-11 | EgoAdapt: A multi-stream evaluation study of adaptation to real-world egocentric user video | Matthias De Lange et.al. | 2307.05784v1 | link |
2023-07-13 | Rad-ReStruct: A Novel VQA Benchmark and Method for Structured Radiology Reporting | Chantal Pellegrini et.al. | 2307.05766v2 | link |
2023-07-11 | Masked Vision and Language Pre-training with Unimodal and Multimodal Contrastive Losses for Medical Visual Question Answering | Pengfei Li et.al. | 2307.05314v1 | null |
2023-07-11 | Attribute Controlled Dialogue Prompting | Runcheng Liu et.al. | 2307.05228v1 | null |
2023-07-11 | Generative Pretraining in Multimodality | Quan Sun et.al. | 2307.05222v1 | link |
2023-07-11 | ExFaceGAN: Exploring Identity Directions in GAN's Learned Latent Space for Synthetic Identity Generation | Fadi Boutros et.al. | 2307.05151v1 | null |
2023-07-11 | Uni-Removal: A Semi-Supervised Framework for Simultaneously Addressing Multiple Degradations in Real-World Images | Yongheng Zhang et.al. | 2307.05075v1 | null |
2023-07-10 | FedYolo: Augmenting Federated Learning with Pretrained Transformers | Xuechen Zhang et.al. | 2307.04905v1 | null |
2023-07-10 | Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos | Sagnik Majumder et.al. | 2307.04760v1 | null |
2023-07-10 | Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA Feedback | Jaskirat Singh et.al. | 2307.04749v1 | null |
2023-07-12 | Weakly-supervised positional contrastive learning: application to cirrhosis classification | Emma Sarfati et.al. | 2307.04617v2 | link |
2023-07-10 | Q-YOLOP: Quantization-aware You Only Look Once for Panoptic Driving Perception | Chi-Chih Chang et.al. | 2307.04537v1 | null |
2023-07-06 | Structure Guided Multi-modal Pre-trained Transformer for Knowledge Graph Reasoning | Ke Liang et.al. | 2307.03591v1 | null |
2023-07-07 | Derivative Free Weight-space Ensembling | Dean Ninalga et.al. | 2307.03506v1 | null |
2023-07-07 | Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification & Segmentation | Dahyun Kang et.al. | 2307.03407v1 | null |
2023-07-07 | Teaching Arithmetic to Small Transformers | Nayoung Lee et.al. | 2307.03381v1 | null |
2023-07-06 | Encoder-Decoder Networks for Self-Supervised Pretraining and Downstream Signal Bandwidth Regression on Digital Antenna Arrays | Rajib Bhattacharjea et.al. | 2307.03327v1 | null |
2023-07-06 | To pretrain or not to pretrain? A case study of domain-specific pretraining for semantic segmentation in histopathology | Tushar Kataria et.al. | 2307.03275v1 | null |
2023-07-06 | Vision Language Transformers: A Survey | Clayton Fields et.al. | 2307.03254v1 | null |
2023-07-06 | VideoGLUE: Video General Understanding Evaluation of Foundation Models | Liangzhe Yuan et.al. | 2307.03166v1 | null |
2023-07-06 | Parameter-Efficient Fine-Tuning of LLaMA for the Clinical Domain | Aryo Gema et.al. | 2307.03042v1 | null |
2023-07-06 | A Critical Look at the Current Usage of Foundation Model for Dense Recognition Task | Shiqi Yang et.al. | 2307.02862v1 | null |
2023-07-06 | Large Language Models Empowered Autonomous Edge AI for Connected Intelligence | Yifei Shen et.al. | 2307.02779v1 | null |
2023-07-05 | ODD: A Benchmark Dataset for the NLP-based Opioid Related Aberrant Behavior Detection | Sunjae Kwon et.al. | 2307.02591v1 | null |
2023-07-05 | Named Entity Inclusion in Abstractive Text Summarization | Sergey Berezin et.al. | 2307.02570v1 | null |
2023-07-05 | Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks | Zhaofeng Wu et.al. | 2307.02477v1 | null |
2023-07-05 | Interactive Image Segmentation with Cross-Modality Vision Transformers | Kun Li et.al. | 2307.02280v1 | link |
2023-07-05 | LOAF-M2L: Joint Learning of Wording and Formatting for Singable Melody-to-Lyric Generation | Longshen Ou et.al. | 2307.02146v1 | null |
2023-07-05 | Prompting Diffusion Representations for Cross-Domain Semantic Segmentation | Rui Gong et.al. | 2307.02138v1 | null |
2023-07-05 | EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models | Michael Wornow et.al. | 2307.02028v1 | link |
2023-07-05 | A ChatGPT Aided Explainable Framework for Zero-Shot Medical Image Diagnosis | Jiaxiang Liu et.al. | 2307.01981v1 | null |
2023-07-04 | KDSTM: Neural Semi-supervised Topic Modeling with Knowledge Distillation | Weijie Xu et.al. | 2307.01878v1 | null |
2023-07-04 | Pretraining is All You Need: A Multi-Atlas Enhanced Transformer Framework for Autism Spectrum Disorder Classification | Lucas Mahler et.al. | 2307.01759v1 | link |
2023-07-04 | Pretraining Conformer with ASR or ASV for Anti-Spoofing Countermeasure | Yikang Wang et.al. | 2307.01546v1 | null |
2023-07-04 | Mitigating the Learning Bias towards Repetition by Self-Contrastive Training for Open-Ended Generation | Jian Guan et.al. | 2307.01542v1 | null |
2023-07-03 | Don't freeze: Finetune encoders for better Self-Supervised HAR | Vitor Fortes Rey et.al. | 2307.01168v1 | null |
2023-07-04 | Improving Language Plasticity via Pretraining with Active Forgetting | Yihong Chen et.al. | 2307.01163v2 | null |
2023-07-03 | Generating Reliable Pixel-Level Labels for Source Free Domain Adaptation | Gabriel Tjio et.al. | 2307.00893v1 | null |
2023-07-03 | Augmenting Deep Learning Adaptation for Wearable Sensor Data through Combined Temporal-Frequency Image Encoding | Yidong Zhu et.al. | 2307.00883v1 | null |
2023-07-03 | Analysis of Task Transferability in Large Pre-trained Classifiers | Akshay Mehra et.al. | 2307.00823v1 | link |
2023-07-01 | Improving Text Matching in E-Commerce Search with A Rationalizable, Intervenable and Fast Entity-Based Relevance Model | Jiong Cai et.al. | 2307.00370v1 | null |
2023-07-01 | Improving Multitask Retrieval by Promoting Task Specialization | Wenzheng Zhang et.al. | 2307.00342v1 | null |
2023-07-01 | Hierarchical Pretraining for Biomedical Term Embeddings | Bryan Cai et.al. | 2307.00266v1 | null |
2023-06-30 | Multiscale Progressive Text Prompt Network for Medical Image Segmentation | Xianjun Han et.al. | 2307.00174v1 | null |
2023-06-30 | Stitched ViTs are Flexible Vision Backbones | Zizheng Pan et.al. | 2307.00154v1 | link |
2023-06-30 | Class-Incremental Learning using Diffusion Model for Distillation and Replay | Quentin Jodelet et.al. | 2306.17560v1 | null |
2023-06-30 | Why does my medical AI look at pictures of birds? Exploring the efficacy of transfer learning across domain boundaries | Frederic Jonske et.al. | 2306.17555v1 | null |
2023-06-30 | MeLM, a generative pretrained language modeling framework that solves forward and inverse mechanics problems | Markus J. Buehler et.al. | 2306.17525v1 | null |
2023-07-03 | LMBot: Distilling Graph Knowledge into Language Model for Graph-less Deployment in Twitter Bot Detection | Zijian Cai et.al. | 2306.17408v2 | null |
2023-06-29 | Towards Open-Domain Topic Classification | Hantian Ding et.al. | 2306.17290v1 | null |
2023-06-29 | Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models | Simian Luo et.al. | 2306.17203v1 | link |
2023-06-29 | An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training | Zitian Chen et.al. | 2306.17165v1 | null |
2023-06-29 | Classifying Crime Types using Judgment Documents from Social Media | Haoxuan Xu et.al. | 2306.17020v1 | null |
2023-06-29 | MIS-FM: 3D Medical Image Segmentation using Foundation Models Pretrained on a Large-Scale Unannotated Dataset | Guotai Wang et.al. | 2306.16925v1 | link |
2023-07-03 | Probabilistic Linguistic Knowledge and Token-level Text Augmentation | Zhengxiang Wang et.al. | 2306.16644v2 | null |
2023-06-29 | Representation learning of vertex heatmaps for 3D human mesh reconstruction from multi-view images | Sungho Chun et.al. | 2306.16615v1 | null |
2023-06-28 | Multi-Site Clinical Federated Learning using Recursive and Attentive Models and NVFlare | Won Joon Yun et.al. | 2306.16367v1 | null |
2023-06-28 | S2SNet: A Pretrained Neural Network for Superconductivity Discovery | Ke Liu et.al. | 2306.16270v1 | link |
2023-06-28 | Effective Transfer of Pretrained Large Visual Model for Fabric Defect Segmentation via Specifc Knowledge Injection | Zhewei Chen et.al. | 2306.16186v1 | null |
2023-06-27 | Classification of Infant Sleep/Wake States: Cross-Attention among Large Scale Pretrained Transformer Networks using Audio, ECG, and IMU Data | Kai Chieh Chang et.al. | 2306.15808v1 | null |
2023-06-27 | ConKI: Contrastive Knowledge Injection for Multimodal Sentiment Analysis | Yakun Yu et.al. | 2306.15796v1 | null |
2023-06-27 | HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution | Eric Nguyen et.al. | 2306.15794v1 | null |
2023-06-27 | Evidential Detection and Tracking Collaboration: New Problem, Benchmark and Algorithm for Robust Anti-UAV System | Xue-Feng Zhu et.al. | 2306.15767v1 | null |
2023-06-27 | Semi-supervised Multimodal Representation Learning through a Global Workspace | Benjamin Devillers et.al. | 2306.15711v1 | link |
2023-06-28 | Extending Context Window of Large Language Models via Positional Interpolation | Shouyuan Chen et.al. | 2306.15595v2 | null |
2023-06-28 | TrickVOS: A Bag of Tricks for Video Object Segmentation | Evangelos Skartados et.al. | 2306.15377v2 | null |
2023-06-27 | Gender Bias in BERT -- Measuring and Analysing Biases through Sentiment Rating in a Realistic Downstream Classification Task | Sophie Jentzsch et.al. | 2306.15298v1 | null |
2023-06-27 | Can Pretrained Language Models Derive Correct Semantics from Corrupt Subwords under Noise? | Xinzhe Li et.al. | 2306.15268v1 | link |
2023-06-28 | Wespeaker baselines for VoxSRC2023 | Shuai Wang et.al. | 2306.15161v2 | null |
2023-06-28 | MIMIC: Masked Image Modeling with Image Correspondences | Kalyani Marathe et.al. | 2306.15128v2 | link |
2023-06-26 | Understanding In-Context Learning via Supportive Pretraining Data | Xiaochuang Han et.al. | 2306.15091v1 | null |
2023-06-26 | Pretraining task diversity and the emergence of non-Bayesian in-context learning for regression | Allan Raventós et.al. | 2306.15063v1 | link |
2023-06-26 | Supervised Pretraining Can Learn In-Context Reinforcement Learning | Jonathan N. Lee et.al. | 2306.14892v1 | null |
2023-06-26 | Composing Parameter-Efficient Modules with Arithmetic Operations | Jinghan Zhang et.al. | 2306.14870v1 | link |
2023-06-27 | Kosmos-2: Grounding Multimodal Large Language Models to the World | Zhiliang Peng et.al. | 2306.14824v2 | null |
2023-06-26 | DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models | Ximing Xing et.al. | 2306.14685v1 | null |
2023-06-26 | Improved Bayes Risk Can Yield Reduced Social Welfare Under Competition | Meena Jagadeesan et.al. | 2306.14670v1 | null |
2023-06-26 | Localized Text-to-Image Generation for Free via Cross Attention Control | Yutong He et.al. | 2306.14636v1 | null |
2023-06-26 | Transfer Learning across Several Centuries: Machine and Historian Integrated Method to Decipher Royal Secretary's Diary | Sojung Lucia Kim et.al. | 2306.14592v1 | null |
2023-06-26 | A-STAR: Test-time Attention Segregation and Retention for Text-to-image Synthesis | Aishwarya Agarwal et.al. | 2306.14544v1 | null |
2023-06-26 | ParameterNet: Parameters Are All You Need for Large-scale Visual Pretraining of Mobile Networks | Kai Han et.al. | 2306.14525v1 | null |
2023-06-27 | DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing | Yujun Shi et.al. | 2306.14435v2 | null |
2023-06-23 | Comparing the Efficacy of Fine-Tuning and Meta-Learning for Few-Shot Policy Imitation | Massimiliano Patacchiola et.al. | 2306.13554v1 | link |
2023-06-23 | DreamEditor: Text-Driven 3D Scene Editing with Neural Fields | Jingyu Zhuang et.al. | 2306.13455v1 | null |
2023-06-23 | Long-range Language Modeling with Self-retrieval | Ohad Rubin et.al. | 2306.13421v1 | null |
2023-06-23 | Variance-Covariance Regularization Improves Representation Learning | Jiachen Zhu et.al. | 2306.13292v1 | null |
2023-06-22 | PromptIR: Prompting for All-in-One Blind Image Restoration | Vaishnav Potlapalli et.al. | 2306.13090v1 | link |
2023-06-22 | Can a single image processing algorithm work equally well across all phases of DCE-MRI? | Adam G. Tattersall et.al. | 2306.12988v1 | null |
2023-06-22 | AudioPaLM: A Large Language Model That Can Speak and Listen | Paul K. Rubenstein et.al. | 2306.12925v1 | null |
2023-06-22 | Learning from Visual Observation via Offline Pretrained State-to-Go Transformer | Bohan Zhou et.al. | 2306.12860v1 | null |
2023-06-23 | Otter-Knowledge: benchmarks of multimodal knowledge graph representation learning from different sources for drug discovery | Hoang Thanh Lam et.al. | 2306.12802v2 | link |
2023-06-22 | Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields | Ori Gordon et.al. | 2306.12760v1 | null |
2023-06-22 | Restoration of the JPEG Maximum Lossy Compressed Face Images with Hourglass Block based on Early Stopping Discriminator | Jongwook Si et.al. | 2306.12757v1 | null |
2023-06-22 | FlowFace++: Explicit Semantic Flow-supervised End-to-End Face Swapping | Yu Zhang et.al. | 2306.12686v1 | null |
2023-06-22 | Identifying and Disentangling Spurious Features in Pretrained Image Representations | Rafayel Darbinyan et.al. | 2306.12673v1 | null |
2023-06-21 | Comparative Analysis of Segment Anything Model and U-Net for Breast Tumor Detection in Ultrasound and Mammography Images | Mohsen Ahmadi et.al. | 2306.12510v1 | null |
2023-06-21 | LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models | Shizhe Diao et.al. | 2306.12420v1 | link |
2023-06-21 | Introspective Action Advising for Interpretable Transfer Learning | Joseph Campbell et.al. | 2306.12314v1 | null |
2023-06-21 | ViTEraser: Harnessing the Power of Vision Transformers for Scene Text Removal with SegMIM Pretraining | Dezhi Peng et.al. | 2306.12106v1 | null |
2023-06-20 | Exploring New Frontiers in Agricultural NLP: Investigating the Potential of Large Language Models for Food Applications | Saed Rezayi et.al. | 2306.11892v1 | null |
2023-06-20 | Unsupervised Deep Unfolded PGD for Transmit Power Allocation in Wireless Systems | Ramoni Adeogun et.al. | 2306.11865v1 | null |
2023-06-20 | A Simple and Effective Pruning Approach for Large Language Models | Mingjie Sun et.al. | 2306.11695v1 | link |
2023-06-20 | Inter-Cell Network Slicing With Transfer Learning Empowered Multi-Agent Deep Reinforcement Learning | Tianlun Hu et.al. | 2306.11552v1 | null |
2023-06-20 | MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian | Willy Fitra Hendria et.al. | 2306.11341v1 | link |
2023-06-19 | RemoteCLIP: A Vision Language Foundation Model for Remote Sensing | Fan Liu et.al. | 2306.11029v1 | null |
2023-06-19 | Semi-Supervised Learning for hyperspectral images by non parametrically predicting view assignment | Shivam Pande et.al. | 2306.10955v1 | null |
2023-06-19 | Detailed retinal vessel segmentation without human annotations using simulated optical coherence tomography angiographs | Linus Kreitner et.al. | 2306.10941v1 | link |
2023-06-19 | Vocal Timbre Effects with Differentiable Digital Signal Processing | David Südholt et.al. | 2306.10886v1 | link |
2023-06-19 | A deep dive into explainable self-supervised transformers for point clouds | Ioannis Romanelis et.al. | 2306.10798v1 | link |
2023-06-19 | Preserving Commonsense Knowledge from Pre-trained Language Models via Causal Inference | Junhao Zheng et.al. | 2306.10790v1 | null |
2023-06-18 | Point-Cloud Completion with Pretrained Text-to-image Diffusion Models | Yoni Kasten et.al. | 2306.10533v1 | null |
2023-06-16 | CLIP2Protect: Protecting Facial Privacy using Text-Guided Makeup via Adversarial Latent Search | Fahad Shamshad et.al. | 2306.10008v1 | link |
2023-06-16 | Robot Learning with Sensorimotor Pre-training | Ilija Radosavovic et.al. | 2306.10007v1 | null |
2023-06-16 | SLACK: Stable Learning of Augmentations with Cold-start and KL regularization | Juliette Marrie et.al. | 2306.09998v1 | null |
2023-06-16 | LabelBench: A Comprehensive Framework for Benchmarking Label-Efficient Learning | Jifan Zhang et.al. | 2306.09910v1 | link |
2023-06-16 | Revealing the impact of social circumstances on the selection of cancer therapy through natural language processing of social work notes | Shenghuan Sun et.al. | 2306.09877v1 | null |
2023-06-16 | MixedTeacher : Knowledge Distillation for fast inference textural anomaly detection | Simon Thomine et.al. | 2306.09859v1 | null |
2023-06-16 | The Big Data Myth: Using Diffusion Models for Dataset Generation to Train Deep Detection Models | Roy Voetman et.al. | 2306.09762v1 | null |
2023-06-16 | Scaling Open-Vocabulary Object Detection | Matthias Minderer et.al. | 2306.09683v1 | null |
2023-06-16 | CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision Models | Hao-Wen Dong et.al. | 2306.09635v1 | null |
2023-06-16 | CMLM-CSE: Based on Conditional MLM Contrastive Learning for Sentence Embeddings | Wei Zhang et.al. | 2306.09594v1 | null |
2023-06-15 | Segment Any Point Cloud Sequences by Distilling Vision Foundation Models | Youquan Liu et.al. | 2306.09347v1 | link |
2023-06-15 | Semantic HELM: An Interpretable Memory for Reinforcement Learning | Fabian Paischer et.al. | 2306.09312v1 | link |
2023-06-15 | Text Promptable Surgical Instrument Segmentation with Vision-Language Models | Zijian Zhou et.al. | 2306.09244v1 | null |
2023-06-15 | SCALE: Scaling up the Complexity for Advanced Language Model Evaluation | Vishvaksenan Rasiah et.al. | 2306.09237v1 | null |
2023-06-15 | Audio Tagging on an Embedded Hardware Platform | Gabriel Bibbo et.al. | 2306.09106v1 | null |
2023-06-15 | Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration | Chenyang Lyu et.al. | 2306.09093v1 | link |
2023-06-15 | COSA: Concatenated Sample Pretrained Vision-Language Foundation Model | Sihan Chen et.al. | 2306.09085v1 | link |
2023-06-15 | Behavioral Cloning via Search in Embedded Demonstration Dataset | Federico Malato et.al. | 2306.09082v1 | null |
2023-06-15 | When Hyperspectral Image Classification Meets Diffusion Models: An Unsupervised Feature Learning Framework | Jingyi Zhou et.al. | 2306.08964v1 | null |
2023-06-15 | A Comparison of Self-Supervised Pretraining Approaches for Predicting Disease Risk from Chest Radiograph Images | Yanru Chen et.al. | 2306.08955v1 | null |
2023-06-13 | Image Captioners Are Scalable Vision Learners Too | Michael Tschannen et.al. | 2306.07915v1 | null |
2023-06-13 | GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Speech Emotion Recognition | Yu Pan et.al. | 2306.07848v1 | null |
2023-06-13 | Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images | Ming Y. Lu et.al. | 2306.07831v1 | null |
2023-06-13 | Monolingual and Cross-Lingual Knowledge Transfer for Topic Classification | Dmitry Karpov et.al. | 2306.07797v1 | null |
2023-06-13 | Multi-objective Molecular Optimization for Opioid Use Disorder Treatment Using Generative Network Complex | Hongsong Feng et.al. | 2306.07484v1 | null |
2023-06-13 | Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard | Ehsan Kamalloo et.al. | 2306.07471v1 | null |
2023-06-12 | Scalable 3D Captioning with Pretrained Models | Tiange Luo et.al. | 2306.07279v1 | null |
2023-06-12 | MovieFactory: Automatic Movie Creation from Text using Large Generative Models for Language and Images | Junchen Zhu et.al. | 2306.07257v1 | null |
2023-06-13 | Fair Learning to Rank with Distribution-free Risk Control | Ruocheng Guo et.al. | 2306.07188v2 | null |
2023-06-12 | Gradient Ascent Post-training Enhances Language Model Generalization | Dongkeun Yoon et.al. | 2306.07052v1 | link |
2023-06-12 | Generating Synthetic Datasets by Interpolating along Generalized Geodesics | Jiaojiao Fan et.al. | 2306.06866v1 | null |
2023-06-11 | Language Versatilists vs. Specialists: An Empirical Revisiting on Multilingual Transfer Ability | Jiacheng Ye et.al. | 2306.06688v1 | null |
2023-06-10 | Bootstrapping Code-Text Pretrained Language Model to Detect Inconsistency Between Code and Comment | Anh T. V. Dau et.al. | 2306.06347v1 | null |
2023-06-10 | Improving Non-autoregressive Translation Quality with Pretrained Language Model, Embedding Distillation and Upsampling Strategy for CTC | Shen-sian Syu et.al. | 2306.06345v1 | null |
2023-06-09 | DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents | Fuxiao Liu et.al. | 2306.06306v1 | link |
2023-06-09 | Abhilash Nandy et.al. | 2306.06190v1 | null | |
2023-06-09 | Virtual Node Tuning for Few-shot Node Classification | Zhen Tan et.al. | 2306.06063v1 | null |
2023-06-09 | Benchmarking self-supervised video representation learning | Akash Kumar et.al. | 2306.06010v1 | null |
2023-06-09 | Exploring Effective Mask Sampling Modeling for Neural Image Compression | Lin Liu et.al. | 2306.05704v1 | null |
2023-06-09 | Embodied Executable Policy Learning with Language-based Scene Summarization | Jielin Qiu et.al. | 2306.05696v1 | null |
2023-06-09 | On the Importance of Feature Decorrelation for Unsupervised Representation Learning in Reinforcement Learning | Hojoon Lee et.al. | 2306.05637v1 | link |
2023-06-08 | Hexatagging: Projective Dependency Parsing as Tagging | Afra Amini et.al. | 2306.05477v1 | null |
2023-06-08 | Tracking Objects with 3D Representation from Videos | Jiawei He et.al. | 2306.05416v1 | null |
2023-06-08 | Mixture-of-Domain-Adapters: Decoupling and Injecting Domain Knowledge to Pre-trained Language Models Memories | Shizhe Diao et.al. | 2306.05406v1 | link |
2023-06-08 | RDumb: A simple approach that questions our progress in continual test-time adaptation | Ori Press et.al. | 2306.05401v1 | link |
2023-06-08 | Extensive Evaluation of Transformer-based Architectures for Adverse Drug Events Extraction | Simone Scaboro et.al. | 2306.05276v1 | link |
2023-06-09 | Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models | Tianzhe Chu et.al. | 2306.05272v2 | link |
2023-06-08 | SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions | Yuseung Lee et.al. | 2306.05178v1 | null |
2023-06-08 | Variable Radiance Field for Real-Life Category-Specifc Reconstruction from Single Image | Kun Wang et.al. | 2306.05145v1 | null |
2023-06-08 | DLAMA: A Framework for Curating Culturally Diverse Facts for Probing the Knowledge of Pretrained Language Models | Amr Keleg et.al. | 2306.05076v1 | null |
2023-06-08 | Improving Visual Prompt Tuning for Self-supervised Vision Transformers | Seungryong Yoo et.al. | 2306.05067v1 | link |
2023-06-08 | Learning A Foundation Language Model for Geoscience Knowledge Understanding and Utilization | Cheng Deng et.al. | 2306.05064v1 | link |
2023-06-07 | Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection | Yu Bai et.al. | 2306.04637v1 | null |
2023-06-07 | Proximity-Informed Calibration for Deep Neural Networks | Miao Xiong et.al. | 2306.04590v1 | link |
2023-06-07 | Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages | Claytone Sikasote et.al. | 2306.04428v1 | link |
2023-06-07 | SF-FSDA: Source-Free Few-Shot Domain Adaptive Object Detection with Efficient Labeled Data Factory | Han Sun et.al. | 2306.04385v1 | null |
2023-06-07 | Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks | Haiyang Xu et.al. | 2306.04362v1 | link |
2023-06-08 | GPT Self-Supervision for a Better Data Annotator | Xiaohuan Pei et.al. | 2306.04349v2 | null |
2023-06-08 | Coarse Is Better? A New Pipeline Towards Self-Supervised Learning with Uncurated Images | Ke Zhu et.al. | 2306.04244v2 | null |
2023-06-07 | Leveraging Knowledge Graph Embeddings to Enhance Contextual Representations for Relation Extraction | Fréjus A. A. Laleye et.al. | 2306.04203v1 | null |
2023-06-07 | From the One, Judge of the Whole: Typed Entailment Graph Construction with Predicate Generation | Zhibin Chen et.al. | 2306.04170v1 | link |
2023-06-07 | Matte Anything: Interactive Natural Image Matting with Segment Anything Models | Jingfeng Yao et.al. | 2306.04121v1 | null |
2023-06-06 | Learning Human Mesh Recovery in 3D Scenes | Zehong Shen et.al. | 2306.03847v1 | null |
2023-06-06 | Quick-Tune: Quickly Learning Which Pretrained Model to Finetune and How | Sebastian Pineda Arango et.al. | 2306.03828v1 | null |
2023-06-06 | On the Difference of BERT-style and CLIP-style Text Encoders | Zhihong Chen et.al. | 2306.03678v1 | link |
2023-06-06 | BioBLP: A Modular Framework for Learning on Multimodal Biomedical Knowledge Graphs | Daniel Daza et.al. | 2306.03606v1 | link |
2023-06-06 | LegoNet: Alternating Model Blocks for Medical Image Segmentation | Ikboljon Sobirov et.al. | 2306.03494v1 | null |
2023-06-06 | Alzheimer Disease Classification through ASR-based Transcriptions: Exploring the Impact of Punctuation and Pauses | Lucía Gómez-Zaragozá et.al. | 2306.03443v1 | null |
2023-06-06 | Quantifying the Variability Collapse of Neural Networks | Jing Xu et.al. | 2306.03440v1 | null |
2023-06-06 | Towards Alleviating the Object Bias in Prompt Tuning-based Factual Knowledge Extraction | Yuhang Wang et.al. | 2306.03378v1 | link |
2023-06-06 | Identifying Shared Decodable Concepts in the Human Brain Using Image-Language Foundation Models | Cory Efird et.al. | 2306.03375v1 | null |
2023-06-07 | Vid2Act: Activate Offline Videos for Visual RL | Minting Pan et.al. | 2306.03360v2 | null |
2023-06-05 | SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression | Tim Dettmers et.al. | 2306.03078v1 | null |
2023-06-05 | Sensitivity-Aware Finetuning for Accuracy Recovery on Deep Learning Hardware | Lakshmi Nair et.al. | 2306.03076v1 | null |
2023-06-05 | Continual Learning with Pretrained Backbones by Tuning in the Input Space | Simone Marullo et.al. | 2306.02947v1 | null |
2023-06-05 | Second Language Acquisition of Neural Language Models | Miyu Oba et.al. | 2306.02920v1 | null |
2023-06-05 | SelfEvolve: A Code Evolution Framework via Large Language Models | Shuyang Jiang et.al. | 2306.02907v1 | null |
2023-06-05 | Learning Probabilistic Symmetrization for Architecture Agnostic Equivariance | Jinwoo Kim et.al. | 2306.02866v1 | link |
2023-06-05 | Transformer-Based UNet with Multi-Headed Cross-Attention Skip Connections to Eliminate Artifacts in Scanned Documents | David Kreuzer et.al. | 2306.02815v1 | null |
2023-06-05 | Explore and Exploit the Diverse Knowledge in Model Zoo for Domain Generalization | Yimeng Chen et.al. | 2306.02595v1 | null |
2023-06-05 | Improved Active Multi-Task Representation Learning via Lasso | Yiping Wang et.al. | 2306.02556v1 | null |
2023-06-04 | RadLing: Towards Efficient Radiology Report Understanding | Rikhiya Ghosh et.al. | 2306.02492v1 | null |
2023-06-02 | Distilling Efficient Language-Specific Models for Cross-Lingual Transfer | Alan Ansell et.al. | 2306.01709v1 | link |
2023-06-02 | Towards In-context Scene Understanding | Ivana Balažević et.al. | 2306.01667v1 | null |
2023-06-02 | Pretrained Language Model based Web Search Ranking: From Relevance to Satisfaction | Canjia Li et.al. | 2306.01599v1 | null |
2023-06-02 | Evaluating The Robustness of Self-Supervised Representations to Background/Foreground Removal | Xavier F. Cadet et.al. | 2306.01398v1 | null |
2023-06-02 | Speech Translation with Foundation Models and Optimal Transport: UPC at IWSLT23 | Ioannis Tsiamas et.al. | 2306.01327v1 | null |
2023-06-01 | Systematic Evaluation of GPT-3 for Zero-Shot Personality Estimation | Adithya V Ganesan et.al. | 2306.01183v1 | null |
2023-06-01 | TMI! Finetuned Models Leak Private Information from their Pretraining Data | John Abascal et.al. | 2306.01181v1 | null |
2023-06-01 | The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only | Guilherme Penedo et.al. | 2306.01116v1 | null |
2023-06-01 | Exploring the Versatility of Zero-Shot CLIP for Interstitial Lung Disease Classification | Cara Van Uden et.al. | 2306.01111v1 | null |
2023-06-01 | Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles | Chaitanya Ryali et.al. | 2306.00989v1 | link |
2023-06-01 | Continual Learning for Abdominal Multi-Organ and Tumor Segmentation | Yixiao Zhang et.al. | 2306.00988v1 | link |
2023-06-01 | StyleGAN knows Normal, Depth, Albedo, and More | Anand Bhattad et.al. | 2306.00987v1 | null |
2023-06-02 | Diffusion Self-Guidance for Controllable Image Generation | Dave Epstein et.al. | 2306.00986v2 | null |
2023-06-01 | Train Offline, Test Online: A Real Robot Learning Benchmark | Gaoyue Zhou et.al. | 2306.00942v1 | link |
2023-06-01 | STEVE-1: A Generative Model for Text-to-Behavior in Minecraft | Shalev Lifshitz et.al. | 2306.00937v1 | null |
2023-06-01 | "Let's not Quote out of Context": Unified Vision-Language Pretraining for Context Assisted Image Captioning | Abisek Rajakumar Kalarani et.al. | 2306.00931v1 | null |
2023-06-01 | Inserting Anybody in Diffusion Models via Celeb Basis | Ge Yuan et.al. | 2306.00926v1 | link |
2023-06-01 | Adapting a ConvNeXt model to audio classification on AudioSet | Thomas Pellegrini et.al. | 2306.00830v1 | null |
2023-06-01 | In or Out? Fixing ImageNet Out-of-Distribution Detection Evaluation | Julian Bitterwolf et.al. | 2306.00826v1 | link |
2023-06-01 | Too Large; Data Reduction for Vision-Language Pre-Training | Alex Jinpeng Wang et.al. | 2305.20087v2 | link |
2023-05-31 | Efficient Shapley Values Estimation by Amortization for Text Classification | Chenghao Yang et.al. | 2305.19998v1 | link |
2023-06-01 | A Global Context Mechanism for Sequence Labeling | Conglei Xu et.al. | 2305.19928v2 | link |
2023-05-31 | Structure-Aware Language Model Pretraining Improves Dense Retrieval on Structured Data | Xinze Li et.al. | 2305.19912v1 | link |
2023-05-31 | How Does Pretraining Improve Discourse-Aware Translation? | Zhihong Huang et.al. | 2305.19847v1 | null |
2023-05-31 | A Survey of Label-Efficient Deep Learning for 3D Point Clouds | Aoran Xiao et.al. | 2305.19812v1 | link |
2023-05-31 | Automatic Discrimination of Human and Neural Machine Translation in Multilingual Scenarios | Malina Chichirau et.al. | 2305.19757v1 | null |
2023-05-31 | Investigation of the Robustness of Neural Density Fields | Jonas Schuhmacher et.al. | 2305.19698v1 | null |
2023-05-31 | End-to-end Training of Deep Boltzmann Machines by Unbiased Contrastive Divergence with Local Mode Initialization | Shohei Taniguchi et.al. | 2305.19684v1 | link |
2023-05-31 | LAIT: Efficient Multi-Segment Encoding in Transformers with Layer-Adjustable Interaction | Jeremiah Milbauer et.al. | 2305.19585v1 | null |
2023-05-30 | Jointly Reparametrized Multi-Layer Adaptation for Efficient and Private Tuning | Umang Gupta et.al. | 2305.19264v1 | link |
2023-05-30 | DäRF: Boosting Radiance Fields from Sparse Inputs with Monocular Depth Adaptation | Jiuhn Song et.al. | 2305.19201v1 | null |
2023-05-30 | Strategic Reasoning with Language Models | Kanishk Gandhi et.al. | 2305.19165v1 | null |
2023-05-30 | LANCE: Stress-testing Visual Models by Generating Language-guided Counterfactual Images | Viraj Prabhu et.al. | 2305.19164v1 | null |
2023-05-30 | Together We Make Sense -- Learning Meta-Sense Embeddings from Pretrained Static Sense Embeddings | Haochen Luo et.al. | 2305.19092v1 | null |
2023-05-30 | Nested Diffusion Processes for Anytime Image Generation | Noam Elata et.al. | 2305.19066v1 | link |
2023-05-30 | Voice Conversion With Just Nearest Neighbors | Matthew Baas et.al. | 2305.18975v1 | link |
2023-05-30 | Prompt-based Tuning of Transformer Models for Multi-Center Medical Image Segmentation | Numan Saeed et.al. | 2305.18948v1 | null |
2023-05-30 | Empirical Sufficiency Lower Bounds for Language Modeling with Locally-Bootstrapped Semantic Structures | Jakob Prange et.al. | 2305.18915v1 | link |
2023-05-30 | Dissecting Chain-of-Thought: A Study on Compositional In-Context Learning of MLPs | Yingcong Li et.al. | 2305.18869v1 | null |
2023-05-29 | CommonAccent: Exploring Large Acoustic Pretrained Models for Accent Classification Based on Common Voice | Juan Zuluaga-Gomez et.al. | 2305.18283v1 | link |
2023-05-29 | Concept Decomposition for Visual Exploration and Inspiration | Yael Vinker et.al. | 2305.18203v1 | null |
2023-05-29 | Multiscale Positive-Unlabeled Detection of AI-Generated Texts | Yuchuan Tian et.al. | 2305.18149v1 | link |
2023-05-29 | Conditional Score Guidance for Text-Driven Image-to-Image Translation | Hyunsoo Lee et.al. | 2305.18007v1 | null |
2023-05-29 | Data Augmentation for Low-Resource Keyphrase Generation | Krishna Garg et.al. | 2305.17968v1 | link |
2023-05-28 | Transfer Learning for Power Outage Detection Task with Limited Training Data | Olukunle Owolabi et.al. | 2305.17817v1 | null |
2023-05-28 | Adapting Language-Audio Models as Few-Shot Audio Learners | Jinhua Liang et.al. | 2305.17719v1 | null |
2023-05-28 | Z-GMOT: Zero-shot Generic Multiple Object Tracking | Kim Hoang Tran et.al. | 2305.17648v1 | null |
2023-05-30 | Learning from Children: Improving Image-Caption Pretraining via Curriculum | Hammad A. Ayyubi et.al. | 2305.17540v2 | link |
2023-05-27 | Text-to-image Editing by Image Information Removal | Zhongping Zhang et.al. | 2305.17489v1 | null |
2023-05-26 | BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks | Kai Zhang et.al. | 2305.17100v1 | link |
2023-05-26 | Learning and Leveraging Verifiers to Improve Planning Capabilities of Pre-trained Language Models | Daman Arora et.al. | 2305.17077v1 | null |
2023-05-26 | Exploiting Abstract Meaning Representation for Open-Domain Question Answering | Cunxiang Wang et.al. | 2305.17050v1 | null |
2023-05-26 | Commonsense Knowledge Graph Completion Via Contrastive Pretraining and Node Clustering | Siwei Wu et.al. | 2305.17019v1 | null |
2023-05-26 | D-CALM: A Dynamic Clustering-based Active Learning Approach for Mitigating Bias | Sabit Hassan et.al. | 2305.17013v1 | null |
2023-05-29 | Three Towers: Flexible Contrastive Learning with Pretrained Image Models | Jannik Kossen et.al. | 2305.16999v2 | null |
2023-05-26 | Inverse Dynamics Pretraining Learns Good Representations for Multitask Imitation | David Brandfonbrener et.al. | 2305.16985v1 | null |
2023-05-26 | Compositional Generalization without Trees using Multiset Tagging and Latent Permutations | Matthias Lindemann et.al. | 2305.16954v1 | null |
2023-05-26 | On Evaluating Adversarial Robustness of Large Vision-Language Models | Yunqing Zhao et.al. | 2305.16934v1 | link |
2023-05-26 | Calibration of Transformer-based Models for Identifying Stress and Depression in Social Media | Loukas Ilias et.al. | 2305.16797v1 | null |
2023-05-25 | Parallel Sampling of Diffusion Models | Andy Shih et.al. | 2305.16317v1 | link |
2023-05-25 | Cross-Lingual Knowledge Distillation for Answer Sentence Selection in Low-Resource Languages | Shivanshu Gupta et.al. | 2305.16302v1 | null |
2023-05-25 | Diversify Your Vision Datasets with Automatic Diffusion-Based Augmentation | Lisa Dunlap et.al. | 2305.16289v1 | link |
2023-05-25 | ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation | Zhengyi Wang et.al. | 2305.16213v1 | link |
2023-05-26 | Diversity-Aware Coherence Loss for Improving Neural Topic Models | Raymond Li et.al. | 2305.16199v2 | link |
2023-05-25 | Explainability Techniques for Chemical Language Models | Stefan Hödl et.al. | 2305.16192v1 | link |
2023-05-25 | Language Models Implement Simple Word2Vec-style Vector Arithmetic | Jack Merullo et.al. | 2305.16130v1 | link |
2023-05-25 | Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data | Takafumi Moriya et.al. | 2305.15971v1 | null |
2023-05-25 | Latent Diffusion Model Based Foley Sound Generation System For DCASE Challenge 2023 Task 7 | Yi Yuan et.al. | 2305.15905v1 | null |
2023-05-25 | On Architectural Compression of Text-to-Image Diffusion Models | Bo-Kyeong Kim et.al. | 2305.15798v1 | null |
2023-05-24 | What can generic neural networks learn from a child's visual experience? | A. Emin Orhan et.al. | 2305.15372v1 | null |
2023-05-24 | Solving Diffusion ODEs with Optimal Boundary Conditions for Better Image Super-Resolution | Yiyang Ma et.al. | 2305.15357v1 | null |
2023-05-24 | Visual Programming for Text-to-Image Generation and Evaluation | Jaemin Cho et.al. | 2305.15328v1 | null |
2023-05-24 | Self-Evolution Learning for Discriminative Language Model Pretraining | Qihuang Zhong et.al. | 2305.15275v1 | null |
2023-05-24 | Revisiting Token Dropping Strategy in Efficient BERT Pretraining | Qihuang Zhong et.al. | 2305.15273v1 | null |
2023-05-24 | ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers | Jingfeng Yao et.al. | 2305.15272v1 | link |
2023-05-24 | Rethinking the Evaluation Protocol of Domain Generalization | Han Yu et.al. | 2305.15253v1 | null |
2023-05-24 | L-CAD: Language-based Colorization with Any-level Descriptions | Zheng Chang et.al. | 2305.15217v1 | null |
2023-05-24 | Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator | Ziwei He et.al. | 2305.15099v1 | null |
2023-05-24 | Dynamic Masking Rate Schedules for MLM Pretraining | Zachary Ankner et.al. | 2305.15096v1 | null |
2023-05-23 | Video Prediction Models as Rewards for Reinforcement Learning | Alejandro Escontrela et.al. | 2305.14343v1 | null |
2023-05-23 | ConGraT: Self-Supervised Contrastive Pretraining for Joint Graph and Text Embeddings | William Brannon et.al. | 2305.14321v1 | link |
2023-05-23 | QLoRA: Efficient Finetuning of Quantized LLMs | Tim Dettmers et.al. | 2305.14314v1 | link |
2023-05-23 | Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining | Emanuele Bugliarello et.al. | 2305.14281v1 | null |
2023-05-23 | Masked Path Modeling for Vision-and-Language Navigation | Zi-Yi Dou et.al. | 2305.14268v1 | null |
2023-05-24 | DUBLIN -- Document Understanding By Language-Image Network | Kriti Aggarwal et.al. | 2305.14218v2 | null |
2023-05-23 | Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks | Tiedong Liu et.al. | 2305.14201v1 | null |
2023-05-23 | Accessing Higher Dimensions for Unsupervised Word Translation | Sida I. Wang et.al. | 2305.14200v1 | null |
2023-05-23 | Evaluating Factual Consistency of Summaries with Large Language Models | Shiqi Chen et.al. | 2305.14069v1 | link |
2023-05-23 | Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification | Sangmin Bae et.al. | 2305.14032v1 | link |
2023-05-22 | Language-Agnostic Bias Detection in Language Models | Abdullatif Köksal et.al. | 2305.13302v1 | null |
2023-05-22 | U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech | Xin Jing et.al. | 2305.13195v1 | null |
2023-05-22 | A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity | Shayne Longpre et.al. | 2305.13169v1 | null |
2023-05-22 | LMGQS: A Large-scale Dataset for Query-focused Summarization | Ruochen Xu et.al. | 2305.13086v1 | null |
2023-05-22 | Textually Pretrained Speech Language Models | Michael Hassid et.al. | 2305.13009v1 | null |
2023-05-22 | Rethinking Semi-supervised Learning with Language Models | Zhengxiang Shi et.al. | 2305.13002v1 | link |
2023-05-22 | Text-based Person Search without Parallel Image-Text Data | Yang Bai et.al. | 2305.12964v1 | null |
2023-05-22 | Farewell to Aimless Large-scale Pretraining: Influential Subset Selection for Language Model | Xiao Wang et.al. | 2305.12816v1 | null |
2023-05-22 | In-Context Learning of Large Language Models Explained as Kernel Regression | Chi Han et.al. | 2305.12766v1 | null |
2023-05-22 | LEAN: Light and Efficient Audio Classification Network | Shwetank Choudhary et.al. | 2305.12712v1 | null |
2023-05-19 | Neural Foundations of Mental Simulation: Future Prediction of Latent Representations on Dynamic Scenes | Aran Nayebi et.al. | 2305.11772v1 | null |
2023-05-19 | Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition | Siyuan Feng et.al. | 2305.11569v1 | null |
2023-05-19 | JOINEDTrans: Prior Guided Multi-task Transformer for Joint Optic Disc/Cup Segmentation and Fovea Detection | Huaqing He et.al. | 2305.11504v1 | null |
2023-05-19 | TreePrompt: Learning to Compose Tree Prompts for Explainable Visual Grounding | Chenchi Zhang et.al. | 2305.11497v1 | null |
2023-05-19 | ReDirTrans: Latent-to-Latent Translation for Gaze and Head Redirection | Shiwei Jin et.al. | 2305.11452v1 | null |
2023-05-18 | CHBias: Bias Evaluation and Mitigation of Chinese Conversational Language Models | Jiaxu Zhao et.al. | 2305.11262v1 | null |
2023-05-18 | Comparing Biases and the Impact of Multilingual Training across Multiple Languages | Sharon Levy et.al. | 2305.11242v1 | null |
2023-05-18 | LIMA: Less Is More for Alignment | Chunting Zhou et.al. | 2305.11206v1 | null |
2023-05-18 | ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities | Peng Wang et.al. | 2305.11172v1 | link |
2023-05-18 | Exploring the Carbon Footprint of Hugging Face's ML Models: A Repository Mining Study | Joel Castaño et.al. | 2305.11164v1 | null |
2023-05-18 | UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild | Can Qin et.al. | 2305.11147v1 | null |
2023-05-18 | mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences | David Uthus et.al. | 2305.11129v1 | null |
2023-05-18 | Generalized Planning in PDDL Domains with Pretrained Large Language Models | Tom Silver et.al. | 2305.11014v1 | link |
2023-05-18 | The Web Can Be Your Oyster for Improving Large Language Models | Junyi Li et.al. | 2305.10998v1 | null |
2023-05-18 | How does the task complexity of masked pretraining objectives affect downstream performance? | Atsuki Yamaguchi et.al. | 2305.10992v1 | link |
2023-05-18 | FLIGHT Mode On: A Feather-Light Network for Low-Light Image Enhancement | Mustafa Ozcan et.al. | 2305.10889v1 | null |
2023-05-18 | VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation | Wenjing Wang et.al. | 2305.10874v1 | null |
2023-05-18 | Semantically Aligned Task Decomposition in Multi-Agent Reinforcement Learning | Wenhao Li et.al. | 2305.10865v1 | null |
2023-05-17 | DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining | Sang Michael Xie et.al. | 2305.10429v1 | null |
2023-05-17 | What You See is What You Read? Improving Text-Image Alignment Evaluation | Michal Yarom et.al. | 2305.10400v1 | link |
2023-05-17 | OpenSLU: A Unified, Modularized, and Extensible Toolkit for Spoken Language Understanding | Libo Qin et.al. | 2305.10231v1 | link |
2023-05-17 | Stop Uploading Test Data in Plain Text: Practical Strategies for Mitigating Data Contamination by Evaluation Benchmarks | Alon Jacovi et.al. | 2305.10160v1 | null |
2023-05-17 | Selective Amnesia: A Continual Learning Approach to Forgetting in Deep Generative Models | Alvin Heng et.al. | 2305.10120v1 | null |
2023-05-17 | CWD30: A Comprehensive and Holistic Dataset for Crop Weed Recognition in Precision Agriculture | Talha Ilyas et.al. | 2305.10084v1 | null |
2023-05-17 | Dynamic Structural Brain Network Construction by Hierarchical Prototype Embedding GCN using T1-MRI | Yilin Leng et.al. | 2305.10077v1 | null |
2023-05-17 | Equivariant Few-Shot Learning from Pretrained Models | Sourya Basu et.al. | 2305.09900v1 | null |
2023-05-16 | The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation | Mutian He et.al. | 2305.09652v1 | null |
2023-05-16 | Concurrent Misclassification and Out-of-Distribution Detection for Semantic Segmentation via Energy-Based Normalizing Flow | Denis Gudovskiy et.al. | 2305.09610v1 | link |
2023-05-16 | An Empirical Study on Google Research Football Multi-agent Scenarios | Yan Song et.al. | 2305.09458v1 | link |
2023-05-16 | Consistent Multi-Granular Rationale Extraction for Explainable Multi-hop Fact Verification | Jiasheng Si et.al. | 2305.09400v1 | null |
2023-05-16 | Deep Ensembling for Perceptual Image Quality Assessment | Nisar Ahmed et.al. | 2305.09141v1 | null |
2023-05-15 | Self-Supervised Pretraining on Paired Sequences of fMRI Data for Transfer Learning to Brain Decoding Tasks | Sean Paulsen et.al. | 2305.09057v1 | null |
2023-05-15 | CLIP-VG: Self-paced Curriculum Adapting of CLIP via Exploiting Pseudo-Language Labels for Visual Grounding | Linhui Xiao et.al. | 2305.08685v1 | null |
2023-05-15 | DarkBERT: A Language Model for the Dark Side of the Internet | Youngjin Jin et.al. | 2305.08596v1 | null |
2023-05-15 | What's the Meaning of Superhuman Performance in Today's NLU? | Simone Tedeschi et.al. | 2305.08414v1 | null |
2023-05-15 | TESS: Text-to-Text Self-Conditioned Simplex Diffusion | Rabeeh Karimi Mahabadi et.al. | 2305.08379v1 | null |
2023-05-15 | "Nothing Abnormal": Disambiguating Medical Reports via Contrastive Knowledge Infusion | Zexue He et.al. | 2305.08300v1 | null |
2023-05-15 | From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models | Shangbin Feng et.al. | 2305.08283v1 | null |
2023-05-14 | FactKB: Generalizable Factuality Evaluation using Language Models Enhanced with Factual Knowledge | Shangbin Feng et.al. | 2305.08281v1 | null |
2023-05-14 | MatSci-NLP: Evaluating Scientific Language Models on Materials Science Language Tasks Using Text-to-Schema Modeling | Yu Song et.al. | 2305.08264v1 | link |
2023-05-14 | Evaluating the roughness of structure-property relationships using pretrained molecular representations | David E. Graff et.al. | 2305.08238v1 | null |
2023-05-14 | DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement | Hendrik Schröter et.al. | 2305.08227v1 | null |
2023-05-12 | Measuring Progress in Fine-grained Vision-and-Language Understanding | Emanuele Bugliarello et.al. | 2305.07558v1 | link |
2023-05-12 | Comprehensive Solution Program Centric Pretraining for Table-and-Text Hybrid Numerical Reasoning | Qianying Liu et.al. | 2305.07475v1 | null |
2023-05-12 | CLIP-Count: Towards Text-Guided Zero-Shot Object Counting | Ruixiang Jiang et.al. | 2305.07304v1 | link |
2023-05-11 | Simple Token-Level Confidence Improves Caption Correctness | Suzanne Petryk et.al. | 2305.07021v1 | null |
2023-05-11 | A General-Purpose Multilingual Document Encoder | Onur Galoğlu et.al. | 2305.07016v1 | link |
2023-05-11 | Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers | Dahun Kim et.al. | 2305.07011v1 | null |
2023-05-11 | Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks | Eshaan Nichani et.al. | 2305.06986v1 | null |
2023-05-11 | IUST_NLP at SemEval-2023 Task 10: Explainable Detecting Sexism with Transformers and Task-adaptive Pretraining | Hadiseh Mahmoudi et.al. | 2305.06892v1 | null |
2023-05-11 | Extending Audio Masked Autoencoders Toward Audio Restoration | Zhi Zhong et.al. | 2305.06701v1 | null |
2023-05-11 | WeditGAN: Few-shot Image Generation via Latent Space Relocation | Yuxuan Duan et.al. | 2305.06671v1 | null |
2023-05-11 | A First Look at LLM-Powered Generative News Recommendation | Qijiong Liu et.al. | 2305.06566v1 | link |
2023-05-11 | Undercover Deepfakes: Detecting Fake Segments in Videos | Sanjay Saha et.al. | 2305.06564v1 | link |
2023-05-11 | How Good are Commercial Large Language Models on African Languages? | Jessica Ojo et.al. | 2305.06530v1 | null |
2023-05-10 | Incorporating Structured Representations into Pretrained Vision & Language Models Using Scene Graphs | Roei Herzig et.al. | 2305.06343v1 | null |
2023-05-10 | XTab: Cross-table Pretraining for Tabular Transformers | Bingzhao Zhu et.al. | 2305.06090v1 | link |
2023-05-10 | A Survey of Deep Code Search | Yutao Xie et.al. | 2305.05959v1 | null |
2023-05-10 | Mover: Mask and Recovery based Facial Part Consistency Aware Method for Deepfake Video Detection | Juan Hu et.al. | 2305.05943v1 | null |
2023-05-10 | SHS-Net: Learning Signed Hyper Surfaces for Oriented Normal Estimation of Point Clouds | Qing Li et.al. | 2305.05873v1 | link |
2023-05-10 | Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? An Examination on Several Typical Tasks | Xianzhi Li et.al. | 2305.05862v1 | null |
2023-05-10 | Vārta: A Large-Scale Headline-Generation Dataset for Indic Languages | Rahul Aralikatte et.al. | 2305.05858v1 | link |
2023-05-09 | Region-based Contrastive Pretraining for Medical Image Retrieval with Anatomic Query | Ho Hin Lee et.al. | 2305.05598v1 | null |
2023-05-09 | Recursions Are All You Need: Towards Efficient Deep Unfolding Networks | Rawwad Alhejaili et.al. | 2305.05505v1 | link |
2023-05-09 | BadCS: A Backdoor Attack Framework for Code search | Shiyi Qi et.al. | 2305.05503v1 | null |
2023-05-09 | Exploiting Pseudo Image Captions for Multimodal Summarization | Chaoya Jiang et.al. | 2305.05496v1 | link |
2023-05-09 | What is the best recipe for character-level encoder-only modelling? | Kris Cao et.al. | 2305.05461v1 | null |
2023-05-09 | MSVQ: Self-Supervised Learning with Multiple Sample Views and Queues | Chen Peng et.al. | 2305.05370v1 | link |
2023-05-09 | A Framework for Designing Foundation Model based Systems | Qinghua Lu et.al. | 2305.05352v1 | null |
2023-05-09 | Application of Artificial Intelligence in the Classification of Microscopical Starch Images for Drug Formulation | Marvellous Ajala et.al. | 2305.05321v1 | null |
2023-05-09 | Robust Acoustic and Semantic Contextual Biasing in Neural Transducers for Speech Recognition | Xuandi Fu et.al. | 2305.05271v1 | null |
2023-05-09 | Boosting Visual-Language Models by Exploiting Hard Samples | Haonan Wang et.al. | 2305.05208v1 | null |
2023-05-08 | Toeplitz Neural Network for Sequence Modeling | Zhen Qin et.al. | 2305.04749v1 | link |
2023-05-08 | Enhancing Knowledge Graph Construction Using Large Language Models | Milena Trajanoska et.al. | 2305.04676v1 | null |
2023-05-08 | MultiTACRED: A Multilingual Version of the TAC Relation Extraction Dataset | Leonhard Hennig et.al. | 2305.04582v1 | link |
2023-05-08 | A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues | Yunxin Li et.al. | 2305.04530v1 | link |
2023-05-08 | SNT: Sharpness-Minimizing Network Transformation for Fast Compression-friendly Pretraining | Jung Hwan Heo et.al. | 2305.04526v1 | null |
2023-05-08 | Retriever and Ranker Framework with Probabilistic Hard Negative Sampling for Code Search | Hande Dong et.al. | 2305.04508v1 | null |
2023-05-08 | Token-level Fitting Issues of Seq2seq Models | Guangsheng Bao et.al. | 2305.04493v1 | null |
2023-05-09 | Vision Langauge Pre-training by Contrastive Learning with Cross-Modal Similarity Regulation | Chaoya Jiang et.al. | 2305.04474v2 | null |
2023-05-08 | Vision Transformer Off-the-Shelf: A Surprising Baseline for Few-Shot Class-Agnostic Counting | Zhicheng Wang et.al. | 2305.04440v1 | null |
2023-05-08 | Breaking Through the Haze: An Advanced Non-Homogeneous Dehazing Method based on Fast Fourier Convolution and ConvNeXt | Han Zhou et.al. | 2305.04430v1 | link |
2023-05-05 | Otter: A Multi-Modal Model with In-Context Instruction Tuning | Bo Li et.al. | 2305.03726v1 | null |
2023-05-05 | COLA: How to adapt vision-language models to Compose Objects Localized with Attributes? | Arijit Ray et.al. | 2305.03689v1 | null |
2023-05-05 | Retrieval Augmented Chest X-Ray Report Generation using OpenAI GPT models | Mercy Ranjit et.al. | 2305.03660v1 | null |
2023-05-05 | Data Curation for Image Captioning with Text-to-Image Generative Models | Wenyan Li et.al. | 2305.03610v1 | null |
2023-05-05 | DisenBooth: Disentangled Parameter-Efficient Tuning for Subject-Driven Text-to-Image Generation | Hong Chen et.al. | 2305.03374v1 | null |
2023-05-05 | HiPool: Modeling Long Documents Using Graph Neural Networks | Irene Li et.al. | 2305.03319v1 | link |
2023-05-04 | Chain-of-Skills: A Configurable Model for Open-domain Question Answering | Kaixin Ma et.al. | 2305.03130v1 | null |
2023-05-04 | Adversarially-Guided Portrait Matting | Sergej Chicherin et.al. | 2305.02981v1 | link |
2023-05-04 | End-to-end spoken language understanding using joint CTC loss and self-supervised, pretrained acoustic encoders | Jixuan Wang et.al. | 2305.02937v1 | null |
2023-05-04 | Forward-Forward Contrastive Learning | Md. Atik Ahamed et.al. | 2305.02927v1 | null |
2023-05-04 | DN at SemEval-2023 Task 12: Low-Resource Language Text Classification via Multilingual Pretrained Language Model Fine-tuning | Daniil Homskiy et.al. | 2305.02607v1 | null |
2023-05-04 | How to Choose Pretrained Handwriting Recognition Models for Single Writer Fine-Tuning | Vittorio Pippi et.al. | 2305.02593v1 | null |
2023-05-03 | Learning to Detect Novel and Fine-Grained Acoustic Sequences Using Pretrained Audio Representations | Vasudha Kowtha et.al. | 2305.02382v1 | null |
2023-05-03 | PeaCoK: Persona Commonsense Knowledge for Consistent and Engaging Narratives | Silin Gao et.al. | 2305.02364v1 | link |
2023-05-03 | Entity Tracking in Language Models | Najoung Kim et.al. | 2305.02363v1 | null |
2023-05-03 | Real-Time Radiance Fields for Single-Image Portrait View Synthesis | Alex Trevithick et.al. | 2305.02310v1 | null |
2023-05-05 | A Neural Divide-and-Conquer Reasoning Framework for Image Retrieval from Linguistically Complex Text | Yunxin Li et.al. | 2305.02265v2 | link |
2023-05-03 | Explaining Language Models' Predictions with High-Impact Concepts | Ruochen Zhao et.al. | 2305.02160v1 | null |
2023-05-02 | KEPLET: Knowledge-Enhanced Pretrained Language Model with Topic Entity Awareness | Yichuan Li et.al. | 2305.01810v1 | null |
2023-05-02 | Don't Stop Pretraining? Make Prompt-based Fine-tuning Powerful Learner | Zhengxiang Shi et.al. | 2305.01711v1 | link |
2023-05-02 | SIA-FTP: A Spoken Instruction Aware Flight Trajectory Prediction Framework | Dongyue Guo et.al. | 2305.01661v1 | null |
2023-05-02 | Unlimiformer: Long-Range Transformers with Unlimited Length Input | Amanda Bertsch et.al. | 2305.01625v1 | link |
2023-05-02 | A Study on the Integration of Pipeline and E2E SLU systems for Spoken Semantic Parsing toward STOP Quality Challenge | Siddhant Arora et.al. | 2305.01620v1 | null |
2023-05-02 | RadAdapt: Radiology Report Summarization via Lightweight Domain Adaptation of Large Language Models | Dave Van Veen et.al. | 2305.01146v1 | null |
2023-05-01 | Interpreting Pretrained Source-code Models using Neuron Redundancy Analyses | Arushi Sharma et.al. | 2305.00875v1 | null |
2023-04-30 | Transfer of knowledge among instruments in automatic music transcription | Michał Leś et.al. | 2305.00426v1 | null |
2023-04-30 | Cross-Shaped Windows Transformer with Self-supervised Pretraining for Clinically Significant Prostate Cancer Detection in Bi-parametric MRI | Yuheng Li et.al. | 2305.00385v1 | null |
2023-04-29 | LD-GAN: Low-Dimensional Generative Adversarial Network for Spectral Image Generation with Variance Regularization | Emmanuel Martinez et.al. | 2305.00132v1 | link |
2023-04-28 | Towards Better Domain Adaptation for Self-supervised Models: A Case Study of Child ASR | Ruchao Fan et.al. | 2305.00115v1 | null |
2023-04-28 | NLNDE at SemEval-2023 Task 12: Adaptive Pretraining and Source Language Selection for Low-Resource Multilingual Sentiment Analysis | Mingyang Wang et.al. | 2305.00090v1 | null |
2023-04-28 | Unsupervised Discovery of 3D Hierarchical Structure with Generative Diffusion Features | Nurislam Tursynbek et.al. | 2305.00067v1 | null |
2023-04-28 | CCpdf: Building a High Quality Corpus for Visually Rich Documents from Web Crawl Data | Michał Turski et.al. | 2304.14953v1 | link |
2023-04-28 | Made of Steel? Learning Plausible Materials for Components in the Vehicle Repair Domain | Annerose Eichel et.al. | 2304.14745v1 | link |
2023-04-28 | DIAMANT: Dual Image-Attention Map Encoders For Medical Image Segmentation | Yousef Yeganeh et.al. | 2304.14571v1 | null |
2023-04-27 | Greybox Penetration Testing on Cloud Access Control with IAM Modeling and Deep Reinforcement Learning | Yang Hu et.al. | 2304.14540v1 | null |
2023-04-27 | Gradient-based Maximally Interfered Retrieval for Domain Incremental 3D Object Detection | Barza Nisar et.al. | 2304.14460v1 | link |
2023-04-27 | We're Afraid Language Models Aren't Modeling Ambiguity | Alisa Liu et.al. | 2304.14399v1 | link |
2023-04-27 | UIO at SemEval-2023 Task 12: Multilingual fine-tuning for sentiment classification in low-resource languages | Egil Rønningstad et.al. | 2304.14189v1 | null |
2023-04-27 | Lightweight, Pre-trained Transformers for Remote Sensing Timeseries | Gabriel Tseng et.al. | 2304.14065v1 | link |
2023-04-27 | Retrieval-based Knowledge Augmented Vision Language Pre-training | Jiahua Rao et.al. | 2304.13923v1 | null |
2023-04-27 | Neural Keyphrase Generation: Analysis and Evaluation | Tuhin Kundu et.al. | 2304.13883v1 | null |
2023-04-26 | highway2vec -- representing OpenStreetMap microregions with respect to their road network characteristics | Kacper Leśniara et.al. | 2304.13865v1 | link |
2023-04-26 | A Deep Learning Framework for Verilog Autocompletion Towards Design and Verification Automation | Enrique Dehaerne et.al. | 2304.13840v1 | null |
2023-04-26 | Programmatically Grounded, Compositionally Generalizable Robotic Manipulation | Renhao Wang et.al. | 2304.13826v1 | null |
2023-04-26 | Translate to Disambiguate: Zero-shot Multilingual Word Sense Disambiguation with Pretrained Language Models | Haoqiang Kang et.al. | 2304.13803v1 | null |
2023-04-26 | Domain Adaptive and Generalizable Network Architectures and Training Strategies for Semantic Image Segmentation | Lukas Hoyer et.al. | 2304.13615v1 | link |
2023-04-26 | Tissue Classification During Needle Insertion Using Self-Supervised Contrastive Learning and Optical Coherence Tomography | Debayan Bhattacharya et.al. | 2304.13574v1 | null |
2023-04-26 | Self-Supervised Multi-Modal Sequential Recommendation | Kunzhe Song et.al. | 2304.13277v1 | null |
2023-04-25 | Towards Compute-Optimal Transfer Learning | Massimo Caccia et.al. | 2304.13164v1 | null |
2023-04-25 | Hypernymization of named entity-rich captions for grounding-based multi-modal pretraining | Giacomo Nebbia et.al. | 2304.13130v1 | null |
2023-04-25 | Pretrain on just structure: Understanding linguistic inductive biases using transfer learning | Isabel Papadimitriou et.al. | 2304.13060v1 | null |
2023-04-25 | On the Generalization of Learned Structured Representations | Andrea Dittadi et.al. | 2304.13001v1 | null |
2023-04-25 | CitePrompt: Using Prompts to Identify Citation Intent in Scientific Papers | Avishek Lahiri et.al. | 2304.12730v1 | link |
2023-04-26 | Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation | Junde Wu et.al. | 2304.12620v2 | null |
2023-04-26 | OFAR: A Multimodal Evidence Retrieval Framework for Illegal Live-streaming Identification | Lin Dengtian et.al. | 2304.12608v2 | null |
2023-04-25 | Model Conversion via Differentially Private Data-Free Distillation | Bochao Liu et.al. | 2304.12528v1 | null |
2023-04-25 | Hint-Aug: Drawing Hints from Foundation Vision Transformers Towards Boosted Few-Shot Parameter-Efficient Tuning | Zhongzhi Yu et.al. | 2304.12520v1 | null |
2023-04-25 | RenderDiffusion: Text Generation as Image Generation | Junyi Li et.al. | 2304.12519v1 | null |
2023-04-24 | PEFT-Ref: A Modular Reference Architecture and Typology for Parameter-Efficient Finetuning Techniques | Mohammed Sabry et.al. | 2304.12410v1 | null |
2023-04-24 | Generative Discovery of Novel Chemical Designs using Diffusion Modeling and Transformer Deep Neural Networks with Application to Deep Eutectic Solvents | Rachel K. Luu et.al. | 2304.12400v1 | null |
2023-04-24 | Uni-QSAR: an Auto-ML Tool for Molecular Property Prediction | Zhifeng Gao et.al. | 2304.12239v1 | null |
2023-04-24 | Deep Audio-Visual Singing Voice Transcription based on Self-Supervised Learning Models | Xiangming Gu et.al. | 2304.12082v1 | null |
2023-04-24 | Robust Tickets Can Transfer Better: Drawing More Transferable Subnetworks in Transfer Learning | Yonggan Fu et.al. | 2304.11834v1 | null |
2023-04-22 | Incomplete Multimodal Learning for Remote Sensing Data Fusion | Yuxing Chen et.al. | 2304.11381v1 | null |
2023-04-22 | Single-stage Multi-human Parsing via Point Sets and Center-based Offsets | Jiaming Chu et.al. | 2304.11356v1 | null |
2023-04-22 | Self-supervised Learning by View Synthesis | Shaoteng Liu et.al. | 2304.11330v1 | null |
2023-04-22 | EEE, Remediating the failure of machine learning models via a network-based optimization patch | Ruiyuan Kang et.al. | 2304.11321v1 | null |
2023-04-21 | Factored Neural Representation for Scene Understanding | Yu-Shiang Wong et.al. | 2304.10950v1 | null |
2023-04-24 | Text2Time: Transformer-based Article Time Period Prediction | Karthick Prasad Gunasekaran et.al. | 2304.10859v2 | null |
2023-04-21 | Rethinking Benchmarks for Cross-modal Image-text Retrieval | Weijing Chen et.al. | 2304.10824v1 | link |
2023-04-21 | Deep Multiview Clustering by Contrasting Cluster Assignments | Jie Chen et.al. | 2304.10769v1 | link |
2023-04-20 | MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models | Deyao Zhu et.al. | 2304.10592v1 | link |
2023-04-20 | Implicit Temporal Modeling with Learnable Alignment for Video Recognition | Shuyuan Tu et.al. | 2304.10465v1 | link |
2023-04-20 | Domain-specific Continued Pretraining of Language Models for Capturing Long Context in Mental Health | Shaoxiong Ji et.al. | 2304.10447v1 | null |
2023-04-20 | Movie Box Office Prediction With Self-Supervised and Visually Grounded Pretraining | Qin Chao et.al. | 2304.10311v1 | null |
2023-04-20 | OptoGPT: A Foundation Model for Inverse Design in Optical Multilayer Thin Film Structures | Taigao Ma et.al. | 2304.10294v1 | null |
2023-04-20 | PREIM3D: 3D Consistent Precise Image Attribute Editing from a Single Image | Jianhui Li et.al. | 2304.10263v1 | null |
2023-04-20 | Does Manipulating Tokenization Aid Cross-Lingual Transfer? A Study on POS Tagging for Non-Standardized Languages | Verena Blaschke et.al. | 2304.10158v1 | link |
2023-04-19 | DCN-T: Dual Context Network with Transformer for Hyperspectral Image Classification | Di Wang et.al. | 2304.09915v1 | link |
2023-04-19 | Domain Adaptable Self-supervised Representation Learning on Remote Sensing Satellite Imagery | Muskaan Chopra et.al. | 2304.09874v1 | link |
2023-04-19 | NetGPT: Generative Pretrained Transformer for Network Traffic | Xuying Meng et.al. | 2304.09513v1 | null |
2023-04-20 | Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes | Simran Arora et.al. | 2304.09433v2 | link |
2023-04-18 | UniMax: Fairer and more Effective Language Sampling for Large-Scale Multilingual Pretraining | Hyung Won Chung et.al. | 2304.09151v1 | null |
2023-04-18 | Decoding Neural Activity to Assess Individual Latent State in Ecologically Valid Contexts | Stephen M. Gordon et.al. | 2304.09050v1 | null |
2023-04-18 | Adapter Learning in Pretrained Feature Extractor for Continual Learning of Diseases | Wentao Zhang et.al. | 2304.09042v1 | null |
2023-04-18 | D2CSE: Difference-aware Deep continuous prompts for Contrastive Sentence Embeddings | Hyunjae Lee et.al. | 2304.08991v1 | null |
2023-04-18 | Deep Collective Knowledge Distillation | Jihyeon Seo et.al. | 2304.08878v1 | null |
2023-04-18 | Romanization-based Large-scale Adaptation of Multilingual Language Models | Sukannya Purkayastha et.al. | 2304.08865v1 | null |
2023-04-19 | Self-Supervised 3D Action Representation Learning with Skeleton Cloud Colorization | Siyuan Yang et.al. | 2304.08799v2 | null |
2023-04-18 | Sparks of GPTs in Edge Intelligence for Metaverse: Caching and Inference for Mobile AIGC Services | Minrui Xu et.al. | 2304.08782v1 | null |
2023-04-17 | Delving into Shape-aware Zero-shot Semantic Segmentation | Xinyu Liu et.al. | 2304.08491v1 | link |
2023-04-17 | BenchMD: A Benchmark for Modality-Agnostic Learning on Medical Images and Sensors | Kathryn Wantlin et.al. | 2304.08486v1 | link |
2023-04-18 | Latent-Shift: Latent Diffusion with Temporal Shift for Efficient Text-to-Video Generation | Jie An et.al. | 2304.08477v2 | null |
2023-04-18 | Inverse design of next-generation superconductors using data-driven deep generative models | Daniel Wines et.al. | 2304.08446v2 | null |
2023-04-17 | VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset | Sihan Chen et.al. | 2304.08345v1 | link |
2023-04-17 | Human Pose Estimation in Monocular Omnidirectional Top-View Images | Jingrui Yu et.al. | 2304.08186v1 | null |
2023-04-17 | DETRs Beat YOLOs on Real-time Object Detection | Wenyu Lv et.al. | 2304.08069v1 | link |
2023-04-17 | Self-Supervised Learning from Non-Object Centric Images with a Geometric Transformation Sensitive Architecture | Taeho Kim Jong-Min Lee et.al. | 2304.08014v1 | null |
2023-04-17 | Learning to "Segment Anything" in Thermal Infrared Images through Knowledge Distillation with a Large Scale Dataset SATIR | Junzhang Chen et.al. | 2304.07969v1 | link |
2023-04-16 | Sabiá: Portuguese Large Language Models | Ramon Pires et.al. | 2304.07880v1 | null |
2023-04-14 | DINOv2: Learning Robust Visual Features without Supervision | Maxime Oquab et.al. | 2304.07193v1 | link |
2023-04-14 | The Second Monocular Depth Estimation Challenge | Jaime Spencer et.al. | 2304.07051v1 | null |
2023-04-14 | MVP-SEG: Multi-View Prompt Learning for Open-Vocabulary Semantic Segmentation | Jie Guo et.al. | 2304.06957v1 | null |
2023-04-14 | Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved With Text | Wanrong Zhu et.al. | 2304.06939v1 | link |
2023-04-14 | 3D Feature Prediction for Masked-AutoEncoder-Based Point Cloud Pretraining | Siming Yan et.al. | 2304.06911v1 | null |
2023-04-14 | Generating Adversarial Examples with Better Transferability via Masking Unimportant Parameters of Surrogate Model | Dingcheng Yang et.al. | 2304.06908v1 | null |
2023-04-14 | Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding | Yu-Qi Yang et.al. | 2304.06906v1 | null |
2023-04-17 | A Contrastive Method Based on Elevation Data for Remote Sensing with Scarce and High Level Semantic Labels | Omar A. Castaño-Idarraga et.al. | 2304.06857v2 | null |
2023-04-13 | Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study | Boxin Wang et.al. | 2304.06762v1 | link |
2023-04-13 | Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction | Hansheng Chen et.al. | 2304.06714v1 | null |
2023-04-13 | Verbs in Action: Improving verb understanding in video-language models | Liliane Momeni et.al. | 2304.06708v1 | null |
2023-04-14 | G2T: A Simple but Effective Framework for Topic Modeling based on Pretrained Language Model and Community Detection | Leihang Zhang et.al. | 2304.06653v2 | null |
2023-04-13 | Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation | Mohit Sharma et.al. | 2304.06600v1 | null |
2023-04-12 | RECLIP: Resource-efficient CLIP by Training with Small Images | Runze Li et.al. | 2304.06028v1 | null |
2023-04-14 | DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion | Johanna Karras et.al. | 2304.06025v2 | null |
2023-04-12 | HaDR: Applying Domain Randomization for Generating Synthetic Multimodal Dataset for Hand Instance Segmentation in Cluttered Industrial Environments | Stefan Grushko et.al. | 2304.05826v1 | null |
2023-04-12 | Impact of Pseudo Depth on Open World Object Segmentation with Minimal User Guidance | Robin Schön et.al. | 2304.05716v1 | null |
2023-04-12 | Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning | Nikhil Singh et.al. | 2304.05600v1 | null |
2023-04-11 | A surprisingly simple technique to control the pretraining bias for better transfer: Expand or Narrow your representation | Florian Bordes et.al. | 2304.05369v1 | null |
2023-04-11 | A Billion-scale Foundation Model for Remote Sensing Images | Keumgang Cha et.al. | 2304.05215v1 | null |
2023-04-11 | MRVM-NeRF: Mask-Based Pretraining for Neural Radiance Fields | Ganlin Yang et.al. | 2304.04962v1 | null |
2023-04-11 | Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference | Tao Lei et.al. | 2304.04947v1 | null |
2023-04-10 | Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition | Shuhuai Ren et.al. | 2304.04704v1 | link |
2023-04-10 | Transfer Learning for Low-Resource Sentiment Analysis | Razhan Hameed et.al. | 2304.04703v1 | link |
2023-04-10 | Attention at SemEval-2023 Task 10: Explainable Detection of Online Sexism (EDOS) | Debashish Roy et.al. | 2304.04610v1 | link |
2023-04-10 | hist2RNA: An efficient deep learning architecture to predict gene expression from breast cancer histopathology images | Raktim Kumar Mondol et.al. | 2304.04507v1 | null |
2023-04-10 | Instance Neural Radiance Field | Benran Hu et.al. | 2304.04395v1 | null |
2023-04-10 | Leveraging Neural Representations for Audio Manipulation | Scott H. Hawley et.al. | 2304.04394v1 | null |
2023-04-10 | Towards Real-time Text-driven Image Manipulation with Unconditional Diffusion Models | Nikita Starodubcev et.al. | 2304.04344v1 | link |
2023-04-09 | Pretrained Embeddings for E-commerce Machine Learning: When it Fails and Why? | Da Xu et.al. | 2304.04330v1 | null |
2023-04-08 | Unsupervised Story Discovery from Continuous News Streams via Scalable Thematic Embedding | Susik Yoon et.al. | 2304.04099v1 | null |
2023-04-08 | WikiGoldSK: Annotated Dataset, Baselines and Few-Shot Learning Experiments for Slovak Named Entity Recognition | Dávid Šuba et.al. | 2304.04026v1 | link |
2023-04-07 | Zero-shot CT Field-of-view Completion with Unconditional Generative Diffusion Prior | Kaiwen Xu et.al. | 2304.03760v1 | null |
2023-04-10 | Anomalous Sound Detection using Audio Representation with Machine ID based Contrastive Learning Pretraining | Jian Guan et.al. | 2304.03588v2 | null |
2023-04-10 | Graph Attention for Automated Audio Captioning | Feiyang Xiao et.al. | 2304.03586v2 | link |
2023-04-07 | Language-aware Multiple Datasets Detection Pretraining for DETRs | Jing Hao et.al. | 2304.03580v1 | null |
2023-04-07 | Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4 | Hanmeng Liu et.al. | 2304.03439v1 | link |
2023-04-06 | RoSteALS: Robust Steganography using Autoencoder Latent Space | Tu Bui et.al. | 2304.03400v1 | link |
2023-04-06 | Self-Supervised Video Similarity Learning | Giorgos Kordopatis-Zilos et.al. | 2304.03378v1 | link |
2023-04-06 | Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting | Syed Talal Wasim et.al. | 2304.03307v1 | link |
2023-04-06 | Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention | Mingyu Ding et.al. | 2304.03282v1 | link |
2023-04-06 | When do you need Chain-of-Thought Prompting for ChatGPT? | Jiuhai Chen et.al. | 2304.03262v1 | null |
2023-04-06 | Zero-Shot Next-Item Recommendation using Large Pretrained Language Models | Lei Wang et.al. | 2304.03153v1 | null |
2023-04-07 | Geometric-aware Pretraining for Vision-centric 3D Object Detection | Linyan Huang et.al. | 2304.03105v2 | link |
2023-04-06 | Convolutional neural networks for crack detection on flexible road pavements | Hermann Tapamo et.al. | 2304.02933v1 | null |
2023-04-06 | Mask Detection and Classification in Thermal Face Images | Natalia Kowalczyk et.al. | 2304.02931v1 | link |
2023-04-06 | Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-commerce | Yang Jin et.al. | 2304.02853v1 | null |
2023-04-06 | Longitudinal Multimodal Transformer Integrating Imaging and Latent Clinical Signatures From Routine EHRs for Pulmonary Nodule Classification | Thomas Z. Li et.al. | 2304.02836v1 | null |
2023-04-05 | Bengali Fake Review Detection using Semi-supervised Generative Adversarial Networks | Md. Tanvir Rouf Shawon et.al. | 2304.02739v1 | null |
2023-04-05 | Exploring the Utility of Self-Supervised Pretraining Strategies for the Detection of Absent Lung Sliding in M-Mode Lung Ultrasound | Blake VanBerlo et.al. | 2304.02724v1 | null |
2023-04-05 | VicTR: Video-conditioned Text Representations for Activity Recognition | Kumara Kahatapitiya et.al. | 2304.02560v1 | null |
2023-04-05 | Deep Perceptual Similarity is Adaptable to Ambiguous Contexts | Gustav Grund Pihlgren et.al. | 2304.02265v1 | null |
2023-04-05 | Towards Efficient Task-Driven Model Reprogramming with Foundation Models | Shoukai Xu et.al. | 2304.02263v1 | null |
2023-04-04 | Pac-HuBERT: Self-Supervised Music Source Separation via Primitive Auditory Clustering and Hidden-Unit BERT | Ke Chen et.al. | 2304.02160v1 | null |
2023-04-04 | Optimal operating MR contrast for brain ventricle parcellation | Savannah P. Hays et.al. | 2304.02056v1 | null |
2023-04-04 | Online augmentation of learned grasp sequence policies for more adaptable and data-efficient in-hand manipulation | Ethan K. Gordon et.al. | 2304.02052v1 | null |
2023-04-04 | AToMiC: An Image/Text Retrieval Test Collection to Support Multimedia Content Creation | Jheng-Hong Yang et.al. | 2304.01961v1 | link |
2023-04-04 | Unsupervised Improvement of Factual Knowledge in Language Models | Nafis Sadeq et.al. | 2304.01597v1 | link |
2023-04-03 | Creating Custom Event Data Without Dictionaries: A Bag-of-Tricks | Andrew Halterman et.al. | 2304.01331v1 | link |
2023-04-03 | Burstormer: Burst Image Restoration and Enhancement Transformer | Akshay Dudhane et.al. | 2304.01194v1 | link |
2023-04-03 | ScandEval: A Benchmark for Scandinavian Natural Language Processing | Dan Saattrup Nielsen et.al. | 2304.00906v1 | link |
2023-04-03 | GreekBART: The First Pretrained Greek Sequence-to-Sequence Model | Iakovos Evdaimon et.al. | 2304.00869v1 | null |
2023-04-03 | Few-shot Fine-tuning is All You Need for Source-free Domain Adaptation | Suho Lee et.al. | 2304.00792v1 | link |
2023-04-03 | Multi-Modal Representation Learning with Text-Driven Soft Masks | Jaeyoo Park et.al. | 2304.00719v1 | null |
2023-04-03 | A Post-Training Framework for Improving Heterogeneous Graph Neural Networks | Cheng Yang et.al. | 2304.00698v1 | null |
2023-04-02 | PK-Chat: Pointer Network Guided Knowledge Driven Generative Dialogue Model | Cheng Deng et.al. | 2304.00592v1 | link |
2023-04-02 | DropMAE: Masked Autoencoders with Spatial-Attention Dropout for Tracking Tasks | Qiangqiang Wu et.al. | 2304.00571v1 | link |
2023-04-02 | Video Pretraining Advances 3D Deep Learning on Chest CT Tasks | Alexander Ke et.al. | 2304.00546v1 | link |
2023-04-02 | Instance-level Trojan Attacks on Visual Question Answering via Adversarial Learning in Neuron Activation Space | Yuwei Sun et.al. | 2304.00436v1 | null |
2023-03-31 | Procedure-Aware Pretraining for Instructional Video Understanding | Honglu Zhou et.al. | 2303.18230v1 | link |
2023-03-31 | Siamese DETR | Zeren Chen et.al. | 2303.18144v1 | null |
2023-03-31 | INoD: Injected Noise Discriminator for Self-Supervised Representation Learning in Agricultural Fields | Julia Hindel et.al. | 2303.18101v1 | null |
2023-03-31 | LaCViT: A Label-aware Contrastive Training Framework for Vision Transformers | Zijun Long et.al. | 2303.18013v1 | null |
2023-03-31 | Knowledge Distillation for Feature Extraction in Underwater VSLAM | Jinghe Yang et.al. | 2303.17981v1 | link |
2023-03-31 | Exploring the Limits of Deep Image Clustering using Pretrained Models | Nikolas Adaloglou et.al. | 2303.17896v1 | null |
2023-03-30 | Learning Garment DensePose for Robust Warping in Virtual Try-On | Aiyu Cui et.al. | 2303.17688v1 | null |
2023-03-30 | Whether and When does Endoscopy Domain Pretraining Make Sense? | Dominik Batić et.al. | 2303.17636v1 | null |
2023-03-30 | Anatomically aware dual-hop learning for pulmonary embolism detection in CT pulmonary angiograms | Florin Condrea et.al. | 2303.17593v1 | null |
2023-03-30 | DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder | Chenpng Du et.al. | 2303.17550v1 | null |
2023-03-30 | Finetuning from Offline Reinforcement Learning: Challenges, Trade-offs and Practical Solutions | Yicheng Luo et.al. | 2303.17396v1 | null |
2023-03-30 | A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision | Lucas Beyer et.al. | 2303.17376v1 | null |
2023-03-30 | PMatch: Paired Masked Image Modeling for Dense Geometric Matching | Shengjie Zhu et.al. | 2303.17342v1 | null |
2023-03-30 | Discriminative Class Tokens for Text-to-Image Diffusion Models | Idan Schwartz et.al. | 2303.17155v1 | null |
2023-03-29 | Transductive few-shot adapters for medical image segmentation | Julio Silva-Rodríguez et.al. | 2303.17051v1 | link |
2023-03-29 | AutoAD: Movie Description in Context | Tengda Han et.al. | 2303.16899v1 | link |
2023-03-29 | Towards Understanding the Effect of Pretraining Label Granularity | Guan Zhe Hong et.al. | 2303.16887v1 | null |
2023-03-28 | Training Language Models with Language Feedback at Scale | Jérémy Scheurer et.al. | 2303.16755v1 | null |
2023-03-29 | Visibility Aware Human-Object Interaction Tracking from Single RGB Camera | Xianghui Xie et.al. | 2303.16479v1 | null |
2023-03-28 | Variational Distribution Learning for Unsupervised Text-to-Image Generation | Minsoo Kang et.al. | 2303.16105v1 | null |
2023-03-28 | Soft-prompt tuning to predict lung cancer using primary care free-text Dutch medical notes | Auke Elfrink et.al. | 2303.15846v1 | null |
2023-03-28 | Instruct 3D-to-3D: Text Instruction Guided 3D-to-3D conversion | Hiromichi Kamata et.al. | 2303.15780v1 | null |
2023-03-28 | SVD-DIP: Overcoming the Overfitting Problem in DIP-based CT Reconstruction | Marco Nittscher et.al. | 2303.15748v1 | link |
2023-03-28 | Large-scale pretraining on pathological images for fine-tuning of small pathological benchmarks | Masataka Kawai et.al. | 2303.15693v1 | null |
2023-03-28 | Pre-training Transformers for Knowledge Graph Completion | Sanxing Chen et.al. | 2303.15682v1 | null |
2023-03-28 | StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing | Senmao Li et.al. | 2303.15649v1 | null |
2023-03-27 | Training-free Style Transfer Emerges from h-space in Diffusion models | Jaeseok Jeong et.al. | 2303.15403v1 | null |
2023-03-27 | Generalizable Neural Voxels for Fast Human Radiance Fields | Taoran Yi et.al. | 2303.15387v1 | null |
2023-03-27 | Improving Neural Topic Models with Wasserstein Knowledge Distillation | Suman Adhya et.al. | 2303.15350v1 | link |
2023-03-27 | Prompt-Guided Zero-Shot Anomaly Action Recognition using Pretrained Deep Skeleton Features | Fumiaki Sato et.al. | 2303.15167v1 | null |
2023-03-27 | Parameter Efficient Local Implicit Image Function Network for Face Segmentation | Mausoom Sarkar et.al. | 2303.15122v1 | null |
2023-03-27 | Adapting Pretrained Language Models for Solving Tabular Prediction Problems in the Electronic Health Record | Christopher McMaster et.al. | 2303.14920v1 | null |
2023-03-27 | Seer: Language Instructed Video Prediction with Latent Diffusion Models | Xianfan Gu et.al. | 2303.14897v1 | null |
2023-03-25 | Indian Language Summarization using Pretrained Sequence-to-Sequence Models | Ashok Urlana et.al. | 2303.14461v1 | null |
2023-03-25 | Sem4SAP: Synonymous Expression Mining From Open Knowledge Graph For Language Model Synonym-Aware Pretraining | Zhouhong Gu et.al. | 2303.14425v1 | null |
2023-03-25 | Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware Compression | Denis Kuznedelev et.al. | 2303.14409v1 | null |
2023-03-27 | Best of Both Worlds: Multimodal Contrastive Learning with Tabular and Imaging Data | Paul Hager et.al. | 2303.14080v2 | link |
2023-03-24 | Accelerating Vision-Language Pretraining with Free Language Modeling | Teng Wang et.al. | 2303.14038v1 | link |
2023-03-24 | SPEC: Summary Preference Decomposition for Low-Resource Abstractive Summarization | Yi-Syuan Chen et.al. | 2303.14011v1 | null |
2023-03-24 | Robust Test-Time Adaptation in Dynamic Scenarios | Longhui Yuan et.al. | 2303.13899v1 | link |
2023-03-23 | Three ways to improve feature alignment for open vocabulary detection | Relja Arandjelović et.al. | 2303.13518v1 | null |
2023-03-23 | Ablating Concepts in Text-to-Image Diffusion Models | Nupur Kumari et.al. | 2303.13516v1 | link |
2023-03-23 | A Closer Look at Model Adaptation using Feature Distortion and Simplicity Bias | Puja Trivedi et.al. | 2303.13500v1 | null |
2023-03-23 | The effectiveness of MAE pre-pretraining for billion-scale pretraining | Mannat Singh et.al. | 2303.13496v1 | null |
2023-03-23 | Increasing Textual Context Size Boosts Medical Image-Text Matching | Idan Glassberg et.al. | 2303.13340v1 | null |
2023-03-23 | Parameter-Efficient Sparse Retrievers and Rerankers using Adapters | Vaishali Pal et.al. | 2303.13220v1 | link |
2023-03-23 | Retrieval-Augmented Classification with Decoupled Representation | Xinnian Liang et.al. | 2303.13065v1 | link |
2023-03-23 | gDoc: Automatic Generation of Structured API Documentation | Shujun Wang et.al. | 2303.13041v1 | null |
2023-03-23 | MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models | Dohwan Ko et.al. | 2303.13009v1 | link |
2023-03-22 | JaCoText: A Pretrained Model for Java Code-Text Generation | Jessica López Espejel et.al. | 2303.12869v1 | null |
2023-03-21 | Affordance Diffusion: Synthesizing Hand-Object Interactions | Yufei Ye et.al. | 2303.12538v1 | null |
2023-03-21 | Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding | Morris Alper et.al. | 2303.12513v1 | link |
2023-03-22 | CLIP^2: Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data | Yihan Zeng et.al. | 2303.12417v1 | null |
2023-03-21 | Prompt-MIL: Boosting Multi-Instance Learning Schemes via Task-specific Prompt Tuning | Jingwei Zhang et.al. | 2303.12214v1 | null |
2023-03-21 | Toward Accurate Interpretable Predictions of Materials Properties within Transformer Language Models | Vadim Korolev et.al. | 2303.12188v1 | null |
2023-03-21 | MV-MR: multi-views and multi-representations for self-supervised learning and knowledge distillation | Vitaliy Kinakh et.al. | 2303.12130v1 | link |
2023-03-21 | Logical Reasoning over Natural Language as Knowledge Representation: A Survey | Zonglin Yang et.al. | 2303.12023v1 | null |
2023-03-21 | A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need? | Chaoning Zhang et.al. | 2303.11717v1 | null |
2023-03-21 | Manipulating Transfer Learning for Property Inference | Yulong Tian et.al. | 2303.11643v1 | link |
2023-03-21 | Large AI Models in Health Informatics: Applications, Challenges, and the Future | Jianing Qiu et.al. | 2303.11568v1 | null |
2023-03-20 | eP-ALM: Efficient Perceptual Augmentation of Language Models | Mustafa Shukor et.al. | 2303.11403v1 | link |
2023-03-20 | Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding | Jihao Liu et.al. | 2303.11325v1 | null |
2023-03-20 | Conversation Modeling to Predict Derailment | Jiaqing Yuan et.al. | 2303.11184v1 | null |
2023-03-20 | Coreset Sampling from Open-Set for Fine-Grained Self-Supervised Learning | Sungnyun Kim et.al. | 2303.11101v1 | null |
2023-03-20 | Discovering Interpretable Directions in the Semantic Latent Space of Diffusion Models | René Haas et.al. | 2303.11073v1 | null |
2023-03-20 | Tubelet-Contrastive Self-Supervision for Video-Efficient Generalization | Fida Mohammad Thoker et.al. | 2303.11003v1 | null |
2023-03-20 | EMC2-Net: Joint Equalization and Modulation Classification based on Constellation Network | Hyun Ryu et.al. | 2303.10934v1 | link |
2023-03-20 | Exploring Representation Learning for Small-Footprint Keyword Spotting | Fan Cui et.al. | 2303.10912v1 | null |
2023-03-21 | Actionlet-Dependent Contrastive Learning for Unsupervised Skeleton-Based Action Recognition | Lilang Lin et.al. | 2303.10904v2 | null |
2023-03-20 | Character, Word, or Both? Revisiting the Segmentation Granularity for Chinese Pre-trained Language Models | Xinnian Liang et.al. | 2303.10893v1 | null |
2023-03-20 | A Global Model Approach to Robust Few-Shot SAR Automatic Target Recognition | Nathan Inkawhich et.al. | 2303.10800v1 | null |
2023-03-17 | Enhancing the Role of Context in Region-Word Alignment for Object Detection | Kyle Buettner et.al. | 2303.10093v1 | null |
2023-03-17 | DialogPaint: A Dialog-based Image Editing Model | Jingxuan Wei et.al. | 2303.10073v1 | null |
2023-03-17 | Breast Cancer Histopathology Image based Gene Expression Prediction using Spatial Transcriptomics data and Deep Learning | Md Mamunur Rahaman et.al. | 2303.09987v1 | null |
2023-03-17 | Dual-path Adaptation from Image to Video Transformers | Jungin Park et.al. | 2303.09857v1 | link |
2023-03-17 | CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos | Seungju Han et.al. | 2303.09713v1 | null |
2023-03-16 | VEIL: Vetting Extracted Image Labels from In-the-Wild Captions for Weakly-Supervised Object Detection | Arushi Rai et.al. | 2303.09608v1 | null |
2023-03-16 | DiffIR: Efficient Diffusion Model for Image Restoration | Bin Xia et.al. | 2303.09472v1 | null |
2023-03-16 | Team SheffieldVeraAI at SemEval-2023 Task 3: Mono and multilingual approaches for news genre, topic and persuasion technique classification | Ben Wu et.al. | 2303.09421v1 | null |
2023-03-16 | 3D Masked Autoencoding and Pseudo-labeling for Domain Adaptive Segmentation of Heterogeneous Infant Brain MRI | Xuzhe Zhang et.al. | 2303.09373v1 | null |
2023-03-16 | StylerDALLE: Language-Guided Style Transfer Using a Vector-Quantized Tokenizer of a Large-Scale Generative Model | Zipeng Xu et.al. | 2303.09268v1 | link |
2023-03-16 | GridCLIP: One-Stage Object Detection by Grid-Level CLIP Representation Learning | Jiayi Lin et.al. | 2303.09252v1 | null |
2023-03-16 | Emotional Reaction Intensity Estimation Based on Multimodal Data | Shangfei Wang et.al. | 2303.09167v1 | null |
2023-03-15 | Deep Learning Weight Pruning with RMT-SVD: Increasing Accuracy and Reducing Overfitting | Yitzchak Shmalo et.al. | 2303.08986v1 | link |
2023-03-15 | Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement | Fartash Faghri et.al. | 2303.08983v1 | null |
2023-03-15 | PLEX: Making the Most of the Available Data for Robotic Manipulation Pretraining | Garrett Thomas et.al. | 2303.08789v1 | null |
2023-03-15 | 2D and 3D CNN-Based Fusion Approach for COVID-19 Severity Prediction from 3D CT-Scans | Fares Bougourzi et.al. | 2303.08740v1 | link |
2023-03-15 | Mapping Urban Population Growth from Sentinel-2 MSI and Census Data Using Deep Learning: A Case Study in Kigali, Rwanda | Sebastian Hafner et.al. | 2303.08511v1 | link |
2023-03-15 | Task-specific Fine-tuning via Variational Information Bottleneck for Weakly-supervised Pathology Whole Slide Image Classification | Honglin Li et.al. | 2303.08446v1 | null |
2023-03-15 | Lana: A Language-Capable Navigator for Instruction Following and Generation | Xiaohan Wang et.al. | 2303.08409v1 | link |
2023-03-15 | SegPrompt: Using Segmentation Map as a Better Prompt to Finetune Deep Models for Kidney Stone Classification | Wei Zhu et.al. | 2303.08303v1 | null |
2023-03-14 | Contextualized Medication Information Extraction Using Transformer-based Deep Learning Architectures | Aokun Chen et.al. | 2303.08259v1 | null |
2023-03-14 | Diversity-Aware Meta Visual Prompting | Qidong Huang et.al. | 2303.08138v1 | link |
2023-03-15 | Eliciting Latent Predictions from Transformers with the Tuned Lens | Nora Belrose et.al. | 2303.08112v2 | link |
2023-03-14 | Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection | Jinchao Li et.al. | 2303.08019v1 | null |
2023-03-14 | A Theory of Emergent In-Context Learning as Implicit Structure Induction | Michael Hahn et.al. | 2303.07971v1 | null |
2023-03-14 | Edit-A-Video: Single Video Editing with Object-Aware Consistency | Chaehun Shin et.al. | 2303.07945v1 | null |
2023-03-15 | Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation | Junyoung Seo et.al. | 2303.07937v2 | null |
2023-03-14 | The Learnability of In-Context Learning | Noam Wies et.al. | 2303.07895v1 | null |
2023-03-14 | Geolocation Predicting of Tweets Using BERT-Based Models | Kateryna Lutsai et.al. | 2303.07865v1 | null |
2023-03-14 | Feature representations useful for predicting image memorability | Takumi Harada et.al. | 2303.07679v1 | null |
2023-03-14 | Variation of Gender Biases in Visual Recognition Models Before and After Finetuning | Jaspreet Ranjit et.al. | 2303.07615v1 | null |
2023-03-13 | Model-tuning Via Prompts Makes NLP Models Adversarially Robust | Mrigank Raman et.al. | 2303.07320v1 | null |
2023-03-13 | Vision-Language Models as Success Detectors | Yuqing Du et.al. | 2303.07280v1 | null |
2023-03-13 | InferFix: End-to-End Program Repair with LLMs | Matthew Jin et.al. | 2303.07263v1 | null |
2023-03-13 | PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents | Weixiong Lin et.al. | 2303.07240v1 | null |
2023-03-13 | AdaptiveNet: Post-deployment Neural Architecture Adaptation for Diverse Edge Environments | Hao Wen et.al. | 2303.07129v1 | null |
2023-03-13 | Generating multiple-choice questions for medical question answering with distractors and cue-masking | Damien Sileo et.al. | 2303.07069v1 | null |
2023-03-14 | Pretrained ViTs Yield Versatile Representations For Medical Images | Christos Matsoukas et.al. | 2303.07034v2 | link |
2023-03-13 | Self-supervised based general laboratory progress pretrained model for cardiovascular event detection | Li-Chin Chen et.al. | 2303.06980v1 | null |
2023-03-14 | Uni-RXN: A Unified Framework Bridging the Gap between Chemical Reaction Pretraining and Conditional Molecule Generation | Bo Qiang et.al. | 2303.06965v2 | link |
2023-03-13 | Contextually-rich human affect perception using multimodal scene information | Digbalay Bose et.al. | 2303.06904v1 | link |
2023-03-10 | Rewarding Chatbots for Real-World Engagement with Millions of Users | Robert Irvine et.al. | 2303.06135v1 | null |
2023-03-13 | Improving Domain-Invariance in Self-Supervised Learning via Batch Styles Standardization | Marin Scalbert et.al. | 2303.06088v2 | null |
2023-03-10 | MVImgNet: A Large-scale Dataset of Multi-view Images | Xianggang Yu et.al. | 2303.06042v1 | null |
2023-03-10 | Marginalia and machine learning: Handwritten text recognition for Marginalia Collections | Adam Axelsson et.al. | 2303.05929v1 | link |
2023-03-10 | Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection | Luting Wang et.al. | 2303.05892v1 | null |
2023-03-10 | 3D Masked Autoencoders with Application to Anomaly Detection in Non-Contrast Enhanced Breast MRI | Daniel M. Lang et.al. | 2303.05861v1 | null |
2023-03-10 | Contrastive Language-Image Pretrained (CLIP) Models are Powerful Out-of-Distribution Detectors | Felix Michels et.al. | 2303.05828v1 | null |
2023-03-10 | Scaling Up 3D Kernels with Bayesian Frequency Re-parameterization for Medical Image Segmentation | Ho Hin Lee et.al. | 2303.05785v1 | null |
2023-03-10 | CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with Variational Alignment | Jiangbin Zheng et.al. | 2303.05725v1 | null |
2023-03-10 | MuLTI: Efficient Video-and-Language Understanding with MultiWay-Sampler and Multiple Choice Modeling | Jiaqi Xu et.al. | 2303.05707v1 | null |
2023-03-09 | FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis Using Self-Supervised Speech Representation Learning | Kazi Injamamul Haque et.al. | 2303.05416v1 | link |
2023-03-09 | Greener yet Powerful: Taming Large Code Generation Models with Quantization | Xiaokai Wei et.al. | 2303.05378v1 | null |
2023-03-09 | Can a Frozen Pretrained Language Model be used for Zero-shot Neural Retrieval on Entity-centric Questions? | Yasuto Hoshi et.al. | 2303.05153v1 | null |
2023-03-08 | Enhancing Low-resolution Face Recognition with Feature Similarity Knowledge Distillation | Sungho Shin et.al. | 2303.04681v1 | null |
2023-03-08 | Aberration-Aware Depth-from-Focus | Xinge Yang et.al. | 2303.04654v1 | null |
2023-03-08 | FastSurf: Fast Neural RGB-D Surface Reconstruction using Per-Frame Intrinsic Refinement and TSDF Fusion Prior Learning | Seunghwan Lee et.al. | 2303.04508v1 | null |
2023-03-08 | Onsets and Velocities: Affordable Real-Time Piano Transcription Using Convolutional Neural Networks | Andres Fernandez et.al. | 2303.04485v1 | link |
2023-03-07 | PSDNet: Determination of Particle Size Distributions Using Synthetic Soil Images and Convolutional Neural Networks | Javad Manashti et.al. | 2303.04269v1 | null |
2023-03-07 | Comparing PSDNet, pretrained networks, and traditional feature extraction for predicting the particle size distribution of granular materials from photographs | Javad Manashti et.al. | 2303.04265v1 | null |
2023-03-09 | Patch of Invisibility: Naturalistic Black-Box Adversarial Attacks on Object Detectors | Raz Lapid et.al. | 2303.04238v2 | null |
2023-03-07 | Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models? | Boris Knyazev et.al. | 2303.04143v1 | link |
2023-03-07 | Foundation Models for Decision Making: Problems, Methods, and Opportunities | Sherry Yang et.al. | 2303.04129v1 | null |
2023-03-07 | CroCoSum: A Benchmark Dataset for Cross-Lingual Code-Switched Summarization | Ruochen Zhang et.al. | 2303.04092v1 | null |
2023-03-07 | Larger language models do in-context learning differently | Jerry Wei et.al. | 2303.03846v1 | null |
2023-03-07 | Lformer: Text-to-Image Generation with L-shape Block Parallel Decoding | Jiacheng Li et.al. | 2303.03800v1 | null |
2023-03-07 | Prediction of transonic flow over supercritical airfoils using geometric-encoding and deep-learning strategies | Zhiwen Deng et.al. | 2303.03695v1 | null |
2023-03-07 | AST-SED: An Effective Sound Event Detection Method Based on Audio Spectrogram Transformer | Kang Li et.al. | 2303.03689v1 | null |
2023-03-06 | Structured Kernel Estimation for Photon-Limited Deconvolution | Yash Sanghvi et.al. | 2303.03472v1 | link |
2023-03-06 | CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning | Hritik Bansal et.al. | [2303.03323v1](http://arxiv.org/abs/2303.03 |