GitHub - stoneyang/cv-arxiv-daily: 🎓Automatically Update CV Papers Daily using Github Actions (Update Every 24th hours)

[![Contributors][contributors-shield]][contributors-url] [![Forks][forks-shield]][forks-url] [![Stargazers][stars-shield]][stars-url] [![Issues][issues-shield]][issues-url]

Updated on 2024.07.16

Table of Contents

pretrain
downstream
adaptor
object detection

pretrain

Publish Date	Title	Authors	PDF	Code
2024-07-12	MUSCLE: A Model Update Strategy for Compatible LLM Evolution	Jessica Echterhoff et.al.	2407.09435v1	null
2024-07-12	Transformer Layers as Painters	Qi Sun et.al.	2407.09298v1	null
2024-07-12	Movie Recommendation with Poster Attention via Multi-modal Transformer Feature Fusion	Linhan Xia et.al.	2407.09157v1	null
2024-07-12	Aligning Diffusion Behaviors with Q-functions for Efficient Continuous Control	Huayu Chen et.al.	2407.09024v1	null
2024-07-12	Tissue-Contrastive Semi-Masked Autoencoders for Segmentation Pretraining on Chest CT	Jie Zheng et.al.	2407.08961v1	null
2024-07-12	Symmetry Awareness Encoded Deep Learning Framework for Brain Imaging Analysis	Yang Ma et.al.	2407.08948v1	link
2024-07-11	Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing	Huanqian Wang et.al.	2407.08770v1	link
2024-07-11	Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data	Cherie Ho et.al.	2407.08726v1	null
2024-07-11	A Taxonomy for Data Contamination in Large Language Models	Medha Palavalli et.al.	2407.08716v1	null
2024-07-11	Mitigating Catastrophic Forgetting in Language Transfer via Model Merging	Anton Alexandrov et.al.	2407.08699v1	null
2024-07-11	Jet Tagging with More-Interaction Particle Transformer	Yifan Wu et.al.	2407.08682v1	null
2024-07-11	Emergent Visual-Semantic Hierarchies in Image-Text Representations	Morris Alper et.al.	2407.08521v1	null
2024-07-11	Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Cross-Regularization	Jinlong Li et.al.	2407.08374v1	null
2024-07-11	E2VIDiff: Perceptual Events-to-Video Reconstruction using Diffusion Priors	Jinxiu Liang et.al.	2407.08231v1	null
2024-07-11	Generating Contextually-Relevant Navigation Instructions for Blind and Low Vision People	Zain Merchant et.al.	2407.08219v1	null
2024-07-10	Knowledge Overshadowing Causes Amalgamated Hallucination in Large Language Models	Yuji Zhang et.al.	2407.08039v1	null
2024-07-10	Training on the Test Task Confounds Evaluation and Emergence	Ricardo Dominguez-Olmedo et.al.	2407.07890v1	link
2024-07-10	Learning Spatial-Semantic Features for Robust Video Object Segmentation	Xin Li et.al.	2407.07760v1	null
2024-07-10	VEnhancer: Generative Space-Time Enhancement for Video Generation	Jingwen He et.al.	2407.07667v1	null
2024-07-10	Machine Unlearning for Medical Imaging	Reza Nasirigerdeh et.al.	2407.07539v1	null
2024-07-10	IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection	Mingjin Zhang et.al.	2407.07520v1	link
2024-07-10	Bucket Pre-training is All You Need	Hongtao Liu et.al.	2407.07495v1	null
2024-07-10	Exploring the Untouched Sweeps for Conflict-Aware 3D Segmentation Pretraining	Tianfang Sun et.al.	2407.07465v1	null
2024-07-10	Unity in Diversity: Multi-expert Knowledge Confrontation and Collaboration for Generalizable Vehicle Re-identification	Zhenyu Kuang et.al.	2407.07351v1	null
2024-07-10	Micro-Expression Recognition by Motion Feature Extraction based on Pre-training	Ruolin Li et.al.	2407.07345v1	null
2024-07-10	ViTime: A Visual Intelligence-Based Foundation Model for Time Series Forecasting	Luoxiao Yang et.al.	2407.07311v1	link
2024-07-09	FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation	Liqun Ma et.al.	2407.07093v1	link
2024-07-09	ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction	Shaozhe Hao et.al.	2407.07077v1	link
2024-07-09	CycleSAM: One-Shot Surgical Scene Segmentation using Cycle-Consistent Feature Matching to Prompt SAM	Aditya Murali et.al.	2407.06795v1	null
2024-07-09	CoLA: Conditional Dropout and Language-driven Robust Dual-modal Salient Object Detection	Shuang Hao et.al.	2407.06780v1	link
2024-07-09	Using Pretrained Large Language Model with Prompt Engineering to Answer Biomedical Questions	Wenxin Zhou et.al.	2407.06779v1	null
2024-07-09	Pretraining-finetuning Framework for Efficient Co-design: A Case Study on Quadruped Robot Parkour	Ci Chen et.al.	2407.06770v1	null
2024-07-09	PDEformer-1: A Foundation Model for One-Dimensional Partial Differential Equations	Zhanhong Ye et.al.	2407.06664v1	null
2024-07-09	Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition	Mingfang Zhang et.al.	2407.06628v1	null
2024-07-09	F2PAD: A General Optimization Framework for Feature-Level to Pixel-Level Anomaly Detection	Chengyu Tao et.al.	2407.06519v1	null
2024-07-09	A Clinical Benchmark of Public Self-Supervised Pathology Foundation Models	Gabriele Campanella et.al.	2407.06508v1	null
2024-07-08	4D Contrastive Superflows are Dense 3D Representation Learners	Xiang Xu et.al.	2407.06190v1	link
2024-07-08	Uni-ELF: A Multi-Level Representation Learning Framework for Electrolyte Formulation Design	Boshen Zeng et.al.	2407.06152v1	null
2024-07-08	3D Vision and Language Pretraining with Large-Scale Synthetic Data	Dejie Yang et.al.	2407.06084v1	link
2024-07-08	From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty	Maor Ivgi et.al.	2407.06071v1	link
2024-07-08	MST5 -- Multilingual Question Answering over Knowledge Graphs	Nikit Srivastava et.al.	2407.06041v1	link
2024-07-08	Igea: a Decoder-Only Language Model for Biomedical Text Generation in Italian	Tommaso Mario Buonocore et.al.	2407.06011v1	null
2024-07-08	Pseudo-triplet Guided Few-shot Composed Image Retrieval	Bohan Hou et.al.	2407.06001v1	null
2024-07-08	Multimodal Diffusion Transformer: Learning Versatile Behavior from Multimodal Goals	Moritz Reuss et.al.	2407.05996v1	null
2024-07-08	Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning	Bin Ren et.al.	2407.05862v1	null
2024-07-08	An Empirical Comparison of Vocabulary Expansion and Initialization Approaches for Language Models	Nandini Mundra et.al.	2407.05841v1	link
2024-07-05	Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs	Rudolf Laine et.al.	2407.04694v1	null
2024-07-05	Pretraining End-to-End Keyword Search with Automatically Discovered Acoustic Units	Bolaji Yusuf et.al.	2407.04652v1	link
2024-07-05	Performance Analysis of Speech Encoders for Low-Resource SLU and ASR in Tunisian Dialect	Salima Mdhaffar et.al.	2407.04533v1	null
2024-07-05	Few-Shot Airway-Tree Modeling using Data-Driven Sparse Priors	Ali Keshavarzi et.al.	2407.04507v1	null
2024-07-05	Using LLMs to label medical papers according to the CIViC evidence model	Markus Hisch et.al.	2407.04466v1	null
2024-07-05	Generalists vs. Specialists: Evaluating Large Language Models for Urdu	Samee Arif et.al.	2407.04459v1	null
2024-07-05	Multi-modal Masked Siamese Network Improves Chest X-Ray Representation Learning	Saeed Shurrab et.al.	2407.04449v1	link
2024-07-05	XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models	Shashi Kumar et.al.	2407.04439v1	null
2024-07-05	Understanding the Role of Invariance in Transfer Learning	Till Speicher et.al.	2407.04325v1	link
2024-07-05	Smart Vision-Language Reasoners	Denisa Roberts et.al.	2407.04212v1	link
2024-07-03	STF: Sentence Transformer Fine-Tuning For Topic Categorization With Limited Data	Kheir Eddine Daouadi et.al.	2407.03253v1	null
2024-07-03	CATT: Character-based Arabic Tashkeel Transformer	Faris Alasmary et.al.	2407.03236v1	link
2024-07-03	SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding	Weitai Kang et.al.	2407.03200v1	link
2024-07-03	On the Client Preference of LLM Fine-tuning in Federated Learning	Feijie Wu et.al.	2407.03038v1	null
2024-07-03	Strategies for Arabic Readability Modeling	Juan Piñeros Liberato et.al.	2407.03032v1	null
2024-07-03	Exploiting Dialect Identification in Automatic Dialectal Text Normalization	Bashar Alhafni et.al.	2407.03020v1	null
2024-07-03	Large language models, physics-based modeling, experimental measurements: the trinity of data-scarce learning of polymer properties	Ning Liu et.al.	2407.02770v1	null
2024-07-02	Magic Insert: Style-Aware Drag-and-Drop	Nataniel Ruiz et.al.	2407.02489v1	null
2024-07-02	GCF: Graph Convolutional Networks for Facial Expression Recognition	Hozaifa Kassab et.al.	2407.02361v1	null
2024-07-02	Parameter-Selective Continual Test-Time Adaptation	Jiaxu Tian et.al.	2407.02253v1	null
2024-07-02	MIREncoder: Multi-modal IR-based Pretrained Embeddings for Performance Optimizations	Akash Dutta et.al.	2407.02238v1	null
2024-07-02	Breaking Language Barriers: Cross-Lingual Continual Pre-Training at Scale	Wenzhen Zheng et.al.	2407.02118v1	null
2024-07-02	DM3D: Distortion-Minimized Weight Pruning for Lossless 3D Object Detection	Kaixin Xu et.al.	2407.02098v1	null
2024-07-02	ScaleDreamer: Scalable Text-to-3D Synthesis with Asynchronous Score Distillation	Zhiyuan Ma et.al.	2407.02040v1	link
2024-07-02	Multi-Grained Contrast for Data-Efficient Unsupervised Representation Learning	Chengchao Shen et.al.	2407.02014v1	link
2024-07-02	Unleash the Power of Local Representations for Few-Shot Classification	Shi Tang et.al.	2407.01967v1	null
2024-07-02	Text-Aware Diffusion for Policy Learning	Calvin Luo et.al.	2407.01903v1	null
2024-06-28	Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs	Sukmin Yun et.al.	2406.20098v1	link
2024-06-28	LLaRA: Supercharging Robot Learning Data for Vision-Language Policy	Xiang Li et.al.	2406.20095v1	link
2024-06-28	BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5	Zhehuai Chen et.al.	2406.19954v1	null
2024-06-28	Breaking the Script Barrier in Multilingual Pre-Trained Language Models with Transliteration-Based Post-Training Alignment	Orgest Xhelili et.al.	2406.19759v1	null
2024-06-28	Deep Fusion Model for Brain Tumor Classification Using Fine-Grained Gradient Preservation	Niful Islam et.al.	2406.19690v1	null
2024-06-28	PopAlign: Population-Level Alignment for Fair Text-to-Image Generation	Shufan Li et.al.	2406.19668v1	link
2024-06-27	Subtractive Training for Music Stem Insertion using Latent Diffusion Models	Ivan Villa-Renteria et.al.	2406.19328v1	null
2024-06-27	SimpleFusion: A Simple Fusion Framework for Infrared and Visible Images	Ming Chen et.al.	2406.19055v1	link
2024-06-27	Fine-tuned network relies on generic representation to solve unseen cognitive task	Dongyan Lin et.al.	2406.18926v1	null
2024-06-27	Sonnet or Not, Bot? Poetry Evaluation for Large Models and Datasets	Melanie Walsh et.al.	2406.18906v1	null
2024-06-27	Learning Modality Knowledge Alignment for Cross-Modality Transfer	Wenxuan Ma et.al.	2406.18864v1	null
2024-06-27	LICO: Large Language Models for In-Context Molecular Optimization	Tung Nguyen et.al.	2406.18851v1	null
2024-06-26	Learn it or Leave it: Module Composition and Pruning for Continual Learning	Mingyang Wang et.al.	2406.18708v1	null
2024-06-26	Automatic Prediction of Amyotrophic Lateral Sclerosis Progression using Longitudinal Speech Transformer	Liming Wang et.al.	2406.18625v1	null
2024-06-26	Mental Modeling of Reinforcement Learning Agents by Language Models	Wenhao Lu et.al.	2406.18505v1	null
2024-06-26	Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference	Yuan Gao et.al.	2406.18453v1	link
2024-06-27	Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs	Lei Zhang et.al.	2406.18294v2	link
2024-06-26	Generative artificial intelligence in ophthalmology: multimodal retinal images for the diagnosis of Alzheimer's disease with convolutional neural networks	I. R. Slootweg et.al.	2406.18247v1	null
2024-06-26	3D-MVP: 3D Multiview Pretraining for Robotic Manipulation	Shengyi Qian et.al.	2406.18158v1	null
2024-06-26	Open-vocabulary Mobile Manipulation in Unseen Dynamic Environments with 3D Semantic Maps	Dicong Qiu et.al.	2406.18115v1	null
2024-06-26	The Surprising Effectiveness of Multimodal Large Language Models for Video Moment Retrieval	Meinardus Boris et.al.	2406.18113v1	link
2024-06-26	Multilingual Knowledge Graph Completion from Pretrained Language Models with Knowledge Constraints	Ran Song et.al.	2406.18085v1	link
2024-06-26	Few-Shot Medical Image Segmentation with High-Fidelity Prototypes	Song Tang et.al.	2406.18074v1	link
2024-06-27	EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation	Baoqi Pei et.al.	2406.18070v2	null
2024-06-25	Data curation via joint example selection further accelerates multimodal learning	Talfan Evans et.al.	2406.17711v1	null
2024-06-25	This Paper Had the Smartest Reviewers -- Flattery Detection Utilising an Audio-Textual Transformer-Based Approach	Lukas Christ et.al.	2406.17667v1	null
2024-06-25	Transformer-based segmentation of adnexal lesions and ovarian implants in CT images	Aneesh Rangnekar et.al.	2406.17666v1	null
2024-06-25	Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients	Aashiq Muhamed et.al.	2406.17660v1	link
2024-06-26	Minimal Interaction Edge Tuning: A New Paradigm for Visual Adaptation	Ningyuan Tang et.al.	2406.17559v2	null
2024-06-25	The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale	Guilherme Penedo et.al.	2406.17557v1	null
2024-06-25	Retrieval-style In-Context Learning for Few-shot Hierarchical Text Classification	Huiyao Chen et.al.	2406.17534v1	null
2024-06-25	Towards Federated Low-Rank Adaptation with Rank-Heterogeneous Communication	Yuji Byun et.al.	2406.17477v1	null
2024-06-25	Investigating Self-Supervised Methods for Label-Efficient Learning	Srinivasa Rao Nandam et.al.	2406.17460v1	null
2024-06-25	Native Design Bias: Studying the Impact of English Nativeness on Language Model Performance	Manon Reusens et.al.	2406.17385v1	null
2024-06-24	Dreamitate: Real-World Visuomotor Policy Learning via Video Generation	Junbang Liang et.al.	2406.16862v1	null
2024-06-24	Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters	Euiin Yi et.al.	2406.16758v1	null
2024-06-24	Geometry-Aware Score Distillation via 3D Consistent Noising and Gradient Consistency Modeling	Min-Seop Kwak et.al.	2406.16695v1	null
2024-06-24	Segment Any Text: A Universal Approach for Robust, Efficient and Adaptable Sentence Segmentation	Markus Frohmann et.al.	2406.16678v1	null
2024-06-24	CAVE: Controllable Authorship Verification Explanations	Sahana Ramnath et.al.	2406.16672v1	link
2024-06-24	DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-Resolution	Aiwen Jiang et.al.	2406.16477v1	null
2024-06-24	Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation	Yuchen Yang et.al.	2406.16282v1	link
2024-06-24	Feature-prompting GBMSeg: One-Shot Reference Guided Training-Free Prompt Engineering for Glomerular Basement Membrane Segmentation	Xueyu Liu et.al.	2406.16271v1	null
2024-06-23	Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking	Yuwei Zhang et.al.	2406.16148v1	link
2024-06-23	Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models	Lynn Chua et.al.	2406.16135v1	link
2024-06-21	Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning	Brandon Huang et.al.	2406.15334v1	null
2024-06-21	GiusBERTo: A Legal Language Model for Personal Data De-identification in Italian Court of Auditors Decisions	Giulio Salierno et.al.	2406.15032v1	null
2024-06-21	Uni-Mol2: Exploring Molecular Pretraining Model at Scale	Xiaohong Ji et.al.	2406.14969v1	null
2024-06-21	ICLEval: Evaluating In-Context Learning Ability of Large Language Models	Wentong Chen et.al.	2406.14955v1	link
2024-06-21	70B-parameter large language models in Japanese medical question-answering	Issey Sukeda et.al.	2406.14882v1	null
2024-06-20	Understanding Finetuning for Factual Knowledge Extraction	Gaurav Ghosal et.al.	2406.14785v1	null
2024-06-20	Factual Dialogue Summarization via Learning from Large Language Models	Rongxin Zhu et.al.	2406.14709v1	null
2024-06-20	Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities	Sachit Menon et.al.	2406.14562v1	null
2024-06-20	V-LASIK: Consistent Glasses-Removal from Videos Using Synthetic Data	Rotem Shalev-Arkushin et.al.	2406.14510v1	null
2024-06-20	Data-Centric AI in the Age of Large Language Models	Xinyi Xu et.al.	2406.14473v1	null
2024-06-20	Decoding Vocal Articulations from Acoustic Latent Representations	Mateo Cámara et.al.	2406.14379v1	null
2024-06-20	Infusing clinical knowledge into tokenisers for language models	Abul Hasan et.al.	2406.14312v1	null
2024-06-20	On the Evaluation Practices in Multilingual NLP: Can Machine Translation Offer an Alternative to Human Translations?	Rochelle Choenni et.al.	2406.14267v1	null
2024-06-20	Geometric Self-Supervised Pretraining on 3D Protein Structures using Subgraphs	Michail Chatzianastasis et.al.	2406.14142v1	null
2024-06-20	Two-Stage Depth Enhanced Learning with Obstacle Map For Object Navigation	Yanwei Zheng et.al.	2406.14103v1	null
2024-06-20	Protecting Privacy Through Approximating Optimal Parameters for Sequence Unlearning in Language Models	Dohyun Lee et.al.	2406.14091v1	null
2024-06-20	Information Guided Regularization for Fine-tuning Language Models	Mandar Sharma et.al.	2406.14005v1	link
2024-06-18	GFM4MPM: Towards Geospatial Foundation Models for Mineral Prospectivity Mapping	Angel Daruna et.al.	2406.12756v1	null
2024-06-18	BIOSCAN-5M: A Multimodal Dataset for Insect Biodiversity	Zahra Gharaee et.al.	2406.12723v1	link
2024-06-18	GeoBench: Benchmarking and Analyzing Monocular Geometry Estimation Models	Yongtao Ge et.al.	2406.12671v1	link
2024-06-18	News Without Borders: Domain Adaptation of Multilingual Sentence Embeddings for Cross-lingual News Recommendation	Andreea Iana et.al.	2406.12634v1	link
2024-06-18	From Instance Training to Instruction Learning: Task Adapters Generation from Instructions	Huanxuan Liao et.al.	2406.12382v1	null
2024-06-18	Cross-Lingual Unlearning of Selective Knowledge in Multilingual Language Models	Minseok Choi et.al.	2406.12354v1	null
2024-06-18	JEN-1 DreamStyler: Customized Musical Concept Learning via Pivotal Parameters Tuning	Boyu Chen et.al.	2406.12292v1	null
2024-06-18	VIRL: Volume-Informed Representation Learning towards Few-shot Manufacturability Estimation	Yu-hsuan Chen et.al.	2406.12286v1	null
2024-06-18	LLMs Are Prone to Fallacies in Causal Inference	Nitish Joshi et.al.	2406.12158v1	null
2024-06-17	Efficient Sequential Decision Making with Large Language Models	Dingyang Chen et.al.	2406.12125v1	null
2024-06-17	Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations	Kazusato Oko et.al.	2406.11828v1	null
2024-06-17	How Do Large Language Models Acquire Factual Knowledge During Pretraining?	Hoyeon Chang et.al.	2406.11813v1	null
2024-06-17	DataComp-LM: In search of the next generation of training sets for language models	Jeffrey Li et.al.	2406.11794v1	null
2024-06-17	A Brief Survey on Leveraging Large Scale Vision Models for Enhanced Robot Grasping	Abhi Kamboj et.al.	2406.11786v1	null
2024-06-17	Input Conditioned Graph Generation for Language Agents	Lukas Vierling et.al.	2406.11555v1	link
2024-06-17	BAMBINO-LM: (Bilingual-)Human-Inspired Continual Pretraining of BabyLM	Zhewen Shen et.al.	2406.11418v1	null
2024-06-17	CodeGemma: Open Code Models Based on Gemma	CodeGemma Team et.al.	2406.11409v1	null
2024-06-17	Preserving Knowledge in Large Language Model: A Model-Agnostic Self-Decompression Approach	Zilun Zhang et.al.	2406.11354v1	null
2024-06-18	BaFTA: Backprop-Free Test-Time Adaptation For Zero-Shot Vision-Language Models	Xuefeng Hu et.al.	2406.11309v2	null
2024-06-17	MiniConGTS: A Near Ultimate Minimalist Contrastive Grid Tagging Scheme for Aspect Sentiment Triplet Extraction	Qiao Sun et.al.	2406.11234v1	null
2024-06-14	Quantifying Variance in Evaluation Benchmarks	Lovish Madaan et.al.	2406.10229v1	null
2024-06-14	PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting	Alex Hanson et.al.	2406.10219v1	null
2024-06-14	AlignNet: Learning dataset score alignment functions to enable better training of speech quality estimators	Jaden Pieper et.al.	2406.10205v1	null
2024-06-14	Improving rule mining via embedding-based link prediction	N'Dah Jean Kouagou et.al.	2406.10144v1	link
2024-06-14	Training-free Camera Control for Video Generation	Chen Hou et.al.	2406.10126v1	null
2024-06-14	Intepretative Deep Learning using Domain Adaptation for Fluorescence Spectroscopy	Umberto Michelucci et.al.	2406.10031v1	null
2024-06-14	Group and Shuffle: Efficient Structured Orthogonal Parametrization	Mikhail Gorbunov et.al.	2406.10019v1	null
2024-06-14	OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control	Yuzhong Huang et.al.	2406.10000v1	null
2024-06-14	TabularFM: An Open Framework For Tabular Foundational Models	Quan M. Tran et.al.	2406.09837v1	null
2024-06-14	HiP Attention: Sparse Sub-Quadratic Attention with Hierarchical Attention Pruning	Heejun Lee et.al.	2406.09827v1	null
2024-06-13	Explore the Limits of Omni-modal Pretraining at Scale	Yiyuan Zhang et.al.	2406.09412v1	link
2024-06-13	Reflecting on the State of Rehearsal-free Continual Learning with Pretrained Models	Lukas Thede et.al.	2406.09384v1	null
2024-06-13	Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations	Rylan Schaeffer et.al.	2406.09366v1	null
2024-06-13	End-to-end Streaming model for Low-Latency Speech Anonymization	Waris Quamer et.al.	2406.09277v1	null
2024-06-13	OpenVLA: An Open-Source Vision-Language-Action Model	Moo Jin Kim et.al.	2406.09246v1	null
2024-06-13	Language Complexity and Speech Recognition Accuracy: Orthographic Complexity Hurts, Phonological Complexity Doesn't	Chihiro Taguchi et.al.	2406.09202v1	null
2024-06-13	SR-CACO-2: A Dataset for Confocal Fluorescence Microscopy Image Super-Resolution	Soufiane Belharbi et.al.	2406.09168v1	link
2024-06-13	MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning	Hanqing Wang et.al.	2406.09044v1	null
2024-06-13	Enhancing Cross-Modal Fine-Tuning with Gradually Intermediate Modality Generation	Lincan Cai et.al.	2406.09003v1	null
2024-06-13	Exploring Multilingual Unseen Speaker Emotion Recognition: Leveraging Co-Attention Cues in Multitask Learning	Arnav Goel et.al.	2406.08931v1	link
2024-06-12	On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models	Hashmat Shadab Malik et.al.	2406.08486v1	link
2024-06-12	Improving LLMs for Recommendation with Out-Of-Vocabulary Tokens	Ting-Ji Huang et.al.	2406.08477v1	null
2024-06-12	Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models	Yuxuan Xue et.al.	2406.08475v1	null
2024-06-12	Strategies for Pretraining Neural Operators	Anthony Zhou et.al.	2406.08473v1	link
2024-06-12	PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences	Daiwei Chen et.al.	2406.08469v1	null
2024-06-12	The Impact of Initialization on LoRA Finetuning Dynamics	Soufiane Hayou et.al.	2406.08447v1	null
2024-06-12	State Soup: In-Context Skill Learning, Retrieval and Mixing	Maciej Pióro et.al.	2406.08423v1	null
2024-06-12	WMAdapter: Adding WaterMark Control to Latent Diffusion Models	Hai Ci et.al.	2406.08337v1	null
2024-06-12	Multimodal Representation Loss Between Timed Text and Audio for Regularized Speech Separation	Tsun-An Hsieh et.al.	2406.08328v1	null
2024-06-12	Is Programming by Example solved by LLMs?	Wen-Ding Li et.al.	2406.08316v1	null
2024-06-11	Autoregressive Pretraining with Mamba in Vision	Sucheng Ren et.al.	2406.07537v1	null
2024-06-11	CTC-based Non-autoregressive Textless Speech-to-Speech Translation	Qingkai Fang et.al.	2406.07330v1	link
2024-06-11	Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?	Qingkai Fang et.al.	2406.07289v1	null
2024-06-11	ParaCLAP -- Towards a general language-audio model for computational paralinguistic tasks	Xin Jing et.al.	2406.07203v1	null
2024-06-11	Translating speech with just images	Dan Oneata et.al.	2406.07133v1	null
2024-06-11	Reading Miscue Detection in Primary School through Automatic Speech Recognition	Lingyun Gao et.al.	2406.07060v1	null
2024-06-11	Eye-for-an-eye: Appearance Transfer with Semantic Correspondence in Diffusion Models	Sooyeon Go et.al.	2406.07008v1	null
2024-06-11	UVIS: Unsupervised Video Instance Segmentation	Shuaiyi Huang et.al.	2406.06908v1	null
2024-06-10	BTS: Bridging Text and Sound Modalities for Metadata-Aided Respiratory Sound Classification	June-Woo Kim et.al.	2406.06786v1	null
2024-06-10	Video-based Exercise Classification and Activated Muscle Group Prediction with Hybrid X3D-SlowFast Network	Manvik Pasula et.al.	2406.06703v1	null
2024-06-10	Direct Preference Optimization for Suppressing Hallucinated Prior Exams in Radiology Report Generation	Oishi Banerjee et.al.	2406.06496v1	null
2024-06-10	AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction	Zhen Xing et.al.	2406.06465v1	null
2024-06-10	Foundation Inference Models for Markov Jump Processes	David Berghaus et.al.	2406.06419v1	null
2024-06-10	Meta Learning Text-to-Speech Synthesis in over 7000 Languages	Florian Lux et.al.	2406.06403v1	link
2024-06-10	Towards Lifelong Learning of Large Language Models: A Survey	Junhao Zheng et.al.	2406.06391v1	link
2024-06-10	Low-Rank Quantization-Aware Training for LLMs	Yelysei Bondarenko et.al.	2406.06385v1	null
2024-06-10	Tx-LLM: A Large Language Model for Therapeutics	Juan Manuel Zambrano Chaves et.al.	2406.06316v1	null
2024-06-10	iMotion-LLM: Motion Prediction Instruction Tuning	Abdulwahab Felemban et.al.	2406.06211v1	null
2024-06-10	DiffInject: Revisiting Debias via Synthetic Data Generation using Diffusion-based Style Injection	Donggeun Ko et.al.	2406.06134v1	null
2024-06-10	EXPIL: Explanatory Predicate Invention for Learning in Games	Jingyuan Sha et.al.	2406.06107v1	null
2024-06-07	Hibou: A Family of Foundational Vision Transformers for Pathology	Dmitry Nechaev et.al.	2406.05074v1	null
2024-06-07	Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning	Subhojyoti Mukherjee et.al.	2406.05064v1	null
2024-06-07	Scenarios and Approaches for Situated Natural Language Explanations	Pengshuo Qiu et.al.	2406.05035v1	null
2024-06-07	Nacala-Roof-Material: Drone Imagery for Roof Detection, Classification, and Segmentation to Support Mosquito-borne Disease Risk Assessment	Venkanna Babu Guthula et.al.	2406.04949v1	null
2024-06-07	Stochastic full waveform inversion with deep generative prior for uncertainty quantification	Yuke Xie et.al.	2406.04859v1	null
2024-06-07	Uncertainty Aware Learning for Language Model Alignment	Yikun Wang et.al.	2406.04854v1	null
2024-06-07	Predicting Polymer Properties Based on Multimodal Multitask Pretraining	Fanmeng Wang et.al.	2406.04727v1	null
2024-06-07	Evaluating and Mitigating IP Infringement in Visual Generative AI	Zhenting Wang et.al.	2406.04662v1	link
2024-06-07	STAR: Skeleton-aware Text-based 4D Avatar Generation with In-Network Motion Retargeting	Zenghao Chai et.al.	2406.04629v1	link
2024-06-07	Camera-Pose Robust Crater Detection from Chang'e 5	Matthew Rodda et.al.	2406.04569v1	null
2024-06-06	Everything to the Synthetic: Diffusion-driven Test-time Adaptation via Synthetic-Domain Alignment	Jiayi Guo et.al.	2406.04295v1	link
2024-06-06	Solving Inverse Problems in Protein Space Using Diffusion-Based Priors	Axel Levy et.al.	2406.04239v1	null
2024-06-06	Towards Understanding Task-agnostic Debiasing Through the Lenses of Intrinsic Bias and Forgetfulness	Guangliang Liu et.al.	2406.04146v1	null
2024-06-06	UrbanSARFloods: Sentinel-1 SLC-Based Benchmark Dataset for Urban and Open-Area Flood Mapping	Jie Zhao et.al.	2406.04111v1	null
2024-06-06	Weight-based Decomposition: A Case for Bilinear MLPs	Michael T. Pearce et.al.	2406.03947v1	null
2024-06-06	BLSP-Emo: Towards Empathetic Large Speech-Language Models	Chen Wang et.al.	2406.03872v1	link
2024-06-06	MuJo: Multimodal Joint Feature Space Learning for Human Activity Recognition	Stefan Gerd Fritsch et.al.	2406.03857v1	null
2024-06-07	Enhanced Semantic Segmentation Pipeline for WeatherProof Dataset Challenge	Nan Zhang et.al.	2406.03799v2	link
2024-06-06	Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining	Jinlong Xue et.al.	2406.03714v1	null
2024-06-06	Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model	Jinlong Xue et.al.	2406.03706v1	null
2024-06-05	Does your data spark joy? Performance gains from domain upsampling at the end of training	Cody Blakeney et.al.	2406.03476v1	null
2024-06-05	LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection	Qiang Chen et.al.	2406.03459v1	link
2024-06-05	FILS: Self-Supervised Video Feature Prediction In Semantic Language Space	Mona Ahmadian et.al.	2406.03447v1	null
2024-06-05	Text-to-Events: Synthetic Event Camera Streams from Conditional Text Input	Joachim Ott et.al.	2406.03439v1	null
2024-06-05	SuperFormer: Volumetric Transformer Architectures for MRI Super-Resolution	Cristhian Forigua et.al.	2406.03359v1	link
2024-06-05	Choice of PEFT Technique in Continual Learning: Prompt Tuning is Not All You Need	Martin Wistuba et.al.	2406.03216v1	null
2024-06-05	Computational Limits of Low-Rank Adaptation (LoRA) for Transformer-Based Models	Jerry Yao-Chieh Hu et.al.	2406.03136v1	null
2024-06-05	DEER: A Delay-Resilient Framework for Reinforcement Learning with Variable Delays	Bo Xia et.al.	2406.03102v1	null
2024-06-05	Population Transformer: Learning Population-level Representations of Intracranial Activity	Geeling Chau et.al.	2406.03044v1	null
2024-06-05	GraphAlign: Pretraining One Graph Neural Network on Multiple Graphs via Feature Alignment	Zhenyu Hou et.al.	2406.02953v1	null
2024-06-04	Landscape-Aware Growing: The Power of a Little LAG	Stefani Karp et.al.	2406.02469v1	null
2024-06-04	An Empirical Study into Clustering of Unseen Datasets with Self-Supervised Encoders	Scott C. Lowe et.al.	2406.02465v1	link
2024-06-04	CADE: Cosine Annealing Differential Evolution for Spiking Neural Network	Runhua Jiang et.al.	2406.02349v1	link
2024-06-04	Probing the Category of Verbal Aspect in Transformer Language Models	Anisia Katinskaia et.al.	2406.02335v1	null
2024-06-04	SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining	Andi Han et.al.	2406.02214v1	link
2024-06-04	Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations	Sarthak Yadav et.al.	2406.02178v1	null
2024-06-05	Multimodal Reasoning with Multimodal Knowledge Graph	Junlin Lee et.al.	2406.02030v2	null
2024-06-04	Zyda: A 1.3T Dataset for Open Language Modeling	Yury Tokpanov et.al.	2406.01981v1	null
2024-06-04	GOMAA-Geo: GOal Modality Agnostic Active Geo-localization	Anindya Sarkar et.al.	2406.01917v1	null
2024-06-04	ProGEO: Generating Prompts through Image-Text Contrastive Learning for Visual Geo-localization	Chen Mao et.al.	2406.01906v1	link
2024-05-31	Code Pretraining Improves Entity Tracking Abilities of Language Models	Najoung Kim et.al.	2405.21068v1	null
2024-05-31	Spectrum-Aware Parameter Efficient Fine-Tuning for Diffusion Models	Xinxi Zhang et.al.	2405.21050v1	null
2024-05-31	Improving Reward Models with Synthetic Critiques	Zihuiwen Ye et.al.	2405.20850v1	null
2024-05-31	Conditioning GAN Without Training Dataset	Kidist Amde Mekonnen et.al.	2405.20687v1	link
2024-05-31	Shotluck Holmes: A Family of Efficient Small-Scale Large Language Vision Models For Video Captioning and Summarization	Richard Luo et.al.	2405.20648v1	null
2024-05-30	Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models	Zachary Ankner et.al.	2405.20541v1	null
2024-05-30	Unveiling the Impact of Coding Data Instruction Fine-Tuning on Large Language Models Reasoning	Xinlu Zhang et.al.	2405.20535v1	null
2024-05-30	Multi-Label Guided Soft Contrastive Learning for Efficient Earth Observation Pretraining	Yi Wang et.al.	2405.20462v1	null
2024-05-30	Scalable Detection of Salient Entities in News Articles	Eliyar Asgarieh et.al.	2405.20461v1	null
2024-05-30	Enhancing Antibiotic Stewardship using a Natural Language Approach for Better Feature Representation	Simon A. Lee et.al.	2405.20419v1	null
2024-05-31	KerasCV and KerasNLP: Vision and Language Power-Ups	Matthew Watson et.al.	2405.20247v2	null
2024-05-30	Jina CLIP: Your CLIP Model Is Also Your Text Retriever	Andreas Koukounas et.al.	2405.20204v1	null
2024-05-30	Exploring Diffusion Models' Corruption Stage in Few-Shot Fine-tuning and Mitigating with Bayesian Neural Networks	Xiaoyu Wu et.al.	2405.19931v1	null
2024-05-30	From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems	Jianliang He et.al.	2405.19883v1	null
2024-05-30	Uncertainty-guided Optimal Transport in Depth Supervised Sparse-View 3D Gaussian	Wei Sun et.al.	2405.19657v1	null
2024-05-29	CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning	Yiping Wang et.al.	2405.19547v1	null
2024-05-29	Posterior Sampling via Autoregressive Generation	Kelly W Zhang et.al.	2405.19466v1	null
2024-05-29	Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice	Jian-Qiao Zhu et.al.	2405.19313v1	null
2024-05-29	Poseidon: Efficient Foundation Models for PDEs	Maximilian Herde et.al.	2405.19101v1	link
2024-05-29	BLSP-KD: Bootstrapping Language-Speech Pre-training via Knowledge Distillation	Chen Wang et.al.	2405.19041v1	null
2024-05-29	Tuning-Free Alignment of Diffusion Models with Direct Noise Optimization	Zhiwei Tang et.al.	2405.18881v1	null
2024-05-29	Domain-Inspired Sharpness-Aware Minimization Under Domain Shifts	Ruipeng Zhang et.al.	2405.18861v1	link
2024-05-29	LetsMap: Unsupervised Representation Learning for Semantic BEV Mapping	Nikhil Gosala et.al.	2405.18852v1	null
2024-05-29	LLaMA-Reg: Using LLaMA 2 for Unsupervised Medical Image Registration	Mingrui Ma et.al.	2405.18774v1	null
2024-05-29	Multi-objective Cross-task Learning via Goal-conditioned GPT-based Decision Transformers for Surgical Robot Task Automation	Jiawei Fu et.al.	2405.18757v1	null
2024-05-29	To FP8 and Back Again: Quantifying the Effects of Reducing Precision on LLM Training Stability	Joonhyung Lee et.al.	2405.18710v1	null
2024-05-29	Rejection via Learning Density Ratios	Alexander Soen et.al.	2405.18686v1	null
2024-05-28	WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization	Jiawei Ma et.al.	2405.18405v1	null
2024-05-28	Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning	Yixiao Zhang et.al.	2405.18386v1	link
2024-05-28	Computing hydration free energies of small molecules with first principles accuracy	J. Harry Moore et.al.	2405.18171v1	null
2024-05-28	Time Series Representation Models	Robert Leppich et.al.	2405.18165v1	link
2024-05-28	An Empirical Analysis of Forgetting in Pre-trained Models with Incremental Low-Rank Updates	Albin Soutif--Cormerais et.al.	2405.18069v1	null
2024-05-28	Visualizing the loss landscape of Self-supervised Vision Transformer	Youngwan Lee et.al.	2405.18042v1	null
2024-05-28	fMRI predictors based on language models of increasing complexity recover brain left lateralization	Laurent Bonnasse-Gahot et.al.	2405.17992v1	null
2024-05-28	Cross-Context Backdoor Attacks against Graph Prompt Learning	Xiaoting Lyu et.al.	2405.17984v1	null
2024-05-28	Knowledge Circuits in Pretrained Transformers	Yunzhi Yao et.al.	2405.17969v1	link
2024-05-28	Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment	Keming Lu et.al.	2405.17931v1	null
2024-05-27	Privacy-Aware Visual Language Models	Laurens Samson et.al.	2405.17423v1	null
2024-05-28	Controllable Longer Image Animation with Diffusion Models	Qiang Wang et.al.	2405.17306v2	null
2024-05-27	Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling	Cristian Rodriguez-Opazo et.al.	2405.17139v1	null
2024-05-27	Position: Foundation Agents as the Paradigm Shift for Decision Making	Xiaoqian Liu et.al.	2405.17009v1	null
2024-05-27	Vision-and-Language Navigation Generative Pretrained Transformer	Wen Hanlin et.al.	2405.16994v1	null
2024-05-27	Exploring the LLM Journey from Cognition to Expression with Linear Representations	Yuzi Yan et.al.	2405.16964v1	null
2024-05-27	Anonymization Prompt Learning for Facial Privacy-Preserving Text-to-Image Generation	Liang Shi et.al.	2405.16895v1	null
2024-05-27	Unsupervised Generative Feature Transformation via Graph Contrastive Pre-training and Multi-objective Fine-tuning	Wangyang Ying et.al.	2405.16879v1	null
2024-05-27	CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild	Xingqun Qi et.al.	2405.16874v1	null
2024-05-27	TokenUnify: Scalable Autoregressive Visual Pre-training with Mixture Token Prediction	Yinda Chen et.al.	2405.16847v1	null
2024-05-24	ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models	Chunjiang Ge et.al.	2405.15738v1	link
2024-05-24	Disease-informed Adaptation of Vision-Language Models	Jiajin Zhang et.al.	2405.15728v1	link
2024-05-24	GECKO: Generative Language Model for English, Code and Korean	Sungwoo Oh et.al.	2405.15640v1	null
2024-05-24	SEP: Self-Enhanced Prompt Tuning for Visual-Language Model	Hantao Yao et.al.	2405.15549v1	link
2024-05-24	Polyp Segmentation Generalisability of Pretrained Backbones	Edward Sanderson et.al.	2405.15524v1	null
2024-05-24	Detection and Positive Reconstruction of Cognitive Distortion sentences: Mandarin Dataset and Evaluation	Shuya Lin et.al.	2405.15334v1	null
2024-05-24	StyleMaster: Towards Flexible Stylized Image Generation with Diffusion Models	Chengming Xu et.al.	2405.15287v1	null
2024-05-24	MindShot: Brain Decoding Framework Using Only One Image	Shuai Jiang et.al.	2405.15278v1	null
2024-05-24	Shopping Queries Image Dataset (SQID): An Image-Enriched ESCI Dataset for Exploring Multimodal Learning in Product Search	Marie Al Ghossein et.al.	2405.15190v1	link
2024-05-24	From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks	Jacob Russin et.al.	2405.15164v1	null
2024-05-23	Bitune: Bidirectional Instruction-Tuning	Dawid J. Kopiczko et.al.	2405.14862v1	null
2024-05-23	Semantica: An Adaptable Image-Conditioned Diffusion Model	Manoj Kumar et.al.	2405.14857v1	null
2024-05-23	Analysis of Atom-level pretraining with QM data for Graph Neural Networks Molecular property models	Jose Arjona-Medina et.al.	2405.14837v1	null
2024-05-23	Masked Image Modelling for retinal OCT understanding	Theodoros Pissas et.al.	2405.14788v1	null
2024-05-23	EditWorld: Simulating World Dynamics for Instruction-Following Image Editing	Ling Yang et.al.	2405.14785v1	null
2024-05-23	WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models	Peng Wang et.al.	2405.14768v1	link
2024-05-23	Distilling Vision-Language Pretraining for Efficient Cross-Modal Retrieval	Young Kyun Jang et.al.	2405.14726v1	null
2024-05-23	Towards Cross-modal Backward-compatible Representation Learning for Vision-Language Models	Young Kyun Jang et.al.	2405.14715v1	null
2024-05-23	Combining Denoising Autoencoders with Contrastive Learning to fine-tune Transformer Models	Alejo Lopez-Avila et.al.	2405.14437v1	link
2024-05-23	Look into the Future: Deep Contextualized Sequential Recommendation	Lei Zheng et.al.	2405.14359v1	null
2024-05-21	Personalized Residuals for Concept-Driven Text-to-Image Generation	Cusuh Ham et.al.	2405.12978v1	null
2024-05-21	Transparency Distortion Robustness for SOTA Image Segmentation Tasks	Volker Knauthe et.al.	2405.12864v1	null
2024-05-21	DisenStudio: Customized Multi-subject Text-to-Video Generation with Disentangled Spatial Control	Hong Chen et.al.	2405.12796v1	null
2024-05-21	EchoPT: A Pretrained Transformer Architecture that Predicts 2D In-Air Sonar Images for Mobile Robotics	Jan Steckel et.al.	2405.12573v1	null
2024-05-21	ProtT3: Protein-to-Text Generation for Text-based Protein Understanding	Zhiyuan Liu et.al.	2405.12564v1	link
2024-05-20	Octo: An Open-Source Generalist Robot Policy	Octo Model Team et.al.	2405.12213v1	null
2024-05-20	Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices	Nathaniel Cohen et.al.	2405.12211v1	null
2024-05-20	MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning	Ting Jiang et.al.	2405.12130v1	link
2024-05-21	Sheet Music Transformer ++: End-to-End Full-Page Optical Music Recognition for Pianoform Sheet Music	Antonio Ríos-Vila et.al.	2405.12105v2	link
2024-05-20	Continuous Sign Language Recognition with Adapted Conformer via Unsupervised Pretraining	Neena Aloysius et.al.	2405.12018v1	null
2024-05-20	Biomedical Entity Linking for Dutch: Fine-tuning a Self-alignment BERT Model on an Automatically Generated Wikipedia Corpus	Fons Hartendorp et.al.	2405.11941v1	link
2024-05-20	Depth Prompting for Sensor-Agnostic Depth Estimation	Jin-Hwi Park et.al.	2405.11867v1	null
2024-05-20	SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model	Siavash Shams et.al.	2405.11831v1	null
2024-05-20	MM-Retinal: Knowledge-Enhanced Foundational Pretraining with Fundus Image-Text Expertise	Ruiqi Wu et.al.	2405.11793v1	link
2024-05-20	TinyLLaVA Factory: A Modularized Codebase for Small-scale Large Multimodal Models	Junlong Jia et.al.	2405.11788v1	link
2024-05-17	FA-Depth: Toward Fast and Accurate Self-supervised Monocular Depth Estimation	Fei Wang et.al.	2405.10885v1	link
2024-05-17	Multicenter Privacy-Preserving Model Training for Deep Learning Brain Metastases Autosegmentation	Yixing Huang et.al.	2405.10870v1	null
2024-05-17	Improving face generation quality and prompt following with synthetic captions	Michail Tarasiou et.al.	2405.10864v1	null
2024-05-17	Open-Vocabulary Spatio-Temporal Action Detection	Tao Wu et.al.	2405.10832v1	null
2024-05-17	Specialising and Analysing Instruction-Tuned and Byte-Level Language Models for Organic Reaction Prediction	Jiayun Pang et.al.	2405.10625v1	null
2024-05-17	UniCL: A Universal Contrastive Learning Framework for Large Time Series Models	Jiawei Li et.al.	2405.10597v1	null
2024-05-17	A Deep Learning Approach to Heterogeneous Consumer Aesthetics in Retail Fashion	Pranjal Rawat et.al.	2405.10498v1	null
2024-05-16	Data Selection for Transfer Unlearning	Nazanin Mohammadi Sepahvand et.al.	2405.10425v1	null
2024-05-16	Analogist: Out-of-the-box Visual In-Context Learning with Image Diffusion Model	Zheng Gu et.al.	2405.10316v1	null
2024-05-16	Libra: Building Decoupled Vision System on Large Language Models	Yifan Xu et.al.	2405.10140v1	link
2024-05-16	Continuous Transfer Learning for UAV Communication-aware Trajectory Design	Chenrui Sun et.al.	2405.10087v1	null
2024-05-16	HecVL: Hierarchical Video-Language Pretraining for Zero-shot Surgical Phase Recognition	Kun Yuan et.al.	2405.10075v1	null
2024-05-16	Natural Language Can Help Bridge the Sim2Real Gap	Albert Yu et.al.	2405.10020v1	null
2024-05-16	Histopathology Foundation Models Enable Accurate Ovarian Cancer Subtype Classification	Jack Breen et.al.	2405.09990v1	link
2024-05-16	Cross-sensor self-supervised training and alignment for remote sensing	Valerio Marsocci et.al.	2405.09922v1	null
2024-05-16	TransMI: A Framework to Create Strong Baselines from Multilingual Pretrained Language Models for Transliterated Data	Yihong Liu et.al.	2405.09913v1	link
2024-05-16	IGOT: Information Gain Optimized Tokenizer on Domain Adaptive Pretraining	Dawei Feng et.al.	2405.09857v1	null
2024-05-15	LoRA Learns Less and Forgets Less	Dan Biderman et.al.	2405.09673v1	null
2024-05-15	Time-Equivariant Contrastive Learning for Degenerative Disease Progression in Retinal OCT	Taha Emre et.al.	2405.09404v1	null
2024-05-15	Matching domain experts by training from scratch on domain knowledge	Xiaoliang Luo et.al.	2405.09395v1	null
2024-05-15	HumanRankEval: Automatic Evaluation of LMs as Conversational Assistants	Milan Gritta et.al.	2405.09186v1	null
2024-05-14	Self-supervised vision-langage alignment of deep learning representations for bone X-rays analysis	Alexandre Englebert et.al.	2405.08932v1	null
2024-05-14	CLIP with Quality Captions: A Strong Pretraining for Vision Tasks	Pavan Kumar Anasosalu Vasu et.al.	2405.08911v1	null
2024-05-14	Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding	Zhimin Li et.al.	2405.08748v1	link
2024-05-14	Self-supervised learning improves robustness of deep learning lung tumor segmentation to CT imaging differences	Jue Jiang et.al.	2405.08657v1	null
2024-05-14	Hearing Touch: Audio-Visual Pretraining for Contact-Rich Manipulation	Jared Mejia et.al.	2405.08576v1	null
2024-05-14	Improving Transformers with Dynamically Composable Multi-Head Attention	Da Xiao et.al.	2405.08553v1	link
2024-05-14	Self-Distillation Improves DNA Sequence Inference	Tong Yu et.al.	2405.08538v1	link
2024-05-14	Parameter-Efficient Instance-Adaptive Neural Video Compression	Hyunmo Yang et.al.	2405.08530v1	null
2024-05-14	Investigating the 'Autoencoder Behavior' in Speech Self-Supervised Models: a focus on HuBERT's Pretraining	Valentin Vielzeuf et.al.	2405.08402v1	null
2024-05-14	Could Chemical LLMs benefit from Message Passing	Jiaqing Xie et.al.	2405.08334v1	null
2024-05-13	Rethinking Histology Slide Digitization Workflows for Low-Resource Settings	Talat Zehra et.al.	2405.08169v1	link
2024-05-13	Improving Breast Cancer Grade Prediction with Multiparametric MRI Created Using Optimized Synthetic Correlated Diffusion Imaging	Chi-en Amy Tai et.al.	2405.07861v1	null
2024-05-13	SAR Image Synthesis with Diffusion Models	Denisa Qosja et.al.	2405.07776v1	null
2024-05-13	LlamaTurk: Adapting Open-Source Generative Large Language Models for Low-Resource Language	Cagri Toraman et.al.	2405.07745v1	link
2024-05-13	Environmental Matching Attack Against Unmanned Aerial Vehicles Object Detection	Dehong Kong et.al.	2405.07595v1	null
2024-05-13	Thai Universal Dependency Treebank	Panyut Sriwirote et.al.	2405.07586v1	null
2024-05-13	Consistency Policy: Accelerated Visuomotor Policies via Consistency Distillation	Aaditya Prasad et.al.	2405.07503v1	null
2024-05-13	CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering	Yuanyuan Jiang et.al.	2405.07451v1	null
2024-05-13	Sakuga-42M Dataset: Scaling Up Cartoon Research	Zhenglin Pan et.al.	2405.07425v1	link
2024-05-13	MoVL:Exploring Fusion Strategies for the Domain-Adaptive Application of Pretrained Models in Medical Imaging Tasks	Haijiang Tian et.al.	2405.07411v1	null
2024-05-12	Zero Shot Context-Based Object Segmentation using SLIP (SAM+CLIP)	Saaketh Koundinya Gundavarapu et.al.	2405.07284v1	null
2024-05-10	Federated Document Visual Question Answering: A Pilot Study	Khanh Nguyen et.al.	2405.06636v1	null
2024-05-10	LMD3: Language Model Data Density Dependence	John Kirchenbauer et.al.	2405.06331v1	null
2024-05-10	Decoding Emotions in Abstract Art: Cognitive Plausibility of CLIP in Recognizing Color-Emotion Associations	Hanna-Sophia Widhoelzl et.al.	2405.06319v1	null
2024-05-10	SaudiBERT: A Large Language Model Pretrained on Saudi Dialect Corpora	Faisal Qarah et.al.	2405.06239v1	null
2024-05-10	VLSM-Adapter: Finetuning Vision-Language Segmentation Efficiently with Lightweight Blocks	Manish Dhakal et.al.	2405.06196v1	null
2024-05-10	ACTION: Augmentation and Computation Toolbox for Brain Network Analysis with Functional MRI	Yuqi Fang et.al.	2405.06178v1	null
2024-05-09	UnSegGNet: Unsupervised Image Segmentation using Graph Neural Networks	Kovvuri Sai Gopal Reddy et.al.	2405.06057v1	link
2024-05-09	Efficient Pretraining Model based on Multi-Scale Local Visual Field Feature Reconstruction for PCB CT Image Element Segmentation	Chen Chen et.al.	2405.05745v1	null
2024-05-09	Parameter-Efficient Fine-Tuning With Adapters	Keyu Chen et.al.	2405.05493v1	null
2024-05-09	PLLM-CS: Pre-trained Large Language Model (LLM) for Cyber Threat Detection in Satellite Networks	Mohammed Hassanin et.al.	2405.05469v1	null
2024-05-08	Deep Learning Method to Predict Wound Healing Progress Based on Collagen Fibers in Wound Tissue	Juan He et.al.	2405.05297v1	null
2024-05-08	Encoder-Decoder Framework for Interactive Free Verses with Generation with Controllable High-Quality Rhyming	Tommaso Pasini et.al.	2405.05176v1	null
2024-05-08	Seeds of Stereotypes: A Large-Scale Textual Analysis of Race and Gender Associations with Diseases in Online Sources	Lasse Hyldig Hansen et.al.	2405.05049v1	null
2024-05-08	${M^2D}$NeRF: Multi-Modal Decomposition NeRF with 3D Feature Fields	Ning Wang et.al.	2405.05010v1	null
2024-05-08	ChuXin: 1.6B Technical Report	Xiaomin Zhuang et.al.	2405.04828v1	null
2024-05-07	Remote Diffusion	Kunal Sunil Kasodekar et.al.	2405.04717v1	null
2024-05-07	Bridging the Bosphorus: Advancing Turkish Large Language Models through Strategies for Low-Resource Language Adaptation and Benchmarking	Emre Can Acikgoz et.al.	2405.04685v1	null
2024-05-07	TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation	Hritik Bansal et.al.	2405.04682v1	null
2024-05-07	S3Former: Self-supervised High-resolution Transformer for Solar PV Profiling	Minh Tran et.al.	2405.04489v1	null
2024-05-08	DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model	DeepSeek-AI et.al.	2405.04434v2	link
2024-05-07	Cross-IQA: Unsupervised Learning for Image Quality Assessment	Zhen Zhang et.al.	2405.04311v1	null
2024-05-07	Sign2GPT: Leveraging Large Language Models for Gloss-Free Sign Language Translation	Ryan Wong et.al.	2405.04164v1	null
2024-05-07	Locally Differentially Private In-Context Learning	Chunyan Zheng et.al.	2405.04032v1	null
2024-05-07	SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing	Yuying Ge et.al.	2405.04007v1	null
2024-05-07	Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application	Jian Jia et.al.	2405.03988v1	null
2024-05-07	Contextualization with SPLADE for High Recall Retrieval	Eugene Yang et.al.	2405.03972v1	link
2024-05-07	AdsorbDiff: Adsorbate Placement via Conditional Denoising Diffusion	Adeesh Kolluru et.al.	2405.03962v1	null
2024-05-06	Provable Preconditioned Plug-and-Play Approach for Compressed Sensing MRI Reconstruction	Tao Hong et.al.	2405.03854v1	null
2024-05-06	Pose Priors from Language Models	Sanjay Subramanian et.al.	2405.03689v1	null
2024-05-06	AtomGPT: Atomistic Generative Pre-trained Transformer for Forward and Inverse Materials Design	Kamal Choudhary et.al.	2405.03680v1	null
2024-05-06	Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment	Abhinav Agarwalla et.al.	2405.03594v1	null
2024-05-06	Whispy: Adapting STT Whisper Models to Real-Time Environments	Antonio Bevilacqua et.al.	2405.03484v1	null
2024-05-06	Adapting Dual-encoder Vision-language Models for Paraphrased Retrieval	Jiacheng Cheng et.al.	2405.03190v1	null
2024-05-06	GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document Understanding	Nil Biescas et.al.	2405.03104v1	null
2024-05-06	SketchGPT: Autoregressive Modeling for Sketch Generation and Recognition	Adarsh Tiwari et.al.	2405.03099v1	null
2024-05-05	RepAugment: Input-Agnostic Representation-Level Augmentation for Respiratory Sound Classification	June-Woo Kim et.al.	2405.02996v1	null
2024-05-05	Score-based Generative Priors Guided Model-driven Network for MRI Reconstruction	Xiaoyu Qiao et.al.	2405.02958v1	null
2024-05-05	IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs	Yuzhen Mao et.al.	2405.02842v1	null
2024-05-03	Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets	Xuelong Geng et.al.	2405.02132v1	null
2024-05-03	A Mutual Information Perspective on Federated Contrastive Learning	Christos Louizos et.al.	2405.02081v1	null
2024-05-03	SATO: Stable Text-to-Motion Framework	Wenshuo Chen et.al.	2405.01461v2	link
2024-05-02	StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation	Yupeng Zhou et.al.	2405.01434v1	link
2024-05-02	CromSS: Cross-modal pre-training with noisy labels for remote sensing image segmentation	Chenying Liu et.al.	2405.01217v1	null
2024-05-02	Language Fairness in Multilingual Information Retrieval	Eugene Yang et.al.	2405.00978v1	link
2024-05-02	PLAID SHIRTTT for Large-Scale Streaming Dense Retrieval	Dawn Lawrie et.al.	2405.00975v1	link
2024-05-01	Transformer-Based Self-Supervised Learning for Histopathological Classification of Ischemic Stroke Clot Origin	K. Yeh et.al.	2405.00908v1	null
2024-05-01	SonicDiffusion: Audio-Driven Image Generation and Editing with Pretrained Diffusion Models	Burak Can Biner et.al.	2405.00878v1	null
2024-05-01	Adapting Pretrained Networks for Image Quality Assessment on High Dynamic Range Displays	Andrei Chubarau et.al.	2405.00670v1	null
2024-05-01	Are Models Biased on Text without Gender-related Language?	Catarina G Belém et.al.	2405.00588v1	link
2024-05-01	Self-supervised Pre-training of Text Recognizers	Martin Kišš et.al.	2405.00420v1	link
2024-05-01	Expert Insight-Enhanced Follow-up Chest X-Ray Summary Generation	Zhichuan Wang et.al.	2405.00344v1	null
2024-04-30	PAODING: A High-fidelity Data-free Pruning Toolkit for Debloating Pre-trained Neural Networks	Mark Huasong Meng et.al.	2405.00074v1	null
2024-04-30	Seeing Through the Clouds: Cloud Gap Imputation with Prithvi Foundation Model	Denys Godwin et.al.	2404.19609v1	null
2024-04-30	Automatic Cardiac Pathology Recognition in Echocardiography Images Using Higher Order Dynamic Mode Decomposition and a Vision Transformer for Small Datasets	Andrés Bell-Navas et.al.	2404.19579v1	null
2024-04-30	CLIP-Mamba: CLIP Pretrained Mamba Models with OOD and Hessian Evaluation	Weiquan Huang et.al.	2404.19394v1	link
2024-04-30	Knowledge Distillation vs. Pretraining from Scratch under a Fixed (Computation) Budget	Minh Duc Bui et.al.	2404.19319v1	null
2024-04-30	Robust Pedestrian Detection via Constructing Versatile Pedestrian Knowledge Bank	Sungjune Park et.al.	2404.19299v1	null
2024-04-30	Revisiting the Adversarial Robustness of Vision Language Models: a Multimodal Perspective	Wanqi Zhou et.al.	2404.19287v1	null
2024-04-30	Understanding Multimodal Contrastive Learning Through Pointwise Mutual Information	Toshimitsu Uesaka et.al.	2404.19228v1	null
2024-04-29	What Drives Performance in Multilingual Language Models?	Sina Bagheri Nezhad et.al.	2404.19159v1	link
2024-04-29	Swin2-MoSE: A New Single Image Super-Resolution Model for Remote Sensing	Leonardo Rossi et.al.	2404.18924v1	null
2024-04-29	Overcoming Knowledge Barriers: Online Imitation Learning from Observation with Pretrained World Models	Xingyuan Zhang et.al.	2404.18896v1	null
2024-04-29	It's Difficult to be Neutral -- Human and LLM-based Sentiment Annotation of Patient Comments	Petter Mæhlum et.al.	2404.18832v1	null
2024-04-30	PatentGPT: A Large Language Model for Intellectual Property	Zilong Bai et.al.	2404.18255v2	null
2024-04-28	Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment	Tengjun Huang et.al.	2404.18253v1	link
2024-04-28	TextGram: Towards a better domain-adaptive pretraining	Sharayu Hiwarkhedkar et.al.	2404.18228v1	null
2024-04-28	Can Perplexity Predict Fine-Tuning Performance? An Investigation of Tokenization Effects on Sequential Language Models for Nepali	Nishant Luitel et.al.	2404.18071v1	null
2024-04-28	Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model	Xiaolong Li et.al.	2404.18065v1	null
2024-04-27	Critical Review for One-class Classification: recent advances and the reality behind them	Toshitaka Hayashi et.al.	2404.17931v1	null
2024-04-27	T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining	Yi Yuan et.al.	2404.17806v1	null
2024-04-26	Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo	Stephen Zhao et.al.	2404.17546v1	null
2024-04-26	Low Cost Machine Vision for Insect Classification	Danja Brandt et.al.	2404.17488v1	null
2024-04-26	SAGHOG: Self-Supervised Autoencoder for Generating HOG Features for Writer Retrieval	Marco Peer et.al.	2404.17221v1	link
2024-04-26	Self-supervised visual learning in the low-data regime: a comparative evaluation	Sotirios Konstantakos et.al.	2404.17202v1	null
2024-04-26	Few-shot Calligraphy Style Learning	Fangda Chen et.al.	2404.17199v1	link
2024-04-26	TIGQA:An Expert Annotated Question Answering Dataset in Tigrinya	Hailay Teklehaymanot et.al.	2404.17194v1	null
2024-04-25	Türkçe Dil Modellerinin Performans Karşılaştırması Performance Comparison of Turkish Language Models	Eren Dogan et.al.	2404.17010v1	null
2024-04-25	Constellation Dataset: Benchmarking High-Altitude Object Detection for an Urban Intersection	Mehmet Kerem Turkcan et.al.	2404.16944v1	link
2024-04-25	A Short Survey of Human Mobility Prediction in Epidemic Modeling from Transformers to LLMs	Christian N. Mayemba et.al.	2404.16921v1	null
2024-04-25	Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding	Mostafa Elhoushi et.al.	2404.16710v1	null
2024-04-25	Road Surface Friction Estimation for Winter Conditions Utilising General Visual Features	Risto Ojala et.al.	2404.16578v1	null
2024-04-25	Leveraging Pretrained Latent Representations for Few-Shot Imitation Learning on a Dexterous Robotic Hand	Davide Liconti et.al.	2404.16483v1	null
2024-04-25	Leveraging tropical reef, bird and unrelated sounds for superior transfer learning in marine bioacoustics	Ben Williams et.al.	2404.16436v1	null
2024-04-25	TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models	Haomiao Ni et.al.	2404.16306v1	null
2024-04-24	Towards a Holistic Evaluation of LLMs on Factual Knowledge Recall	Jiaqing Yuan et.al.	2404.16164v1	null
2024-04-24	FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication	Eric Slyman et.al.	2404.16123v1	null
2024-04-24	MoDE: CLIP Data Experts via Clustering	Jiawei Ma et.al.	2404.16030v1	link
2024-04-24	Representing Part-Whole Hierarchies in Foundation Models by Learning Localizability, Composability, and Decomposability from Anatomy via Self-Supervision	Mohammad Reza Hosseinzadeh Taher et.al.	2404.15672v1	null
2024-04-24	HybridVC: Efficient Voice Style Conversion with Text and Audio Prompts	Xinlei Niu et.al.	2404.15637v1	null
2024-04-24	Can Foundational Large Language Models Assist with Conducting Pharmaceuticals Manufacturing Investigations?	Hossein Salami et.al.	2404.15578v1	null
2024-04-24	Retrieval Head Mechanistically Explains Long-Context Factuality	Wenhao Wu et.al.	2404.15574v1	null
2024-04-23	SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation	Xiangyu Xu et.al.	2404.15276v1	link
2024-04-23	CT-GLIP: 3D Grounded Language-Image Pretraining with CT Scans and Radiology Reports for Full-Body Scenarios	Jingyang Lin et.al.	2404.15272v1	null
2024-04-23	Setting up the Data Printer with Improved English to Ukrainian Machine Translation	Yurii Paniv et.al.	2404.15196v1	link
2024-04-23	Combating Missing Modalities in Egocentric Videos at Test Time	Merey Ramazanova et.al.	2404.15161v1	null
2024-04-23	DP-Net: Learning Discriminative Parts for image recognition	Ronan Sicre et.al.	2404.15037v1	null
2024-04-23	IPAD: Industrial Process Anomaly Detection Dataset	Jinfan Liu et.al.	2404.15033v1	null
2024-04-23	Multi-Modal Prompt Learning on Blind Image Quality Assessment	Wensheng Pan et.al.	2404.14949v1	null
2024-04-23	Driver Activity Classification Using Generalizable Representations from Vision-Language Models	Ross Greer et.al.	2404.14906v1	null
2024-04-23	FMint: Bridging Human Designed and Data Pretrained Models for Differential Equation Foundation Model	Zezheng Song et.al.	2404.14688v1	null
2024-04-23	Automated Multi-Language to English Machine Translation Using Generative Pre-Trained Transformers	Elijah Pelofske et.al.	2404.14680v1	null
2024-04-22	PARAMANU-GANITA: Language Model with Mathematical Capabilities	Mitodru Niyogi et.al.	2404.14395v1	null
2024-04-22	Calc-CMU at SemEval-2024 Task 7: Pre-Calc -- Learning to Use the Calculator Improves Numeracy in Language Models	Vishruth Veerendranath et.al.	2404.14355v1	link
2024-04-22	Automatic Discovery of Visual Circuits	Achyuta Rajaram et.al.	2404.14349v1	link
2024-04-22	Heterogeneous Face Recognition Using Domain Invariant Units	Anjith George et.al.	2404.14343v1	null
2024-04-22	Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels	Jan-Philipp Fränken et.al.	2404.14313v1	link
2024-04-22	OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks	Sophia Sirko-Galouchenko et.al.	2404.14027v1	null
2024-04-22	EventLens: Leveraging Event-Aware Pretraining and Cross-modal Linking Enhances Visual Commonsense Reasoning	Mingjie Ma et.al.	2404.13847v1	null
2024-04-21	FiLo: Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality Localization	Zhaopeng Gu et.al.	2404.13671v1	null
2024-04-21	PEACH: Pretrained-embedding Explanation Across Contextual and Hierarchical Structure	Feiqi Cao et.al.	2404.13645v1	link
2024-04-21	Lost in Space: Probing Fine-grained Spatial Understanding in Vision and Language Resamplers	Georgios Pantazopoulos et.al.	2404.13594v1	link
2024-04-19	MoVA: Adapting Mixture of Vision Experts to Multimodal Context	Zhuofan Zong et.al.	2404.13046v1	link
2024-04-19	Training-and-prompt-free General Painterly Harmonization Using Image-wise Attention Sharing	Teng-Fang Hsiao et.al.	2404.12900v1	link
2024-04-19	Grasper: A Generalist Pursuer for Pursuit-Evasion Problems	Pengdeng Li et.al.	2404.12626v1	link
2024-04-18	Towards Large Language Models as Copilots for Theorem Proving in Lean	Peiyang Song et.al.	2404.12534v1	link
2024-04-18	Understanding Optimal Feature Transfer via a Fine-Grained Bias-Variance Analysis	Yufan Li et.al.	2404.12481v1	null
2024-04-18	mOthello: When Do Cross-Lingual Representation Alignment and Cross-Lingual Transfer Emerge in Multilingual Models?	Tianze Hua et.al.	2404.12444v1	null
2024-04-18	MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale	Xiaotang Gai et.al.	2404.12372v1	null
2024-04-18	AniClipart: Clipart Animation with Text-to-Video Priors	Ronghuan Wu et.al.	2404.12347v1	null
2024-04-18	GraFIQs: Face Image Quality Assessment Using Gradient Magnitudes	Jan Niklas Kolf et.al.	2404.12203v1	link
2024-04-18	OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data	Chandeepa Dissanayake et.al.	2404.12195v1	link
2024-04-18	How to Benchmark Vision Foundation Models for Semantic Segmentation?	Tommie Kerssies et.al.	2404.12172v1	null
2024-04-18	Aligning language models with human preferences	Tomasz Korbak et.al.	2404.12150v1	link
2024-04-18	MaskCD: A Remote Sensing Change Detection Network Based on Mask Classification	Weikang Yu et.al.	2404.12081v1	link
2024-04-18	Simultaneous Detection and Interaction Reasoning for Object-Centric Action Recognition	Xunsong Li et.al.	2404.11903v1	null
2024-04-17	How often are errors in natural language reasoning due to paraphrastic variability?	Neha Srikanth et.al.	2404.11717v1	null
2024-04-17	Pretraining Billion-scale Geospatial Foundational Models on Frontier	Aristeidis Tsaris et.al.	2404.11706v1	null
2024-04-17	On the Scalability of GNNs for Molecular Graphs	Maciej Sypetkowski et.al.	2404.11568v1	null
2024-04-17	Predicting Long-horizon Futures by Conditioning on Geometry and Time	Tarasha Khurana et.al.	2404.11554v1	null
2024-04-17	ScaleFold: Reducing AlphaFold Initial Training Time to 10 Hours	Feiwen Zhu et.al.	2404.11068v1	null
2024-04-17	Lightweight Unsupervised Federated Learning with Pretrained Vision Language Model	Hao Yan et.al.	2404.11046v1	null
2024-04-17	Many-Shot In-Context Learning	Rishabh Agarwal et.al.	2404.11018v1	null
2024-04-17	MaeFuse: Transferring Omni Features with Pretrained Masked Autoencoders for Infrared and Visible Image Fusion via Guided Training	Jiayang Li et.al.	2404.11016v1	null
2024-04-16	More Room for Language: Investigating the Effect of Retrieval on Language Models	David Samuel et.al.	2404.10939v1	null
2024-04-16	Retrieval Augmented Verification : Unveiling Disinformation with Structured Representations for Zero-Shot Real-Time Evidence-guided Fact-Checking of Multi-modal Social media posts	Arka Ujjal Dey et.al.	2404.10702v1	null
2024-04-17	Do Counterfactual Examples Complicate Adversarial Training?	Eric Yeats et.al.	2404.10588v2	null
2024-04-17	Optimization of Prompt Learning via Multi-Knowledge Representation for Vision-Language Models	Enming Zhang et.al.	2404.10357v2	null
2024-04-16	From Data Deluge to Data Curation: A Filtering-WoRA Paradigm for Efficient Text-based Person Search	Jintao Sun et.al.	2404.10292v1	null
2024-04-16	Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology	Oren Kraus et.al.	2404.10242v1	link
2024-04-16	Compressible and Searchable: AI-native Multi-Modal Retrieval System with Learned Image Compression	Jixiang Luo et.al.	2404.10234v1	null
2024-04-15	Self-Supervised Learning Featuring Small-Scale Image Dataset for Treatable Retinal Diseases Classification	Luffina C. Huang et.al.	2404.10166v1	null
2024-04-15	NOISe: Nuclei-Aware Osteoclast Instance Segmentation for Mouse-to-Human Domain Transfer	Sai Kumar Reddy Manne et.al.	2404.10130v1	link
2024-04-15	Explainable Light-Weight Deep Learning Pipeline for Improved Drought Stres	Aswini Kumar Patra et.al.	2404.10073v1	null
2024-04-15	EgoPet: Egomotion and Interaction Data from an Animal's Perspective	Amir Bar et.al.	2404.09991v1	null
2024-04-15	Contrastive Pretraining for Visual Concept Explanations of Socioeconomic Outcomes	Ivica Obadic et.al.	2404.09768v1	null
2024-04-15	Bridging Vision and Language Spaces with Assignment Prediction	Jungin Park et.al.	2404.09632v1	link
2024-04-15	Magic Clothing: Controllable Garment-Driven Image Synthesis	Weifeng Chen et.al.	2404.09512v1	link
2024-04-15	Leveraging Temporal Contextualization for Video Action Recognition	Minji Kim et.al.	2404.09490v1	null
2024-04-15	RankCLIP: Ranking-Consistent Language-Image Pretraining	Yiming Zhang et.al.	2404.09387v1	null
2024-04-16	Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment	Zhiqing Hong et.al.	2404.09313v2	null
2024-04-13	MaSkel: A Model for Human Whole-body X-rays Generation from Human Masking Images	Yingjie Xi et.al.	2404.09000v1	link
2024-04-13	DeDoDe v2: Analyzing and Improving the DeDoDe Keypoint Detector	Johan Edstedt et.al.	2404.08928v1	link
2024-04-13	Is Next Token Prediction Sufficient for GPT? Exploration on Code Logic Comprehension	Mengnan Qi et.al.	2404.08885v1	null
2024-04-12	BERT-LSH: Reducing Absolute Compute For Attention	Zezheng Li et.al.	2404.08836v1	null
2024-04-12	Probing the 3D Awareness of Visual Foundation Models	Mohamed El Banani et.al.	2404.08636v1	link
2024-04-12	Pre-training Small Base LMs with Fewer Tokens	Sunny Sanyal et.al.	2404.08634v1	link
2024-04-12	Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation	Haozhe Zhao et.al.	2404.08491v1	link
2024-04-12	OTTER: Improving Zero-Shot Classification via Optimal Transport	Changho Shin et.al.	2404.08461v1	null
2024-04-12	AdapterSwap: Continuous Training of LLMs with Data Removal and Access-Control Guarantees	William Fleshman et.al.	2404.08417v1	null
2024-04-12	Pretraining and Updating Language- and Domain-specific Large Language Model: A Case Study in Japanese Business Domain	Kosuke Takahashi et.al.	2404.08262v1	null
2024-04-12	Improving Continuous Sign Language Recognition with Adapted Image Models	Lianyu Hu et.al.	2404.08226v1	link
2024-04-12	Measuring Cross-lingual Transfer in Bytes	Leandro Rodrigues de Souza et.al.	2404.08191v1	link
2024-04-11	Self-supervised Dataset Distillation: A Good Compression Is All You Need	Muxin Zhou et.al.	2404.07976v1	link
2024-04-11	Rho-1: Not All Tokens Are What You Need	Zhenghao Lin et.al.	2404.07965v1	link
2024-04-11	MindBridge: A Cross-Subject Brain Decoding Framework	Shizun Wang et.al.	2404.07850v1	link
2024-04-11	Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck	Nathan Godey et.al.	2404.07647v1	null
2024-04-11	Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval	Minkuk Kim et.al.	2404.07610v1	link
2024-04-11	GLID: Pre-training a Generalist Encoder-Decoder Vision Model	Jihao Liu et.al.	2404.07603v1	null
2024-04-11	A fine-tuning workflow for automatic first-break picking with deep learning	Amir Mardan et.al.	2404.07400v1	link
2024-04-10	Accurate Tennis Court Line Detection on Amateur Recorded Matches	Sameer Agrawal et.al.	2404.06977v1	null
2024-04-10	GraSAME: Injecting Token-Level Structural Information to Pretrained Language Models via Graph-guided Self-Attention Mechanism	Shuzhou Yuan et.al.	2404.06911v1	null
2024-04-10	Text-Based Reasoning About Vector Graphics	Zhenhailong Wang et.al.	2404.06479v2	null
2024-04-10	MuPT: A Generative Symbolic Music Pretrained Transformer	Xingwei Qu et.al.	2404.06393v2	null
2024-04-11	On adversarial training and the 1 Nearest Neighbor classifier	Amir Hagai et.al.	2404.06313v2	link
2024-04-09	ColorMNet: A Memory-based Deep Spatial-Temporal Feature Propagation Network for Video Colorization	Yixin Yang et.al.	2404.06251v1	link
2024-04-09	Anchor-based Robust Finetuning of Vision-Language Models	Jinwei Han et.al.	2404.06244v1	null
2024-04-09	[Call for Papers] The 2nd BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus	Leshem Choshen et.al.	2404.06214v1	null
2024-04-09	OmniFusion Technical Report	Elizaveta Goncharova et.al.	2404.06212v1	link
2024-04-09	Unified Multi-modal Diagnostic Framework with Reconstruction Pre-training and Heterogeneity-combat Tuning	Yupei Zhang et.al.	2404.06057v1	link
2024-04-09	Online/Offline Learning to Enable Robust Beamforming: Limited Feedback Meets Deep Generative Models	Ying Li et.al.	2404.06055v1	null
2024-04-08	Language-Independent Representations Improve Zero-Shot Summarization	Vladimir Solovyev et.al.	2404.05720v1	null
2024-04-08	MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning	Matteo Farina et.al.	2404.05621v1	null
2024-04-08	Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining	Nikola Ljubešić et.al.	2404.05428v1	link
2024-04-08	Relation Extraction Using Large Language Models: A Case Study on Acupuncture Point Locations	Yiming Li et.al.	2404.05415v1	null
2024-04-07	StockGPT: A GenAI Model for Stock Prediction and Trading	Dat Mai et.al.	2404.05101v1	null
2024-04-07	AUEditNet: Dual-Branch Facial Action Unit Intensity Manipulation with Implicit Disentanglement	Shiwei Jin et.al.	2404.05063v1	null
2024-04-07	PagPassGPT: Pattern Guided Password Guessing via Generative Pretrained Transformer	Xingyu Su et.al.	2404.04886v1	link
2024-04-07	Msmsfnet: a multi-stream and multi-scale fusion net for edge detection	Chenguang Liu et.al.	2404.04856v1	null
2024-04-07	F-MALLOC: Feed-forward Memory Allocation for Continual Learning in Neural Machine Translation	Junhong Wu et.al.	2404.04846v1	null
2024-04-07	Data Bias According to Bipol: Men are Naturally Right and It is the Role of Women to Follow Their Lead	Irene Pagliai et.al.	2404.04838v1	null
2024-04-05	Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model	Xinrun Du et.al.	2404.04167v1	null
2024-04-04	No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance	Vishaal Udandarao et.al.	2404.04125v1	link
2024-04-05	Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation	Mingyuan Zhou et.al.	2404.04057v1	null
2024-04-05	Teaching Llama a New Language Through Cross-Lingual Knowledge Transfer	Hele-Andra Kuulmets et.al.	2404.04042v1	null
2024-04-05	Willkommens-Merkel, Chaos-Johnson, and Tore-Klose: Modeling the Evaluative Meaning of German Personal Name Compounds	Annerose Eichel et.al.	2404.04031v1	null
2024-04-04	Layerwise Early Stopping for Test Time Adaptation	Sabyasachi Sahoo et.al.	2404.03784v1	null
2024-04-04	DiffBody: Human Body Restoration by Imagining with Generative Diffusion Prior	Yiming Zhang et.al.	2404.03642v1	null
2024-04-04	Learn When (not) to Trust Language Models: A Privacy-Centric Adaptive Model-Aware Approach	Chengkai Huang et.al.	2404.03514v1	null
2024-04-04	A Cause-Effect Look at Alleviating Hallucination of Knowledge-grounded Dialogue Generation	Jifan Yu et.al.	2404.03491v1	null
2024-04-04	Scaling Up Video Summarization Pretraining with Large Language Models	Dawit Mureja Argaw et.al.	2404.03398v1	null
2024-04-03	Scaling Laws for Galaxy Images	Mike Walmsley et.al.	2404.02973v1	link
2024-04-03	MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation	Petru-Daniel Tudosiu et.al.	2404.02790v1	null
2024-04-03	CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech	Jaehyeon Kim et.al.	2404.02781v1	null
2024-04-03	DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement	Hao Wu et.al.	2404.02755v1	null
2024-04-03	Cross-Architecture Transfer Learning for Linear-Cost Inference Transformers	Sehyun Choi et.al.	2404.02684v1	null
2024-04-03	Large Language Models for Expansion of Spoken Language Understanding Systems to New Languages	Jakub Hoscilowicz et.al.	2404.02588v1	link
2024-04-03	The Promises and Pitfalls of Using Language Models to Measure Instruction Quality in Education	Paiheng Xu et.al.	2404.02444v1	null
2024-04-03	What Are We Measuring When We Evaluate Large Vision-Language Models? An Analysis of Latent Factors and Biases	Anthony Meng Huat Tiong et.al.	2404.02415v1	link
2024-04-02	Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Orthogonal Diffusion Models	Zeyu Yang et.al.	2404.02148v1	link
2024-04-02	Iterated Learning Improves Compositionality in Large Vision-Language Models	Chenhao Zheng et.al.	2404.02145v1	null
2024-04-03	ViTamin: Designing Scalable Vision Models in the Vision-Language Era	Jieneng Chen et.al.	2404.02132v2	link
2024-04-02	FLawN-T5: An Empirical Examination of Effective Instruction-Tuning Data Mixtures for Legal Reasoning	Joel Niklaus et.al.	2404.02127v1	link
2024-04-02	Adaptive Feature Fusion Neural Network for Glaucoma Segmentation on Unseen Fundus Images	Jiyuan Zhong et.al.	2404.02084v1	null
2024-04-02	Noise Masking Attacks and Defenses for Pretrained Speech Models	Matthew Jagielski et.al.	2404.02052v1	null
2024-04-02	Dissecting Paraphrases: The Impact of Prompt Syntax and supplementary Information on Knowledge Retrieval from Pretrained Language Models	Stephan Linzbach et.al.	2404.01992v1	null
2024-04-02	Activation Steering for Robust Type Prediction in CodeLLMs	Francesca Lucchetti et.al.	2404.01903v1	null
2024-04-02	Poro 34B and the Blessing of Multilinguality	Risto Luukkonen et.al.	2404.01856v1	null
2024-04-02	Where to Move Next: Zero-shot Generalization of LLMs for Next POI Recommendation	Shanshan Feng et.al.	2404.01855v1	null
2024-03-29	Convolutional Prompting meets Language Models for Continual Learning	Anurag Roy et.al.	2403.20317v1	null
2024-03-29	Latxa: An Open Language Model and Evaluation Suite for Basque	Julen Etxaniz et.al.	2403.20266v1	link
2024-03-29	Long-Tailed Anomaly Detection with Learnable Class Names	Chih-Hui Ho et.al.	2403.20236v1	null
2024-03-29	StegoGAN: Leveraging Steganography for Non-Bijective Image-to-Image Translation	Sidi Wu et.al.	2403.20142v1	null
2024-03-29	FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models	Barbara Toniella Corradini et.al.	2403.20105v1	null
2024-03-29	Negative Label Guided OOD Detection with Pretrained Vision-Language Models	Xue Jiang et.al.	2403.20078v1	link
2024-03-28	Siamese Vision Transformers are Scalable Audio-visual Learners	Yan-Bo Lin et.al.	2403.19638v1	link
2024-03-28	SA-GS: Scale-Adaptive Gaussian Splatting for Training-Free Anti-Aliasing	Xiaowei Song et.al.	2403.19615v1	link
2024-03-28	LocCa: Visual Pretraining with Location-aware Captioners	Bo Wan et.al.	2403.19596v1	null
2024-03-28	Situation Awareness for Driver-Centric Driving Style Adaptation	Johann Haselberger et.al.	2403.19595v1	link
2024-03-28	Keypoint Action Tokens Enable In-Context Imitation Learning in Robotics	Norman Di Palo et.al.	2403.19578v1	null
2024-03-28	Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment	Alireza Ganjdanesh et.al.	2403.19490v1	null
2024-03-28	Checkpoint Merging via Bayesian Optimization in LLM Pretraining	Deyuan Liu et.al.	2403.19390v1	null
2024-03-28	NaijaHate: Evaluating Hate Speech Detection on Nigerian Twitter Using Representative Data	Manuel Tonneau et.al.	2403.19260v1	link
2024-03-29	STaR-GATE: Teaching Language Models to Ask Clarifying Questions	Chinmaya Andukuri et.al.	2403.19154v2	null
2024-03-28	Instruction-based Hypergraph Pretraining	Mingdai Yang et.al.	2403.19063v1	null
2024-03-27	Bringing Textual Prompt to AI-Generated Image Quality Assessment	Bowen Qu et.al.	2403.18714v1	null
2024-03-27	Noise-Robust Keyword Spotting through Self-supervised Pretraining	Jacob Mørk et.al.	2403.18560v1	null
2024-03-27	OrCo: Towards Better Generalization via Orthogonality and Contrast for Few-Shot Class-Incremental Learning	Noor Ahmed et.al.	2403.18550v1	null
2024-03-27	Enhanced Generative Recommendation via Content and Collaboration Integration	Yidan Wang et.al.	2403.18480v1	null
2024-03-27	NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation	Jingyang Huo et.al.	2403.18211v1	null
2024-03-26	Juru: Legal Brazilian Large Language Model from Reputable Sources	Roseval Malaquias Junior et.al.	2403.18140v1	null
2024-03-26	The Impact of Syntactic and Semantic Proximity on Machine Translation with Back-Translation	Nicolas Guerin et.al.	2403.18031v1	null
2024-03-26	The Unreasonable Ineffectiveness of the Deeper Layers	Andrey Gromov et.al.	2403.17887v1	null
2024-03-26	GenesisTex: Adapting Image Denoising Diffusion to Texture Space	Chenjian Gao et.al.	2403.17782v1	null
2024-03-26	Leave No Patient Behind: Enhancing Medication Recommendation for Rare Disease Patients	Zihao Zhao et.al.	2403.17745v1	null
2024-03-26	Masked Autoencoders are PDE Learners	Anthony Zhou et.al.	2403.17728v1	null
2024-03-26	REFeREE: A REference-FREE Model-Based Metric for Text Simplification	Yichen Huang et.al.	2403.17640v1	link
2024-03-25	Exploring CausalWorld: Enhancing robotic manipulation via knowledge transfer and curriculum learning	Xinrui Wang et.al.	2403.17266v1	null
2024-03-25	Joint chest X-ray diagnosis and clinical visual attention prediction with multi-stage cooperative learning: enhancing interpretability	Zirui Qiu et.al.	2403.16970v1	null
2024-03-25	Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance	Jiasheng Ye et.al.	2403.16952v1	link
2024-03-25	Make-It-Vivid: Dressing Your Animatable Biped Cartoon Characters from Text	Junshu Tang et.al.	2403.16897v1	null
2024-03-25	Can Machine Translation Bridge Multilingual Pretraining and Cross-lingual Transfer Learning?	Shaoxiong Ji et.al.	2403.16777v1	null
2024-03-25	ProCQA: A Large-scale Community-based Programming Question Answering Dataset for Code Search	Zehan Li et.al.	2403.16702v1	null
2024-03-25	A comparative analysis of embedding models for patent similarity	Grazia Sveva Ascione et.al.	2403.16630v1	null
2024-03-25	Elysium: Exploring Object-level Perception in Videos via MLLM	Han Wang et.al.	2403.16558v1	link
2024-03-25	An Intermediate Fusion ViT Enables Efficient Text-Image Alignment in Diffusion Models	Zizhao Hu et.al.	2403.16530v1	null
2024-03-25	Self-Supervised Learning for Medical Image Data with Anatomy-Oriented Imaging Planes	Tianwei Zhang et.al.	2403.16499v1	null
2024-03-25	PathoTune: Adapting Visual Foundation Model to Pathological Specialists	Jiaxuan Lu et.al.	2403.16497v1	null
2024-03-25	LSTTN: A Long-Short Term Transformer-based Spatio-temporal Neural Network for Traffic Flow Forecasting	Qinyao Luo et.al.	2403.16495v1	null
2024-03-25	DeepMachining: Online Prediction of Machining Errors of Lathe Machines	Xiang-Li Lu et.al.	2403.16451v1	null
2024-03-25	KIT-19: A Comprehensive Korean Instruction Toolkit on 19 Tasks for Fine-Tuning Korean Large Language Models	Dongjun Jang et.al.	2403.16444v1	null
2024-03-22	Long-CLIP: Unlocking the Long-Text Capability of CLIP	Beichen Zhang et.al.	2403.15378v1	null
2024-03-22	CoLLEGe: Concept Embedding Generation for Large Language Models	Ryan Teehan et.al.	2403.15362v1	null
2024-03-22	Neural Plasticity-Inspired Foundation Model for Observing the Earth Crossing Modalities	Zhitong Xiong et.al.	2403.15356v1	null
2024-03-22	SFOD: Spiking Fusion Object Detector	Yimeng Fan et.al.	2403.15192v1	link
2024-03-22	Brain-grounding of semantic vectors improves neural decoding of visual stimuli	Shirin Vafaei et.al.	2403.15176v1	null
2024-03-22	LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement	Nicholas Lee et.al.	2403.15042v1	null
2024-03-22	Risk and Response in Large Language Models: Evaluating Key Threat Categories	Bahareh Harandizadeh et.al.	2403.14988v1	null
2024-03-22	CLIP-VQDiffusion : Langauge Free Training of Text To Image generation using CLIP and vector quantized diffusion model	Seungdae Han et.al.	2403.14944v1	null
2024-03-21	VidLA: Video-Language Alignment at Scale	Mamshad Nayeem Rizve et.al.	2403.14870v1	null
2024-03-21	TAMS: Translation-Assisted Morphological Segmentation	Enora Rice et.al.	2403.14840v1	null
2024-03-21	ReNoise: Real Image Inversion Through Iterative Noising	Daniel Garibi et.al.	2403.14602v1	null
2024-03-21	Towards Efficient Information Fusion: Concentric Dual Fusion Attention Based Multiple Instance Learning for Whole Slide Images	Yujian Liu et.al.	2403.14346v1	null
2024-03-21	Beyond Surface Similarity: Detecting Subtle Semantic Shifts in Financial Narratives	Jiaxin Liu et.al.	2403.14341v1	null
2024-03-21	Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition	Sihyun Yu et.al.	2403.14148v1	null
2024-03-21	Text-Enhanced Data-free Approach for Federated Class-Incremental Learning	Minh-Tuan Tran et.al.	2403.14101v1	link
2024-03-20	Evaluating Unsupervised Dimensionality Reduction Methods for Pretrained Sentence Embeddings	Gaifan Zhang et.al.	2403.14001v1	null
2024-03-20	Visually Grounded Speech Models have a Mutual Exclusivity Bias	Leanne Nortje et.al.	2403.13922v1	null
2024-03-20	Leveraging Linguistically Enhanced Embeddings for Open Information Extraction	Fauzan Farooqui et.al.	2403.13903v1	null
2024-03-20	On Pretraining Data Diversity for Self-Supervised Learning	Hasan Abed Al Kader Hammoud et.al.	2403.13808v1	link
2024-03-20	Learning from Models and Data for Visual Grounding	Ruozhen He et.al.	2403.13804v1	null
2024-03-20	RewardBench: Evaluating Reward Models for Language Modeling	Nathan Lambert et.al.	2403.13787v1	link
2024-03-20	When Cars meet Drones: Hyperbolic Federated Learning for Source-Free Domain Adaptation in Adverse Weather	Giulia Rizzoli et.al.	2403.13762v1	null
2024-03-20	PARAMANU-AYN: An Efficient Novel Generative and Instruction-tuned Language Model for Indian Legal Case Documents	Mitodru Niyogi et.al.	2403.13681v1	null
2024-03-20	Grounding Spatial Relations in Text-Only Language Models	Gorka Azkune et.al.	2403.13666v1	link
2024-03-20	Do Not Worry if You Do Not Have Data: Building Pretrained Language Models Using Translationese	Meet Doshi et.al.	2403.13638v1	null
2024-03-20	Bayesian Physics-informed Neural Networks for System Identification of Inverter-dominated Power Systems	Simon Stock et.al.	2403.13602v1	null
2024-03-20	VL-Mamba: Exploring State Space Models for Multimodal Learning	Yanyuan Qiao et.al.	2403.13600v1	null
2024-03-20	ReGround: Improving Textual and Spatial Grounding at No Cost	Yuseung Lee et.al.	2403.13589v1	null
2024-03-19	Zero-Reference Low-Light Enhancement via Physical Quadruple Priors	Wenjing Wang et.al.	2403.12933v1	null
2024-03-19	Generalizable and Stable Finetuning of Pretrained Language Models on Low-Resource Texts	Sai Ashish Somayajula et.al.	2403.12918v1	link
2024-03-19	Yell At Your Robot: Improving On-the-Fly from Language Corrections	Lucy Xiaoyang Shi et.al.	2403.12910v1	null
2024-03-20	MEDBind: Unifying Language and Multimodal Medical Data Embeddings	Yuan Gao et.al.	2403.12894v2	null
2024-03-19	Automated Data Curation for Robust Language Model Fine-Tuning	Jiuhai Chen et.al.	2403.12776v1	null
2024-03-19	Diffusion-Driven Self-Supervised Learning for Shape Reconstruction and Pose Estimation	Jingtao Sun et.al.	2403.12728v1	link
2024-03-19	Simple Hack for Transformers against Heavy Long-Text Classification on a Time- and Memory-Limited GPU Service	Mirza Alim Mutasodirin et.al.	2403.12563v1	null
2024-03-19	Equity through Access: A Case for Small-scale Deep Learning	Raghavendra Selvan et.al.	2403.12562v1	link
2024-03-19	Pretraining Codomain Attention Neural Operators for Solving Multiphysics PDEs	Md Ashiqur Rahman et.al.	2403.12553v1	null
2024-03-19	TT-BLIP: Enhancing Fake News Detection Using BLIP and Tri-Transformer	Eunjee Choi et.al.	2403.12481v1	null
2024-03-18	Urban Scene Diffusion through Semantic Occupancy Map	Junge Zhang et.al.	2403.11697v1	null
2024-03-18	Prioritized Semantic Learning for Zero-shot Instance Navigation	Xander Sun et.al.	2403.11650v1	null
2024-03-18	Arc2Face: A Foundation Model of Human Faces	Foivos Paraperas Papantoniou et.al.	2403.11641v1	null
2024-03-18	End-to-end multi-modal product matching in fashion e-commerce	Sándor Tóth et.al.	2403.11593v1	null
2024-03-18	CasSR: Activating Image Power for Real-World Image Super-Resolution	Haolan Chen et.al.	2403.11451v1	null
2024-03-18	Zero-shot Compound Expression Recognition with Visual Language Model at the 6th ABAW Challenge	Jiahe Wang et.al.	2403.11450v1	null
2024-03-18	Boosting Continuous Emotion Recognition with Self-Pretraining using Masked Autoencoders, Temporal Convolutional Networks, and Transformers	Weiwei Zhou et.al.	2403.11440v1	null
2024-03-18	X-LLaVA: Optimizing Bilingual Large Vision-Language Alignment	Dongjae Shin et.al.	2403.11399v1	null
2024-03-17	Ensembling and Test Augmentation for Covid-19 Detection and Covid-19 Domain Adaptation from 3D CT-Scans	Fares Bougourzi et.al.	2403.11338v1	null
2024-03-17	Stylized Face Sketch Extraction via Generative Prior with Limited Data	Kwan Yun et.al.	2403.11263v1	null
2024-03-15	Frozen Feature Augmentation for Few-Shot Image Classification	Andreas Bär et.al.	2403.10519v1	null
2024-03-15	Approximate Nullspace Augmented Finetuning for Robust Vision Transformers	Haoyang Liu et.al.	2403.10476v1	null
2024-03-15	Using an LLM to Turn Sign Spottings into Spoken Language Sentences	Ozge Mercanoglu Sincan et.al.	2403.10434v1	null
2024-03-15	Monotonic Representation of Numeric Properties in Language Models	Benjamin Heinzerling et.al.	2403.10381v1	null
2024-03-15	Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder	Jinseok Kim et.al.	2403.10255v1	null
2024-03-15	Generative Region-Language Pretraining for Open-Ended Object Detection	Chuang Lin et.al.	2403.10191v1	link
2024-03-15	RAFT: Adapting Language Model to Domain Specific RAG	Tianjun Zhang et.al.	2403.10131v1	link
2024-03-15	Codebook Transfer with Part-of-Speech for Vector-Quantized Image Modeling	Baoquan Zhang et.al.	2403.10071v1	null
2024-03-15	Boundary Matters: A Bi-Level Active Finetuning Framework	Han Lu et.al.	2403.10069v1	null
2024-03-14	Adapting OC20-trained EquiformerV2 Models for High-Entropy Materials	Christian M. Clausen et.al.	2403.09811v1	null
2024-03-14	OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning	Lingyi Hong et.al.	2403.09634v1	null
2024-03-14	Holo-Relighting: Controllable Volumetric Portrait Relighting from a Single Image	Yiqun Mei et.al.	2403.09632v1	null
2024-03-14	Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking	Eric Zelikman et.al.	2403.09629v1	null
2024-03-14	Counterfactual contrastive learning: robust representations via causal image synthesis	Melanie Roschewitz et.al.	2403.09605v1	link
2024-03-14	uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with Unsupervised Audio Mixtures	Afrina Tabassum et.al.	2403.09579v1	link
2024-03-14	Unsupervised Modality-Transferable Video Highlight Detection with Representation Activation Sequence Learning	Tingtian Li et.al.	2403.09401v1	null
2024-03-14	PreConfig: A Pretrained Model for Automating Network Configuration	Fuliang Li et.al.	2403.09369v1	null
2024-03-14	HeadEvolver: Text to Head Avatars via Locally Learnable Mesh Deformation	Duotun Wang et.al.	2403.09326v1	null
2024-03-14	Annotation Free Semantic Segmentation with Vision Foundation Models	Soroush Seifi et.al.	2403.09307v1	null
2024-03-14	CLIP-EBC: CLIP Can Count Accurately through Enhanced Blockwise Classification	Yiming Ma et.al.	2403.09281v1	null
2024-03-13	DAM: Dynamic Adapter Merging for Continual Video QA Learning	Feng Cheng et.al.	2403.08755v1	link
2024-03-13	Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization	Renjie Pi et.al.	2403.08730v1	null
2024-03-13	Data-Efficient Sleep Staging with Synthetic Time Series Pretraining	Niklas Grieger et.al.	2403.08592v1	null
2024-03-13	Gaussian Splatting in Style	Abhishek Saroha et.al.	2403.08498v1	null
2024-03-13	Towards Dense and Accurate Radar Perception Via Efficient Cross-Modal Diffusion Model	Ruibin Zhang et.al.	2403.08460v1	null
2024-03-13	Gemma: Open Models Based on Gemini Research and Technology	Gemma Team et.al.	2403.08295v1	null
2024-03-13	Generative Pretrained Structured Transformers: Unsupervised Syntactic Language Models at Scale	Xiang Hu et.al.	2403.08293v1	null
2024-03-13	GPT, Ontology, and CAABAC: A Tripartite Personalized Access Control Model Anchored by Compliance, Context and Attribute	Raza Nowrozy et.al.	2403.08264v1	null
2024-03-13	LAFS: Landmark-based Facial Self-supervised Learning for Face Recognition	Zhonglin Sun et.al.	2403.08161v1	link
2024-03-12	Learning Data Association for Multi-Object Tracking using Only Coordinates	Mehdi Miah et.al.	2403.08018v1	null
2024-03-12	12 mJ per Class On-Device Online Few-Shot Class-Incremental Learning	Yoga Esa Wibowo et.al.	2403.07851v1	link
2024-03-12	Chronos: Learning the Language of Time Series	Abdul Fatir Ansari et.al.	2403.07815v1	link
2024-03-12	Boosting keyword spotting through on-device learnable user speech characteristics	Cristian Cioflan et.al.	2403.07802v1	null
2024-03-12	Fine-tuning Neural Network Quantum States	Riccardo Rende et.al.	2403.07795v1	null
2024-03-12	Synth$^2$: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings	Sahand Sharifzadeh et.al.	2403.07750v1	null
2024-03-12	MoralBERT: Detecting Moral Values in Social Discourse	Vjosa Preniqi et.al.	2403.07678v1	null
2024-03-12	Characterization of Large Language Model Development in the Datacenter	Qinghao Hu et.al.	2403.07648v1	link
2024-03-12	Triples-to-isiXhosa (T2X): Addressing the Challenges of Low-Resource Agglutinative Data-to-Text Generation	Francois Meyer et.al.	2403.07567v1	link
2024-03-12	Matrix-Transformation Based Low-Rank Adaptation (MTLoRA): A Brain-Inspired Method for Parameter-Efficient Fine-Tuning	Yao Liang et.al.	2403.07440v1	null
2024-03-12	In-context learning enables multimodal large language models to classify cancer pathology images	Dyke Ferber et.al.	2403.07407v1	null
2024-03-11	VideoMamba: State Space Model for Efficient Video Understanding	Kunchang Li et.al.	2403.06977v1	link
2024-03-11	MEND: Meta dEmonstratioN Distillation for Efficient and Effective In-Context Learning	Yichuan Li et.al.	2403.06914v1	null
2024-03-11	FocusCLIP: Multimodal Subject-Level Guidance for Zero-Shot Transfer in Human-Centric Tasks	Muhammad Saif Ullah Khan et.al.	2403.06904v1	null
2024-03-11	On the Generalization Ability of Unsupervised Pretraining	Yuyang Deng et.al.	2403.06871v1	null
2024-03-11	Data-Independent Operator: A Training-Free Artifact Representation Extractor for Generalizable Deepfake Detection	Chuangchuang Tan et.al.	2403.06803v1	link
2024-03-11	PeerAiD: Improving Adversarial Distillation from a Specialized Peer Tutor	Jaewon Jung et.al.	2403.06668v1	null
2024-03-11	Cross-domain and Cross-dimension Learning for Image-to-Graph Transformers	Alexander H. Berger et.al.	2403.06601v1	null
2024-03-11	SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection	Yuxuan Li et.al.	2403.06534v1	link
2024-03-11	FontCLIP: A Semantic Typography Visual-Language Model for Multilingual Font Applications	Yuki Tatsukawa et.al.	2403.06453v1	null
2024-03-11	Can LLMs' Tuning Methods Work in Medical Multimodal Domain?	Jiawei Chen et.al.	2403.06407v1	null
2024-03-08	DeepSeek-VL: Towards Real-World Vision-Language Understanding	Haoyu Lu et.al.	2403.05525v1	link
2024-03-08	Self-Supervised Multiple Instance Learning for Acute Myeloid Leukemia Classification	Salome Kazeminia et.al.	2403.05379v1	null
2024-03-08	ACLSum: A New Dataset for Aspect-based Summarization of Scientific Publications	Sotaro Takeshita et.al.	2403.05303v1	link
2024-03-08	CommitBench: A Benchmark for Commit Message Generation	Maximilian Schall et.al.	2403.05188v1	link
2024-03-08	GSEdit: Efficient Text-Guided Editing of 3D Objects via Gaussian Splatting	Francesco Palandra et.al.	2403.05154v1	null
2024-03-08	Face2Diffusion for Fast and Editable Face Personalization	Kaede Shiohara et.al.	2403.05094v1	link
2024-03-08	Agile Multi-Source-Free Domain Adaptation	Xinyao Li et.al.	2403.05062v1	link
2024-03-07	An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control	Aosong Feng et.al.	2403.04880v1	null
2024-03-07	I Can't Believe It's Not Scene Flow!	Ishan Khatri et.al.	2403.04739v1	link
2024-03-07	Masked Capsule Autoencoders	Miles Everett et.al.	2403.04724v1	null
2024-03-07	Yi: Open Foundation Models by 01.AI	01. AI et.al.	2403.04652v1	link
2024-03-07	Teaching Large Language Models to Reason with Reinforcement Learning	Alex Havrilla et.al.	2403.04642v1	null
2024-03-07	Pix2Gif: Motion-Guided Diffusion for GIF Generation	Hitesh Kandala et.al.	2403.04634v1	null
2024-03-07	CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?	Ibrahim Alabdulmohsin et.al.	2403.04547v1	null
2024-03-07	Source Matters: Source Dataset Impact on Model Robustness in Medical Imaging	Dovile Juodelyte et.al.	2403.04484v1	link
2024-03-07	Enhancing Court View Generation with Knowledge Injection and Guidance	Ang Li et.al.	2403.04366v1	link
2024-03-07	Federated Recommendation via Hybrid Retrieval Augmented Generation	Huimin Zeng et.al.	2403.04256v1	link
2024-03-07	DEEP-ICL: Definition-Enriched Experts for Language Model In-Context Learning	Xingwei Qu et.al.	2403.04233v1	null
2024-03-06	Bridging Language and Items for Retrieval and Recommendation	Yupeng Hou et.al.	2403.03952v1	link
2024-03-06	The Heuristic Core: Understanding Subnetwork Generalization in Pretrained Language Models	Adithya Bhaskar et.al.	2403.03942v1	link
2024-03-06	Designing Informative Metrics for Few-Shot Example Selection	Rishabh Adiga et.al.	2403.03861v1	null
2024-03-06	MeaCap: Memory-Augmented Zero-shot Image Captioning	Zequn Zeng et.al.	2403.03715v1	null
2024-03-06	On Transfer in Classification: How Well do Subsets of Classes Generalize?	Raphael Baena et.al.	2403.03569v1	null
2024-03-06	Low-Dose CT Image Reconstruction by Fine-Tuning a UNet Pretrained for Gaussian Denoising for the Downstream Task of Image Enhancement	Tim Selig et.al.	2403.03551v1	null
2024-03-06	CNN-based End-to-End Adaptive Controller with Stability Guarantees	Myeongseok Ryu et.al.	2403.03499v1	null
2024-03-06	Multi-modal Deep Learning	Chen Yuhua et.al.	2403.03385v1	null
2024-03-05	XAI-Based Detection of Adversarial Attacks on Deepfake Detectors	Ben Pinhasov et.al.	2403.02955v1	null
2024-03-05	Enhancing Conceptual Understanding in Multimodal Contrastive Learning through Hard Negative Samples	Philipp J. Rösch et.al.	2403.02875v1	null
2024-03-05	Crossing Linguistic Horizons: Finetuning and Comprehensive Evaluation of Vietnamese Large Language Models	Sang T. Truong et.al.	2403.02715v1	null
2024-03-05	Breeze-7B Technical Report	Chan-Jan Hsu et.al.	2403.02712v1	null
2024-03-04	A Tutorial on the Pretrain-Finetune Paradigm for Natural Language Processing	Yu Wang et.al.	2403.02504v1	null
2024-03-04	Encodings for Prediction-based Neural Architecture Search	Yash Akhauri et.al.	2403.02484v1	link
2024-03-04	Transformers Provably Learn Feature-Position Correlations in Masked Image Modeling	Yu Huang et.al.	2403.02233v1	null
2024-03-04	TPLLM: A Traffic Prediction Framework Based on Pretrained Large Language Models	Yilong Ren et.al.	2403.02221v1	null
2024-03-04	What has LeBenchmark Learnt about French Syntax?	Zdravko Dugonjić et.al.	2403.02173v1	null
2024-03-04	Enhancing Information Maximization with Distance-Aware Contrastive Learning for Source-Free Cross-Domain Few-Shot Learning	Huali Xu et.al.	2403.01966v1	link
2024-03-02	Data-free Multi-label Image Recognition via LLM-powered Prompt Tuning	Shuo Yang et.al.	2403.01209v1	null
2024-03-01	Tree-Regularized Tabular Embeddings	Xuan Li et.al.	2403.00963v1	link
2024-03-01	G3DR: Generative 3D Reconstruction in ImageNet	Pradyumna Reddy et.al.	2403.00939v1	null
2024-03-01	Word Order and World Knowledge	Qinghua Zhao et.al.	2403.00876v1	null
2024-03-01	Hierarchical Indexing for Retrieval-Augmented Opinion Summarization	Tom Hosking et.al.	2403.00435v1	null
2024-03-01	Private Benchmarking to Prevent Contamination and Improve Comparative Evaluation of LLMs	Nishanth Chandran et.al.	2403.00393v1	null
2024-03-01	MaskLRF: Self-supervised Pretraining via Masked Autoencoding of Local Reference Frames for Rotation-invariant 3D Point Set Analysis	Takahiko Furuya et.al.	2403.00206v1	link
2024-02-29	Ask Your Distribution Shift if Pre-Training is Right for You	Benjamin Cohen-Wang et.al.	2403.00194v1	link
2024-02-29	Non-Invasive Medical Digital Twins using Physics-Informed Self-Supervised Learning	Keying Kuang et.al.	2403.00177v1	link
2024-02-29	UniTS: Building a Unified Time Series Model	Shanghua Gao et.al.	2403.00131v1	link
2024-02-29	SeD: Semantic-Aware Discriminator for Image Super-Resolution	Bingchen Li et.al.	2402.19387v1	null
2024-02-29	OzMAC: An Energy-Efficient Sparsity-Exploiting Multiply-Accumulate-Unit Design for DL Inference	Harideep Nair et.al.	2402.19376v1	null
2024-02-29	Compact Speech Translation Models via Discrete Speech Units Pretraining	Tsz Kin Lam et.al.	2402.19333v1	null
2024-02-29	Mirage: Cross-Embodiment Zero-Shot Policy Transfer with Cross-Painting	Lawrence Yunliang Chen et.al.	2402.19249v1	null
2024-02-29	PeLLE: Encoder-based language models for Brazilian Portuguese based on open data	Guilherme Lamartine de Mello et.al.	2402.19204v1	null
2024-02-29	VIXEN: Visual Text Comparison Network for Image Difference Captioning	Alexander Black et.al.	2402.19119v1	null
2024-02-29	Improving Group Connectivity for Generalization of Federated Deep Learning	Zexi Li et.al.	2402.18949v1	null
2024-02-29	Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data	Takaaki Saeki et.al.	2402.18932v1	null
2024-02-29	Reducing Hallucinations in Entity Abstract Summarization with Facts-Template Decomposition	Fangwei Zhu et.al.	2402.18873v1	link
2024-02-29	Dual Operating Modes of In-Context Learning	Ziqian Lin et.al.	2402.18819v1	null
2024-02-28	Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation	Nihal V. Nayak et.al.	2402.18334v1	link
2024-02-28	How to think step-by-step: A mechanistic understanding of chain-of-thought reasoning	Subhabrata Dutta et.al.	2402.18312v1	link
2024-02-28	Feature Denoising For Low-Light Instance Segmentation Using Weighted Non-Local Blocks	Joanne Lin et.al.	2402.18307v1	null
2024-02-28	Self-Supervised Learning in Electron Microscopy: Towards a Foundation Model for Advanced Image Analysis	Bashir Kazimi et.al.	2402.18286v1	null
2024-02-28	NToP: NeRF-Powered Large-scale Dataset Generation for 2D and 3D Human Pose Estimation in Top-View Fisheye Images	Jingrui Yu et.al.	2402.18196v1	null
2024-02-28	Diffusion-based Neural Network Weights Generation	Bedionita Soro et.al.	2402.18153v1	null
2024-02-28	DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning	Jianxiong Li et.al.	2402.18137v1	null
2024-02-28	Downstream Task Guided Masking Learning in Masked Autoencoders Using Multi-Level Optimization	Han Guo et.al.	2402.18128v1	link
2024-02-28	Collaborative decoding of critical tokens for boosting factuality of large language models	Lifeng Jin et.al.	2402.17982v1	null
2024-02-27	Acquiring Linguistic Knowledge from Multimodal Input	Theodor Amariucai et.al.	2402.17936v1	null
2024-02-27	Tower: An Open Multilingual Large Language Model for Translation-Related Tasks	Duarte M. Alves et.al.	2402.17733v1	null
2024-02-27	MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation	Hanan Gani et.al.	2402.17725v1	link
2024-02-27	NextLevelBERT: Investigating Masked Language Modeling with Higher-Level Representations for Long Documents	Tamara Czinczoll et.al.	2402.17682v1	null
2024-02-27	SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation	Shuangrui Ding et.al.	2402.17645v1	null
2024-02-27	Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data	Xiao Liu et.al.	2402.17644v1	link
2024-02-27	Adapt Before Comparison: A New Perspective on Cross-Domain Few-Shot Segmentation	Jonas Herzog et.al.	2402.17614v1	null
2024-02-27	A Large-scale Evaluation of Pretraining Paradigms for the Detection of Defects in Electroluminescence Solar Cell Images	David Torpey et.al.	2402.17611v1	null
2024-02-27	Training-Free Long-Context Scaling of Large Language Models	Chenxin An et.al.	2402.17463v1	link
2024-02-27	Enhancing EEG-to-Text Decoding through Transferable Representations from Pre-trained Contrastive EEG-Text Masked Autoencoder	Jiaqi Wang et.al.	2402.17433v1	null
2024-02-27	Investigating Continual Pretraining in Large Language Models: Insights and Implications	Çağatay Yıldız et.al.	2402.17400v1	null
2024-02-26	Immunization against harmful fine-tuning attacks	Domenic Rosati et.al.	2402.16382v1	null
2024-02-26	An Integrated Data Processing Framework for Pretraining Foundation Models	Yiding Sun et.al.	2402.16358v1	link
2024-02-26	MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs	Zimu Lu et.al.	2402.16352v1	null
2024-02-26	BLO-SAM: Bi-level Optimization Based Overfitting-Preventing Finetuning of SAM	Li Zhang et.al.	2402.16338v1	null
2024-02-26	Learning Translations: Emergent Communication Pretraining for Cooperative Language Acquisition	Dylan Cope et.al.	2402.16247v1	null
2024-02-26	High-Frequency-aware Hierarchical Contrastive Selective Coding for Representation Learning on Text-attributed Graphs	Peiyan Zhang et.al.	2402.16240v1	null
2024-02-25	Task Specific Pretraining with Noisy Labels for Remote sensing Image Segmentation	Chenying Liu et.al.	2402.16164v1	null
2024-02-25	StochCA: A Novel Approach for Exploiting Pretrained Models with Cross-Attention	Seungwon Seo et.al.	2402.16092v1	link
2024-02-25	LSTP: Language-guided Spatial-Temporal Prompt Learning for Long-form Video-Text Understanding	Yuxuan Wang et.al.	2402.16050v1	link
2024-02-25	Adversarial-Robust Transfer Learning for Medical Imaging via Domain Assimilation	Xiaohui Chen et.al.	2402.16005v1	null
2024-02-23	Repetition Improves Language Model Embeddings	Jacob Mitchell Springer et.al.	2402.15449v1	link
2024-02-23	PREDILECT: Preferences Delineated with Zero-Shot Language-based Reasoning in Reinforcement Learning	Simon Holk et.al.	2402.15420v1	null
2024-02-23	United We Pretrain, Divided We Fail! Representation Learning for Time Series by Pretraining on 75 Datasets at Once	Maurice Kraus et.al.	2402.15404v1	null
2024-02-23	Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized Control	Masatoshi Uehara et.al.	2402.15194v1	null
2024-02-23	The Surprising Effectiveness of Skip-Tuning in Diffusion Sampling	Jiajun Ma et.al.	2402.15170v1	null
2024-02-23	Self-Adaptive Reconstruction with Contrastive Learning for Unsupervised Sentence Embeddings	Junlong Liu et.al.	2402.15153v1	null
2024-02-23	ColBERT-XM: A Modular Multi-Vector Representation Model for Zero-Shot Multilingual Information Retrieval	Antoine Louis et.al.	2402.15059v1	null
2024-02-23	CARBD-Ko: A Contextually Annotated Review Benchmark Dataset for Aspect-Level Sentiment Classification in Korean	Dongjun Jang et.al.	2402.15046v1	null
2024-02-22	Towards Few-Shot Adaptation of Foundation Models via Multitask Finetuning	Zhuoyan Xu et.al.	2402.15017v1	link
2024-02-22	Zero-shot cross-lingual transfer in instruction tuning of large language model	Nadezhda Chirkova et.al.	2402.14778v1	null
2024-02-22	Prompting a Pretrained Transformer Can Be a Universal Approximator	Aleksandar Petrov et.al.	2402.14753v1	null
2024-02-22	Dependency Annotation of Ottoman Turkish with Multilingual BERT	Şaziye Betül Özateş et.al.	2402.14743v1	null
2024-02-22	Cleaner Pretraining Corpus Curation with Neural Web Scraping	Zhipeng Xu et.al.	2402.14652v1	link
2024-02-22	Rethinking Scientific Summarization Evaluation: Grounding Explainable Metrics on Facet-aware Benchmark	Xiuying Chen et.al.	2402.14359v1	null
2024-02-22	GAM-Depth: Self-Supervised Indoor Depth Estimation Leveraging a Gradient-Aware Mask and Semantic Constraints	Anqi Cheng et.al.	2402.14354v1	null
2024-02-22	MVD$^2$: Efficient Multiview 3D Reconstruction for Multiview Diffusion	Xin-Yang Zheng et.al.	2402.14253v1	null
2024-02-22	Swin3D++: Effective Multi-Source Pretraining for 3D Indoor Scene Understanding	Yu-Qi Yang et.al.	2402.14215v1	link
2024-02-22	BeTAIL: Behavior Transformer Adversarial Imitation Learning from Human Racing Gameplay	Catherine Weaver et.al.	2402.14194v1	null
2024-02-21	T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching	Zizheng Pan et.al.	2402.14167v1	link
2024-02-21	User-LLM: Efficient LLM Contextualization with User Embeddings	Lin Ning et.al.	2402.13598v1	null
2024-02-21	Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment	Yunxin Li et.al.	2402.13561v1	null
2024-02-21	LLMs Meet Long Video: Advancing Long Video Comprehension with An Interactive Visual Adapter in LLMs	Yunxin Li et.al.	2402.13546v1	null
2024-02-21	FinGPT-HPC: Efficient Pretraining and Finetuning Large Language Models for Financial Applications with High-Performance Computing	Xiao-Yang Liu et.al.	2402.13533v1	null
2024-02-21	How Important is Domain Specificity in Language Models and Instruction Finetuning for Biomedical Relation Extraction?	Aviv Brokman et.al.	2402.13470v1	null
2024-02-20	Investigating Cultural Alignment of Large Language Models	Badr AlKhamissi et.al.	2402.13231v1	link
2024-02-20	RoCode: A Dataset for Measuring Code Intelligence from Problem Definitions in Romanian	Adrian Cosma et.al.	2402.13222v1	link
2024-02-20	VideoPrism: A Foundational Visual Encoder for Video Understanding	Long Zhao et.al.	2402.13217v1	null
2024-02-20	Heterogeneous Graph Reasoning for Fact Checking over Texts and Tables	Haisong Gong et.al.	2402.13028v1	link
2024-02-20	Cell Graph Transformer for Nuclei Classification	Wei Lou et.al.	2402.12946v1	link
2024-02-20	More Discriminative Sentence Embeddings via Semantic Graph Smoothing	Chakib Fettal et.al.	2402.12890v1	link
2024-02-20	ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic	Fajri Koto et.al.	2402.12840v1	link
2024-02-20	Equivariant Pretrained Transformer for Unified Geometric Learning on Multi-Domain 3D Molecules	Rui Jiao et.al.	2402.12714v1	null
2024-02-20	PDEformer: Towards a Foundation Model for One-Dimensional Partial Differential Equations	Zhanhong Ye et.al.	2402.12652v1	null
2024-02-19	GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations	Jinhao Duan et.al.	2402.12348v1	link
2024-02-19	Key ingredients for effective zero-shot cross-lingual knowledge transfer in generative tasks	Nadezhda Chirkova et.al.	2402.12279v1	null
2024-02-19	High-quality Data-to-Text Generation for Severely Under-Resourced Languages with Out-of-the-box Large Language Models	Michela Lorandi et.al.	2402.12267v1	link
2024-02-19	Is It a Free Lunch for Removing Outliers during Pretraining?	Baohao Liao et.al.	2402.12102v1	null
2024-02-19	Direct Consistency Optimization for Compositional Text-to-Image Personalization	Kyungmin Lee et.al.	2402.12004v1	null
2024-02-19	DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation	Chong Zeng et.al.	2402.11929v1	null
2024-02-19	MRKE: The Multi-hop Reasoning Evaluation of LLMs by Knowledge Edition	Jian Wu et.al.	2402.11924v1	null
2024-02-19	ComFusion: Personalized Subject Generation in Multiple Specific Scenes From Single Image	Yan Hong et.al.	2402.11849v1	null
2024-02-19	UniST: A Prompt-Empowered Universal Model for Urban Spatio-Temporal Prediction	Yuan Yuan et.al.	2402.11838v1	null
2024-02-19	LLM as Prompter: Low-resource Inductive Reasoning on Arbitrary Knowledge Graphs	Kai Wang et.al.	2402.11804v1	null
2024-02-16	Proving membership in LLM pretraining data via data watermarks	Johnny Tian-Zheng Wei et.al.	2402.10892v1	null
2024-02-16	Enhancement-Driven Pretraining for Robust Fingerprint Representation Learning	Ekta Gavas et.al.	2402.10847v1	null
2024-02-16	Associative Memories in the Feature Space	Tommaso Salvatori et.al.	2402.10814v1	null
2024-02-16	BioFusionNet: Deep Learning-Based Survival Risk Stratification in ER+ Breast Cancer Through Multifeature and Multimodal Data Fusion	Raktim Kumar Mondol et.al.	2402.10717v1	null
2024-02-16	Are ID Embeddings Necessary? Whitening Pre-trained Text Embeddings for Effective Sequential Recommendation	Lingzi Zhang et.al.	2402.10602v1	null
2024-02-16	SPAR: Personalized Content-Based Recommendation via Long Engagement Attention	Chiyu Zhang et.al.	2402.10555v1	null
2024-02-16	MFBind: a Multi-Fidelity Approach for Evaluating Drug Compounds in Practical Generative Modeling	Peter Eckmann et.al.	2402.10387v1	null
2024-02-15	BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains	Yanis Labrak et.al.	2402.10373v1	null
2024-02-15	Euclid preparation. Measuring detailed galaxy morphologies for Euclid with Machine Learning	Euclid Collaboration et.al.	2402.10187v1	link
2024-02-15	Data Engineering for Scaling Language Models to 128K Context	Yao Fu et.al.	2402.10171v1	link
2024-02-15	Towards Safer Large Language Models through Machine Unlearning	Zheyuan Liu et.al.	2402.10058v1	null
2024-02-15	LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition	Jinyuan Li et.al.	2402.09989v1	null
2024-02-15	Data Augmentation and Transfer Learning Approaches Applied to Facial Expressions Recognition	Enrico Randellini et.al.	2402.09982v1	null
2024-02-15	All in One and One for All: A Simple yet Effective Method towards Cross-domain Graph Pretraining	Haihong Zhao et.al.	2402.09834v1	null
2024-02-15	Knowledge of Pretrained Language Models on Surface Information of Tokens	Tatsuya Hiraoka et.al.	2402.09808v1	null
2024-02-14	Towards Privacy-Aware Sign Language Translation at Scale	Phillip Rust et.al.	2402.09611v1	null
2024-02-14	DeepATLAS: One-Shot Localization for Biomedical Data	Peter D. Chang et.al.	2402.09587v1	null
2024-02-14	Deep Rib Fracture Instance Segmentation and Classification from CT on the RibFrac Challenge	Jiancheng Yang et.al.	2402.09372v1	null
2024-02-14	Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking	Yi Fung et.al.	2402.09369v1	null
2024-02-14	HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM Inference	Yashas Samaga B L et.al.	2402.09360v1	null
2024-02-14	Few-Shot Object Detection with Sparse Context Transformers	Jie Mei et.al.	2402.09315v1	null
2024-02-14	Embracing the black box: Heading towards foundation models for causal discovery from time series data	Gideon Stein et.al.	2402.09305v1	null
2024-02-14	Spectral Filters, Dark Signals, and Attention Sinks	Nicola Cancedda et.al.	2402.09221v1	null
2024-02-14	MPIrigen: MPI Code Generation through Domain-Specific Language Models	Nadav Schneider et.al.	2402.09126v1	link
2024-02-14	I can't see it but I can Fine-tune it: On Encrypted Fine-tuning of Transformers using Fully Homomorphic Encryption	Prajwal Panzade et.al.	2402.09059v1	null
2024-02-14	Pretraining Vision-Language Model for Difference Visual Question Answering in Longitudinal Chest X-rays	Yeongjae Cho et.al.	2402.08966v1	null
2024-02-14	Moving Object Proposals with Deep Learned Optical Flow for Video Object Segmentation	Ge Shi et.al.	2402.08882v1	null
2024-02-13	Human Curriculum Effects Emerge with In-Context Learning in Neural Networks	Jacob Russin et.al.	2402.08674v1	null
2024-02-13	Tandem Transformers for Inference Efficient LLMs	Aishwarya P S et.al.	2402.08644v1	null
2024-02-13	Captions Are Worth a Thousand Words: Enhancing Product Retrieval with Pretrained Image-to-Text Models	Jason Tang et.al.	2402.08532v1	null
2024-02-13	Concept-1K: A Novel Benchmark for Instance Incremental Learning	Junhao Zheng et.al.	2402.08526v1	link
2024-02-13	Pixel Sentence Representation Learning	Chenghao Xiao et.al.	2402.08183v1	null
2024-02-12	Which Pretrain Samples to Rehearse when Finetuning Pretrained Models?	Andrew Bai et.al.	2402.08096v1	null
2024-02-12	Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models	Siddharth Karamcheti et.al.	2402.07865v1	link
2024-02-12	Differentially Private Zeroth-Order Methods for Scalable Large Language Model Finetuning	Z Liu et.al.	2402.07818v1	null
2024-02-12	AutoMathText: Autonomous Data Selection with Language Models for Mathematical Texts	Yifan Zhang et.al.	2402.07625v1	link
2024-02-12	Foundational Inference Models for Dynamical Systems	Patrick Seifner et.al.	2402.07594v1	null
2024-02-12	Only the Curve Shape Matters: Training Foundation Models for Zero-Shot Multivariate Time Series Forecasting through Next Curve Shape Prediction	Cheng Feng et.al.	2402.07570v1	link
2024-02-12	MAFIA: Multi-Adapter Fused Inclusive LanguAge Models	Prachi Jain et.al.	2402.07519v1	null
2024-02-12	SLIT: Boosting Audio-Text Pre-Training via Multi-Stage Learning and Instruction Tuning	Hang Zhao et.al.	2402.07485v1	null
2024-02-12	Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT	Jon Saad-Falcon et.al.	2402.07440v1	null
2024-02-12	SemTra: A Semantic Skill Translator for Cross-Domain Zero-Shot Policy Adaptation	Sangwoo Shin et.al.	2402.07418v1	null
2024-02-11	Multi-Modal Emotion Recognition by Text, Speech and Video Using Pretrained Transformers	Minoo Shayaninasab et.al.	2402.07327v1	null
2024-02-09	Feature Density Estimation for Out-of-Distribution Detection via Normalizing Flows	Evan D. Cook et.al.	2402.06537v1	null
2024-02-09	GS-CLIP: Gaussian Splatting for Contrastive Language-Image-3D Pretraining from Real-World Data	Haoyuan Li et.al.	2402.06198v1	null
2024-02-09	Premier-TACO: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss	Ruijie Zheng et.al.	2402.06187v1	null
2024-02-09	MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models	Yixiao Zhang et.al.	2402.06178v1	null
2024-02-08	Early Fusion of Features for Semantic Segmentation	Anupam Gupta et.al.	2402.06091v1	null
2024-02-08	Exploring Visual Culture Awareness in GPT-4V: A Comprehensive Probing	Yong Cao et.al.	2402.06015v1	null
2024-02-08	WebLINX: Real-World Website Navigation with Multi-Turn Dialogue	Xing Han Lù et.al.	2402.05930v1	null
2024-02-08	Collaborative Control for Geometry-Conditioned PBR Image Generation	Shimon Vainer et.al.	2402.05919v1	null
2024-02-08	Efficient Stagewise Pretraining via Progressive Subnetworks	Abhishek Panigrahi et.al.	2402.05913v1	null
2024-02-08	SpiRit-LM: Interleaved Spoken and Written Language Model	Tu Anh Nguyen et.al.	2402.05755v1	null
2024-02-08	Unified Speech-Text Pretraining for Spoken Dialog Modeling	Heeseung Kim et.al.	2402.05706v1	null
2024-02-08	Pretrained Generative Language Models as General Learning Frameworks for Sequence-Based Tasks	Ben Fauber et.al.	2402.05616v1	null
2024-02-08	Establishing degrees of closeness between audio recordings along different dimensions using large-scale cross-lingual models	Maxime Fily et.al.	2402.05581v1	null
2024-02-07	BIKED++: A Multimodal Dataset of 1.4 Million Bicycle Image and Parametric CAD Designs	Lyle Regenwetter et.al.	2402.05301v1	null
2024-02-07	SPAD : Spatially Aware Multiview Diffusers	Yash Kant et.al.	2402.05235v1	null
2024-02-07	A Sober Look at LLMs for Material Discovery: Are They Actually Good for Bayesian Optimization Over Molecules?	Agustinus Kristiadi et.al.	2402.05015v1	link
2024-02-07	Personalized Text Generation with Fine-Grained Linguistic Control	Bashar Alhafni et.al.	2402.04914v1	link
2024-02-07	OV-NeRF: Open-vocabulary Neural Radiance Fields with Vision and Language Foundation Models for 3D Semantic Understanding	Guibiao Liao et.al.	2402.04648v1	null
2024-02-06	PreGIP: Watermarking the Pretraining of Graph Neural Networks for Deep Intellectual Property Protection	Enyan Dai et.al.	2402.04435v1	null
2024-02-06	Fine-Tuned Language Models Generate Stable Inorganic Materials as Text	Nate Gruver et.al.	2402.04379v1	link
2024-02-06	The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry	Michael Zhang et.al.	2402.04347v1	null
2024-02-06	EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters	Quan Sun et.al.	2402.04252v1	link
2024-02-06	MusicRL: Aligning Music Generation to Human Preferences	Geoffrey Cideron et.al.	2402.04229v1	null
2024-02-06	Scaling Laws for Downstream Task Performance of Large Language Models	Berivan Isik et.al.	2402.04177v1	null
2024-02-06	Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains	Ashok Vardhan Makkuva et.al.	2402.04161v1	link
2024-02-06	A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation	Zhengbo Wang et.al.	2402.04087v1	link
2024-02-06	Connecting the Dots: Collaborative Fine-tuning for Black-Box Vision-Language Models	Zhengbo Wang et.al.	2402.04050v1	null
2024-02-06	Polyp-DDPM: Diffusion-Based Semantic Polyp Synthesis for Enhanced Segmentation	Zolnamar Dorjsembe et.al.	2402.04031v1	link
2024-02-06	Low-rank Attention Side-Tuning for Parameter-Efficient Fine-Tuning	Ningyuan Tang et.al.	2402.04009v1	null
2024-02-06	Understanding the Effect of Noise in LLM Training Data with Algorithmic Chains of Thought	Alex Havrilla et.al.	2402.04004v1	null
2024-02-06	Humans Beat Deep Networks at Recognizing Objects in Unusual Poses, Given Enough Time	Netta Ollikka et.al.	2402.03973v1	null
2024-02-05	Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining	Jiarun Liu et.al.	2402.03302v1	link
2024-02-05	Training-Free Consistent Text-to-Image Generation	Yoad Tewel et.al.	2402.03286v1	null
2024-02-05	CLIP Can Understand Depth	Dunam Kim et.al.	2402.03251v1	null
2024-02-05	FROSTER: Frozen CLIP Is A Strong Teacher for Open-Vocabulary Action Recognition	Xiaohu Huang et.al.	2402.03241v1	null
2024-02-05	Towards mitigating uncann(eye)ness in face swaps via gaze-centric loss terms	Ethan Wilson et.al.	2402.03188v1	null
2024-02-05	Time-, Memory- and Parameter-Efficient Visual Adaptation	Otniel-Bogdan Mercea et.al.	2402.02887v1	null
2024-02-05	Enhancing Compositional Generalization via Compositional Feature Alignment	Haoxiang Wang et.al.	2402.02851v1	link
2024-02-04	Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning	Haoyi Zhu et.al.	2402.02500v1	null
2024-02-04	A Graph is Worth $K$ Words: Euclideanizing Graph using Pure Transformer	Zhangyang Gao et.al.	2402.02464v1	null
2024-02-04	BECLR: Batch Enhanced Contrastive Few-Shot Learning	Stylianos Poulakakis-Daktylidis et.al.	2402.02444v1	link
2024-02-02	From Words to Molecules: A Survey of Large Language Models in Chemistry	Chang Liao et.al.	2402.01439v1	null
2024-02-02	Continual Learning for Large Language Models: A Survey	Tongtong Wu et.al.	2402.01364v1	null
2024-02-02	Describing Images $\textit{Fast and Slow}$: Quantifying and Predicting the Variation in Human Signals during Visuo-Linguistic Processes	Ece Takmaz et.al.	2402.01352v1	null
2024-02-02	Training-time Neuron Alignment through Permutation Subspace for Improving Linear Mode Connectivity and Model Fusion	Zexi Li et.al.	2402.01342v1	null
2024-02-02	On the Transferability of Large-Scale Self-Supervision to Few-Shot Audio Classification	Calum Heggan et.al.	2402.01274v1	null
2024-02-02	Can Shape-Infused Joint Embeddings Improve Image-Conditioned 3D Diffusion?	Cristian Sbrolli et.al.	2402.01241v1	null
2024-02-02	In-Context Learning for Few-Shot Nested Named Entity Recognition	Meishan Zhang et.al.	2402.01182v1	null
2024-02-02	Interpretation of Intracardiac Electrograms Through Textual Representations	William Jongwon Han et.al.	2402.01115v1	null
2024-02-02	Double-Dip: Thwarting Label-Only Membership Inference Attacks with Transfer Learning and Randomization	Arezoo Rajabi et.al.	2402.01114v1	null
2024-02-02	Specialized Language Models with Cheap Inference from Limited Domain Data	David Grangier et.al.	2402.01093v1	null
2024-02-01	Can Large Language Models Understand Context?	Yilun Zhu et.al.	2402.00858v1	null
2024-02-01	LLMs learn governing principles of dynamical systems, revealing an in-context neural scaling law	Toni J. B. Liu et.al.	2402.00795v1	null
2024-02-01	CroissantLLM: A Truly Bilingual French-English Language Model	Manuel Faysse et.al.	2402.00786v1	link
2024-02-01	AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning	Fu-Yun Wang et.al.	2402.00769v1	link
2024-02-01	Unlearnable Algorithms for In-context Learning	Andrei Muresanu et.al.	2402.00751v1	null
2024-02-01	Approximating Optimal Morphing Attacks using Template Inversion	Laurent Colbois et.al.	2402.00695v1	null
2024-02-01	Improving Critical Node Detection Using Neural Network-based Initialization in a Genetic Algorithm	Chanjuan Liu et.al.	2402.00404v1	null
2024-02-01	Real-time Stereo Speech Enhancement with Spatial-Cue Preservation based on Dual-Path Structure	Masahito Togami et.al.	2402.00337v1	null
2024-02-01	Towards AI-Assisted Synthesis of Verified Dafny Methods	Md Rakib Hossain Misu et.al.	2402.00247v1	link
2024-01-31	Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research	Luca Soldaini et.al.	2402.00159v1	link
2024-01-31	Binding Touch to Everything: Learning Unified Multimodal Tactile Representations	Fengyu Yang et.al.	2401.18084v1	null
2024-01-31	Paramanu: A Family of Novel Efficient Indic Generative Foundation Language Models	Mitodru Niyogi et.al.	2401.18034v1	null
2024-01-31	Efficient Subseasonal Weather Forecast using Teleconnection-informed Transformers	Shan Zhao et.al.	2401.17870v1	null
2024-01-31	Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model	Zihan Zhong et.al.	2401.17868v1	null
2024-01-31	Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction	Xueyuan Chen et.al.	2401.17796v1	null
2024-01-31	EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning	Jaeyeon Kim et.al.	2401.17690v1	link
2024-01-31	Towards Efficient and Reliable LLM Serving: A Real-World Workload Study	Yuxin Wang et.al.	2401.17644v1	null
2024-01-31	Local and Global Contexts for Conversation	Zuoquan Lin et.al.	2401.17588v1	link
2024-01-30	Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens	Jiacheng Liu et.al.	2401.17377v1	null
2024-01-30	Transfer Learning for Text Diffusion Models	Kehang Han et.al.	2401.17181v1	null
2024-01-29	Unsupervised Discovery of Steerable Factors When Graph Deep Generative Models Are Entangled	Shengchao Liu et.al.	2401.17123v1	link
2024-01-30	Finetuning Large Language Models for Vulnerability Detection	Alexey Shestov et.al.	2401.17010v1	null
2024-01-30	Distinguishing Fictional Voices: a Study of Authorship Verification Models for Quotation Attribution	Gaspard Michel et.al.	2401.16968v1	link
2024-01-30	PBSCSR: The Piano Bootleg Score Composer Style Recognition Dataset	Arhan Jain et.al.	2401.16803v1	link
2024-01-30	MolPLA: A Molecular Pretraining Framework for Learning Cores, R-Groups and their Linker Joints	Mogan Gim et.al.	2401.16771v1	null
2024-01-30	Gradient-Based Language Model Red Teaming	Nevan Wichers et.al.	2401.16656v1	link
2024-01-30	IRCoCo: Immediate Rewards-Guided Deep Reinforcement Learning for Code Completion	Bolun Li et.al.	2401.16637v1	link
2024-01-29	ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks	Bolei Ma et.al.	2401.16589v1	link
2024-01-29	Massively Multilingual Text Translation For Low-Resource Languages	Zhong Zhou et.al.	2401.16582v1	null
2024-01-29	Scaling Sparse Fine-Tuning to Large Language Models	Alan Ansell et.al.	2401.16405v1	null
2024-01-29	PICL: Physics Informed Contrastive Learning for Partial Differential Equations	Cooper Lorsung et.al.	2401.16327v1	null
2024-01-29	Enhancing Molecular Property Prediction with Auxiliary Learning and Task-Specific Adaptation	Vishal Dey et.al.	2401.16299v1	null
2024-01-29	Textual Entailment for Effective Triple Validation in Object Prediction	Andrés García-Silva et.al.	2401.16293v1	null
2024-01-29	Cutup and Detect: Human Fall Detection on Cutup Untrimmed Videos Using a Large Foundational Video Understanding Model	Till Grutschus et.al.	2401.16280v1	null
2024-01-29	Type-based Neural Link Prediction Adapter for Complex Query Answering	Lingning Song et.al.	2401.16045v1	null
2024-01-29	Finding Challenging Metaphors that Confuse Pretrained Language Models	Yucheng Li et.al.	2401.16012v1	null
2024-01-29	StableIdentity: Inserting Anybody into Anywhere at First Sight	Qinghe Wang et.al.	2401.15975v1	null
2024-01-29	Masked Audio Modeling with CLAP and Multi-Objective Learning	Yifei Xin et.al.	2401.15953v1	null
2024-01-29	HICH Image/Text (HICH-IT): Comprehensive Text and Image Datasets for Hypertensive Intracerebral Hemorrhage Research	Jie Li et.al.	2401.15934v1	null
2024-01-26	RESPRECT: Speeding-up Multi-fingered Grasping with Residual Reinforcement Learning	Federico Ceola et.al.	2401.14858v1	null
2024-01-26	Endowing Protein Language Models with Structural Knowledge	Dexiong Chen et.al.	2401.14819v1	null
2024-01-26	MaLLaM -- Malaysia Large Language Model	Husein Zolkepli et.al.	2401.14680v1	null
2024-01-26	An Empirical Investigation of Domain Adaptation Ability for Chinese Spelling Check Models	Xi Wang et.al.	2401.14630v1	null
2024-01-26	Towards Lifelong Scene Graph Generation with Knowledge-ware In-context Prompt Learning	Tao He et.al.	2401.14626v1	null
2024-01-25	MResT: Multi-Resolution Sensing for Real-Time Control with Vision-Language Models	Saumya Saxena et.al.	2401.14502v1	null
2024-01-25	Rethinking Patch Dependence for Masked Autoencoders	Letian Fu et.al.	2401.14391v1	null
2024-01-25	TURNA: A Turkish Encoder-Decoder Language Model for Enhanced Understanding and Generation	Gökçe Uludoğan et.al.	2401.14373v1	link
2024-01-25	Sketch2NeRF: Multi-view Sketch-guided Text-to-3D Generation	Minglin Chen et.al.	2401.14257v1	null
2024-01-25	Assessing the Portability of Parameter Matrices Trained by Parameter-Efficient Finetuning Methods	Mohammed Sabry et.al.	2401.14228v1	null
2024-01-25	BootPIG: Bootstrapping Zero-shot Personalized Image Generation Capabilities in Pretrained Diffusion Models	Senthil Purushwalkam et.al.	2401.13974v1	null
2024-01-24	S2TPVFormer: Spatio-Temporal Tri-Perspective View for temporally coherent 3D Semantic Occupancy Prediction	Sathira Silva et.al.	2401.13785v1	null
2024-01-24	Enhancing Image Retrieval : A Comprehensive Study on Photo Search using the CLIP Mode	Naresh Kumar Lahajal et.al.	2401.13613v1	null
2024-01-24	Large Malaysian Language Model Based on Mistral for Enhanced Local Language Understanding	Husein Zolkepli et.al.	2401.13565v1	null
2024-01-25	Finetuning Foundation Models for Joint Analysis Optimization	Matthias Vigl et.al.	2401.13536v2	null
2024-01-24	Generative Human Motion Stylization in Latent Space	Chuan Guo et.al.	2401.13505v1	null
2024-01-24	MaLA-500: Massive Language Adaptation of Large Language Models	Peiqin Lin et.al.	2401.13303v1	null
2024-01-24	Audio-Infused Automatic Image Colorization by Exploiting Audio Scene Semantics	Pengcheng Zhao et.al.	2401.13270v1	null
2024-01-24	Segment Any Cell: A SAM-based Auto-prompting Fine-tuning Framework for Nuclei Segmentation	Saiyang Na et.al.	2401.13220v1	null
2024-01-24	AdCorDA: Classifier Refinement via Adversarial Correction and Domain Adaptation	Lulan Shen et.al.	2401.13212v1	null
2024-01-23	The Language Barrier: Dissecting Safety Challenges of LLMs in Multilingual Contexts	Lingfeng Shen et.al.	2401.13136v1	null
2024-01-23	Digital Divides in Scene Recognition: Uncovering Socioeconomic Biases in Deep Learning Systems	Michelle R. Greene et.al.	2401.13097v1	null
2024-01-23	GALA: Generating Animatable Layered Assets from a Single Scan	Taeksoo Kim et.al.	2401.12979v1	null
2024-01-23	Pretraining and the Lasso	Erin Craig et.al.	2401.12911v1	null
2024-01-23	PSDF: Prior-Driven Neural Implicit Surface Learning for Multi-view Reconstruction	Wanjuan Su et.al.	2401.12751v1	null
2024-01-23	Evaluation of large language models for assessing code maintainability	Marc Dillmann et.al.	2401.12714v1	null
2024-01-23	Persona-centric Metamorphic Relation guided Robustness Evaluation for Multi-turn Dialogue Modelling	Yanbing Chen et.al.	2401.12483v1	null
2024-01-23	The Neglected Tails of Vision-Language Models	Shubham Parashar et.al.	2401.12425v1	null
2024-01-22	OCT-SelfNet: A Self-Supervised Framework with Multi-Modal Datasets for Generalized and Robust Retinal Disease Detection	Fatema-E Jannat et.al.	2401.12344v1	null
2024-01-22	Contrastive Learning and Cycle Consistency-based Transductive Transfer Learning for Target Annotation	Shoaib Meraj Sami et.al.	2401.12340v1	null
2024-01-22	APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference	Bowen Zhao et.al.	2401.12200v1	null
2024-01-22	An Empirical Analysis of In-context Learning Abilities of LLMs for MT	Pranjal A. Chitale et.al.	2401.12097v1	null
2024-01-22	Multi-level Cross-modal Alignment for Image Clustering	Liping Qiu et.al.	2401.11740v1	null
2024-01-22	M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition	Mengmeng Wang et.al.	2401.11649v1	null
2024-01-21	MolTailor: Tailoring Chemical Molecular Representation to Specific Tasks via Text Prompts	Haoqiang Guo et.al.	2401.11403v1	link
2024-01-21	LLMRA: Multi-modal Large Language Model based Restoration Assistant	Xiaoyu Jin et.al.	2401.11401v1	null
2024-01-19	Revealing Emotional Clusters in Speaker Embeddings: A Contrastive Learning Strategy for Speech Emotion Recognition	Ismail Rasim Ulgen et.al.	2401.11017v1	null
2024-01-19	Mitigating Hallucinations of Large Language Models via Knowledge Consistent Alignment	Fanqi Wan et.al.	2401.10768v1	link
2024-01-19	DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval	Xiangpeng Yang et.al.	2401.10588v1	null
2024-01-19	Name Tagging Under Domain Shift via Metric Learning for Life Sciences	Hongyi Liu et.al.	2401.10472v1	null
2024-01-19	Investigating Training Strategies and Model Robustness of Low-Rank Adaptation for Language Modeling in Speech Recognition	Yu Yu et.al.	2401.10447v1	null
2024-01-18	Supervised Fine-tuning in turn Improves Visual Foundation Models	Xiaohu Jiang et.al.	2401.10222v1	link
2024-01-18	Evolutionary Computation in the Era of Large Language Model: Survey and Roadmap	Xingyu Wu et.al.	2401.10034v1	null
2024-01-18	Gender Bias in Machine Translation and The Era of Large Language Models	Eva Vanmassenhove et.al.	2401.10016v1	null
2024-01-18	Meme-ingful Analysis: Enhanced Understanding of Cyberbullying in Memes Through Multimodal Explanations	Prince Jha et.al.	2401.09899v1	link
2024-01-18	Improving fine-grained understanding in image-text pre-training	Ioana Bica et.al.	2401.09865v1	null
2024-01-18	Improving the Accuracy of Analog-Based In-Memory Computing Accelerators Post-Training	Corey Lammie et.al.	2401.09859v1	null
2024-01-18	Simple and effective data augmentation for compositional generalization	Yuekun Yao et.al.	2401.09815v1	null
2024-01-18	Instance Brownian Bridge as Texts for Open-vocabulary Video Instance Segmentation	Zesen Cheng et.al.	2401.09732v1	link
2024-01-17	CT Liver Segmentation via PVT-based Encoding and Refined Decoding	Debesh Jha et.al.	2401.09630v1	link
2024-01-17	Aligning Large Language Models with Counterfactual DPO	Bradley Butcher et.al.	2401.09566v1	null
2024-01-17	Deciphering Textual Authenticity: A Generalized Strategy through the Lens of Large Language Semantics for Detecting Human vs. Machine-Generated Text	Mazal Bethany et.al.	2401.09407v1	null
2024-01-17	Machines Do See Color: A Guideline to Classify Different Forms of Racist Discourse in Large Corpora	Diana Davila Gordillo et.al.	2401.09333v1	null
2024-01-17	An Efficient Generalizable Framework for Visuomotor Policies via Control-aware Augmentation and Privilege-guided Distillation	Yinuo Zhao et.al.	2401.09258v1	null
2024-01-17	Preparing Lessons for Progressive Training on Language Models	Yu Pan et.al.	2401.09192v1	null
2024-01-17	Visual Robotic Manipulation with Depth-Aware Pretraining	Wanying Wang et.al.	2401.09038v1	null
2024-01-16	Fast Dynamic 3D Object Generation from a Single-view Video	Zijie Pan et.al.	2401.08742v1	null
2024-01-16	Fixed Point Diffusion Models	Xingjian Bai et.al.	2401.08741v1	null
2024-01-16	Tuning Language Models by Proxy	Alisa Liu et.al.	2401.08565v1	null
2024-01-16	GATS: Gather-Attend-Scatter	Konrad Zolna et.al.	2401.08525v1	null
2024-01-17	Salute the Classic: Revisiting Challenges of Machine Translation in the Age of Large Language Models	Jianhui Pang et.al.	2401.08350v2	null
2024-01-16	MCRPL: A Pretrain, Prompt & Fine-tune Paradigm for Non-overlapping Many-to-one Cross-domain Recommendation	Hao Liu et.al.	2401.08228v1	null
2024-01-16	SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation	Zhixuan Liu et.al.	2401.08053v1	null
2024-01-15	How does self-supervised pretraining improve robustness against noisy labels across various medical image classification datasets?	Bidur Khanal et.al.	2401.07990v1	null
2024-01-15	Word Boundary Information Isn't Useful for Encoder Language Models	Edward Gow-Smith et.al.	2401.07923v1	null
2024-01-15	EMBRE: Entity-aware Masking for Biomedical Relation Extraction	Mingjie Li et.al.	2401.07877v1	null
2024-01-15	VeCAF: VLM-empowered Collaborative Active Finetuning with Training Objective Awareness	Rongyu Zhang et.al.	2401.07853v1	null
2024-01-15	Fusing Echocardiography Images and Medical Records for Continuous Patient Stratification	Nathan Painchaud et.al.	2401.07796v1	null
2024-01-15	On the importance of Data Scale in Pretraining Arabic Language Models	Abbas Ghaddar et.al.	2401.07760v1	link
2024-01-15	HexaGen3D: StableDiffusion is just one step away from Fast and Diverse Text-to-3D Generation	Antoine Mercier et.al.	2401.07727v1	null
2024-01-12	Scalable 3D Panoptic Segmentation With Superpoint Graph Clustering	Damien Robert et.al.	2401.06704v1	link
2024-01-12	TransliCo: A Contrastive Learning Framework to Address the Script Barrier in Multilingual Pretrained Language Models	Yihong Liu et.al.	2401.06620v1	null
2024-01-12	BOK-VQA: Bilingual Outside Knowledge-based Visual Question Answering via Graph Representation Pretraining	Minjun Kim et.al.	2401.06443v1	null
2024-01-12	AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters	Li Lucy et.al.	2401.06408v1	link
2024-01-12	AffordanceLLM: Grounding Affordance from Vision Language Models	Shengyi Qian et.al.	2401.06341v1	null
2024-01-12	Application Of Vision-Language Models For Assessing Osteoarthritis Disease Severity	Banafshe Felfeliyan et.al.	2401.06331v1	null
2024-01-11	A Study on Self-Supervised Pretraining for Vision Problems in Gastrointestinal Endoscopy	Edward Sanderson et.al.	2401.06278v1	null
2024-01-11	Transformers are Multi-State RNNs	Matanel Oren et.al.	2401.06104v1	null
2024-01-11	Autocompletion of Chief Complaints in the Electronic Health Records using Large Language Models	K M Sajjadul Islam et.al.	2401.06088v1	null
2024-01-11	LinguAlchemy: Fusing Typological and Geographical Elements for Unseen Language Generalization	Muhammad Farid Adilazuarda et.al.	2401.06034v1	null
2024-01-11	DiffDA: a diffusion model for weather-scale data assimilation	Langwen Huang et.al.	2401.05932v1	null
2024-01-11	Towards Boosting Many-to-Many Multilingual Machine Translation with Large Language Models	Pengzhi Gao et.al.	2401.05861v1	link
2024-01-11	Discovering Low-rank Subspaces for Language-agnostic Multilingual Representations	Zhihui Xie et.al.	2401.05792v1	link
2024-01-11	Zero Resource Cross-Lingual Part Of Speech Tagging	Sahil Chopra et.al.	2401.05727v1	null
2024-01-10	Diffusion Priors for Dynamic View Synthesis from Monocular Videos	Chaoyang Wang et.al.	2401.05583v1	null
2024-01-10	Siamese Networks with Soft Labels for Unsupervised Lesion Detection and Patch Pretraining on Screening Mammograms	Kevin Van Vorst et.al.	2401.05570v1	null
2024-01-10	Physics guided dual Self-supervised learning for structure-based materials property prediction	Nihang Fu et.al.	2401.05223v1	link
2024-01-10	Pre-trained Large Language Models for Financial Sentiment Analysis	Wei Luo et.al.	2401.05215v1	null
2024-01-10	MISS: A Generative Pretraining and Finetuning Approach for Med-VQA	Jiawei Chen et.al.	2401.05163v1	null
2024-01-09	Phishing Website Detection through Multi-Model Analysis of HTML Content	Furkan Çolhak et.al.	2401.04820v1	null
2024-01-10	RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation	Mahdi Nikdan et.al.	2401.04679v2	null
2024-01-09	DepressionEmo: A novel dataset for multilabel classification of depression emotions	Abu Bakar Siddiqur Rahman et.al.	2401.04655v1	link
2024-01-09	Representative Feature Extraction During Diffusion Process for Sketch Extraction with One Example	Kwan Yun et.al.	2401.04362v1	null
2024-01-09	Private Fine-tuning of Large Language Models with Zeroth-order Optimization	Xinyu Tang et.al.	2401.04343v1	null
2024-01-08	Dr$^2$Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning	Chen Zhao et.al.	2401.04105v1	null
2024-01-08	FFSplit: Split Feed-Forward Network For Optimizing Accuracy-Efficiency Trade-off in Language Model Inference	Zirui Liu et.al.	2401.04044v1	null
2024-01-08	TTMs: Fast Multi-level Tiny Time Mixers for Improved Zero-shot and Few-shot Forecasting of Multivariate Time Series	Vijay Ekambaram et.al.	2401.03955v1	null
2024-01-08	TeleChat Technical Report	Zihan Wang et.al.	2401.03804v1	null
2024-01-08	Anatomy of Neural Language Models	Majd Saleh et.al.	2401.03797v1	link
2024-01-07	Transfer the linguistic representations from TTS to accent conversion with non-parallel data	Xi Chen et.al.	2401.03538v1	null
2024-01-05	Locally Adaptive Neural 3D Morphable Models	Michail Tarasiou et.al.	2401.02937v1	link
2024-01-05	MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance	Renjie Pi et.al.	2401.02906v1	link
2024-01-05	Pheme: Efficient and Conversational Speech Generation	Paweł Budzianowski et.al.	2401.02839v1	null
2024-01-05	Fus-MAE: A cross-attention-based data fusion approach for Masked Autoencoders in remote sensing	Hugo Chan-To-Hing et.al.	2401.02764v1	null
2024-01-05	Detection and Classification of Diabetic Retinopathy using Deep Learning Algorithms for Segmentation to Facilitate Referral Recommendation for Test and Treatment Prediction	Manoj S H et.al.	2401.02759v1	link
2024-01-05	MAMI: Multi-Attentional Mutual-Information for Long Sequence Neuron Captioning	Alfirsa Damasyifa Fauzulhaq et.al.	2401.02744v1	null
2024-01-05	Synergistic Formulaic Alpha Generation for Quantitative Trading based on Reinforcement Learning	Hong-Gi Shin et.al.	2401.02710v1	null
2024-01-05	Benchmarking PathCLIP for Pathology Image Analysis	Sunyi Zheng et.al.	2401.02651v1	null
2024-01-05	MOODv2: Masked Image Modeling for Out-of-Distribution Detection	Jingyao Li et.al.	2401.02611v1	null
2024-01-04	Vulnerabilities Unveiled: Adversarially Attacking a Multimodal Vision Langauge Model for Pathology Imaging	Jai Prakash Veerla et.al.	2401.02565v1	null
2024-01-04	LLaMA Pro: Progressive LLaMA with Block Expansion	Chengyue Wu et.al.	2401.02415v1	link
2024-01-04	TinyLlama: An Open-Source Small Language Model	Peiyuan Zhang et.al.	2401.02385v1	link
2024-01-04	DIALIGHT: Lightweight Multilingual Development and Evaluation of Task-Oriented Dialogue Systems with Large Language Models	Songbo Hu et.al.	2401.02208v1	null
2024-01-04	Location Aware Modular Biencoder for Tourism Question Answering	Haonan Li et.al.	2401.02187v1	link
2024-01-04	SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment	Ziping Ma et.al.	2401.02137v1	null
2024-01-04	Text2MDT: Extracting Medical Decision Trees from Medical Texts	Wei Zhu et.al.	2401.02034v1	null
2024-01-03	Revisiting Zero-Shot Abstractive Summarization in the Era of Large Language Models from the Perspective of Position Bias	Anshuman Chhabra et.al.	2401.01989v1	link
2024-01-03	The Power of Training: How Different Neural Network Setups Influence the Energy Demand	Daniel Geißler et.al.	2401.01851v1	null
2024-01-03	FullLoRA-AT: Efficiently Boosting the Robustness of Pretrained Vision Transformers	Zheng Yuan et.al.	2401.01752v1	null
2024-01-03	De-Confusing Pseudo-Labels in Source-Free Domain Adaptation	Idit Diamant et.al.	2401.01650v1	null
2024-01-03	Towards a Foundation Purchasing Model: Pretrained Generative Autoregression on Transaction Sequences	Piotr Skalski et.al.	2401.01641v1	link
2024-01-02	Deep-ELA: Deep Exploratory Landscape Analysis with Self-Supervised Pretrained Transformers for Single- and Multi-Objective Continuous Optimization Problems	Moritz Vinzent Seiler et.al.	2401.01192v1	null
2024-01-02	Query-Based Knowledge Sharing for Open-Vocabulary Multi-Label Classification	Xuelin Zhu et.al.	2401.01181v1	null
2024-01-02	Quokka: An Open-source Large Language Model ChatBot for Material Science	Xianjun Yang et.al.	2401.01089v1	link
2024-01-02	LLaMA Beyond English: An Empirical Study on Language Capability Transfer	Jun Zhao et.al.	2401.01055v1	null
2024-01-02	Cheetah: Natural Language Generation for 517 African Languages	Ife Adebara et.al.	2401.01053v1	null
2024-01-01	Multi-Lattice Sampling of Quantum Field Theories via Neural Operators	Bálint Máté et.al.	2401.00828v1	null
2024-01-01	Self-supervised learning for skin cancer diagnosis with limited training data	Hamish Haggerty et.al.	2401.00692v1	null
2024-01-01	Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models	Guangji Bai et.al.	2401.00625v1	null
2023-12-31	Neural Networks Against (and For) Self-Training: Classification with Small Labeled and Large Unlabeled Sets	Payam Karisani et.al.	2401.00575v1	link
2023-12-31	A Generalist FaceX via Learning Unified Facial Representation	Yue Han et.al.	2401.00551v1	link
2023-12-29	MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining	Jacob Portes et.al.	2312.17482v1	null
2023-12-29	FerKD: Surgical Label Adaptation for Efficient Distillation	Zhiqiang Shen et.al.	2312.17473v1	link
2023-12-29	Video Understanding with Large Language Models: A Survey	Yunlong Tang et.al.	2312.17432v1	link
2023-12-28	The LLM Surgeon	Tycho F. A. van der Ouderaa et.al.	2312.17244v1	null
2023-12-28	Unsupervised Universal Image Segmentation	Dantong Niu et.al.	2312.17243v1	link
2023-12-28	Visual Explanations of Image-Text Representations via Multi-Modal Information Bottleneck Attribution	Ying Wang et.al.	2312.17174v1	link
2023-12-28	Non-Vacuous Generalization Bounds for Large Language Models	Sanae Lotfi et.al.	2312.17173v1	null
2023-12-28	Restoration by Generation with Constrained Priors	Zheng Ding et.al.	2312.17161v1	null
2023-12-28	Generative AI for Math: Part I -- MathPile: A Billion-Token-Scale Pretraining Corpus for Math	Zengzhi Wang et.al.	2312.17120v1	link
2023-12-29	Length Extrapolation of Transformers: A Survey from the Perspective of Position Encoding	Liang Zhao et.al.	2312.17044v2	null
2023-12-28	3DTINC: Time-Equivariant Non-Contrastive Learning for Predicting Disease Progression from Longitudinal OCTs	Taha Emre et.al.	2312.16980v1	null
2023-12-27	I2V-Adapter: A General Image-to-Video Adapter for Video Diffusion Models	Xun Guo et.al.	2312.16693v1	null
2023-12-27	LIP-Loc: LiDAR Image Pretraining for Cross-Modal Localization	Sai Shubodh Puligilla et.al.	2312.16648v1	null
2023-12-22	DRStageNet: Deep Learning for Diabetic Retinopathy Staging from Fundus Images	Yevgeniy Men et.al.	2312.14891v1	null
2023-12-22	Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models	Alan Chan et.al.	2312.14751v1	null
2023-12-22	Harnessing Diffusion Models for Visual Perception with Meta Prompts	Qiang Wan et.al.	2312.14733v1	link
2023-12-22	Inclusive normalization of face images to passport format	Hongliu Cao et.al.	2312.14544v1	null
2023-12-22	ADA-GAD: Anomaly-Denoised Autoencoders for Graph Anomaly Detection	Junwei He et.al.	2312.14535v1	null
2023-12-22	Generative Pretraining at Scale: Transformer-Based Encoding of Transactional Behavior for Fraud Detection	Ze Yu Zhao et.al.	2312.14406v1	null
2023-12-22	Unveiling Backbone Effects in CLIP: Exploring Representational Synergies and Variances	Cristian Rodriguez-Opazo et.al.	2312.14400v1	null
2023-12-22	StyleRetoucher: Generalized Portrait Image Retouching with GAN Priors	Wanchao Su et.al.	2312.14389v1	null
2023-12-21	Crystal Growth Characterization of WSe$_2$ Thin Film Using Machine Learning	Isaiah A. Moses et.al.	2312.14311v1	null
2023-12-21	DUSt3R: Geometric 3D Vision Made Easy	Shuzhe Wang et.al.	2312.14132v1	null
2023-12-21	VideoPoet: A Large Language Model for Zero-Shot Video Generation	Dan Kondratyuk et.al.	2312.14125v1	null
2023-12-21	Typhoon: Thai Large Language Models	Kunat Pipatanakul et.al.	2312.13951v1	null
2023-12-21	TinySAM: Pushing the Envelope for Efficient Segment Anything Model	Han Shu et.al.	2312.13789v1	link
2023-12-21	DreamTuner: Single Image is Enough for Subject-Driven Generation	Miao Hua et.al.	2312.13691v1	null
2023-12-21	DyBluRF: Dynamic Deblurring Neural Radiance Fields for Blurry Monocular Video	Minh-Quan Viet Bui et.al.	2312.13528v1	null
2023-12-20	Time is Encoded in the Weights of Finetuned Language Models	Kai Nylund et.al.	2312.13401v1	null
2023-12-20	Conditional Image Generation with Pretrained Generative Model	Rajesh Shrestha et.al.	2312.13253v1	null
2023-12-20	A 3D super-resolution of wind fields via physics-informed pixel-wise self-attention generative adversarial network	Takuya Kurihana et.al.	2312.13212v1	null
2023-12-21	Molecular Hypergraph Neural Networks	Junwu Chen et.al.	2312.13136v2	link
2023-12-19	Value Explicit Pretraining for Goal-Based Transfer Learning	Kiran Lekkala et.al.	2312.12339v1	null
2023-12-19	Parameter-Efficient Fine-Tuning Methods for Pretrained Language Models: A Critical Review and Assessment	Lingling Xu et.al.	2312.12148v1	null
2023-12-19	ZS-SRT: An Efficient Zero-Shot Super-Resolution Training Method for Neural Radiance Fields	Xiang Feng et.al.	2312.12122v1	null
2023-12-19	DMT: Comprehensive Distillation with Multiple Self-supervised Teachers	Yuang Liu et.al.	2312.11938v1	null
2023-12-19	Empowering Dual-Level Graph Self-Supervised Pretraining with Motif Discovery	Pengwei Yan et.al.	2312.11927v1	link
2023-12-18	Ultrasound Image Enhancement using CycleGAN and Perceptual Loss	Shreeram Athreya et.al.	2312.11748v1	link
2023-12-18	Evaluating Language-Model Agents on Realistic Autonomous Tasks	Megan Kinniment et.al.	2312.11671v1	null
2023-12-18	Implicit Affordance Acquisition via Causal Action-Effect Modeling in the Video Domain	Hsiu-Yu Yang et.al.	2312.11345v1	null
2023-12-18	UniDCP: Unifying Multiple Medical Vision-language Tasks via Dynamic Cross-modal Learnable Prompts	Chenlu Zhan et.al.	2312.11171v1	null
2023-12-19	Split and Rephrase with Large Language Models	David Ponce et.al.	2312.11075v2	null
2023-12-17	CEIR: Concept-based Explainable Image Representation Learning	Yan Cui et.al.	2312.10747v1	null
2023-12-17	Addressing Sample Inefficiency in Multi-View Representation Learning	Kumar Krishna Agrawal et.al.	2312.10725v1	null
2023-12-17	T2M-HiFiGPT: Generating High Quality Human Motion from Textual Descriptions with Residual Discrete Representations	Congyi Wang et.al.	2312.10628v1	null
2023-12-17	Do LLMs Work on Charts? Designing Few-Shot Prompts for Chart Question Answering and Summarization	Xuan Long Do et.al.	2312.10610v1	null
2023-12-16	Paloma: A Benchmark for Evaluating Language Model Fit	Ian Magnusson et.al.	2312.10523v1	null
2023-12-16	Enhancing Person Re-Identification through Tensor Feature Fusion	Akram Abderraouf Gharbi et.al.	2312.10470v1	null
2023-12-16	RetailKLIP : Finetuning OpenCLIP backbone using metric learning on a single GPU for Zero-shot retail product image classification	Muktabh Mayank Srivastava et.al.	2312.10282v1	null
2023-12-15	Bayesian Estimate of Mean Proper Scores for Diversity-Enhanced Active Learning	Wei Tan et.al.	2312.10116v1	null
2023-12-15	PathoDuet: Foundation Models for Pathological Slide Analysis of H&E and IHC Stains	Shengyi Hua et.al.	2312.09894v1	link
2023-12-15	Probing Pretrained Language Models with Hierarchy Properties	Jesús Lovón-Melgarejo et.al.	2312.09670v1	null
2023-12-15	Vectorizing string entries for data processing on tables: when are larger language models better?	Léo Grinsztajn et.al.	2312.09634v1	null
2023-12-15	Image Deblurring using GAN	Zhengdong Li et.al.	2312.09496v1	null
2023-12-14	Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision	Collin Burns et.al.	2312.09390v1	null
2023-12-14	Weight subcloning: direct initialization of transformers using larger pretrained ones	Mohammad Samragh et.al.	2312.09299v1	null
2023-12-14	ZeroRF: Fast Sparse View 360° Reconstruction with Zero Pretraining	Ruoxi Shi et.al.	2312.09249v1	null
2023-12-14	Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking	Jacob Eisenstein et.al.	2312.09244v1	null
2023-12-14	OccNeRF: Self-Supervised Multi-Camera Occupancy Prediction with Neural Radiance Fields	Chubin Zhang et.al.	2312.09243v1	link
2023-12-14	Reliability in Semantic Segmentation: Can We Use Synthetic Data?	Thibaut Loiseau et.al.	2312.09231v1	null
2023-12-14	WIT-UAS: A Wildland-fire Infrared Thermal Dataset to Detect Crew Assets From Aerial Views	Andrew Jong et.al.	2312.09159v1	link
2023-12-14	Exploring Transferability for Randomized Smoothing	Kai Qiu et.al.	2312.09020v1	null
2023-12-14	OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers	Han Liang et.al.	2312.08985v1	null
2023-12-14	BiPFT: Binary Pre-trained Foundation Transformer with Low-rank Estimation of Binarization Residual Polynomials	Xingrun Xing et.al.	2312.08937v1	link
2023-12-14	Guided Diffusion from Self-Supervised Diffusion Features	Vincent Tao Hu et.al.	2312.08825v1	null
2023-12-14	Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention	Kaiqiang Song et.al.	2312.08618v1	null
2023-12-13	Enhancing Robot Program Synthesis Through Environmental Context	Tianyi Chen et.al.	2312.08250v1	null
2023-12-13	Patch-wise Graph Contrastive Learning for Image Translation	Chanyong Jung et.al.	2312.08223v1	null
2023-12-13	Knowledge-Aware Artifact Image Synthesis with LLM-Enhanced Prompting and Multi-Source Supervision	Shengguang Wu et.al.	2312.08056v1	null
2023-12-13	SLJP: Semantic Extraction based Legal Judgment Prediction	Prameela Madambakam et.al.	2312.07979v1	null
2023-12-13	CoIE: Chain-of-Instruct Editing for Multi-Attribute Face Manipulation	Zhenduo Zhang et.al.	2312.07879v1	null
2023-12-13	Foundation Models in Robotics: Applications, Challenges, and the Future	Roya Firoozi et.al.	2312.07843v1	null
2023-12-13	A Foundational Multimodal Vision Language AI Assistant for Human Pathology	Ming Y. Lu et.al.	2312.07814v1	null
2023-12-12	Tell, don't show: Declarative facts influence how LLMs generalize	Alexander Meinke et.al.	2312.07779v1	null
2023-12-12	A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning	Yinmin Zhang et.al.	2312.07685v1	null
2023-12-12	GMTalker: Gaussian Mixture based Emotional talking video Portraits	Yibo Xia et.al.	2312.07669v1	null
2023-12-12	Double-Flow GAN model for the reconstruction of perceived faces from brain activities	Zihao Wang et.al.	2312.07478v1	null
2023-12-12	Cross-modal Contrastive Learning with Asymmetric Co-attention Network for Video Moment Retrieval	Love Panta et.al.	2312.07435v1	null
2023-12-12	How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation	Zhongyi Han et.al.	2312.07424v1	null
2023-12-12	ICL Markup: Structuring In-Context Learning using Soft-Token Tags	Marc-Etienne Brunet et.al.	2312.07405v1	null
2023-12-12	Benchmarking Pretrained Vision Embeddings for Near- and Duplicate Detection in Medical Images	Tuan Truong et.al.	2312.07273v1	null
2023-12-12	ProxyDet: Synthesizing Proxy Novel Classes via Classwise Mixup for Open Vocabulary Object Detection	Joonhyun Jeong et.al.	2312.07266v1	null
2023-12-12	Dynamic Corrective Self-Distillation for Better Fine-Tuning of Pretrained Models	Ibtihel Amara et.al.	2312.07028v1	null
2023-12-12	CCM: Adding Conditional Controls to Text-to-Image Consistency Models	Jie Xiao et.al.	2312.06971v1	null
2023-12-12	READ-PVLA: Recurrent Adapter with Partial Video-Language Alignment for Parameter-Efficient Transfer Learning in Low-Resource Video-Language Modeling	Thong Nguyen et.al.	2312.06950v1	null
2023-12-11	DYAD: A Descriptive Yet Abjuring Density efficient approximation to linear neural network layers	Sarin Chandy et.al.	2312.06881v1	link
2023-12-11	De novo Design of Polymer Electrolytes with High Conductivity using GPT-based and Diffusion-based Generative Models	Zhenze Yang et.al.	2312.06470v1	null
2023-12-11	PointVoxel: A Simple and Effective Pipeline for Multi-View Multi-Modal 3D Human Pose Estimation	Zhiyu Pan et.al.	2312.06409v1	null
2023-12-11	MMDesign: Multi-Modality Transfer Learning for Generative Protein Design	Jiangbin Zheng et.al.	2312.06297v1	null
2023-12-11	Medical Vision Language Pretraining: A survey	Prashant Shrestha et.al.	2312.06224v1	null
2023-12-10	NovaCOMET: Open Commonsense Foundation Models with Symbolic Knowledge Distillation	Peter West et.al.	2312.05979v1	null
2023-12-10	A Comprehensive Dataset and Automated Pipeline for Nailfold Capillary Analysis	Linxi Zhao et.al.	2312.05930v1	link
2023-12-10	Building Variable-sized Models via Learngene Pool	Boyu Shi et.al.	2312.05743v1	null
2023-12-10	Initialization Matters for Adversarial Transfer Learning	Andong Hua et.al.	2312.05716v1	null
2023-12-09	Understanding the Effect of Model Compression on Social Bias in Large Language Models	Gustavo Gonçalves et.al.	2312.05662v1	link
2023-12-09	Enhancing Medical Specialty Assignment to Patients using NLP Techniques	Chris Solomou et.al.	2312.05585v1	null
2023-12-08	SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation	Thuan Hoang Nguyen et.al.	2312.05239v1	null
2023-12-08	Datasets, Models, and Algorithms for Multi-Sensor, Multi-agent Autonomy Using AVstack	R. Spencer Hallyburton et.al.	2312.04970v1	null
2023-12-08	Zoology: Measuring and Improving Recall in Efficient Language Models	Simran Arora et.al.	2312.04927v1	link
2023-12-08	Cross-BERT for Point Cloud Pretraining	Xin Li et.al.	2312.04891v1	null
2023-12-08	Adapting Vision Transformer for Efficient Change Detection	Yang Zhao et.al.	2312.04869v1	null
2023-12-08	HuRef: HUman-REadable Fingerprint for Large Language Models	Boyi Zeng et.al.	2312.04828v1	null
2023-12-07	STraceBERT: Source Code Retrieval using Semantic Application Traces	Claudio Spiess et.al.	2312.04731v1	null
2023-12-07	Simul-LLM: A Framework for Exploring High-Quality Simultaneous Translation with Large Language Models	Victor Agostinelli et.al.	2312.04691v1	null
2023-12-07	ConVRT: Consistent Video Restoration Through Turbulence with Test-time Optimization of Neural Video Representations	Haoming Cai et.al.	2312.04679v1	null
2023-12-07	On Sarcasm Detection with OpenAI GPT-based Models	Montgomery Gole et.al.	2312.04642v1	null
2023-12-07	Intelligent Anomaly Detection for Lane Rendering Using Transformer with Self-Supervised Pre-Training and Customized Fine-Tuning	Yongqi Dong et.al.	2312.04398v1	null
2023-12-07	Multi-View Unsupervised Image Generation with Cross Attention Guidance	Llukman Cerkezi et.al.	2312.04337v1	null
2023-12-07	Diffusing Colors: Image Colorization with Text Guided Diffusion	Nir Zabari et.al.	2312.04145v1	null
2023-12-07	Instance Tracking in 3D Scenes from Egocentric Videos	Yunhan Zhao et.al.	2312.04117v1	link
2023-12-07	Enhancing the Rationale-Input Alignment for Self-explaining Rationalization	Wei Liu et.al.	2312.04103v1	null
2023-12-05	DiffusionAtlas: High-Fidelity Consistent Diffusion Video Editing	Shao-Yu Chang et.al.	2312.03772v1	null
2023-12-06	Blueprinting the Future: Automatic Item Categorization using Hierarchical Zero-Shot and Few-Shot Classifiers	Ting Wang et.al.	2312.03561v1	null
2023-12-06	PneumoLLM: Harnessing the Power of Large Language Model for Pneumoconiosis Diagnosis	Meiyue Song et.al.	2312.03490v1	link
2023-12-06	Molecule Joint Auto-Encoding: Trajectory Pretraining with 2D and 3D Diffusion	Weitao Du et.al.	2312.03475v1	null
2023-12-05	Leveraging Laryngograph Data for Robust Voicing Detection in Speech	Yixuan Zhang et.al.	2312.03129v1	link
2023-12-05	DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control	Yuru Jia et.al.	2312.03048v1	null
2023-12-05	MagicStick: Controllable Video Editing via Control Handle Transformations	Yue Ma et.al.	2312.03047v1	link
2023-12-05	Zero-Shot Point Cloud Registration	Weijie Wang et.al.	2312.03032v1	null
2023-12-05	WhisBERT: Multimodal Text-Audio Language Modeling on 100M Words	Lukas Wolf et.al.	2312.02931v1	null
2023-12-05	Rare Galaxy Classes Identified In Foundation Model Representations	Mike Walmsley et.al.	2312.02910v1	null
2023-12-05	Large Knowledge Model: Perspectives and Challenges	Huajun Chen et.al.	2312.02706v1	null
2023-12-05	Prompt2NeRF-PIL: Fast NeRF Generation via Pretrained Implicit Latent	Jianmeng Liu et.al.	2312.02568v1	null
2023-12-05	Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation	Shanshan Zhong et.al.	2312.02439v1	link
2023-12-05	Visually Grounded Language Learning: a review of language games, datasets, tasks, and models	Alessandro Suglia et.al.	2312.02431v1	null
2023-12-05	Efficient Online Data Mixing For Language Model Pre-Training	Alon Albalak et.al.	2312.02406v1	null
2023-12-04	FaultFormer: Transformer-based Prediction of Bearing Faults	Anthony Zhou et.al.	2312.02380v1	null
2023-12-04	Rejuvenating image-GPT as Strong Visual Representation Learners	Sucheng Ren et.al.	2312.02147v1	link
2023-12-04	Object Recognition as Next Token Prediction	Kaiyu Yue et.al.	2312.02142v1	link
2023-12-04	TPPoet: Transformer-Based Persian Poem Generation using Minimal Data and Advanced Decoding Techniques	Amir Panahandeh et.al.	2312.02125v1	null
2023-12-04	Open-DDVM: A Reproduction and Extension of Diffusion Model for Optical Flow Estimation	Qiaole Dong et.al.	2312.01746v1	link
2023-12-04	Data Management For Large Language Models: A Survey	Zige Wang et.al.	2312.01700v1	null
2023-12-04	SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference	Feng Wang et.al.	2312.01597v1	null
2023-12-04	APoLLo: Unified Adapter and Prompt Learning for Vision Language Models	Sanjoy Chowdhury et.al.	2312.01564v1	null
2023-12-03	Effectively Fine-tune to Improve Large Multimodal Models for Radiology Report Generation	Yuzhe Lu et.al.	2312.01504v1	null
2023-12-03	Improving In-Context Learning in Diffusion Models with Visual Context-Modulated Prompts	Tianqi Chen et.al.	2312.01408v1	null
2023-12-03	Stable Messenger: Steganography for Message-Concealed Image Generation	Quang Nguyen et.al.	2312.01284v1	null
2023-12-01	Mamba: Linear-Time Sequence Modeling with Selective State Spaces	Albert Gu et.al.	2312.00752v1	null
2023-12-01	GIFT: Generative Interpretable Fine-Tuning Transformers	Chinmay Savadikar et.al.	2312.00700v1	link
2023-12-01	Nonparametric Variational Regularisation of Pretrained Transformers	Fabio Fehr et.al.	2312.00662v1	null
2023-12-01	Summarization-based Data Augmentation for Document Classification	Yueguan Wang et.al.	2312.00513v1	link
2023-12-01	PyraTrans: Learning Attention-Enriched Multi-Scale Pyramid Network from Pre-Trained Transformers for Effective Malicious URL Detection	Ruitong Liu et.al.	2312.00508v1	null
2023-12-01	On the Out-Of-Distribution Robustness of Self-Supervised Representation Learning for Phonocardiogram Signals	Aristotelis Ballas et.al.	2312.00502v1	link
2023-12-01	Towards Generalizable Referring Image Segmentation via Target Prompt and Visual Coherence	Yajie Liu et.al.	2312.00452v1	null
2023-12-01	Dolphins: Multimodal Language Model for Driving	Yingzi Ma et.al.	2312.00438v1	null
2023-12-01	PEFTDebias : Capturing debiasing information using PEFTs	Sumit Agarwal et.al.	2312.00434v1	null
2023-12-01	SynFundus: Generating a synthetic fundus images dataset with millions of samples and multi-disease annotations	Fangxin Shang et.al.	2312.00377v1	null
2023-11-30	Initializing Models with Larger Ones	Zhiqiu Xu et.al.	2311.18823v1	link
2023-11-30	ElasticDiffusion: Training-free Arbitrary Size Image Generation	Moayed Haji-Ali et.al.	2311.18822v1	link
2023-11-30	ArthModel: Enhance Arithmetic Skills to Large Language Model	Yingdi Guo et.al.	2311.18609v1	null
2023-11-30	Dataset Distillation via the Wasserstein Metric	Haoyang Liu et.al.	2311.18531v1	null
2023-11-30	MV-CLIP: Multi-View CLIP for Zero-shot 3D Shape Recognition	Dan Song et.al.	2311.18402v1	null
2023-11-30	Transfer Learning across Different Chemical Domains: Virtual Screening of Organic Materials with Deep Learning Models Pretrained on Small Molecule and Chemical Reaction Data	Chengwei Zhang et.al.	2311.18377v1	null
2023-11-30	Hubness Reduction Improves Sentence-BERT Semantic Spaces	Beatrix M. G. Nielsen et.al.	2311.18364v1	link
2023-11-30	OmniMotionGPT: Animal Motion Generation with Limited Data	Zhangsihao Yang et.al.	2311.18303v1	null
2023-11-30	HKUST at SemEval-2023 Task 1: Visual Word Sense Disambiguation with Context Augmentation and Visual Assistance	Zhuohao Yin et.al.	2311.18273v1	link
2023-11-30	LLVMs4Protest: Harnessing the Power of Large Language and Vision Models for Deciphering Protests in the News	Yongjun Zhang et.al.	2311.18241v1	link
2023-11-29	A Simple Recipe for Language-guided Domain Generalized Segmentation	Mohammad Fahes et.al.	2311.17922v1	null
2023-11-29	Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation	Shuangrui Ding et.al.	2311.17893v1	null
2023-11-29	Evaluating VLMs for Score-Based, Multi-Probe Annotation of 3D Objects	Rishabh Kabra et.al.	2311.17851v1	null
2023-11-29	SPiC-E : Structural Priors in 3D Diffusion Models using Cross Entity Attention	Etai Sella et.al.	2311.17834v1	null
2023-11-29	DAP: Domain-aware Prompt Learning for Vision-and-Language Navigation	Ting Liu et.al.	2311.17812v1	null
2023-11-29	PillarNeSt: Embracing Backbone Scaling and Pretraining for Pillar-based 3D Object Detection	Weixin Mao et.al.	2311.17770v1	null
2023-11-29	SAMPro3D: Locating SAM Prompts in 3D for Zero-Shot Scene Segmentation	Mutian Xu et.al.	2311.17707v1	null
2023-11-29	Vulnerability of Automatic Identity Recognition to Audio-Visual Deepfakes	Pavel Korshunov et.al.	2311.17655v1	null
2023-11-29	Continual Learning with Low Rank Adaptation	Martin Wistuba et.al.	2311.17601v1	null
2023-11-29	HiDiffusion: Unlocking High-Resolution Creativity and Efficiency in Low-Resolution Trained Diffusion Models	Shen Zhang et.al.	2311.17528v1	null
2023-11-28	Self-Supervised Motion Magnification by Backpropagating Through Optical Flow	Zhaoying Pan et.al.	2311.17056v1	null
2023-11-28	Surf-D: High-Quality Surface Generation for Arbitrary Topologies using Diffusion Models	Zhengming Yu et.al.	2311.17050v1	null
2023-11-28	MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training	Pavan Kumar Anasosalu Vasu et.al.	2311.17049v1	null
2023-11-28	Debiasing Multimodal Models via Causal Information Minimization	Vaidehi Patil et.al.	2311.16941v1	link
2023-11-28	LLaFS: When Large-Language Models Meet Few-Shot Segmentation	Lanyun Zhu et.al.	2311.16926v1	link
2023-11-28	The Falcon Series of Open Language Models	Ebtesam Almazrouei et.al.	2311.16867v1	null
2023-11-28	As-Plausible-As-Possible: Plausibility-Aware Mesh Deformation Using 2D Diffusion Priors	Seungwoo Yoo et.al.	2311.16739v1	null
2023-11-28	CLAP: Contrastive Learning with Augmented Prompts for Robustness on Pretrained Vision-Language Models	Yichao Cai et.al.	2311.16445v1	null
2023-11-28	Text-Driven Image Editing via Learnable Regions	Yuanze Lin et.al.	2311.16432v1	link
2023-11-28	Manifold Preserving Guided Diffusion	Yutong He et.al.	2311.16424v1	null
2023-11-27	On Bringing Robots Home	Nur Muhammad Mahi Shafiullah et.al.	2311.16098v1	link
2023-11-27	ViT-Lens-2: Gateway to Omni-modal Intelligence	Weixian Lei et.al.	2311.16081v1	link
2023-11-27	MEDITRON-70B: Scaling Medical Pretraining for Large Language Models	Zeming Chen et.al.	2311.16079v1	link
2023-11-27	Exploring Attribute Variations in Style-based GANs using Diffusion Models	Rishubh Parihar et.al.	2311.16052v1	null
2023-11-27	Sparsify-then-Classify: From Internal Neurons of Large Language Models To Efficient Text Classifiers	Yilun Liu et.al.	2311.15983v1	null
2023-11-27	YUAN 2.0: A Large Language Model with Localized Filtering-based Attention	Shaohua Wu et.al.	2311.15786v1	link
2023-11-27	Enhancing Diffusion Models with Text-Encoder Reinforcement Learning	Chaofeng Chen et.al.	2311.15657v1	link
2023-11-27	ET3D: Efficient Text-to-3D Generation via Multi-View Distillation	Yiming Chen et.al.	2311.15561v1	null
2023-11-27	Dataset Distillation in Latent Space	Yuxuan Duan et.al.	2311.15547v1	null
2023-11-27	AerialBooth: Mutual Information Guidance for Text Controlled Aerial View Synthesis from a Single Image	Divya Kothandaraman et.al.	2311.15478v1	null
2023-11-24	Calibrated Language Models Must Hallucinate	Adam Tauman Kalai et.al.	2311.14648v1	null
2023-11-24	tinyCLAP: Distilling Constrastive Language-Audio Pretrained Models	Francesco Paissan et.al.	2311.14517v1	null
2023-11-24	LLamol: A Dynamic Multi-Conditional Generative Transformer for De Novo Molecular Design	Niklas Dobberstein et.al.	2311.14407v1	null
2023-11-24	ÚFAL CorPipe at CRAC 2023: Larger Context Improves Multilingual Coreference Resolution	Milan Straka et.al.	2311.14391v1	null
2023-11-24	Binarized 3D Whole-body Human Mesh Recovery	Zhiteng Li et.al.	2311.14323v1	link
2023-11-24	ZeroPS: High-quality Cross-modal Knowledge Transfer for Zero-Shot 3D Part Segmentation	Yuheng Xue et.al.	2311.14262v1	null
2023-11-23	Learning to Solve Inverse Problems for Perceptual Sound Matching	Han Han et.al.	2311.14213v1	null
2023-11-23	Hardware Resilience Properties of Text-Guided Image Classifiers	Syed Talal Wasim et.al.	2311.14062v1	link
2023-11-23	Dialogue Quality and Emotion Annotations for Customer Support Conversations	John Mendonça et.al.	2311.13910v1	link
2023-11-23	General Phrase Debiaser: Debiasing Masked Language Models at a Multi-Token Level	Bingkang Shi et.al.	2311.13892v1	link
2023-11-22	Medical Image Retrieval Using Pretrained Embeddings	Farnaz Khun Jush et.al.	2311.13547v1	null
2023-11-22	Revisiting Machine Learning based Test Case Prioritization for Continuous Integration	Yifan Zhao et.al.	2311.13413v1	link
2023-11-22	High-Quality Face Caricature via Style Translation	Lamyanba Laishram et.al.	2311.13338v1	null
2023-11-22	FedFN: Feature Normalization for Alleviating Data Heterogeneity Problem in Federated Learning	Seongyoon Kim et.al.	2311.13267v1	null
2023-11-22	On the Calibration of Large Language Models and Alignment	Chiwei Zhu et.al.	2311.13240v1	null
2023-11-22	Volumetric Reconstruction Resolves Off-Resonance Artifacts in Static and Dynamic PROPELLER MRI	Annesha Ghosh et.al.	2311.13177v1	link
2023-11-22	GENET: Unleashing the Power of Side Information for Recommendation via Hypergraph Pre-training	Yang Li et.al.	2311.13121v1	null
2023-11-21	Diffusion Model Alignment Using Direct Preference Optimization	Bram Wallace et.al.	2311.12908v1	null
2023-11-21	Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks	Samyak Jain et.al.	2311.12786v1	null
2023-11-21	Oasis: Data Curation and Assessment System for Pretraining of Large Language Models	Tong Zhou et.al.	2311.12537v1	link
2023-11-21	Adapting pretrained speech model for Mandarin lyrics transcription and alignment	Jun-You Wang et.al.	2311.12488v1	null
2023-11-21	PhayaThaiBERT: Enhancing a Pretrained Thai Language Model with Unassimilated Loanwords	Panyut Sriwirote et.al.	2311.12475v1	null
2023-11-21	Malicious URL Detection via Pretrained Language Model Guided Multi-Level Feature Attention Network	Ruitong Liu et.al.	2311.12372v1	null
2023-11-21	A Supervised Contrastive Learning Pretrain-Finetune Approach for Time Series	Trang H. Tran et.al.	2311.12290v1	null
2023-11-21	ATLANTIC: Structure-Aware Retrieval-Augmented Language Model for Interdisciplinary Science	Sai Munikoti et.al.	2311.12289v1	null
2023-11-21	Equipping Pretrained Unconditional Music Transformers with Instrument and Genre Controls	Weihan Xu et.al.	2311.12257v1	null
2023-11-20	LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning	Han Guo et.al.	2311.12023v1	link
2023-11-20	Context-aware Neural Machine Translation for English-Japanese Business Scene Dialogues	Sumire Honda et.al.	2311.11976v1	link
2023-11-20	Adaptive Training Distributions with Scalable Online Bilevel Optimization	David Grangier et.al.	2311.11973v1	null
2023-11-20	LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge	Gongwei Chen et.al.	2311.11860v1	null
2023-11-20	Efficient Grammatical Error Correction Via Multi-Task Training and Optimized Training Schedule	Andrey Bout et.al.	2311.11813v1	null
2023-11-20	KBioXLM: A Knowledge-anchored Biomedical Multilingual Pretrained Language Model	Lei Geng et.al.	2311.11564v1	link
2023-11-20	A Multi-Center Study on the Adaptability of a Shared Foundation Model for Electronic Health Records	Lin Lawrence Guo et.al.	2311.11483v1	null
2023-11-19	Self-Supervised Pretraining for Heterogeneous Hypergraph Neural Networks	Abdalgader Abubaker et.al.	2311.11368v1	null
2023-11-19	Pair-wise Layer Attention with Spatial Masking for Video Prediction	Ping Li et.al.	2311.11289v1	link
2023-11-19	Uncertainty quantification for noisy inputs-outputs in physics-informed neural networks and neural operators	Zongren Zou et.al.	2311.11262v1	null
2023-11-17	Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2	Hamish Ivison et.al.	2311.10702v1	null
2023-11-17	Bias A-head? Analyzing Bias in Transformer-Based Language Model Attention Heads	Yi Yang et.al.	2311.10395v1	null
2023-11-17	Leveraging Function Space Aggregation for Federated Learning at Scale	Nikita Dhawan et.al.	2311.10291v1	null
2023-11-17	Diagnosing and Debiasing Corpus-Based Political Bias and Insults in GPT2	Ambri Ma et.al.	2311.10266v1	null
2023-11-16	Latent Feature-based Data Splits to Improve Generalisation Evaluation: A Hate Speech Detection Case Study	Maike Züfle et.al.	2311.10236v1	link
2023-11-16	Self-supervised learning of multi-omics embeddings in the low-label, high-data regime	Christian John Hurry et.al.	2311.09962v1	null
2023-11-16	Overcoming Data Scarcity in Biomedical Imaging with a Foundational Multi-Task Model	Raphael Schäfer et.al.	2311.09847v1	null
2023-11-16	Investigating Data Contamination in Modern Benchmarks for Large Language Models	Chunyuan Deng et.al.	2311.09783v1	null
2023-11-16	Back to Basics: A Simple Recipe for Improving Out-of-Domain Retrieval in Dense Encoders	Hyunji Lee et.al.	2311.09765v1	link
2023-11-16	Translation Aligned Sentence Embeddings for Turkish Language	Eren Unlu et.al.	2311.09748v1	null
2023-11-16	Whispers of Doubt Amidst Echoes of Triumph in NLP Robustness	Ashim Gupta et.al.	2311.09694v1	null
2023-11-16	Augmenting Unsupervised Reinforcement Learning with Self-Reference	Andrew Zhao et.al.	2311.09692v1	null
2023-11-16	Evolving Domain Adaptation of Pretrained Language Models for Text Classification	Yun-Shiuan Chuang et.al.	2311.09661v1	null
2023-11-16	Efficient End-to-End Visual Document Understanding with Rationale Distillation	Wang Zhu et.al.	2311.09612v1	null
2023-11-16	Enchancing Semi-Supervised Learning for Extractive Summarization with an LLM-based pseudolabeler	Gaurav Sahu et.al.	2311.09559v1	null
2023-11-15	Single-Image 3D Human Digitization with Shape-Guided Diffusion	Badour AlBahar et.al.	2311.09221v1	null
2023-11-15	TableLlama: Towards Open Large Generalist Models for Tables	Tianshu Zhang et.al.	2311.09206v1	null
2023-11-15	Do Localization Methods Actually Localize Memorized Data in LLMs?	Ting-Yun Chang et.al.	2311.09060v1	null
2023-11-15	Data Similarity is Not Enough to Explain Language Model Performance	Gregory Yauney et.al.	2311.09006v1	link
2023-11-15	Incremental Object-Based Novelty Detection with Feedback Loop	Simone Caldarella et.al.	2311.09004v1	null
2023-11-15	OFA: A Framework of Initializing Unseen Subword Embeddings for Efficient Large-scale Multilingual Continued Pretraining	Yihong Liu et.al.	2311.08849v1	null
2023-11-15	An Eye on Clinical BERT: Investigating Language Model Generalization for Diabetic Eye Disease Phenotyping	Keith Harrigian et.al.	2311.08687v1	null
2023-11-15	Multistage Collaborative Knowledge Distillation from Large Language Models	Jiachen Zhao et.al.	2311.08640v1	null
2023-11-14	Unsupervised segmentation of irradiation$\unicode{x2010}$induced order$\unicode{x2010}$disorder phase transitions in electron microscopy	Arman H Ter-Petrosyan et.al.	2311.08585v1	null
2023-11-14	UT5: Pretraining Non autoregressive T5 with unrolled denoising	Mahmoud G. Salem et.al.	2311.08552v1	null
2023-11-14	Open-vocabulary keyword spotting in any language through multilingual contrastive speech-phoneme pretraining	Jian Zhu et.al.	2311.08323v1	null
2023-11-14	ARTEMIS: Using GANs with Multiple Discriminators to Generate Art	James Baker et.al.	2311.08278v1	null
2023-11-14	Investigating the Encoding of Words in BERT's Neurons using Feature Textualization	Tanja Baeumel et.al.	2311.08240v1	null
2023-11-14	Unlock the Power: Competitive Distillation for Multi-Modal Large Language Models	Xinwei Li et.al.	2311.08213v1	null
2023-11-14	A Survey on Language Models for Code	Ziyin Zhang et.al.	2311.07989v1	link
2023-11-14	Test-Time Training for Semantic Segmentation with Output Contrastive Loss	Yunlong Zhang et.al.	2311.07877v1	link
2023-11-14	Probing clustering in neural network representations	Thao Nguyen et.al.	2311.07864v1	null
2023-11-14	Overview of the TREC 2023 Product Product Search Track	Daniel Campos et.al.	2311.07861v1	null
2023-11-14	Learning Mutually Informed Representations for Characters and Subwords	Yilin Wang et.al.	2311.07853v1	null
2023-11-13	IruMozhi: Automatically classifying diglossia in Tamil	Kabilan Prasanna et.al.	2311.07804v1	null
2023-11-13	Masked Face Dataset Generation and Masked Face Recognition	Rui Cai et.al.	2311.07475v1	link
2023-11-13	Are We Falling in a Middle-Intelligence Trap? An Analysis and Mitigation of the Reversal Curse	Ang Lv et.al.	2311.07468v1	null
2023-11-13	Language Grounded QFormer for Efficient Vision Language Understanding	Moulik Choraria et.al.	2311.07449v1	null
2023-11-13	Hallucination Augmented Recitations for Language Models	Abdullatif Köksal et.al.	2311.07424v1	null
2023-11-13	Fine-Tuning the Retrieval Mechanism for Tabular Deep Learning	Felix den Breejen et.al.	2311.07343v1	null
2023-11-13	On Elastic Language Models	Chen Zhang et.al.	2311.07204v1	null
2023-11-13	Developing a Named Entity Recognition Dataset for Tagalog	Lester James V. Miranda et.al.	2311.07161v1	null
2023-11-13	SpectralGPT: Spectral Foundation Model	Danfeng Hong et.al.	2311.07113v1	null
2023-11-13	ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models	Ilker Kesen et.al.	2311.07022v1	null
2023-11-12	Concept-wise Fine-tuning Matters in Preventing Negative Transfer	Yunqiao Yang et.al.	2311.06868v1	null
2023-11-10	BanglaBait: Semi-Supervised Adversarial Approach for Clickbait Detection on Bangla Clickbait Dataset	Md. Motahar Mahtab et.al.	2311.06204v1	link
2023-11-10	FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores	Daniel Y. Fu et.al.	2311.05908v1	null
2023-11-10	AI-native Interconnect Framework for Integration of Large Language Model Technologies in 6G Systems	Sasu Tarkoma et.al.	2311.05842v1	null
2023-11-09	Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in Clutter	Georgios Tziafas et.al.	2311.05779v1	link
2023-11-09	Efficiently Adapting Pretrained Language Models To New Languages	Zoltan Csaki et.al.	2311.05741v1	null
2023-11-09	Window Attention is Bugged: How not to Interpolate Position Embeddings	Daniel Bolya et.al.	2311.05613v1	null
2023-11-09	Mirror: A Universal Framework for Various Information Extraction Tasks	Tong Zhu et.al.	2311.05419v1	link
2023-11-09	Improving Hand Recognition in Uncontrolled and Uncooperative Environments using Multiple Spatial Transformers and Loss Functions	Wojciech Michal Matkowski et.al.	2311.05383v1	null
2023-11-09	ConRad: Image Constrained Radiance Fields for 3D Generation from a Single Image	Senthil Purushwalkam et.al.	2311.05230v1	null
2023-11-08	Interpreting Pretrained Language Models via Concept Bottlenecks	Zhen Tan et.al.	2311.05014v1	null
2023-11-08	Lightweight Diffusion Models with Distillation-Based Block Neural Architecture Search	Siao Tang et.al.	2311.04950v1	null
2023-11-08	Beyond Size: How Gradients Shape Pruning Decisions in Large Language Models	Rocktim Jyoti Das et.al.	2311.04902v1	link
2023-11-08	Towards Few-Annotation Learning in Computer Vision: Application to Image Classification and Object Detection tasks	Quentin Bouniot et.al.	2311.04888v1	null
2023-11-08	DACBERT: Leveraging Dependency Agreement for Cost-Efficient Bert Pretraining	Martin Kuo et.al.	2311.04799v1	null
2023-11-08	Training CLIP models on Data from Scientific Papers	Calvin Metzger et.al.	2311.04711v1	link
2023-11-08	Army of Thieves: Enhancing Black-Box Model Extraction via Ensemble based sample selection	Akshit Jindal et.al.	2311.04588v1	link
2023-11-08	Large GPT-like Models are Bad Babies: A Closer Look at the Relationship between Linguistic Competence and Psycholinguistic Measures	Julius Steuer et.al.	2311.04547v1	null
2023-11-07	3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features	Chenfeng Xu et.al.	2311.04391v1	null
2023-11-07	Evaluating the Effectiveness of Retrieval-Augmented Large Language Models in Scientific Document Reasoning	Sai Munikoti et.al.	2311.04348v1	null
2023-11-07	Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning	Rishabh Jain et.al.	2311.04313v1	null
2023-11-07	Selective Visual Representations Improve Convergence and Generalization for Embodied AI	Ainaz Eftekhar et.al.	2311.04193v1	null
2023-11-07	Do Language Models Learn Semantics of Code? A Case Study in Vulnerability Detection	Benjamin Steenhoek et.al.	2311.04109v1	null
2023-11-07	Language Representation Projection: Can We Transfer Factual Knowledge across Languages in Multilingual Language Models?	Shaoyang Xu et.al.	2311.03788v1	null
2023-11-07	Unified Low-Resource Sequence Labeling by Sample-Aware Dynamic Sparse Finetuning	Sarkar Snigdha Sarathi Das et.al.	2311.03748v1	link
2023-11-07	Analysis of the User Perception of Chatbots in Education Using A Partial Least Squares Structural Equation Modeling Approach	Md Rabiul Hasan et.al.	2311.03636v1	null
2023-11-06	Uni-O4: Unifying Online and Offline Deep Reinforcement Learning with Multi-Step On-Policy Optimization	Kun Lei et.al.	2311.03351v1	null
2023-11-06	A Foundation Model for Music Informatics	Minz Won et.al.	2311.03318v1	link
2023-11-07	S-LoRA: Serving Thousands of Concurrent LoRA Adapters	Ying Sheng et.al.	2311.03285v2	link
2023-11-06	An Efficient Self-Supervised Cross-View Training For Sentence Embedding	Peerat Limkonchotiwat et.al.	2311.03228v1	link
2023-11-06	LDM3D-VR: Latent Diffusion Model for 3D VR	Gabriela Ben Melech Stan et.al.	2311.03226v1	null
2023-11-06	A Simple yet Efficient Ensemble Approach for AI-generated Text Detection	Harika Abburi et.al.	2311.03084v1	null
2023-11-06	CogVLM: Visual Expert for Pretrained Language Models	Weihan Wang et.al.	2311.03079v1	link
2023-11-06	SugarViT -- Multi-objective Regression of UAV Images with Vision Transformers and Deep Label Distribution Learning Demonstrated on Disease Severity Prediction in Sugar Beet	Maurice Günder et.al.	2311.03076v1	null
2023-11-06	Masking Hyperspectral Imaging Data with Pretrained Models	Elias Arbash et.al.	2311.03053v1	null
2023-11-06	The Pursuit of Human Labeling: A New Perspective on Unsupervised Learning	Artyom Gadetsky et.al.	2311.02940v1	link
2023-11-03	Leveraging Large-Scale Pretrained Vision Foundation Models for Label-Efficient 3D Point Cloud Segmentation	Shichao Dong et.al.	2311.01989v1	null
2023-11-03	Generalization of Graph-Based Active Learning Relaxation Strategies Across Materials	Xiaoxiao Wang et.al.	2311.01987v1	null
2023-11-03	The language of prompting: What linguistic properties make a prompt successful?	Alina Leidinger et.al.	2311.01967v1	null
2023-11-03	ForecastPFN: Synthetically-Trained Zero-Shot Forecasting	Samuel Dooley et.al.	2311.01933v1	link
2023-11-03	Towards Concept-Aware Large Language Models	Chen Shani et.al.	2311.01866v1	null
2023-11-03	TCM-GPT: Efficient Pre-training of Large Language Models for Domain Adaptation in Traditional Chinese Medicine	Guoxing Yang et.al.	2311.01786v1	null
2023-11-03	Data-Free Distillation of Language Model by Text-to-Text Transfer	Zheyuan Bai et.al.	2311.01689v1	null
2023-11-02	Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models	Andy Zhou et.al.	2311.01441v1	link
2023-11-02	Recognize Any Regions	Haosen Yang et.al.	2311.01373v1	null
2023-11-02	Collaborative Large Language Model for Recommender Systems	Yaochen Zhu et.al.	2311.01343v1	link
2023-11-02	Terrain-Informed Self-Supervised Learning: Enhancing Building Footprint Extraction from LiDAR Data with Limited Annotations	Anuja Vats et.al.	2311.01188v1	null
2023-11-02	Noise-Robust Fine-Tuning of Pretrained Language Models via External Guidance	Song Wang et.al.	2311.01108v1	null
2023-11-02	Expanding Expressiveness of Diffusion Models with Limited Data via Self-Distillation based Fine-Tuning	Jiwan Hur et.al.	2311.01018v1	null
2023-11-02	VideoDreamer: Customized Multi-Subject Text-to-Video Generation with Disen-Mix Finetuning	Hong Chen et.al.	2311.00990v1	null
2023-11-02	MAAIG: Motion Analysis And Instruction Generation	Wei-Hsin Yeh et.al.	2311.00980v1	null
2023-11-02	Blending Reward Functions via Few Expert Demonstrations for Faithful and Accurate Knowledge-Grounded Dialogue Generation	Wanyu Du et.al.	2311.00953v1	null
2023-11-02	Learning Defect Prediction from Unrealistic Data	Kamel Alrashedy et.al.	2311.00931v1	null
2023-11-01	Crosslingual Retrieval Augmented In-context Learning for Bangla	Xiaoqian Li et.al.	2311.00587v1	null
2023-11-01	MNN: Mixed Nearest-Neighbors for Self-Supervised Learning	Chen Peng et.al.	2311.00562v1	link
2023-11-01	Form follows Function: Text-to-Text Conditional Graph Generation based on Functional Requirements	Peter A. Zachares et.al.	2311.00444v1	null
2023-11-01	An analysis of large speech models-based representations for speech emotion recognition	Adrian Bogdan Stânea et.al.	2311.00394v1	null
2023-11-01	fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for Multi-Subject Brain Activity Decoding	Xuelin Qian et.al.	2311.00342v1	null
2023-11-01	Syntactic Inductive Bias in Transformer Language Models: Especially Helpful for Low-Resource Languages?	Luke Gessler et.al.	2311.00268v1	null
2023-11-01	ChatGPT-Powered Hierarchical Comparisons for Image Classification	Zhiyuan Ren et.al.	2311.00206v1	link
2023-10-31	Object-centric Video Representation for Long-term Action Anticipation	Ce Zhang et.al.	2311.00180v1	link
2023-10-31	ChipNeMo: Domain-Adapted LLMs for Chip Design	Mingjie Liu et.al.	2311.00176v1	null
2023-10-31	Neuroformer: Multimodal and Multitask Generative Pretraining for Brain Data	Antonis Antoniades et.al.	2311.00136v1	null
2023-10-31	Vanishing Gradients in Reinforcement Finetuning of Language Models	Noam Razin et.al.	2310.20703v1	null
2023-10-31	The Unreasonable Effectiveness of Random Target Embeddings for Continuous-Output Neural Machine Translation	Evgeniia Tokarchuk et.al.	2310.20620v1	null
2023-10-31	Increasing The Performance of Cognitively Inspired Data-Efficient Language Models via Implicit Structure Building	Omar Momen et.al.	2310.20589v1	null
2023-10-31	Breaking the Token Barrier: Chunking and Convolution for Efficient Long Text Classification with BERT	Aman Jaiswal et.al.	2310.20558v1	null
2023-10-31	CapsFusion: Rethinking Image-Text Data at Scale	Qiying Yu et.al.	2310.20550v1	null
2023-10-31	HWD: A Novel Evaluation Score for Styled Handwritten Text Generation	Vittorio Pippi et.al.	2310.20316v1	link
2023-10-31	AutoMixer for Improved Multivariate Time-Series Forecasting on BizITOps Data	Santosh Palaskar et.al.	2310.20280v1	null
2023-10-31	DEPN: Detecting and Editing Privacy Neurons in Pretrained Language Models	Xinwei Wu et.al.	2310.20138v1	null
2023-10-31	Improving Prompt Tuning with Learned Prompting Layers	Wei Zhu et.al.	2310.20127v1	null
2023-10-30	GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models	Mianchu Wang et.al.	2310.20025v1	null
2023-10-30	LitCab: Lightweight Calibration of Language Models on Outputs of Varied Lengths	Xin Liu et.al.	2310.19208v1	link
2023-10-29	Deep Audio Analyzer: a Framework to Industrialize the Research on Audio Forensics	Valerio Francesco Puglisi et.al.	2310.19081v1	null
2023-10-29	Prompt-Engineering and Transformer-based Question Generation and Evaluation	Rubaba Amyeen et.al.	2310.18867v1	null
2023-10-28	Rethinking Semi-Supervised Federated Learning: How to co-train fully-labeled and fully-unlabeled client imaging data	Pramit Saha et.al.	2310.18815v1	null
2023-10-28	ProMap: Effective Bilingual Lexicon Induction via Language Model Prompting	Abdellah El Mekki et.al.	2310.18778v1	link
2023-10-28	Integration of persistent Laplacian and pre-trained transformer for protein solubility changes upon mutation	Jiahui Chen et.al.	2310.18760v1	null
2023-10-28	Probing LLMs for Joint Encoding of Linguistic Categories	Giulio Starace et.al.	2310.18696v1	link
2023-10-28	Feature Guided Masked Autoencoder for Self-supervised Learning in Remote Sensing	Yi Wang et.al.	2310.18653v1	link
2023-10-28	Local-Global Self-Supervised Visual Representation Learning	Ali Javidani et.al.	2310.18651v1	link
2023-10-28	Setting the Trap: Capturing and Defeating Backdoors in Pretrained Language Models through Honeypots	Ruixiang Tang et.al.	2310.18633v1	null
2023-10-27	Style Description based Text-to-Speech with Conditional Prosodic Layer Normalization based Diffusion GAN	Neeraj Kumar et.al.	2310.18169v1	null
2023-10-27	Does Role-Playing Chatbots Capture the Character Personalities? Assessing Personality Traits for Role-Playing Chatbots	Xintao Wang et.al.	2310.17976v1	link
2023-10-27	FaultSeg Swin-UNETR: Transformer-Based Self-Supervised Pretraining Model for Fault Recognition	Zeren Zhang et.al.	2310.17974v1	null
2023-10-27	Multivessel Coronary Artery Segmentation and Stenosis Localisation using Ensemble Learning	Muhammad Bilal et.al.	2310.17954v1	null
2023-10-27	Transformers as Graph-to-Graph Models	James Henderson et.al.	2310.17936v1	link
2023-10-27	Grid Jigsaw Representation with CLIP: A New Perspective on Image Clustering	Zijie Song et.al.	2310.17869v1	null
2023-10-26	StyleBART: Decorate Pretrained Model with Style Adapters for Unsupervised Stylistic Headline Generation	Hanqing Wang et.al.	2310.17743v1	null
2023-10-26	Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model	Karsten Roth et.al.	2310.17653v1	null
2023-10-26	InstOptima: Evolutionary Multi-objective Instruction Optimization via Large Language Model-based Instruction Operators	Heng Yang et.al.	2310.17630v1	link
2023-10-26	Proving Test Set Contamination in Black Box Language Models	Yonatan Oren et.al.	2310.17623v1	null
2023-10-26	Lil-Bevo: Explorations of Strategies for Training Language Models in More Humanlike Ways	Venkata S Govindarajan et.al.	2310.17591v1	link
2023-10-26	PAC-tuning:Fine-tuning Pretrained Language Models with PAC-driven Perturbed Gradient Descent	Guangliang Liu et.al.	2310.17588v1	null
2023-10-26	Evaluating Bias and Fairness in Gender-Neutral Pretrained Vision-and-Language Models	Laura Cabello et.al.	2310.17530v1	link
2023-10-26	Dialect Adaptation and Data Augmentation for Low-Resource ASR: TalTech Systems for the MADASR 2023 Challenge	Tanel Alumäe et.al.	2310.17448v1	null
2023-10-26	AntifakePrompt: Prompt-Tuned Vision-Language Models are Fake Image Detectors	You-Ming Chang et.al.	2310.17419v1	null
2023-10-26	CADS: Unleashing the Diversity of Diffusion Models through Condition-Annealed Sampling	Seyedmorteza Sadat et.al.	2310.17347v1	null
2023-10-26	Prototypical Contrastive Learning-based CLIP Fine-tuning for Object Re-identification	Jiachen Li et.al.	2310.17218v1	null
2023-10-25	Proposal-Contrastive Pretraining for Object Detection from Fewer Data	Quentin Bouniot et.al.	2310.16835v1	null
2023-10-25	Exploring OCR Capabilities of GPT-4V(ision) : A Quantitative and In-depth Evaluation	Yongxin Shi et.al.	2310.16809v1	link
2023-10-25	Detecting Pretraining Data from Large Language Models	Weijia Shi et.al.	2310.16789v1	null
2023-10-25	BabyStories: Can Reinforcement Learning Teach Baby Language Models to Write Better Stories?	Xingmeng Zhao et.al.	2310.16681v1	link
2023-10-25	Learning to Explain: A Model-Agnostic Framework for Explaining Black Box Models	Oren Barkan et.al.	2310.16584v1	link
2023-10-25	The Distributional Hypothesis Does Not Fully Explain the Benefits of Masked Language Model Pretraining	Ting-Rui Chiang et.al.	2310.16261v1	null
2023-10-24	Octopus: A Multitask Model and Toolkit for Arabic Natural Language Generation	AbdelRahim Elmadany et.al.	2310.16127v1	null
2023-10-24	Locally Differentially Private Document Generation Using Zero Shot Prompting	Saiteja Utpala et.al.	2310.16111v1	null
2023-10-24	A Unified, Scalable Framework for Neural Population Decoding	Mehdi Azabou et.al.	2310.16046v1	null
2023-10-24	Finetuning Offline World Models in the Real World	Yunhai Feng et.al.	2310.16029v1	null
2023-10-24	Characterizing Mechanisms for Factual Recall in Language Models	Qinan Yu et.al.	2310.15910v1	null
2023-10-24	Do Stochastic Parrots have Feelings Too? Improving Neural Detection of Synthetic Text via Emotion Recognition	Alan Cowap et.al.	2310.15904v1	link
2023-10-24	Automatic Aorta Segmentation with Heavily Augmented, High-Resolution 3-D ResUNet: Contribution to the SEG.A Challenge	Marek Wodzinski et.al.	2310.15827v1	null
2023-10-24	Discriminator Guidance for Autoregressive Diffusion Models	Filip Ekström Kelvinius et.al.	2310.15817v1	null
2023-10-24	Improving generalization in large language models by learning prefix subspaces	Louis Falissard et.al.	2310.15793v1	null
2023-10-24	Mean Teacher DETR with Masked Feature Alignment: A Robust Domain Adaptive Detection Transformer Framework	Weixi Weng et.al.	2310.15646v1	null
2023-10-24	Unveiling Multilinguality in Transformer Models: Exploring Language Specificity in Feed-Forward Networks	Sunit Bhattacharya et.al.	2310.15552v1	null
2023-10-24	Let the Pretrained Language Models "Imagine" for Short Texts Topic Modeling	Pritom Saha Akash et.al.	2310.15420v1	null
2023-10-23	FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling	Haonan Qiu et.al.	2310.15169v1	null
2023-10-23	Novel-View Acoustic Synthesis from 3D Reconstructed Rooms	Byeongjoo Ahn et.al.	2310.15130v1	link
2023-10-23	Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model	Ruoxi Shi et.al.	2310.15110v1	link
2023-10-23	E4S: Fine-grained Face Swapping via Editing With Regional GAN Inversion	Maomao Li et.al.	2310.15081v1	link
2023-10-23	SLOG: A Structural Generalization Benchmark for Semantic Parsing	Bingzhi Li et.al.	2310.15040v1	null
2023-10-23	Fast 2D Bicephalous Convolutional Autoencoder for Compressing 3D Time Projection Chamber Data	Yi Huang et.al.	2310.15026v1	null
2023-10-23	Once Upon a $\textit{Time}$ in $\textit{Graph}$: Relative-Time Pretraining for Complex Temporal Reasoning	Sen Yang et.al.	2310.14709v1	null
2023-10-23	Extending Input Contexts of Language Models through Training on Segmented Sequences	Petros Karypis et.al.	2310.14633v1	null
2023-10-23	Generative Pre-trained Transformer for Vietnamese Community-based COVID-19 Question Answering	Tam Minh Vo et.al.	2310.14602v1	null
2023-10-23	The Skipped Beat: A Study of Sociopragmatic Understanding in LLMs for 64 Languages	Chiyu Zhang et.al.	2310.14557v1	null
2023-10-20	Technical Report for ICCV 2023 Visual Continual Learning Challenge: Continuous Test-time Adaptation for Semantic Segmentation	Damian Sójka et.al.	2310.13533v1	null
2023-10-20	Cache me if you Can: an Online Cost-aware Teacher-Student framework to Reduce the Calls to Large Language Models	Ilias Stogiannidis et.al.	2310.13395v1	null
2023-10-20	SILC: Improving Vision Language Pretraining with Self-Distillation	Muhammad Ferjad Naeem et.al.	2310.13355v1	null
2023-10-20	Exploring the Impact of Corpus Diversity on Financial Pretrained Language Models	Jaeyoung Choe et.al.	2310.13312v1	null
2023-10-20	Unified Pretraining for Recommendation via Task Hypergraphs	Mingdai Yang et.al.	2310.13286v1	link
2023-10-20	DPM-Solver-v3: Improved Diffusion ODE Solver with Empirical Model Statistics	Kaiwen Zheng et.al.	2310.13268v1	link
2023-10-20	On the Language Encoder of Contrastive Cross-modal Models	Mengjie Zhao et.al.	2310.13267v1	null
2023-10-20	Anomaly Detection of Command Shell Sessions based on DistilBERT: Unsupervised and Supervised Approaches	Zefang Liu et.al.	2310.13247v1	null
2023-10-19	A Car Model Identification System for Streamlining the Automobile Sales Process	Said Togru et.al.	2310.13198v1	null
2023-10-19	Do Language Models Learn about Legal Entity Types during Pretraining?	Claire Barale et.al.	2310.13092v1	link
2023-10-19	A Predictive Factor Analysis of Social Biases and Task-Performance in Pretrained Masked Language Models	Yi Zhou et.al.	2310.12936v1	null
2023-10-19	Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning	Juan Rocamonde et.al.	2310.12921v1	null
2023-10-19	A Systematic Study of Performance Disparities in Multilingual Task-Oriented Dialogue Systems	Songbo Hu et.al.	2310.12892v1	null
2023-10-19	Predicting Ovarian Cancer Treatment Response in Histopathology using Hierarchical Vision Transformers and Multiple Instance Learning	Jack Breen et.al.	2310.12866v1	link
2023-10-19	Survival of the Most Influential Prompts: Efficient Black-Box Prompt Search via Clustering and Pruning	Han Zhou et.al.	2310.12774v1	link
2023-10-19	Query-aware Long Video Localization and Relation Discrimination for Deep Video Understanding	Yuanxing Xu et.al.	2310.12724v1	null
2023-10-19	Reliable and Efficient In-Memory Fault Tolerance of Large Language Model Pretraining	Yuxin Wang et.al.	2310.12670v1	null
2023-10-19	Pretraining Language Models with Text-Attributed Heterogeneous Graphs	Tao Zou et.al.	2310.12580v1	link
2023-10-19	Not All Countries Celebrate Thanksgiving: On the Cultural Dominance in Large Language Models	Wenxuan Wang et.al.	2310.12481v1	null
2023-10-19	Enhancing High-Resolution 3D Generation through Pixel-wise Gradient Clipping	Zijie Pan et.al.	2310.12474v1	link
2023-10-18	Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture	Daniel Y. Fu et.al.	2310.12109v1	null
2023-10-18	Evaluating the Fairness of Discriminative Foundation Models in Computer Vision	Junaid Ali et.al.	2310.11867v1	null
2023-10-18	Masked Pretraining for Multi-Agent Decision Making	Jie Liu et.al.	2310.11846v1	null
2023-10-18	Subject-specific Deep Neural Networks for Count Data with High-cardinality Categorical Features	Hangbin Lee et.al.	2310.11654v1	null
2023-10-18	Systematic Assessment of Factual Knowledge in Large Language Models	Linhao Luo et.al.	2310.11638v1	null
2023-10-17	GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment	Dhruba Ghosh et.al.	2310.11513v1	link
2023-10-17	Hybrid quantum-classical graph neural networks for tumor classification in digital pathology	Anupama Ray et.al.	2310.11353v1	null
2023-10-17	Elucidating The Design Space of Classifier-Guided Diffusion Generation	Jiajun Ma et.al.	2310.11311v1	null
2023-10-17	Utilizing Weak Supervision To Generate Indonesian Conservation Dataset	Mega Fransiska et.al.	2310.11258v1	null
2023-10-17	Query2Triple: Unified Query Encoding for Answering Diverse Complex Queries over Knowledge Graphs	Yao Xu et.al.	2310.11246v1	link
2023-10-17	Leveraging Content-based Features from Multiple Acoustic Models for Singing Voice Conversion	Xueyao Zhang et.al.	2310.11160v1	null
2023-10-17	MeKB-Rec: Personal Knowledge Graph Learning for Cross-Domain Recommendation	Xin Su et.al.	2310.11088v1	null
2023-10-17	Domain Generalization Using Large Pretrained Models with Mixture-of-Adapters	Gyuseong Lee et.al.	2310.11031v1	null
2023-10-16	SD-HuBERT: Self-Distillation Induces Syllabic Organization in HuBERT	Cheol Jun Cho et.al.	2310.10803v1	null
2023-10-16	LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation	Ruiqi Wu et.al.	2310.10769v1	null
2023-10-16	BiomedJourney: Counterfactual Biomedical Image Generation by Instruction-Learning from Multimodal Patient Journeys	Yu Gu et.al.	2310.10765v1	null
2023-10-16	Interactive Task Planning with Language Models	Boyi Li et.al.	2310.10645v1	null
2023-10-16	Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models	Kevin Black et.al.	2310.10639v1	null
2023-10-16	In-Context Pretraining: Language Modeling Beyond Document Boundaries	Weijia Shi et.al.	2310.10638v1	null
2023-10-16	Llemma: An Open Language Model For Mathematics	Zhangir Azerbayev et.al.	2310.10631v1	link
2023-10-16	Video Language Planning	Yilun Du et.al.	2310.10625v1	null
2023-10-16	One For All & All For One: Bypassing Hyperparameter Tuning with Model Averaging For Cross-Lingual Transfer	Fabian David Schmidt et.al.	2310.10532v1	link
2023-10-16	Unifying Image Processing as Visual Prompting Question Answering	Yihao Liu et.al.	2310.10513v1	null
2023-10-16	Can Word Sense Distribution Detect Semantic Changes of Words?	Xiaohang Tang et.al.	2310.10400v1	link
2023-10-16	$\textit{Swap and Predict}$ -- Predicting the Semantic Changes in Words across Corpora by Context Swapping	Taichi Aida et.al.	2310.10397v1	link
2023-10-16	Cross-Lingual Consistency of Factual Knowledge in Multilingual Language Models	Jirui Qi et.al.	2310.10378v1	link
2023-10-13	PromptRE: Weakly-Supervised Document-Level Relation Extraction via Prompting-Based Data Programming	Chufan Gao et.al.	2310.09265v1	null
2023-10-13	Hypernymy Understanding Evaluation of Text-to-Image Models via WordNet Hierarchy	Anton Baryshnikov et.al.	2310.09247v1	link
2023-10-13	ClickPrompt: CTR Models are Strong Prompt Generators for Adapting Language Models to CTR Prediction	Jianghao Lin et.al.	2310.09234v1	null
2023-10-13	PaLI-3 Vision Language Models: Smaller, Faster, Stronger	Xi Chen et.al.	2310.09199v1	null
2023-10-13	Jointly-Learned Exit and Inference for a Dynamic Neural Network : JEI-DNN	Florence Regol et.al.	2310.09163v1	null
2023-10-13	UniParser: Multi-Human Parsing with Unified Correlation Representation Learning	Jiaming Chu et.al.	2310.08984v1	link
2023-10-13	Exploration with Principles for Diverse AI Supervision	Hao Liu et.al.	2310.08899v1	null
2023-10-13	Open X-Embodiment: Robotic Learning Datasets and RT-X Models	Abhishek Padalkar et.al.	2310.08864v1	null
2023-10-13	Speaking rate attention-based duration prediction for speed control TTS	Jesuraj Bandekar et.al.	2310.08846v1	null
2023-10-13	From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models	Dongsheng Jiang et.al.	2310.08825v1	link
2023-10-12	Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video	Shashanka Venkataramanan et.al.	2310.08584v1	null
2023-10-12	Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining	Licong Lin et.al.	2310.08566v1	null
2023-10-12	Do pretrained Transformers Really Learn In-context by Gradient Descent?	Lingfeng Shen et.al.	2310.08540v1	null
2023-10-12	"SegLoc": Study on Novel Visual Self-supervised Learning Scheme (Segment Localization) Tailored for Dense Prediction Tasks of Security Inspection X-ray Images	Shervin Halat et.al.	2310.08421v1	null
2023-10-12	How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?	Jingfeng Wu et.al.	2310.08391v1	null
2023-10-12	Improving Factual Consistency for Knowledge-Grounded Dialogue Systems via Knowledge Enhancement and Alignment	Boyang Xue et.al.	2310.08372v1	null
2023-10-12	GePSAn: Generative Procedure Step Anticipation in Cooking Videos	Mohamed Ashraf Abdelsalam et.al.	2310.08312v1	null
2023-10-12	CHIP: Contrastive Hierarchical Image Pretraining	Arpit Mittal et.al.	2310.08304v1	null
2023-10-12	Expanding the Vocabulary of BERT for Knowledge Base Construction	Dong Yang et.al.	2310.08291v1	link
2023-10-12	Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting	Zijie Chen et.al.	2310.08129v1	null
2023-10-11	InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining	Boxin Wang et.al.	2310.07713v1	null
2023-10-12	Rethinking the BERT-like Pretraining for DNA Sequences	Chaoqi Liang et.al.	2310.07644v2	null
2023-10-12	Multimodal Graph Learning for Generative Tasks	Minji Yoon et.al.	2310.07478v2	link
2023-10-12	NuTime: Numerically Multi-Scaled Embedding for Large-Scale Time Series Pretraining	Chenguo Lin et.al.	2310.07402v2	null
2023-10-11	CLIP for Lightweight Semantic Segmentation	Ke Jin et.al.	2310.07394v1	null
2023-10-11	Beyond Memorization: Violating Privacy Via Inference with Large Language Models	Robin Staab et.al.	2310.07298v1	null
2023-10-12	Score Regularized Policy Optimization through Diffusion Behavior	Huayu Chen et.al.	2310.07297v2	link
2023-10-11	IBoxCLA: Towards Robust Box-supervised Segmentation of Polyp via Improved Box-dice and Contrastive Latent-anchors	Zhiwei Wang et.al.	2310.07248v1	null
2023-10-11	Crowd Counting in Harsh Weather using Image Denoising with Pix2Pix GANs	Muhammad Asif Khan et.al.	2310.07245v1	null
2023-10-11	Self-supervised Pocket Pretraining via Protein Fragment-Surroundings Alignment	Bowen Gao et.al.	2310.07229v1	null
2023-10-10	OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text	Keiran Paster et.al.	2310.06786v1	null
2023-10-10	Uni3D: Exploring Unified 3D Representation at Scale	Junsheng Zhou et.al.	2310.06773v1	link
2023-10-10	Tweedie Moment Projected Diffusions For Inverse Problems	Benjamin Boys et.al.	2310.06721v1	null
2023-10-10	Learning Multiplex Embeddings on Text-rich Networks with One Text Encoder	Bowen Jin et.al.	2310.06684v1	null
2023-10-10	Self-Supervised Representation Learning for Online Handwriting Text Classification	Pouya Mehralian et.al.	2310.06645v1	null
2023-10-10	SpikeCLIP: A Contrastive Language-Image Pretrained Spiking Neural Network	Tianlong Li et.al.	2310.06488v1	null
2023-10-10	CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model	Peng Di et.al.	2310.06266v1	null
2023-10-10	Model Tuning or Prompt Tuning? A Study of Large Language Models for Clinical Concept and Relation Extraction	Cheng Peng et.al.	2310.06239v1	null
2023-10-10	Domain Expansion via Network Adaptation for Solving Inverse Problems	Nebiyou Yismaw et.al.	2310.06235v1	null
2023-10-10	GeoLLM: Extracting Geospatial Knowledge from Large Language Models	Rohin Manvi et.al.	2310.06213v1	null
2023-10-09	TAIL: Task-specific Adapters for Imitation Learning with Large Pretrained Models	Zuxin Liu et.al.	2310.05905v1	null
2023-10-09	Planning to Go Out-of-Distribution in Offline-to-Online Reinforcement Learning	Trevor McInroe et.al.	2310.05723v1	null
2023-10-09	A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics	Kai He et.al.	2310.05694v1	link
2023-10-09	No Token Left Behind: Efficient Vision Transformer via Dynamic Token Idling	Xuwei Xu et.al.	2310.05654v1	null
2023-10-09	UAVs and Neural Networks for search and rescue missions	Hartmut Surmann et.al.	2310.05512v1	null
2023-10-09	Sentence-level Prompts Benefit Composed Image Retrieval	Yang Bai et.al.	2310.05473v1	link
2023-10-09	Augmented Embeddings for Custom Retrievals	Anirudh Khatry et.al.	2310.05380v1	null
2023-10-08	Visual Storytelling with Question-Answer Plans	Danyang Liu et.al.	2310.05295v1	null
2023-10-08	Do Large Language Models Know about Facts?	Xuming Hu et.al.	2310.05177v1	null
2023-10-08	UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model	Jiabo Ye et.al.	2310.05126v1	link
2023-10-06	Transferring speech-generic and depression-specific knowledge for Alzheimer's disease detection	Ziyun Cui et.al.	2310.04358v1	null
2023-10-06	A Comprehensive Evaluation of Large Language Models on Benchmark Biomedical Text Processing Tasks	Israt Jahan et.al.	2310.04270v1	null
2023-10-06	Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement Learning	Yinda Chen et.al.	2310.04148v1	link
2023-10-06	Robust Multimodal Learning with Missing Modalities via Parameter-Efficient Adaptation	Md Kaykobad Reza et.al.	2310.03986v1	null
2023-10-05	Hard View Selection for Contrastive Learning	Fabio Ferreira et.al.	2310.03940v1	null
2023-10-05	Bridging Low-level Geometry to High-level Concepts in Visual Servoing of Robot Manipulation Task Using Event Knowledge Graphs and Vision-Language Models	Chen Jiang et.al.	2310.03932v1	null
2023-10-05	Less is More: On the Feature Redundancy of Pretrained Models When Transferring to Few-shot Tasks	Xu Luo et.al.	2310.03843v1	null
2023-10-05	PrIeD-KIE: Towards Privacy Preserved Document Key Information Extraction	Saifullah Saifullah et.al.	2310.03777v1	null
2023-10-05	Stylist: Style-Driven Feature Ranking for Robust Novelty Detection	Stefan Smeu et.al.	2310.03738v1	null
2023-10-05	Tik-to-Tok: Translating Language Models One Token at a Time: An Embedding Initialization Strategy for Efficient Language Adaptation	François Remy et.al.	2310.03477v1	null
2023-10-05	FreeReg: Image-to-Point Cloud Registration Leveraging Pretrained Diffusion Models and Monocular Depth Estimators	Haiping Wang et.al.	2310.03420v1	null
2023-10-05	Procedural Text Mining with Large Language Models	Anisa Rula et.al.	2310.03376v1	link
2023-10-05	Benchmarking Large Language Models As AI Research Agents	Qian Huang et.al.	2310.03302v1	link
2023-10-05	SimVLG: Simple and Efficient Pretraining of Visual Language Generative Models	Yiren Jian et.al.	2310.03291v1	null
2023-10-05	Fragment-based Pretraining and Finetuning on Molecular Graphs	Kha-Dinh Luong et.al.	2310.03274v1	null
2023-10-04	On the Performance of Multimodal Language Models	Utsav Garg et.al.	2310.03211v1	null
2023-10-04	Enhancing Accuracy in Deep Learning Using Random Matrix Theory	Leonid Berlyand et.al.	2310.03165v1	null
2023-10-04	OpenMM 8: Molecular Dynamics Simulation with Machine Learning Potentials	Peter Eastman et.al.	2310.03121v1	null
2023-10-04	Retrieval meets Long Context Large Language Models	Peng Xu et.al.	2310.03025v1	null
2023-10-04	AstroCLIP: Cross-Modal Pre-Training for Astronomical Foundation Models	Francois Lanusse et.al.	2310.03024v1	null
2023-10-04	Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions	Satwik Bhattamishra et.al.	2310.03016v1	null
2023-10-04	Multiple Physics Pretraining for Physical Surrogate Models	Michael McCabe et.al.	2310.02994v1	null
2023-10-04	Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors	Ido Amos et.al.	2310.02980v1	null
2023-10-04	T$^3$Bench: Benchmarking Current Progress in Text-to-3D Generation	Yuze He et.al.	2310.02977v1	null
2023-10-04	Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task Adaptation	Chen Dun et.al.	2310.02842v1	null
2023-10-03	Implicit regularization of multi-task learning and finetuning in overparameterized neural networks	Jack W. Lindsey et.al.	2310.02396v1	null
2023-10-04	Who's Harry Potter? Approximate Unlearning in LLMs	Ronen Eldan et.al.	2310.02238v2	null
2023-10-03	Think before you speak: Training Language Models With Pause Tokens	Sachin Goyal et.al.	2310.02226v1	null
2023-10-03	SIEVE: Multimodal Dataset Pruning Using Image Captioning Models	Anas Mahmoud et.al.	2310.02110v1	null
2023-10-03	Understanding Masked Autoencoders From a Local Contrastive Perspective	Xiaoyu Yue et.al.	2310.01994v1	null
2023-10-03	Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving	Long Chen et.al.	2310.01957v1	link
2023-10-03	MFOS: Model-Free & One-Shot Object Pose Estimation	JongMin Lee et.al.	2310.01897v1	null
2023-10-04	LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment	Bin Zhu et.al.	2310.01852v2	link
2023-10-03	MIMO-NeRF: Fast Neural Rendering with Multi-input Multi-output Neural Radiance Fields	Takuhiro Kaneko et.al.	2310.01821v1	null
2023-10-03	SEA: Sparse Linear Attention with Estimated Attention Mask	Heejun Lee et.al.	2310.01777v1	null
2023-10-03	Backdiff: a diffusion model for generalized transferable protein backmapping	Yikai Liu et.al.	2310.01768v1	null
2023-10-02	L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models	Ansong Ni et.al.	2309.17446v2	null
2023-09-29	Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks	Vaidehi Patil et.al.	2309.17410v1	link
2023-09-29	Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency	Zhihan Liu et.al.	2309.17382v1	null
2023-09-29	Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation	Shih-Lun Wu et.al.	2309.17352v1	null
2023-09-29	Scaling Experiments in Self-Supervised Cross-Table Representation Learning	Maximilian Schambach et.al.	2309.17339v1	null
2023-09-29	Consistent123: One Image to Highly Consistent 3D Asset Using Case-Aware Diffusion Priors	Yukang Lin et.al.	2309.17261v1	null
2023-09-29	Glioma subtype classification from histopathological images using in-domain and out-of-domain transfer learning: An experimental study	Vladimir Despotovic et.al.	2309.17223v1	null
2023-09-29	Reconstruction of Patient-Specific Confounders in AI-based Radiologic Image Interpretation using Generative Pretraining	Tianyu Han et.al.	2309.17123v1	link
2023-09-28	Qwen Technical Report	Jinze Bai et.al.	2309.16609v1	link
2023-09-28	Tensor Factorization for Leveraging Cross-Modal Knowledge in Data-Constrained Infrared Object Detection	Manish Sharma et.al.	2309.16592v1	null
2023-09-28	Universal Sleep Decoder: Aligning awake and sleep neural representation across subjects	Hui Zheng et.al.	2309.16457v1	null
2023-09-28	Predicting performance difficulty from piano sheet music images	Pedro Ramoneda et.al.	2309.16287v1	null
2023-09-28	Hierarchical Cross-Modality Knowledge Transfer with Sinkhorn Attention for CTC-based ASR	Xugang Lu et.al.	2309.16093v1	null
2023-09-27	Effective Long-Context Scaling of Foundation Models	Wenhan Xiong et.al.	2309.16039v1	null
2023-09-27	Graph-level Representation Learning with Joint-Embedding Predictive Architectures	Geri Skenderi et.al.	2309.16014v1	null
2023-09-27	Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts	Deniz Engin et.al.	2309.15915v1	null
2023-09-27	One For All: Video Conversation is Feasible Without Video Instruction Tuning	Ruyang Liu et.al.	2309.15785v1	null
2023-09-27	Question answering using deep learning in low resource Indian language Marathi	Dhiraj Amin et.al.	2309.15779v1	null
2023-09-27	ChatGPT-BCI: Word-Level Neural State Classification Using GPT, EEG, and Eye-Tracking Biomarkers in Semantic Inference Reading Comprehension	Yuhong Zhang et.al.	2309.15714v1	null
2023-09-27	Jointly Training Large Autoregressive Multimodal Models	Emanuele Aiello et.al.	2309.15564v1	null
2023-09-27	High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models	Chunyu Qiang et.al.	2309.15512v1	null
2023-09-27	DreamCom: Finetuning Text-guided Inpainting Model for Image Composition	Lingxiao Lu et.al.	2309.15508v1	null
2023-09-27	VideoAdviser: Video Knowledge Distillation for Multimodal Transfer Learning	Yanan Wang et.al.	2309.15494v1	null
2023-09-27	Tackling VQA with Pretrained Foundation Models without Further Training	Alvin De Jun Tan et.al.	2309.15487v1	null
2023-09-27	Towards Foundation Models Learned from Anatomy in Medical Imaging via Self-Supervision	Mohammad Reza Hosseinzadeh Taher et.al.	2309.15358v1	null
2023-09-26	SEPT: Towards Efficient Scene Representation Learning for Motion Prediction	Zhiqian Lan et.al.	2309.15289v1	null
2023-09-26	Language-EXtended Indoor SLAM (LEXIS): A Versatile System for Real-time Visual Scene Understanding	Christina Kassab et.al.	2309.15065v1	null
2023-09-25	When Automated Assessment Meets Automated Content Generation: Examining Text Quality in the Era of GPTs	Marialena Bevilacqua et.al.	2309.14488v1	null
2023-09-25	SINCERE: Supervised Information Noise-Contrastive Estimation REvisited	Patrick Feeney et.al.	2309.14277v1	link
2023-09-26	Species196: A One-Million Semi-supervised Dataset for Fine-grained Species Recognition	Wei He et.al.	2309.14183v2	null
2023-09-25	VidChapters-7M: Video Chapters at Scale	Antoine Yang et.al.	2309.13952v1	link
2023-09-25	TouchUp-G: Improving Feature Representation through Graph-Centric Finetuning	Jing Zhu et.al.	2309.13885v1	null
2023-09-24	Accelerating Large Batch Training via Gradient Signal to Noise Ratio (GSNR)	Guo-qing Jiang et.al.	2309.13681v1	null
2023-09-24	VoiceLDM: Text-to-Speech with Environmental Context	Yeonghyeon Lee et.al.	2309.13664v1	null
2023-09-24	Cross-modal Alignment with Optimal Transport for CTC-based ASR	Xugang Lu et.al.	2309.13650v1	null
2023-09-24	Robust data driven discovery of a seismic wave equation	Shijun Cheng et.al.	2309.13645v1	null
2023-09-24	Towards Robust Robot 3D Perception in Urban Environments: The UT Campus Object Dataset	Arthur Zhang et.al.	2309.13549v1	null
2023-09-24	InSpaceType: Reconsider Space Type in Indoor Monocular Depth Estimation	Cho-Ying Wu et.al.	2309.13516v1	null
2023-09-22	A matter of attitude: Focusing on positive and active gradients to boost saliency maps	Oscar Llorente et.al.	2309.12913v1	link
2023-09-22	SRFNet: Monocular Depth Estimation with Fine-grained Structure via Spatial Reliability-oriented Fusion of Frames and Events	Tianbo Pan et.al.	2309.12842v1	null
2023-09-22	Synthetic Boost: Leveraging Synthetic Data for Enhanced Vision-Language Segmentation in Echocardiography	Rabin Adhikari et.al.	2309.12829v1	link
2023-09-22	Unsupervised Representations Improve Supervised Learning in Speech Emotion Recognition	Amirali Soltani Tehrani et.al.	2309.12714v1	null
2023-09-21	Studying and improving reasoning in humans and machines	Nicolas Yax et.al.	2309.12485v1	null
2023-09-21	Environment-biased Feature Ranking for Novelty Detection Robustness	Stefan Smeu et.al.	2309.12301v1	null
2023-09-21	Weakly-supervised Automated Audio Captioning via text only training	Theodoros Kouzelis et.al.	2309.12242v1	link
2023-09-21	Exploiting CLIP-based Multi-modal Approach for Artwork Classification and Retrieval	Alberto Baldrati et.al.	2309.12110v1	null
2023-09-21	Accelerating Thematic Investment with Prompt Tuned Pretrained Language Models	Valentin Leonhard Buchner et.al.	2309.12075v1	null
2023-09-21	BELT:Bootstrapping Electroencephalography-to-Language Decoding and Zero-Shot Sentiment Classification by Natural Language Supervision	Jinzhao Zhou et.al.	2309.12056v1	null
2023-09-21	Beyond Image Borders: Learning Feature Extrapolation for Unbounded Image Composition	Xiaoyu Liu et.al.	2309.12042v1	link
2023-09-21	DEYOv3: DETR with YOLO for Real-time Object Detection	Haodong Ouyang et.al.	2309.11851v1	null
2023-09-21	Evaluating Large Language Models for Document-grounded Response Generation in Information-Seeking Dialogues	Norbert Braunschweiler et.al.	2309.11838v1	null
2023-09-21	Multimodal Transformers for Wireless Communications: A Case Study in Beam Prediction	Yu Tian et.al.	2309.11811v1	link
2023-09-21	SLHCat: Mapping Wikipedia Categories and Lists to DBpedia by Leveraging Semantic, Lexical, and Hierarchical Features	Zhaoyi Wang et.al.	2309.11791v1	null
2023-09-20	Galaxy Zoo DESI: Detailed Morphology Measurements for 8.7M Galaxies in the DESI Legacy Imaging Surveys	Mike Walmsley et.al.	2309.11425v1	link
2023-09-20	GECTurk: Grammatical Error Correction and Detection Dataset for Turkish	Atakan Kara et.al.	2309.11346v1	link
2023-09-21	Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism	Chengcheng Wang et.al.	2309.11331v2	link
2023-09-20	Uncovering the effects of model initialization on deep model generalization: A study with adult and pediatric Chest X-ray images	Sivaramakrishnan Rajaraman et.al.	2309.11318v1	null
2023-09-20	Using Artificial Intelligence for the Automation of Knitting Patterns	Uduak Uboh et.al.	2309.11202v1	null
2023-09-20	Assessment of Pre-Trained Models Across Languages and Grammars	Alberto Muñoz-Ortiz et.al.	2309.11165v1	link
2023-09-20	Hyperspectral Benchmark: Bridging the Gap between HSI Applications through Comprehensive Dataset and Pretraining	Hannah Frank et.al.	2309.11122v1	link
2023-09-20	Visual Question Answering in the Medical Domain	Louisa Canepa et.al.	2309.11080v1	null
2023-09-20	Weak Supervision for Label Efficient Visual Bug Detection	Farrukh Rahman et.al.	2309.11077v1	null
2023-09-20	3D-U-SAM Network For Few-shot Tooth Segmentation in CBCT Images	Yifu Zhang et.al.	2309.11015v1	null
2023-09-19	Motif-Centric Representation Learning for Symbolic Music	Yuxuan Wu et.al.	2309.10597v1	null
2023-09-19	A Neighbourhood-Aware Differential Privacy Mechanism for Static Word Embeddings	Danushka Bollegala et.al.	2309.10551v1	null
2023-09-19	OpenMSD: Towards Multilingual Scientific Documents Similarity Measurement	Yang Gao et.al.	2309.10539v1	link
2023-09-19	FoleyGen: Visually-Guided Audio Generation	Xinhao Mei et.al.	2309.10537v1	null
2023-09-19	Improving CLIP Robustness with Knowledge Distillation and Self-Training	Clement Laroudie et.al.	2309.10361v1	null
2023-09-19	KoBigBird-large: Transformation of Transformer for Korean Language Understanding	Kisu Yang et.al.	2309.10339v1	null
2023-09-19	Mixed-Distil-BERT: Code-mixed Language Modeling for Bangla, English, and Hindi	Md Nishat Raihan et.al.	2309.10272v1	null
2023-09-18	Generative modeling, design and analysis of spider silk protein sequences for enhanced mechanical properties	Wei Lu et.al.	2309.10170v1	null
2023-09-18	Understanding Catastrophic Forgetting in Language Models via Implicit Inference	Suhas Kotha et.al.	2309.10105v1	link
2023-09-18	Plug in the Safety Chip: Enforcing Constraints for LLM-driven Robot Agents	Ziyi Yang et.al.	2309.09919v1	null
2023-09-19	Harnessing Collective Intelligence Under a Lack of Cultural Consensus	Necdet Gürkan et.al.	2309.09787v2	null
2023-09-18	DGM-DR: Domain Generalization with Mutual Information Regularized Diabetic Retinopathy Classification	Aleksandr Matsun et.al.	2309.09670v1	null
2023-09-18	DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation	Bowen Yin et.al.	2309.09668v1	link
2023-09-18	Electrolaryngeal Speech Intelligibility Enhancement Through Robust Linguistic Encoders	Lester Phillip Violeta et.al.	2309.09627v1	null
2023-09-18	PromptST: Prompt-Enhanced Spatio-Temporal Multi-Attribute Prediction	Zijian Zhang et.al.	2309.09500v1	null
2023-09-18	Self-supervised TransUNet for Ultrasound regional segmentation of the distal radius in children	Yuyue Zhou et.al.	2309.09490v1	null
2023-09-18	Face-Driven Zero-Shot Voice Conversion with Memory-based Face-Voice Alignment	Zheng-Yan Sheng et.al.	2309.09470v1	null
2023-09-18	Investigating Zero- and Few-shot Generalization in Fact Verification	Liangming Pan et.al.	2309.09444v1	link
2023-09-18	Unified Pretraining Target Based Video-music Retrieval With Music Rhythm And Video Optical Flow Information	Tianjun Mao et.al.	2309.09421v1	null
2023-09-15	How Transferable are Attribute Controllers on Pretrained Multilingual Translation Models?	Danni Liu et.al.	2309.08565v1	link
2023-09-15	Breathing New Life into 3D Assets with Generative Repainting	Tianfu Wang et.al.	2309.08523v1	link
2023-09-15	Scaling Laws for Sparsely-Connected Foundation Models	Elias Frantar et.al.	2309.08520v1	null
2023-09-15	Audio-free Prompt Tuning for Language-Audio Models	Yiming Li et.al.	2309.08357v1	null
2023-09-15	Headless Language Models: Learning without Predicting with Contrastive Weight Tying	Nathan Godey et.al.	2309.08351v1	null
2023-09-15	Leveraging the Power of Data Augmentation for Transformer-based Tracking	Jie Zhao et.al.	2309.08264v1	null
2023-09-15	BROW: Better featuRes fOr Whole slide image based on self-distillation	Yuanfeng Wu et.al.	2309.08259v1	null
2023-09-15	Fine-tune the pretrained ATST model for sound event detection	Nian Shao et.al.	2309.08153v1	link
2023-09-15	Multi-Scale Estimation for Omni-Directional Saliency Maps Using Learnable Equator Bias	Takao Yamanaka et.al.	2309.08139v1	link
2023-09-15	AnyOKP: One-Shot and Instance-Aware Object Keypoint Extraction with Pretrained ViT	Fangbo Qin et.al.	2309.08134v1	null
2023-09-14	Physically Plausible Full-Body Hand-Object Interaction Synthesis	Jona Braun et.al.	2309.07907v1	null
2023-09-15	Virchow: A Million-Slide Digital Pathology Foundation Model	Eugene Vorontsov et.al.	2309.07778v2	null
2023-09-14	PerPLM: Personalized Fine-tuning of Pretrained Language Models via Writer-specific Intermediate Learning and Prompts	Daisuke Oba et.al.	2309.07727v1	null
2023-09-14	L1-aware Multilingual Mispronunciation Detection Framework	Yassine El Kheir et.al.	2309.07719v1	null
2023-09-14	NutritionVerse: Empirical Study of Various Dietary Intake Estimation Approaches	Chi-en Amy Tai et.al.	2309.07704v1	null
2023-09-14	SwitchGPT: Adapting Large Language Models for Non-Text Outputs	Xinyu Wang et.al.	2309.07623v1	null
2023-09-14	VerilogEval: Evaluating Large Language Models for Verilog Code Generation	Mingjie Liu et.al.	2309.07544v1	null
2023-09-14	DePT: Decoupled Prompt Tuning	Ji Zhang et.al.	2309.07439v1	link
2023-09-14	Nucleus-aware Self-supervised Pretraining Using Unpaired Image-to-image Translation for Histopathology Images	Zhiyun Song et.al.	2309.07394v1	link
2023-09-14	Training Audio Captioning Models without Audio	Soham Deshmukh et.al.	2309.07372v1	link
2023-09-13	TransNet: A Transfer Learning-Based Network for Human Action Recognition	K. Alomar et.al.	2309.06951v1	null
2023-09-13	Enhancing the Performance of Multi-Agent Reinforcement Learning for Controlling HVAC Systems	Daniel Bayer et.al.	2309.06940v1	null
2023-09-14	VEATIC: Video-based Emotion and Affect Tracking in Context Dataset	Zhihang Ren et.al.	2309.06745v2	null
2023-09-13	VLSlice: Interactive Vision-and-Language Slice Discovery	Eric Slyman et.al.	2309.06703v1	link
2023-09-13	STUPD: A Synthetic Dataset for Spatial and Temporal Relation Reasoning	Palaash Agrawal et.al.	2309.06680v1	null
2023-09-12	Zero-Shot Visual Classification with Guided Cropping	Piyapat Saranrittichai et.al.	2309.06581v1	null
2023-09-12	Attention De-sparsification Matters: Inducing Diversity in Digital Pathology Representation Learning	Saarthak Kapse et.al.	2309.06439v1	null
2023-09-12	Learning to Predict Concept Ordering for Common Sense Generation	Tianhui Zhang et.al.	2309.06363v1	link
2023-09-12	360$^\circ$ from a Single Camera: A Few-Shot Approach for LiDAR Segmentation	Laurenz Reichardt et.al.	2309.06197v1	null
2023-09-12	Active Label Refinement for Semantic Segmentation of Satellite Images	Tuan Pham Minh et.al.	2309.06159v1	null
2023-09-12	Annotating Data for Fine-Tuning a Neural Ranker? Current Active Learning Strategies are not Better than Random Selection	Sophia Althammer et.al.	2309.06131v1	null
2023-09-12	Do PLMs Know and Understand Ontological Knowledge?	Weiqi Wu et.al.	2309.05936v1	link
2023-09-12	Frequency-Aware Masked Autoencoders for Multimodal Pretraining on Biosignals	Ran Liu et.al.	2309.05927v1	null
2023-09-11	Natural Language Supervision for General-Purpose Audio Representations	Benjamin Elizalde et.al.	2309.05767v1	null
2023-09-11	Learning the Geodesic Embedding with Graph Neural Networks	Bo Pang et.al.	2309.05613v1	null
2023-09-11	Temporal Action Localization with Enhanced Instant Discriminability	Dingfeng Shi et.al.	2309.05590v1	link
2023-09-11	Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation	Anna Deichler et.al.	2309.05455v1	null
2023-09-11	Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP	Jinzuomu Zhong et.al.	2309.05423v1	null
2023-09-11	Towards generalisable and calibrated synthetic speech detection with self-supervised representations	Dan Oneata et.al.	2309.05384v1	null
2023-09-11	DeCUR: decoupling common & unique representations for multimodal self-supervision	Yi Wang et.al.	2309.05300v1	link
2023-09-11	Quantifying and Attributing the Hallucination of Large Language Models via Association Analysis	Li Du et.al.	2309.05217v1	null
2023-09-11	SIM-Sync: From Certifiably Optimal Synchronization over the 3D Similarity Group to Scene Reconstruction with Learned Depth	Xihang Yu et.al.	2309.05184v1	null
2023-09-10	Anatomy Completor: A Multi-class Completion Framework for 3D Anatomy Reconstruction	Jianning Li et.al.	2309.04956v1	null
2023-09-10	Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation	Yuan Gan et.al.	2309.04946v1	link
2023-09-08	Subwords as Skills: Tokenization for Sparse-Reward Reinforcement Learning	David Yunis et.al.	2309.04459v1	null
2023-09-08	Zero-Shot Robustification of Zero-Shot Models With Foundation Models	Dyah Adila et.al.	2309.04344v1	null
2023-09-08	Enhancing Hierarchical Transformers for Whole Brain Segmentation with Intracranial Measurements Integration	Xin Yu et.al.	2309.04071v1	null
2023-09-08	3D Denoisers are Good 2D Teachers: Molecular Pretraining via Denoising and Cross-Modal Distillation	Sungjun Cho et.al.	2309.04062v1	null
2023-09-07	Multiple Representation Transfer from Large Language Models to End-to-End ASR Systems	Takuma Udagawa et.al.	2309.04031v1	null
2023-09-07	Multimodal Transformer for Material Segmentation	Md Kaykobad Reza et.al.	2309.04001v1	link
2023-09-07	DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models	Yung-Sung Chuang et.al.	2309.03883v1	link
2023-09-07	Prompt-based Context- and Domain-aware Pretraining for Vision and Language Navigation	Ting Liu et.al.	2309.03661v1	null
2023-09-08	All Labels Together: Low-shot Intent Detection with an Efficient Label Semantic Encoding Paradigm	Jiangshu Du et.al.	2309.03563v2	null
2023-09-07	SyncDreamer: Generating Multiview-consistent Images from a Single-view Image	Yuan Liu et.al.	2309.03453v1	null
2023-09-06	Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation	Arvind Krishna Sridhar et.al.	2309.03340v1	null
2023-09-06	EvoCLINICAL: Evolving Cyber-Cyber Digital Twin with Active Transfer Learning for Automated Cancer Registry System	Chengjie Lu et.al.	2309.03246v1	null
2023-09-06	Leveraging ASR Pretrained Conformers for Speaker Verification through Transfer Learning and Knowledge Distillation	Danwei Cai et.al.	2309.03019v1	null
2023-09-07	HAE-RAE Bench: Evaluation of Korean Knowledge in Language Models	Guijin Son et.al.	2309.02706v2	null
2023-09-05	Self-Supervised Pretraining Improves Performance and Inference Efficiency in Multiple Lung Ultrasound Interpretation Tasks	Blake VanBerlo et.al.	2309.02596v1	null
2023-09-05	Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning	Lili Yu et.al.	2309.02591v1	null
2023-09-05	A Survey of the Impact of Self-Supervised Pretraining for Diagnostic Tasks with Radiological Images	Blake VanBerlo et.al.	2309.02555v1	null
2023-09-05	Building a Winning Team: Selecting Source Model Ensembles using a Submodular Transferability Estimation Approach	Vimal K B et.al.	2309.02429v1	null
2023-09-05	Weigh Your Own Words: Improving Hate Speech Counter Narrative Generation via Attention Regularization	Helena Bonaldi et.al.	2309.02311v1	null
2023-09-05	Bring the Noise: Introducing Noise Robustness to Pretrained Automatic Speech Recognition	Patrick Eickhoff et.al.	2309.02145v1	null
2023-09-04	Uncertainty in AI: Evaluating Deep Neural Networks on Out-of-Distribution Images	Jamiu Idowu et.al.	2309.01850v1	null
2023-09-06	An Empirical Analysis for Zero-Shot Multi-Label Classification on COVID-19 CT Scans and Uncurated Reports	Ethan Dack et.al.	2309.01740v2	null
2023-09-04	A Comparative Analysis of Pretrained Language Models for Text-to-Speech	Marcel Granero-Moya et.al.	2309.01576v1	null
2023-09-04	DiverseMotion: Towards Diverse Human Motion Generation via Discrete Diffusion	Yunhong Lou et.al.	2309.01372v1	null
2023-09-04	Can I Trust Your Answer? Visually Grounded Video Question Answering	Junbin Xiao et.al.	2309.01327v1	null
2023-09-03	COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using Transformers	Julien Denize et.al.	2309.01270v1	link
2023-09-03	Optimizing Mobile-Edge AI-Generated Everything (AIGX) Services by Prompt Engineering: Fundamental, Framework, and Case Study	Yinqiu Liu et.al.	2309.01065v1	null
2023-09-01	Catalyst Property Prediction with CatBERTa: Unveiling Feature Exploration Strategies through Large Language Models	Janghoon Ock et.al.	2309.00563v1	link
2023-09-01	Trust your Good Friends: Source-free Domain Adaptation by Reciprocal Neighborhood Clustering	Shiqi Yang et.al.	2309.00528v1	null
2023-09-01	CPSP: Learning Speech Concepts From Phoneme Supervision	Chunyu Qiang et.al.	2309.00424v1	null
2023-09-01	FactLLaMA: Optimizing Instruction-Following Language Models with External Knowledge for Automated Fact-Checking	Tsun-Hin Cheung et.al.	2309.00240v1	null
2023-08-31	A Sequential Framework for Detection and Classification of Abnormal Teeth in Panoramic X-rays	Tudor Dascalu et.al.	2309.00027v1	link
2023-08-31	StyleInV: A Temporal Style Modulated Inversion Network for Unconditional Video Generation	Yuhan Wang et.al.	2308.16909v1	link
2023-08-31	The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants	Lucas Bandarkar et.al.	2308.16884v1	link
2023-08-31	Towards Multilingual Automatic Dialogue Evaluation	John Mendonça et.al.	2308.16795v1	null
2023-08-31	Generate Your Own Scotland: Satellite Image Generation Conditioned on Maps	Miguel Espinosa et.al.	2308.16648v1	link
2023-08-31	Expanding Frozen Vision-Language Models without Retraining: Towards Improved Robot Perception	Riley Tavassoli et.al.	2308.16493v1	null
2023-08-31	Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff	Satoshi Suzuki et.al.	2308.16454v1	null
2023-08-30	ToddlerBERTa: Exploiting BabyBERTa for Grammar Learning and Language Understanding	Omer Veysel Cagatan et.al.	2308.16336v1	null
2023-08-30	Can Prompt Learning Benefit Radiology Report Generation?	Jun Wang et.al.	2308.16269v1	null
2023-08-30	SAM-Med2D	Junlong Cheng et.al.	2308.16184v1	link
2023-08-30	Quantifying Uncertainty in Answers from any Language Model via Intrinsic and Extrinsic Confidence Assessment	Jiuhai Chen et.al.	2308.16175v1	null
2023-08-30	Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models	Neha Sengupta et.al.	2308.16149v1	null
2023-08-30	MerA: Merging Pretrained Adapters For Few-Shot Learning	Shwai He et.al.	2308.15982v1	null
2023-08-29	A General-Purpose Self-Supervised Model for Computational Pathology	Richard J. Chen et.al.	2308.15474v1	null
2023-08-29	DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior	Xinqi Lin et.al.	2308.15070v1	link
2023-08-29	Generative Model for Models: Rapid DNN Customization for Diverse Tasks and Resource Constraints	Wenxing Xu et.al.	2308.15003v1	null
2023-08-28	SynthDistill: Face Recognition with Knowledge Distillation from Synthetic Data	Hatef Otroshi Shahreza et.al.	2308.14852v1	null
2023-08-28	VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation	Xudong Wang et.al.	2308.14710v1	link
2023-08-28	Fine-Tuning Llama 2 Large Language Models for Detecting Online Sexual Predatory Chats and Abusive Texts	Thanh Thi Nguyen et.al.	2308.14683v1	null
2023-08-28	Adversarial Attacks on Foundational Vision Models	Nathan Inkawhich et.al.	2308.14597v1	null
2023-08-28	Multimodal Detection of Social Spambots in Twitter using Transformers	Loukas Ilias et.al.	2308.14484v1	null
2023-08-28	Self-Supervision for Tackling Unsupervised Anomaly Detection: Pitfalls and Opportunities	Leman Akoglu et.al.	2308.14380v1	null
2023-08-28	FonMTL: Towards Multitask Learning for the Fon Language	Bonaventure F. P. Dossou et.al.	2308.14280v1	link
2023-08-28	Parameter-Efficient Transfer Learning for Audio-Visual-Language Tasks	Hongye Liu et.al.	2308.14274v1	null
2023-08-27	SketchDreamer: Interactive Text-Augmented Creative Sketch Ideation	Zhiyu Qu et.al.	2308.14191v1	null
2023-08-27	Only Encode Once: Making Content-based News Recommender Greener	Qijiong Liu et.al.	2308.14155v1	null
2023-08-27	Situated Natural Language Explanations	Zining Zhu et.al.	2308.14115v1	null
2023-08-25	In-context learning for model-free system identification	Marco Forgione et.al.	2308.13380v1	link
2023-08-25	Refine Neutrino Events Reconstruction with BEiT-3	Chen Li et.al.	2308.13285v1	link
2023-08-25	Self-supervised Scene Text Segmentation with Object-centric Layered Representations Augmented by Text Regions	Yibo Wang et.al.	2308.13178v1	null
2023-08-25	Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining?	Fei Wang et.al.	2308.12898v2	link
2023-08-24	A Parse-Then-Place Approach for Generating Graphic Layouts from Textual Descriptions	Jiawei Lin et.al.	2308.12700v1	null
2023-08-25	Masked Feature Modelling: Feature Masking for the Unsupervised Pre-training of a Graph Attention Network Block for Bottom-up Video Event Recognition	Dimitrios Daskalakis et.al.	2308.12673v2	null
2023-08-24	A Small and Fast BERT for Chinese Medical Punctuation Restoration	Tongtao Ling et.al.	2308.12568v1	link
2023-08-24	Parameter-Efficient Transfer Learning for Remote Sensing Image-Text Retrieval	Yuan Yuan et.al.	2308.12509v1	null
2023-08-24	Source-Free Collaborative Domain Adaptation via Multi-Perspective Feature Enrichment for Functional MRI Analysis	Yuqi Fang et.al.	2308.12495v1	link
2023-08-23	D4: Improving LLM Pretraining via Document De-Duplication and Diversification	Kushal Tirumala et.al.	2308.12284v1	null
2023-08-23	Language Reward Modulation for Pretraining Reinforcement Learning	Ademi Adeniji et.al.	2308.12270v1	link
2023-08-23	Prompt2Model: Generating Deployable Models from Natural Language Instructions	Vijay Viswanathan et.al.	2308.12261v1	link
2023-08-25	Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning	Jiasheng Ye et.al.	2308.12219v2	link
2023-08-23	DR-Tune: Improving Fine-tuning of Pretrained Visual Models by Distribution Regularization with Semantic Calibration	Nan Zhou et.al.	2308.12058v1	link
2023-08-23	Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages	Jinyi Hu et.al.	2308.12038v1	link
2023-08-23	Local Distortion Aware Efficient Transformer Adaptation for Image Quality Assessment	Kangmin Xu et.al.	2308.12001v1	null
2023-08-23	Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields	Hyeonseop Song et.al.	2308.11974v1	null
2023-08-23	CED: Consistent ensemble distillation for audio tagging	Heinrich Dinkel et.al.	2308.11957v1	link
2023-08-22	Time Does Tell: Self-Supervised Time-Tuning of Dense Image Representations	Mohammadreza Salehi et.al.	2308.11796v1	link
2023-08-22	Open Set Synthetic Image Source Attribution	Shengbang Fang et.al.	2308.11557v1	null
2023-08-22	Masked Momentum Contrastive Learning for Zero-shot Semantic Understanding	Jiantao Wu et.al.	2308.11448v1	null
2023-08-22	Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and Captioning	Shansong Liu et.al.	2308.11276v1	null
2023-08-21	UnLoc: A Unified Framework for Video Localization Tasks	Shen Yan et.al.	2308.11062v1	link
2023-08-21	SupEuclid: Extremely Simple, High Quality OoD Detection with Supervised Contrastive Learning and Euclidean Distance	Jarrod Haas et.al.	2308.10973v1	null
2023-08-21	DocPrompt: Large-scale continue pretrain for zero-shot and few-shot document question answering	Sijin Wu et.al.	2308.10959v1	null
2023-08-21	EALink: An Efficient and Accurate Pre-trained Framework for Issue-Commit Link Recovery	Chenyuan Zhang et.al.	2308.10759v1	link
2023-08-23	Foundation Model-oriented Robustness: Robust Image Model Evaluation with Pretrained Models	Peiyan Zhang et.al.	2308.10632v2	null
2023-08-21	When Prompt-based Incremental Learning Does Not Meet Strong Pretraining	Yu-Ming Tang et.al.	2308.10445v1	link
2023-08-21	Turning a CLIP Model into a Scene Text Spotter	Wenwen Yu et.al.	2308.10408v1	link
2023-08-20	cantnlp@LT-EDI@RANLP-2023: Homophobia/Transphobia Detection in Social Media Comments using Spatio-Temporally Retrained Language Models	Sidney G. -J. Wong et.al.	2308.10370v1	null
2023-08-22	Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting	Qidong Huang et.al.	2308.10315v2	link
2023-08-20	Make-It-4D: Synthesizing a Consistent Long-Term Dynamic Scene Video from a Single Image	Liao Shen et.al.	2308.10257v1	null
2023-08-20	From Global to Local: Multi-scale Out-of-distribution Detection	Ji Zhang et.al.	2308.10239v1	link
2023-08-20	ViT-Lens: Towards Omni-modal Representations	Weixian Lei et.al.	2308.10185v1	link
2023-08-19	Efficient Representation Learning for Healthcare with Cross-Architectural Self-Supervision	Pranav Singh et.al.	2308.10064v1	link
2023-08-18	Artificial-Spiking Hierarchical Networks for Vision-Language Representation Learning	Yeming Chen et.al.	2308.09455v1	null
2023-08-18	Accelerated materials language processing enabled by GPT	Jaewoong Choi et.al.	2308.09354v1	null
2023-08-18	DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability	Runhui Huang et.al.	2308.09306v1	null
2023-08-18	V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models	Heng Wang et.al.	2308.09300v1	null
2023-08-18	Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality Assessment Model	Ryandhimas E. Zezario et.al.	2308.09262v1	null
2023-08-17	Semantic Consistency for Assuring Reliability of Large Language Models	Harsh Raj et.al.	2308.09138v1	null
2023-08-17	Edit Temporal-Consistent Videos with Image Diffusion Model	Yuanzhi Wang et.al.	2308.09091v1	null
2023-08-17	On the Evaluation of Neural Code Translation: Taxonomy and Benchmark	Mingsheng Jiao et.al.	2308.08961v1	null
2023-08-17	Bag of Tricks for Long-Tailed Multi-Label Classification on Chest X-Rays	Feng Hong et.al.	2308.08853v1	null
2023-08-16	Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer	Guangyi Chen et.al.	2308.08414v1	null
2023-08-16	Advancing continual lifelong learning in neural information retrieval: definition, dataset, framework, and empirical evaluation	Jingrui Hou et.al.	2308.08378v1	null
2023-08-16	Boosting Commit Classification with Contrastive Learning	Jiajun Tong et.al.	2308.08263v1	null
2023-08-16	Is Self-Supervised Pretraining Good for Extrapolation in Molecular Property Prediction?	Shun Takashige et.al.	2308.08129v1	null
2023-08-15	End-to-End Open Vocabulary Keyword Search With Multilingual Neural Representations	Bolaji Yusuf et.al.	2308.08027v1	null
2023-08-15	RAVEN: In-Context Learning with Retrieval Augmented Encoder-Decoder Language Models	Jie Huang et.al.	2308.07922v1	null
2023-08-15	Dancing Avatar: Pose and Text-Guided Human Motion Videos Synthesis with Image Diffusion Model	Bosheng Qin et.al.	2308.07749v1	null
2023-08-16	SPM: Structured Pretraining and Matching Architectures for Relevance Modeling in Meituan Search	Wen Zan et.al.	2308.07711v2	null
2023-08-15	Self-supervised Hypergraphs for Learning Multiple World Interpretations	Alina Marcu et.al.	2308.07615v1	null
2023-08-15	SGDiff: A Style Guided Diffusion Model for Fashion Synthesis	Zhengwentai Sun et.al.	2308.07605v1	link
2023-08-15	AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model	Jeong Hun Yeo et.al.	2308.07593v1	null
2023-08-14	Semantic Similarity Loss for Neural Source Code Summarization	Chia-Yi Su et.al.	2308.07429v1	link
2023-08-14	Platypus: Quick, Cheap, and Powerful Refinement of LLMs	Ariel N. Lee et.al.	2308.07317v1	link
2023-08-15	SEMI-CenterNet: A Machine Learning Facilitated Approach for Semiconductor Defect Inspection	Vic De Ridder et.al.	2308.07180v2	null
2023-08-14	CTP: Towards Vision-Language Continual Pretraining via Compatible Momentum Contrast and Topology Preservation	Hongguang Zhu et.al.	2308.07146v1	link
2023-08-14	On the Importance of Spatial Relations for Few-shot Action Recognition	Yilun Zhang et.al.	2308.07119v1	null
2023-08-14	A One Stop 3D Target Reconstruction and multilevel Segmentation Method	Jiexiong Xu et.al.	2308.06974v1	link
2023-08-14	Robustness Stress Testing in Medical Image Classification	Mobarakol Islam et.al.	2308.06889v1	link
2023-08-14	Towards Open-Set Test-Time Adaptation Utilizing the Wisdom of Crowds in Entropy Minimization	Jungsoo Lee et.al.	2308.06879v1	null
2023-08-13	Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks	David Junhao Zhang et.al.	2308.06739v1	null
2023-08-13	IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models	Hu Ye et.al.	2308.06721v1	null
2023-08-12	GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher	Youliang Yuan et.al.	2308.06463v1	link
2023-08-11	Zero-shot Text-driven Physically Interpretable Face Editing	Yapeng Meng et.al.	2308.05976v1	null
2023-08-10	AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining	Haohe Liu et.al.	2308.05734v1	null
2023-08-10	Generative Diffusion Models for Radio Wireless Channel Modelling and Sampling	Ushnish Sengupta et.al.	2308.05583v1	null
2023-08-10	Fine-grained building roof instance segmentation based on domain adapted pretraining and composite dual-backbone	Guozhang Liu et.al.	2308.05358v1	null
2023-08-10	Multimodal Pretrained Models for Sequential Decision-Making: Synthesis, Verification, Grounding, and Perception	Yunhao Yang et.al.	2308.05295v1	null
2023-08-09	Deep Learning Model Transfer in Forest Mapping using Multi-source Satellite SAR and Optical Images	Shaojia Ge et.al.	2308.05005v1	null
2023-08-09	Transferable Models for Bioacoustics with Human Language Supervision	David Robinson et.al.	2308.04978v1	link
2023-08-09	JEDI: Joint Expert Distillation in a Semi-Supervised Multi-Dataset Student-Teacher Scenario for Video Action Recognition	Lucian Bicsi et.al.	2308.04934v1	null
2023-08-09	Deep Generative Networks for Heterogeneous Augmentation of Cranial Defects	Kamil Kwarciak et.al.	2308.04883v1	null
2023-08-09	Optimizing a Transformer-based network for a deep learning seismic processing workflow	Randy Harsuko et.al.	2308.04739v1	null
2023-08-08	Temporal DINO: A Self-supervised Video Strategy to Enhance Action Prediction	Izzeddin Teeti et.al.	2308.04589v1	null
2023-08-08	Semi-Supervised Semantic Segmentation of Cell Nuclei via Diffusion-based Large-Scale Pre-Training and Collaborative Learning	Zhuchen Shao et.al.	2308.04578v1	null
2023-08-08	Improving Medical Image Classification in Noisy Labels Using Only Self-supervised Pretraining	Bidur Khanal et.al.	2308.04551v1	link
2023-08-08	Pengembangan Model untuk Mendeteksi Kerusakan pada Terumbu Karang dengan Klasifikasi Citra	Fadhil Muhammad et.al.	2308.04337v1	null
2023-08-08	In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning	Xiaochuang Han et.al.	2308.04275v1	link
2023-08-08	Prompted Contrast with Masked Motion Modeling: Towards Versatile 3D Action Representation Learning	Jiahang Zhang et.al.	2308.03975v1	null
2023-08-07	AdaptiveSAM: Towards Efficient Tuning of SAM for Surgical Scene Segmentation	Jay N. Paranjape et.al.	2308.03726v1	link
2023-08-07	Exploring Visual Pre-training for Robot Manipulation: Datasets, Models and Methods	Ya Jing et.al.	2308.03620v1	null
2023-08-07	When GPT Meets Program Analysis: Towards Intelligent Detection of Smart Contract Logic Vulnerabilities in GPTScan	Yuqiang Sun et.al.	2308.03314v1	null
2023-08-06	Introducing Feature Attention Module on Convolutional Neural Network for Diabetic Retinopathy Detection	Susmita Ghosh et.al.	2308.02985v1	null
2023-08-05	DiffDance: Cascaded Human Motion Diffusion Model for Dance Generation	Qiaosong Qi et.al.	2308.02915v1	null
2023-08-05	Improving Generalization of Image Captioning with Unsupervised Prompt Learning	Hongchen Wei et.al.	2308.02862v1	null
2023-08-05	Dual Degradation-Inspired Deep Unfolding Network for Low-Light Image Enhancement	Huake Wang et.al.	2308.02776v1	null
2023-08-04	Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP	Qihang Yu et.al.	2308.02487v1	link
2023-08-04	A Parameter-efficient Multi-subject Model for Predicting fMRI Activity	Connor Lane et.al.	2308.02351v1	link
2023-08-04	Explaining Relation Classification Models with Semantic Extents	Lars Klöser et.al.	2308.02193v1	link
2023-08-03	DualCoOp++: Fast and Effective Adaptation to Multi-Label Recognition with Limited Annotations	Ping Hu et.al.	2308.01890v1	null
2023-08-03	MAP: A Model-agnostic Pretraining Framework for Click-through Rate Prediction	Jianghao Lin et.al.	2308.01737v1	link
2023-08-03	Baby's CoThought: Leveraging Large Language Models for Enhanced Reasoning in Compact Models	Zheyu Zhang et.al.	2308.01684v1	link
2023-08-03	MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies	Ke Chen et.al.	2308.01546v1	null
2023-08-03	Multimodal Neurons in Pretrained Text-Only Transformers	Sarah Schwettmann et.al.	2308.01544v1	null
2023-08-03	MFIM: Megapixel Facial Identity Manipulation	Sanghyeon Na et.al.	2308.01536v1	null
2023-08-02	Teaching Smaller Language Models To Generalise To Unseen Compositional Questions	Tim Hartill et.al.	2308.00946v1	null
2023-08-01	Ada-DQA: Adaptive Diverse Quality-aware Feature Acquisition for Video Quality Assessment	Hongbo Liu et.al.	2308.00729v1	null
2023-08-01	Adaptive Semantic Consistency for Cross-domain Few-shot Classification	Hengchu Lu et.al.	2308.00727v1	null
2023-08-01	CodeBPE: Investigating Subtokenization Options for Large Language Model Pretraining on Source Code	Nadezhda Chirkova et.al.	2308.00683v1	null
2023-08-01	An L2-Normalized Spatial Attention Network For Accurate And Fast Classification Of Brain Tumors In 2D T1-Weighted CE-MRI Images	Grace Billingsley et.al.	2308.00491v1	link
2023-08-01	DINO-CXR: A self supervised method based on vision transformer for chest X-ray classification	Mohammadreza Shakouri et.al.	2308.00475v1	null
2023-08-01	ViT2EEG: Leveraging Hybrid Pretrained Vision Transformers for EEG Data	Ruiqi Yang et.al.	2308.00454v1	link
2023-08-01	Fountain -- an intelligent contextual assistant combining knowledge representation and language models for manufacturing risk identification	Saurabh Kumar et.al.	2308.00364v1	null
2023-08-01	Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models	Jiaao Chen et.al.	2308.00304v1	null
2023-08-01	The Algonauts Project 2023 Challenge: UARK-UAlbany Team Solution	Xuan-Bac Nguyen et.al.	2308.00262v1	link
2023-08-01	Instructed to Bias: Instruction-Tuned Language Models Exhibit Emergent Cognitive Bias	Itay Itzhak et.al.	2308.00225v1	null
2023-07-31	Generative Models as a Complex Systems Science: How can we make sense of large language model behavior?	Ari Holtzman et.al.	2308.00189v1	null
2023-07-31	Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity	Charlie Hou et.al.	2308.00177v1	null
2023-07-31	Towards Trustworthy and Aligned Machine Learning: A Data-centric Survey with Causality Perspectives	Haoyang Liu et.al.	2307.16851v1	null
2023-07-31	UniVTG: Towards Unified Video-Language Temporal Grounding	Kevin Qinghong Lin et.al.	2307.16715v1	link
2023-07-31	DDG-Net: Discriminability-Driven Graph Network for Weakly-supervised Temporal Action Localization	Xiaojun Tang et.al.	2307.16415v1	link
2023-07-31	MobileVidFactory: Automatic Diffusion-Based Social Media Video Generation for Mobile Devices from Text	Junchen Zhu et.al.	2307.16371v1	null
2023-07-31	AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?	Qi Zhao et.al.	2307.16368v1	null
2023-07-30	Unified Model for Image, Video, Audio and Language Tasks	Mustafa Shukor et.al.	2307.16184v1	link
2023-07-30	HD-Fusion: Detailed Text-to-3D Generation Leveraging Multiple Noise Estimation	Jinbo Wu et.al.	2307.16183v1	null
2023-08-01	Motion Degeneracy in Self-supervised Learning of Elevation Angle Estimation for 2D Forward-Looking Sonar	Yusheng Wang et.al.	2307.16160v2	null
2023-07-29	Instance-Wise Adaptive Tuning and Caching for Vision-Language Models	Chunjin Yang et.al.	2307.15983v1	null
2023-07-29	GeneMask: Fast Pretraining of Gene Sequences to Enable Few-Shot Learning	Soumyadeep Roy et.al.	2307.15933v1	link
2023-07-28	SimDETR: Simplifying self-supervised pretraining for DETR	Ioannis Maniadis Metaxas et.al.	2307.15697v1	null
2023-07-28	The FlySpeech Audio-Visual Speaker Diarization System for MISP Challenge 2022	Li Zhang et.al.	2307.15400v1	null
2023-07-28	Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions	Yifei Xin et.al.	2307.15344v1	null
2023-07-28	ChatHome: Development and Evaluation of a Domain-Specific Language Model for Home Renovation	Cheng Wen et.al.	2307.15290v1	link
2023-07-28	Multilingual Lexical Simplification via Paraphrase Generation	Kang Liu et.al.	2307.15286v1	link
2023-07-28	AC-Norm: Effective Tuning for Medical Image Analysis via Affine Collaborative Normalization	Chuyan Zhang et.al.	2307.15282v1	link
2023-07-28	A deep transfer learning network for structural condition identification with limited real-world training data	Nengxin Bao et.al.	2307.15249v1	null
2023-07-27	Seal-3D: Interactive Pixel-Level Editing for Neural Radiance Fields	Xiangyu Wang et.al.	2307.15131v1	link
2023-07-27	Sample Less, Learn More: Efficient Action Recognition via Frame Feature Restoration	Harry Cheng et.al.	2307.14866v1	null
2023-07-27	Gloss-free Sign Language Translation: Improving from Visual-Language Pretraining	Benjia Zhou et.al.	2307.14768v1	null
2023-07-27	A Weakly Supervised Segmentation Network Embedding Cross-scale Attention Guidance and Noise-sensitive Constraint for Detecting Tertiary Lymphoid Structures of Pancreatic Tumors	Bingxue Wang et.al.	2307.14603v1	null
2023-07-26	MiDaS v3.1 -- A Model Zoo for Robust Monocular Relative Depth Estimation	Reiner Birkl et.al.	2307.14460v1	link
2023-07-26	Controllable Generation of Dialogue Acts for Dialogue Systems via Few-Shot Response Generation and Ranking	Angela Ramirez et.al.	2307.14440v1	link
2023-07-26	Visual Instruction Inversion: Image Editing via Visual Prompting	Thao Nguyen et.al.	2307.14331v1	null
2023-07-26	Comparative Analysis of Libraries for the Sentimental Analysis	Wendy Ccoya et.al.	2307.14311v1	null
2023-07-27	RPG-Palm: Realistic Pseudo-data Generation for Palmprint Recognition	Lei Shen et.al.	2307.14016v2	null
2023-07-26	ESSAformer: Efficient Transformer for Hyperspectral Image Super-resolution	Mingjin Zhang et.al.	2307.14010v1	null
2023-07-26	Tracking Anything in High Quality	Jiawen Zhu et.al.	2307.13974v1	link
2023-07-26	How Does Diffusion Influence Pretrained Language Models on Out-of-Distribution Data?	Huazheng Wang et.al.	2307.13949v1	link
2023-07-26	FinTree: Financial Dataset Pretrain Transformer Encoder for Relation Extraction	Hyunjong Ok et.al.	2307.13900v1	null
2023-07-25	Pretrained Deep 2.5D Models for Efficient Predictive Modeling from Retinal OCT	Taha Emre et.al.	2307.13865v1	null
2023-07-25	E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning	Cheng Han et.al.	2307.13770v1	link
2023-07-25	QuickQual: Lightweight, convenient retinal image quality scoring with off-the-shelf pretrained models	Justin Engelmann et.al.	2307.13646v1	link
2023-07-25	XDLM: Cross-lingual Diffusion Language Model for Machine Translation	Linyao Chen et.al.	2307.13560v1	null
2023-07-25	Zshot: An Open-source Framework for Zero-Shot Named Entity Recognition and Relation Extraction	Gabriele Picco et.al.	2307.13497v1	null
2023-07-24	DeepGATGO: A Hierarchical Pretraining-Based Graph-Attention Model for Automatic Protein Function Prediction	Zihao Li et.al.	2307.13004v1	null
2023-07-25	Towards a Visual-Language Foundation Model for Computational Pathology	Ming Y. Lu et.al.	2307.12914v2	null
2023-07-24	Multiscale Video Pretraining for Long-Term Activity Forecasting	Reuben Tan et.al.	2307.12854v1	null
2023-07-24	Predicting Ordinary Differential Equations with Transformers	Sören Becker et.al.	2307.12617v1	null
2023-07-25	TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition	Shilin Lu et.al.	2307.12493v2	link
2023-07-23	CommonsenseVIS: Visualizing and Understanding Commonsense Reasoning Capabilities of Natural Language Models	Xingbo Wang et.al.	2307.12382v1	null
2023-07-23	Self-Supervised Learning for Audio-Based Emotion Recognition	Peranut Nimitsurachat et.al.	2307.12343v1	null
2023-07-23	Geometry-Aware Adaptation for Pretrained Models	Nicholas Roberts et.al.	2307.12226v1	null
2023-07-22	Pathology-and-genomics Multimodal Transformer for Survival Outcome Prediction	Kexin Ding et.al.	2307.11952v1	link
2023-07-21	Bibliometric Analysis of Publisher and Journal Instructions to Authors on Generative-AI in Academic and Scientific Publishing	Conner Ganjavi et.al.	2307.11918v1	null
2023-07-21	Enhancing CLIP with GPT-4: Harnessing Visual Descriptions as Prompts	Mayug Maniparambil et.al.	2307.11661v1	null
2023-07-21	Advancing Visual Grounding with Scene Knowledge: Benchmark and Method	Zhihong Chen et.al.	2307.11558v1	link
2023-07-21	Generating Image-Specific Text Improves Fine-grained Image Classification	Emily Mu et.al.	2307.11315v1	null
2023-07-20	Heuristic Hyperparameter Choice for Image Anomaly Detection	Zeyu Jiang et.al.	2307.11197v1	null
2023-07-20	Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding	Siddhant Arora et.al.	2307.11005v1	null
2023-07-20	PASTA: Pretrained Action-State Transformer Agents	Raphael Boige et.al.	2307.10936v1	null
2023-07-20	BlendFace: Re-designing Identity Encoders for Face-Swapping	Kaede Shiohara et.al.	2307.10854v1	link
2023-07-20	HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and Retarget Faces	Stella Bounareli et.al.	2307.10797v1	link
2023-07-20	Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition	Weidong Chen et.al.	2307.10757v1	link
2023-07-20	Learning Discriminative Visual-Text Representation for Polyp Re-Identification	Suncheng Xiang et.al.	2307.10625v1	link
2023-07-20	Deep fused flow and topology features for botnet detection basing on pretrained GCN	Meng Xiaoyuan et.al.	2307.10583v1	null
2023-07-20	SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer	Daegyeom Kim et.al.	2307.10550v1	null
2023-07-19	Interpreting and Correcting Medical Image Classification with PIP-Net	Meike Nauta et.al.	2307.10404v1	null
2023-07-19	Gradient Sparsification For Masked Fine-Tuning of Transformers	James O' Neill et.al.	2307.10098v1	null
2023-07-20	Class Attention to Regions of Lesion for Imbalanced Medical Image Recognition	Jia-Xin Zhuang et.al.	2307.10036v2	null
2023-07-19	An analysis on the effects of speaker embedding choice in non auto-regressive TTS	Adriana Stan et.al.	2307.09898v1	null
2023-07-19	Pseudo Outlier Exposure for Out-of-Distribution Detection using Pretrained Transformers	Jaeyoung Kim et.al.	2307.09455v2	null
2023-07-19	Llama 2: Open Foundation and Fine-Tuned Chat Models	Hugo Touvron et.al.	2307.09288v2	link
2023-07-18	UniTabE: Pretraining a Unified Tabular Encoder for Heterogeneous Tabular Data	Yazheng Yang et.al.	2307.09249v1	null
2023-07-18	Division Gets Better: Learning Brightness-Aware and Detail-Sensitive Representations for Low-Light Image Enhancement	Huake Wang et.al.	2307.09104v1	null
2023-07-18	Multimodal Machine Learning for Extraction of Theorems and Proofs in the Scientific Literature	Shrey Mishra et.al.	2307.09047v1	link
2023-07-18	Accuracy versus time frontiers of semi-supervised and self-supervised learning on medical images	Zhe Huang et.al.	2307.08919v1	link
2023-07-17	Flow Matching in Latent Space	Quan Dao et.al.	2307.08698v1	link
2023-07-17	Deficiency-Aware Masked Transformer for Video Inpainting	Yongsheng Yu et.al.	2307.08629v1	link
2023-07-17	Scale-Aware Modulation Meet Transformer	Weifeng Lin et.al.	2307.08579v1	link
2023-07-17	Does Visual Pretraining Help End-to-End Reasoning?	Chen Sun et.al.	2307.08506v1	null
2023-07-17	Improving End-to-End Speech Translation by Imitation-Based Knowledge Distillation with Synthetic Transcripts	Rebekka Hubert et.al.	2307.08426v1	link
2023-07-18	CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing	Ahmet Canberk Baykal et.al.	2307.08397v2	null
2023-07-17	Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition	Shaoshi Ling et.al.	2307.08234v1	null
2023-07-17	Zero-Shot Image Harmonization with Generative Model Prior	Jianqi Chen et.al.	2307.08182v1	link
2023-07-16	Diffusion to Confusion: Naturalistic Adversarial Patch Generation Based on Diffusion Model for Object Detector	Shuo-Yen Lin et.al.	2307.08076v1	null
2023-07-16	Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language Modelling	Longyue Wang et.al.	2307.08074v1	null
2023-07-14	DreamTeacher: Pretraining Image Backbones with Deep Generative Models	Daiqing Li et.al.	2307.07487v1	null
2023-07-14	Towards spoken dialect identification of Irish	Liam Lonergan et.al.	2307.07436v1	null
2023-07-14	Improving Zero-Shot Generalization for CLIP with Synthesized Prompts	Zhengbo Wang et.al.	2307.07397v1	link
2023-07-14	Using Large Language Models for Zero-Shot Natural Language Generation from Knowledge Graphs	Agnes Axelsson et.al.	2307.07312v1	null
2023-07-13	Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling	He Huang et.al.	2307.07057v1	null
2023-07-13	In-context Autoencoder for Context Compression in a Large Language Model	Tao Ge et.al.	2307.06945v1	null
2023-07-13	mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs	Gregor Geigle et.al.	2307.06930v1	link
2023-07-13	Explainable 2D Vision Models for 3D Medical Data	Alexander Ziller et.al.	2307.06614v1	null
2023-07-12	T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation	Kaiyi Huang et.al.	2307.06350v1	null
2023-07-12	Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution	Mostafa Dehghani et.al.	2307.06304v1	null
2023-07-12	Instruction Mining: High-Quality Instruction Data Selection for Large Language Models	Yihan Cao et.al.	2307.06290v1	null
2023-07-12	Pluggable Neural Machine Translation Models via Memory-augmented Adapters	Yuzhuang Xu et.al.	2307.06029v1	link
2023-07-12	What Happens During Finetuning of Vision Transformers: An Invariance Based Investigation	Gabriele Merlin et.al.	2307.06006v1	null
2023-07-13	PIGEON: Predicting Image Geolocations	Lukas Haas et.al.	2307.05845v2	null
2023-07-11	EgoAdapt: A multi-stream evaluation study of adaptation to real-world egocentric user video	Matthias De Lange et.al.	2307.05784v1	link
2023-07-13	Rad-ReStruct: A Novel VQA Benchmark and Method for Structured Radiology Reporting	Chantal Pellegrini et.al.	2307.05766v2	link
2023-07-11	Masked Vision and Language Pre-training with Unimodal and Multimodal Contrastive Losses for Medical Visual Question Answering	Pengfei Li et.al.	2307.05314v1	null
2023-07-11	Attribute Controlled Dialogue Prompting	Runcheng Liu et.al.	2307.05228v1	null
2023-07-11	Generative Pretraining in Multimodality	Quan Sun et.al.	2307.05222v1	link
2023-07-11	ExFaceGAN: Exploring Identity Directions in GAN's Learned Latent Space for Synthetic Identity Generation	Fadi Boutros et.al.	2307.05151v1	null
2023-07-11	Uni-Removal: A Semi-Supervised Framework for Simultaneously Addressing Multiple Degradations in Real-World Images	Yongheng Zhang et.al.	2307.05075v1	null
2023-07-10	FedYolo: Augmenting Federated Learning with Pretrained Transformers	Xuechen Zhang et.al.	2307.04905v1	null
2023-07-10	Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos	Sagnik Majumder et.al.	2307.04760v1	null
2023-07-10	Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA Feedback	Jaskirat Singh et.al.	2307.04749v1	null
2023-07-12	Weakly-supervised positional contrastive learning: application to cirrhosis classification	Emma Sarfati et.al.	2307.04617v2	link
2023-07-10	Q-YOLOP: Quantization-aware You Only Look Once for Panoptic Driving Perception	Chi-Chih Chang et.al.	2307.04537v1	null
2023-07-06	Structure Guided Multi-modal Pre-trained Transformer for Knowledge Graph Reasoning	Ke Liang et.al.	2307.03591v1	null
2023-07-07	Derivative Free Weight-space Ensembling	Dean Ninalga et.al.	2307.03506v1	null
2023-07-07	Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification & Segmentation	Dahyun Kang et.al.	2307.03407v1	null
2023-07-07	Teaching Arithmetic to Small Transformers	Nayoung Lee et.al.	2307.03381v1	null
2023-07-06	Encoder-Decoder Networks for Self-Supervised Pretraining and Downstream Signal Bandwidth Regression on Digital Antenna Arrays	Rajib Bhattacharjea et.al.	2307.03327v1	null
2023-07-06	To pretrain or not to pretrain? A case study of domain-specific pretraining for semantic segmentation in histopathology	Tushar Kataria et.al.	2307.03275v1	null
2023-07-06	Vision Language Transformers: A Survey	Clayton Fields et.al.	2307.03254v1	null
2023-07-06	VideoGLUE: Video General Understanding Evaluation of Foundation Models	Liangzhe Yuan et.al.	2307.03166v1	null
2023-07-06	Parameter-Efficient Fine-Tuning of LLaMA for the Clinical Domain	Aryo Gema et.al.	2307.03042v1	null
2023-07-06	A Critical Look at the Current Usage of Foundation Model for Dense Recognition Task	Shiqi Yang et.al.	2307.02862v1	null
2023-07-06	Large Language Models Empowered Autonomous Edge AI for Connected Intelligence	Yifei Shen et.al.	2307.02779v1	null
2023-07-05	ODD: A Benchmark Dataset for the NLP-based Opioid Related Aberrant Behavior Detection	Sunjae Kwon et.al.	2307.02591v1	null
2023-07-05	Named Entity Inclusion in Abstractive Text Summarization	Sergey Berezin et.al.	2307.02570v1	null
2023-07-05	Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks	Zhaofeng Wu et.al.	2307.02477v1	null
2023-07-05	Interactive Image Segmentation with Cross-Modality Vision Transformers	Kun Li et.al.	2307.02280v1	link
2023-07-05	LOAF-M2L: Joint Learning of Wording and Formatting for Singable Melody-to-Lyric Generation	Longshen Ou et.al.	2307.02146v1	null
2023-07-05	Prompting Diffusion Representations for Cross-Domain Semantic Segmentation	Rui Gong et.al.	2307.02138v1	null
2023-07-05	EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models	Michael Wornow et.al.	2307.02028v1	link
2023-07-05	A ChatGPT Aided Explainable Framework for Zero-Shot Medical Image Diagnosis	Jiaxiang Liu et.al.	2307.01981v1	null
2023-07-04	KDSTM: Neural Semi-supervised Topic Modeling with Knowledge Distillation	Weijie Xu et.al.	2307.01878v1	null
2023-07-04	Pretraining is All You Need: A Multi-Atlas Enhanced Transformer Framework for Autism Spectrum Disorder Classification	Lucas Mahler et.al.	2307.01759v1	link
2023-07-04	Pretraining Conformer with ASR or ASV for Anti-Spoofing Countermeasure	Yikang Wang et.al.	2307.01546v1	null
2023-07-04	Mitigating the Learning Bias towards Repetition by Self-Contrastive Training for Open-Ended Generation	Jian Guan et.al.	2307.01542v1	null
2023-07-03	Don't freeze: Finetune encoders for better Self-Supervised HAR	Vitor Fortes Rey et.al.	2307.01168v1	null
2023-07-04	Improving Language Plasticity via Pretraining with Active Forgetting	Yihong Chen et.al.	2307.01163v2	null
2023-07-03	Generating Reliable Pixel-Level Labels for Source Free Domain Adaptation	Gabriel Tjio et.al.	2307.00893v1	null
2023-07-03	Augmenting Deep Learning Adaptation for Wearable Sensor Data through Combined Temporal-Frequency Image Encoding	Yidong Zhu et.al.	2307.00883v1	null
2023-07-03	Analysis of Task Transferability in Large Pre-trained Classifiers	Akshay Mehra et.al.	2307.00823v1	link
2023-07-01	Improving Text Matching in E-Commerce Search with A Rationalizable, Intervenable and Fast Entity-Based Relevance Model	Jiong Cai et.al.	2307.00370v1	null
2023-07-01	Improving Multitask Retrieval by Promoting Task Specialization	Wenzheng Zhang et.al.	2307.00342v1	null
2023-07-01	Hierarchical Pretraining for Biomedical Term Embeddings	Bryan Cai et.al.	2307.00266v1	null
2023-06-30	Multiscale Progressive Text Prompt Network for Medical Image Segmentation	Xianjun Han et.al.	2307.00174v1	null
2023-06-30	Stitched ViTs are Flexible Vision Backbones	Zizheng Pan et.al.	2307.00154v1	link
2023-06-30	Class-Incremental Learning using Diffusion Model for Distillation and Replay	Quentin Jodelet et.al.	2306.17560v1	null
2023-06-30	Why does my medical AI look at pictures of birds? Exploring the efficacy of transfer learning across domain boundaries	Frederic Jonske et.al.	2306.17555v1	null
2023-06-30	MeLM, a generative pretrained language modeling framework that solves forward and inverse mechanics problems	Markus J. Buehler et.al.	2306.17525v1	null
2023-07-03	LMBot: Distilling Graph Knowledge into Language Model for Graph-less Deployment in Twitter Bot Detection	Zijian Cai et.al.	2306.17408v2	null
2023-06-29	Towards Open-Domain Topic Classification	Hantian Ding et.al.	2306.17290v1	null
2023-06-29	Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models	Simian Luo et.al.	2306.17203v1	link
2023-06-29	An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training	Zitian Chen et.al.	2306.17165v1	null
2023-06-29	Classifying Crime Types using Judgment Documents from Social Media	Haoxuan Xu et.al.	2306.17020v1	null
2023-06-29	MIS-FM: 3D Medical Image Segmentation using Foundation Models Pretrained on a Large-Scale Unannotated Dataset	Guotai Wang et.al.	2306.16925v1	link
2023-07-03	Probabilistic Linguistic Knowledge and Token-level Text Augmentation	Zhengxiang Wang et.al.	2306.16644v2	null
2023-06-29	Representation learning of vertex heatmaps for 3D human mesh reconstruction from multi-view images	Sungho Chun et.al.	2306.16615v1	null
2023-06-28	Multi-Site Clinical Federated Learning using Recursive and Attentive Models and NVFlare	Won Joon Yun et.al.	2306.16367v1	null
2023-06-28	S2SNet: A Pretrained Neural Network for Superconductivity Discovery	Ke Liu et.al.	2306.16270v1	link
2023-06-28	Effective Transfer of Pretrained Large Visual Model for Fabric Defect Segmentation via Specifc Knowledge Injection	Zhewei Chen et.al.	2306.16186v1	null
2023-06-27	Classification of Infant Sleep/Wake States: Cross-Attention among Large Scale Pretrained Transformer Networks using Audio, ECG, and IMU Data	Kai Chieh Chang et.al.	2306.15808v1	null
2023-06-27	ConKI: Contrastive Knowledge Injection for Multimodal Sentiment Analysis	Yakun Yu et.al.	2306.15796v1	null
2023-06-27	HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution	Eric Nguyen et.al.	2306.15794v1	null
2023-06-27	Evidential Detection and Tracking Collaboration: New Problem, Benchmark and Algorithm for Robust Anti-UAV System	Xue-Feng Zhu et.al.	2306.15767v1	null
2023-06-27	Semi-supervised Multimodal Representation Learning through a Global Workspace	Benjamin Devillers et.al.	2306.15711v1	link
2023-06-28	Extending Context Window of Large Language Models via Positional Interpolation	Shouyuan Chen et.al.	2306.15595v2	null
2023-06-28	TrickVOS: A Bag of Tricks for Video Object Segmentation	Evangelos Skartados et.al.	2306.15377v2	null
2023-06-27	Gender Bias in BERT -- Measuring and Analysing Biases through Sentiment Rating in a Realistic Downstream Classification Task	Sophie Jentzsch et.al.	2306.15298v1	null
2023-06-27	Can Pretrained Language Models Derive Correct Semantics from Corrupt Subwords under Noise?	Xinzhe Li et.al.	2306.15268v1	link
2023-06-28	Wespeaker baselines for VoxSRC2023	Shuai Wang et.al.	2306.15161v2	null
2023-06-28	MIMIC: Masked Image Modeling with Image Correspondences	Kalyani Marathe et.al.	2306.15128v2	link
2023-06-26	Understanding In-Context Learning via Supportive Pretraining Data	Xiaochuang Han et.al.	2306.15091v1	null
2023-06-26	Pretraining task diversity and the emergence of non-Bayesian in-context learning for regression	Allan Raventós et.al.	2306.15063v1	link
2023-06-26	Supervised Pretraining Can Learn In-Context Reinforcement Learning	Jonathan N. Lee et.al.	2306.14892v1	null
2023-06-26	Composing Parameter-Efficient Modules with Arithmetic Operations	Jinghan Zhang et.al.	2306.14870v1	link
2023-06-27	Kosmos-2: Grounding Multimodal Large Language Models to the World	Zhiliang Peng et.al.	2306.14824v2	null
2023-06-26	DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models	Ximing Xing et.al.	2306.14685v1	null
2023-06-26	Improved Bayes Risk Can Yield Reduced Social Welfare Under Competition	Meena Jagadeesan et.al.	2306.14670v1	null
2023-06-26	Localized Text-to-Image Generation for Free via Cross Attention Control	Yutong He et.al.	2306.14636v1	null
2023-06-26	Transfer Learning across Several Centuries: Machine and Historian Integrated Method to Decipher Royal Secretary's Diary	Sojung Lucia Kim et.al.	2306.14592v1	null
2023-06-26	A-STAR: Test-time Attention Segregation and Retention for Text-to-image Synthesis	Aishwarya Agarwal et.al.	2306.14544v1	null
2023-06-26	ParameterNet: Parameters Are All You Need for Large-scale Visual Pretraining of Mobile Networks	Kai Han et.al.	2306.14525v1	null
2023-06-27	DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing	Yujun Shi et.al.	2306.14435v2	null
2023-06-23	Comparing the Efficacy of Fine-Tuning and Meta-Learning for Few-Shot Policy Imitation	Massimiliano Patacchiola et.al.	2306.13554v1	link
2023-06-23	DreamEditor: Text-Driven 3D Scene Editing with Neural Fields	Jingyu Zhuang et.al.	2306.13455v1	null
2023-06-23	Long-range Language Modeling with Self-retrieval	Ohad Rubin et.al.	2306.13421v1	null
2023-06-23	Variance-Covariance Regularization Improves Representation Learning	Jiachen Zhu et.al.	2306.13292v1	null
2023-06-22	PromptIR: Prompting for All-in-One Blind Image Restoration	Vaishnav Potlapalli et.al.	2306.13090v1	link
2023-06-22	Can a single image processing algorithm work equally well across all phases of DCE-MRI?	Adam G. Tattersall et.al.	2306.12988v1	null
2023-06-22	AudioPaLM: A Large Language Model That Can Speak and Listen	Paul K. Rubenstein et.al.	2306.12925v1	null
2023-06-22	Learning from Visual Observation via Offline Pretrained State-to-Go Transformer	Bohan Zhou et.al.	2306.12860v1	null
2023-06-23	Otter-Knowledge: benchmarks of multimodal knowledge graph representation learning from different sources for drug discovery	Hoang Thanh Lam et.al.	2306.12802v2	link
2023-06-22	Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields	Ori Gordon et.al.	2306.12760v1	null
2023-06-22	Restoration of the JPEG Maximum Lossy Compressed Face Images with Hourglass Block based on Early Stopping Discriminator	Jongwook Si et.al.	2306.12757v1	null
2023-06-22	FlowFace++: Explicit Semantic Flow-supervised End-to-End Face Swapping	Yu Zhang et.al.	2306.12686v1	null
2023-06-22	Identifying and Disentangling Spurious Features in Pretrained Image Representations	Rafayel Darbinyan et.al.	2306.12673v1	null
2023-06-21	Comparative Analysis of Segment Anything Model and U-Net for Breast Tumor Detection in Ultrasound and Mammography Images	Mohsen Ahmadi et.al.	2306.12510v1	null
2023-06-21	LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models	Shizhe Diao et.al.	2306.12420v1	link
2023-06-21	Introspective Action Advising for Interpretable Transfer Learning	Joseph Campbell et.al.	2306.12314v1	null
2023-06-21	ViTEraser: Harnessing the Power of Vision Transformers for Scene Text Removal with SegMIM Pretraining	Dezhi Peng et.al.	2306.12106v1	null
2023-06-20	Exploring New Frontiers in Agricultural NLP: Investigating the Potential of Large Language Models for Food Applications	Saed Rezayi et.al.	2306.11892v1	null
2023-06-20	Unsupervised Deep Unfolded PGD for Transmit Power Allocation in Wireless Systems	Ramoni Adeogun et.al.	2306.11865v1	null
2023-06-20	A Simple and Effective Pruning Approach for Large Language Models	Mingjie Sun et.al.	2306.11695v1	link
2023-06-20	Inter-Cell Network Slicing With Transfer Learning Empowered Multi-Agent Deep Reinforcement Learning	Tianlun Hu et.al.	2306.11552v1	null
2023-06-20	MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian	Willy Fitra Hendria et.al.	2306.11341v1	link
2023-06-19	RemoteCLIP: A Vision Language Foundation Model for Remote Sensing	Fan Liu et.al.	2306.11029v1	null
2023-06-19	Semi-Supervised Learning for hyperspectral images by non parametrically predicting view assignment	Shivam Pande et.al.	2306.10955v1	null
2023-06-19	Detailed retinal vessel segmentation without human annotations using simulated optical coherence tomography angiographs	Linus Kreitner et.al.	2306.10941v1	link
2023-06-19	Vocal Timbre Effects with Differentiable Digital Signal Processing	David Südholt et.al.	2306.10886v1	link
2023-06-19	A deep dive into explainable self-supervised transformers for point clouds	Ioannis Romanelis et.al.	2306.10798v1	link
2023-06-19	Preserving Commonsense Knowledge from Pre-trained Language Models via Causal Inference	Junhao Zheng et.al.	2306.10790v1	null
2023-06-18	Point-Cloud Completion with Pretrained Text-to-image Diffusion Models	Yoni Kasten et.al.	2306.10533v1	null
2023-06-16	CLIP2Protect: Protecting Facial Privacy using Text-Guided Makeup via Adversarial Latent Search	Fahad Shamshad et.al.	2306.10008v1	link
2023-06-16	Robot Learning with Sensorimotor Pre-training	Ilija Radosavovic et.al.	2306.10007v1	null
2023-06-16	SLACK: Stable Learning of Augmentations with Cold-start and KL regularization	Juliette Marrie et.al.	2306.09998v1	null
2023-06-16	LabelBench: A Comprehensive Framework for Benchmarking Label-Efficient Learning	Jifan Zhang et.al.	2306.09910v1	link
2023-06-16	Revealing the impact of social circumstances on the selection of cancer therapy through natural language processing of social work notes	Shenghuan Sun et.al.	2306.09877v1	null
2023-06-16	MixedTeacher : Knowledge Distillation for fast inference textural anomaly detection	Simon Thomine et.al.	2306.09859v1	null
2023-06-16	The Big Data Myth: Using Diffusion Models for Dataset Generation to Train Deep Detection Models	Roy Voetman et.al.	2306.09762v1	null
2023-06-16	Scaling Open-Vocabulary Object Detection	Matthias Minderer et.al.	2306.09683v1	null
2023-06-16	CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision Models	Hao-Wen Dong et.al.	2306.09635v1	null
2023-06-16	CMLM-CSE: Based on Conditional MLM Contrastive Learning for Sentence Embeddings	Wei Zhang et.al.	2306.09594v1	null
2023-06-15	Segment Any Point Cloud Sequences by Distilling Vision Foundation Models	Youquan Liu et.al.	2306.09347v1	link
2023-06-15	Semantic HELM: An Interpretable Memory for Reinforcement Learning	Fabian Paischer et.al.	2306.09312v1	link
2023-06-15	Text Promptable Surgical Instrument Segmentation with Vision-Language Models	Zijian Zhou et.al.	2306.09244v1	null
2023-06-15	SCALE: Scaling up the Complexity for Advanced Language Model Evaluation	Vishvaksenan Rasiah et.al.	2306.09237v1	null
2023-06-15	Audio Tagging on an Embedded Hardware Platform	Gabriel Bibbo et.al.	2306.09106v1	null
2023-06-15	Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration	Chenyang Lyu et.al.	2306.09093v1	link
2023-06-15	COSA: Concatenated Sample Pretrained Vision-Language Foundation Model	Sihan Chen et.al.	2306.09085v1	link
2023-06-15	Behavioral Cloning via Search in Embedded Demonstration Dataset	Federico Malato et.al.	2306.09082v1	null
2023-06-15	When Hyperspectral Image Classification Meets Diffusion Models: An Unsupervised Feature Learning Framework	Jingyi Zhou et.al.	2306.08964v1	null
2023-06-15	A Comparison of Self-Supervised Pretraining Approaches for Predicting Disease Risk from Chest Radiograph Images	Yanru Chen et.al.	2306.08955v1	null
2023-06-13	Image Captioners Are Scalable Vision Learners Too	Michael Tschannen et.al.	2306.07915v1	null
2023-06-13	GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Speech Emotion Recognition	Yu Pan et.al.	2306.07848v1	null
2023-06-13	Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images	Ming Y. Lu et.al.	2306.07831v1	null
2023-06-13	Monolingual and Cross-Lingual Knowledge Transfer for Topic Classification	Dmitry Karpov et.al.	2306.07797v1	null
2023-06-13	Multi-objective Molecular Optimization for Opioid Use Disorder Treatment Using Generative Network Complex	Hongsong Feng et.al.	2306.07484v1	null
2023-06-13	Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard	Ehsan Kamalloo et.al.	2306.07471v1	null
2023-06-12	Scalable 3D Captioning with Pretrained Models	Tiange Luo et.al.	2306.07279v1	null
2023-06-12	MovieFactory: Automatic Movie Creation from Text using Large Generative Models for Language and Images	Junchen Zhu et.al.	2306.07257v1	null
2023-06-13	Fair Learning to Rank with Distribution-free Risk Control	Ruocheng Guo et.al.	2306.07188v2	null
2023-06-12	Gradient Ascent Post-training Enhances Language Model Generalization	Dongkeun Yoon et.al.	2306.07052v1	link
2023-06-12	Generating Synthetic Datasets by Interpolating along Generalized Geodesics	Jiaojiao Fan et.al.	2306.06866v1	null
2023-06-11	Language Versatilists vs. Specialists: An Empirical Revisiting on Multilingual Transfer Ability	Jiacheng Ye et.al.	2306.06688v1	null
2023-06-10	Bootstrapping Code-Text Pretrained Language Model to Detect Inconsistency Between Code and Comment	Anh T. V. Dau et.al.	2306.06347v1	null
2023-06-10	Improving Non-autoregressive Translation Quality with Pretrained Language Model, Embedding Distillation and Upsampling Strategy for CTC	Shen-sian Syu et.al.	2306.06345v1	null
2023-06-09	DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents	Fuxiao Liu et.al.	2306.06306v1	link
2023-06-09	$FPDM$: Domain-Specific Fast Pre-training Technique using Document-Level Metadata	Abhilash Nandy et.al.	2306.06190v1	null
2023-06-09	Virtual Node Tuning for Few-shot Node Classification	Zhen Tan et.al.	2306.06063v1	null
2023-06-09	Benchmarking self-supervised video representation learning	Akash Kumar et.al.	2306.06010v1	null
2023-06-09	Exploring Effective Mask Sampling Modeling for Neural Image Compression	Lin Liu et.al.	2306.05704v1	null
2023-06-09	Embodied Executable Policy Learning with Language-based Scene Summarization	Jielin Qiu et.al.	2306.05696v1	null
2023-06-09	On the Importance of Feature Decorrelation for Unsupervised Representation Learning in Reinforcement Learning	Hojoon Lee et.al.	2306.05637v1	link
2023-06-08	Hexatagging: Projective Dependency Parsing as Tagging	Afra Amini et.al.	2306.05477v1	null
2023-06-08	Tracking Objects with 3D Representation from Videos	Jiawei He et.al.	2306.05416v1	null
2023-06-08	Mixture-of-Domain-Adapters: Decoupling and Injecting Domain Knowledge to Pre-trained Language Models Memories	Shizhe Diao et.al.	2306.05406v1	link
2023-06-08	RDumb: A simple approach that questions our progress in continual test-time adaptation	Ori Press et.al.	2306.05401v1	link
2023-06-08	Extensive Evaluation of Transformer-based Architectures for Adverse Drug Events Extraction	Simone Scaboro et.al.	2306.05276v1	link
2023-06-09	Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models	Tianzhe Chu et.al.	2306.05272v2	link
2023-06-08	SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions	Yuseung Lee et.al.	2306.05178v1	null
2023-06-08	Variable Radiance Field for Real-Life Category-Specifc Reconstruction from Single Image	Kun Wang et.al.	2306.05145v1	null
2023-06-08	DLAMA: A Framework for Curating Culturally Diverse Facts for Probing the Knowledge of Pretrained Language Models	Amr Keleg et.al.	2306.05076v1	null
2023-06-08	Improving Visual Prompt Tuning for Self-supervised Vision Transformers	Seungryong Yoo et.al.	2306.05067v1	link
2023-06-08	Learning A Foundation Language Model for Geoscience Knowledge Understanding and Utilization	Cheng Deng et.al.	2306.05064v1	link
2023-06-07	Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection	Yu Bai et.al.	2306.04637v1	null
2023-06-07	Proximity-Informed Calibration for Deep Neural Networks	Miao Xiong et.al.	2306.04590v1	link
2023-06-07	Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages	Claytone Sikasote et.al.	2306.04428v1	link
2023-06-07	SF-FSDA: Source-Free Few-Shot Domain Adaptive Object Detection with Efficient Labeled Data Factory	Han Sun et.al.	2306.04385v1	null
2023-06-07	Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks	Haiyang Xu et.al.	2306.04362v1	link
2023-06-08	GPT Self-Supervision for a Better Data Annotator	Xiaohuan Pei et.al.	2306.04349v2	null
2023-06-08	Coarse Is Better? A New Pipeline Towards Self-Supervised Learning with Uncurated Images	Ke Zhu et.al.	2306.04244v2	null
2023-06-07	Leveraging Knowledge Graph Embeddings to Enhance Contextual Representations for Relation Extraction	Fréjus A. A. Laleye et.al.	2306.04203v1	null
2023-06-07	From the One, Judge of the Whole: Typed Entailment Graph Construction with Predicate Generation	Zhibin Chen et.al.	2306.04170v1	link
2023-06-07	Matte Anything: Interactive Natural Image Matting with Segment Anything Models	Jingfeng Yao et.al.	2306.04121v1	null
2023-06-06	Learning Human Mesh Recovery in 3D Scenes	Zehong Shen et.al.	2306.03847v1	null
2023-06-06	Quick-Tune: Quickly Learning Which Pretrained Model to Finetune and How	Sebastian Pineda Arango et.al.	2306.03828v1	null
2023-06-06	On the Difference of BERT-style and CLIP-style Text Encoders	Zhihong Chen et.al.	2306.03678v1	link
2023-06-06	BioBLP: A Modular Framework for Learning on Multimodal Biomedical Knowledge Graphs	Daniel Daza et.al.	2306.03606v1	link
2023-06-06	LegoNet: Alternating Model Blocks for Medical Image Segmentation	Ikboljon Sobirov et.al.	2306.03494v1	null
2023-06-06	Alzheimer Disease Classification through ASR-based Transcriptions: Exploring the Impact of Punctuation and Pauses	Lucía Gómez-Zaragozá et.al.	2306.03443v1	null
2023-06-06	Quantifying the Variability Collapse of Neural Networks	Jing Xu et.al.	2306.03440v1	null
2023-06-06	Towards Alleviating the Object Bias in Prompt Tuning-based Factual Knowledge Extraction	Yuhang Wang et.al.	2306.03378v1	link
2023-06-06	Identifying Shared Decodable Concepts in the Human Brain Using Image-Language Foundation Models	Cory Efird et.al.	2306.03375v1	null
2023-06-07	Vid2Act: Activate Offline Videos for Visual RL	Minting Pan et.al.	2306.03360v2	null
2023-06-05	SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression	Tim Dettmers et.al.	2306.03078v1	null
2023-06-05	Sensitivity-Aware Finetuning for Accuracy Recovery on Deep Learning Hardware	Lakshmi Nair et.al.	2306.03076v1	null
2023-06-05	Continual Learning with Pretrained Backbones by Tuning in the Input Space	Simone Marullo et.al.	2306.02947v1	null
2023-06-05	Second Language Acquisition of Neural Language Models	Miyu Oba et.al.	2306.02920v1	null
2023-06-05	SelfEvolve: A Code Evolution Framework via Large Language Models	Shuyang Jiang et.al.	2306.02907v1	null
2023-06-05	Learning Probabilistic Symmetrization for Architecture Agnostic Equivariance	Jinwoo Kim et.al.	2306.02866v1	link
2023-06-05	Transformer-Based UNet with Multi-Headed Cross-Attention Skip Connections to Eliminate Artifacts in Scanned Documents	David Kreuzer et.al.	2306.02815v1	null
2023-06-05	Explore and Exploit the Diverse Knowledge in Model Zoo for Domain Generalization	Yimeng Chen et.al.	2306.02595v1	null
2023-06-05	Improved Active Multi-Task Representation Learning via Lasso	Yiping Wang et.al.	2306.02556v1	null
2023-06-04	RadLing: Towards Efficient Radiology Report Understanding	Rikhiya Ghosh et.al.	2306.02492v1	null
2023-06-02	Distilling Efficient Language-Specific Models for Cross-Lingual Transfer	Alan Ansell et.al.	2306.01709v1	link
2023-06-02	Towards In-context Scene Understanding	Ivana Balažević et.al.	2306.01667v1	null
2023-06-02	Pretrained Language Model based Web Search Ranking: From Relevance to Satisfaction	Canjia Li et.al.	2306.01599v1	null
2023-06-02	Evaluating The Robustness of Self-Supervised Representations to Background/Foreground Removal	Xavier F. Cadet et.al.	2306.01398v1	null
2023-06-02	Speech Translation with Foundation Models and Optimal Transport: UPC at IWSLT23	Ioannis Tsiamas et.al.	2306.01327v1	null
2023-06-01	Systematic Evaluation of GPT-3 for Zero-Shot Personality Estimation	Adithya V Ganesan et.al.	2306.01183v1	null
2023-06-01	TMI! Finetuned Models Leak Private Information from their Pretraining Data	John Abascal et.al.	2306.01181v1	null
2023-06-01	The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only	Guilherme Penedo et.al.	2306.01116v1	null
2023-06-01	Exploring the Versatility of Zero-Shot CLIP for Interstitial Lung Disease Classification	Cara Van Uden et.al.	2306.01111v1	null
2023-06-01	Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles	Chaitanya Ryali et.al.	2306.00989v1	link
2023-06-01	Continual Learning for Abdominal Multi-Organ and Tumor Segmentation	Yixiao Zhang et.al.	2306.00988v1	link
2023-06-01	StyleGAN knows Normal, Depth, Albedo, and More	Anand Bhattad et.al.	2306.00987v1	null
2023-06-02	Diffusion Self-Guidance for Controllable Image Generation	Dave Epstein et.al.	2306.00986v2	null
2023-06-01	Train Offline, Test Online: A Real Robot Learning Benchmark	Gaoyue Zhou et.al.	2306.00942v1	link
2023-06-01	STEVE-1: A Generative Model for Text-to-Behavior in Minecraft	Shalev Lifshitz et.al.	2306.00937v1	null
2023-06-01	"Let's not Quote out of Context": Unified Vision-Language Pretraining for Context Assisted Image Captioning	Abisek Rajakumar Kalarani et.al.	2306.00931v1	null
2023-06-01	Inserting Anybody in Diffusion Models via Celeb Basis	Ge Yuan et.al.	2306.00926v1	link
2023-06-01	Adapting a ConvNeXt model to audio classification on AudioSet	Thomas Pellegrini et.al.	2306.00830v1	null
2023-06-01	In or Out? Fixing ImageNet Out-of-Distribution Detection Evaluation	Julian Bitterwolf et.al.	2306.00826v1	link
2023-06-01	Too Large; Data Reduction for Vision-Language Pre-Training	Alex Jinpeng Wang et.al.	2305.20087v2	link
2023-05-31	Efficient Shapley Values Estimation by Amortization for Text Classification	Chenghao Yang et.al.	2305.19998v1	link
2023-06-01	A Global Context Mechanism for Sequence Labeling	Conglei Xu et.al.	2305.19928v2	link
2023-05-31	Structure-Aware Language Model Pretraining Improves Dense Retrieval on Structured Data	Xinze Li et.al.	2305.19912v1	link
2023-05-31	How Does Pretraining Improve Discourse-Aware Translation?	Zhihong Huang et.al.	2305.19847v1	null
2023-05-31	A Survey of Label-Efficient Deep Learning for 3D Point Clouds	Aoran Xiao et.al.	2305.19812v1	link
2023-05-31	Automatic Discrimination of Human and Neural Machine Translation in Multilingual Scenarios	Malina Chichirau et.al.	2305.19757v1	null
2023-05-31	Investigation of the Robustness of Neural Density Fields	Jonas Schuhmacher et.al.	2305.19698v1	null
2023-05-31	End-to-end Training of Deep Boltzmann Machines by Unbiased Contrastive Divergence with Local Mode Initialization	Shohei Taniguchi et.al.	2305.19684v1	link
2023-05-31	LAIT: Efficient Multi-Segment Encoding in Transformers with Layer-Adjustable Interaction	Jeremiah Milbauer et.al.	2305.19585v1	null
2023-05-30	Jointly Reparametrized Multi-Layer Adaptation for Efficient and Private Tuning	Umang Gupta et.al.	2305.19264v1	link
2023-05-30	DäRF: Boosting Radiance Fields from Sparse Inputs with Monocular Depth Adaptation	Jiuhn Song et.al.	2305.19201v1	null
2023-05-30	Strategic Reasoning with Language Models	Kanishk Gandhi et.al.	2305.19165v1	null
2023-05-30	LANCE: Stress-testing Visual Models by Generating Language-guided Counterfactual Images	Viraj Prabhu et.al.	2305.19164v1	null
2023-05-30	Together We Make Sense -- Learning Meta-Sense Embeddings from Pretrained Static Sense Embeddings	Haochen Luo et.al.	2305.19092v1	null
2023-05-30	Nested Diffusion Processes for Anytime Image Generation	Noam Elata et.al.	2305.19066v1	link
2023-05-30	Voice Conversion With Just Nearest Neighbors	Matthew Baas et.al.	2305.18975v1	link
2023-05-30	Prompt-based Tuning of Transformer Models for Multi-Center Medical Image Segmentation	Numan Saeed et.al.	2305.18948v1	null
2023-05-30	Empirical Sufficiency Lower Bounds for Language Modeling with Locally-Bootstrapped Semantic Structures	Jakob Prange et.al.	2305.18915v1	link
2023-05-30	Dissecting Chain-of-Thought: A Study on Compositional In-Context Learning of MLPs	Yingcong Li et.al.	2305.18869v1	null
2023-05-29	CommonAccent: Exploring Large Acoustic Pretrained Models for Accent Classification Based on Common Voice	Juan Zuluaga-Gomez et.al.	2305.18283v1	link
2023-05-29	Concept Decomposition for Visual Exploration and Inspiration	Yael Vinker et.al.	2305.18203v1	null
2023-05-29	Multiscale Positive-Unlabeled Detection of AI-Generated Texts	Yuchuan Tian et.al.	2305.18149v1	link
2023-05-29	Conditional Score Guidance for Text-Driven Image-to-Image Translation	Hyunsoo Lee et.al.	2305.18007v1	null
2023-05-29	Data Augmentation for Low-Resource Keyphrase Generation	Krishna Garg et.al.	2305.17968v1	link
2023-05-28	Transfer Learning for Power Outage Detection Task with Limited Training Data	Olukunle Owolabi et.al.	2305.17817v1	null
2023-05-28	Adapting Language-Audio Models as Few-Shot Audio Learners	Jinhua Liang et.al.	2305.17719v1	null
2023-05-28	Z-GMOT: Zero-shot Generic Multiple Object Tracking	Kim Hoang Tran et.al.	2305.17648v1	null
2023-05-30	Learning from Children: Improving Image-Caption Pretraining via Curriculum	Hammad A. Ayyubi et.al.	2305.17540v2	link
2023-05-27	Text-to-image Editing by Image Information Removal	Zhongping Zhang et.al.	2305.17489v1	null
2023-05-26	BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks	Kai Zhang et.al.	2305.17100v1	link
2023-05-26	Learning and Leveraging Verifiers to Improve Planning Capabilities of Pre-trained Language Models	Daman Arora et.al.	2305.17077v1	null
2023-05-26	Exploiting Abstract Meaning Representation for Open-Domain Question Answering	Cunxiang Wang et.al.	2305.17050v1	null
2023-05-26	Commonsense Knowledge Graph Completion Via Contrastive Pretraining and Node Clustering	Siwei Wu et.al.	2305.17019v1	null
2023-05-26	D-CALM: A Dynamic Clustering-based Active Learning Approach for Mitigating Bias	Sabit Hassan et.al.	2305.17013v1	null
2023-05-29	Three Towers: Flexible Contrastive Learning with Pretrained Image Models	Jannik Kossen et.al.	2305.16999v2	null
2023-05-26	Inverse Dynamics Pretraining Learns Good Representations for Multitask Imitation	David Brandfonbrener et.al.	2305.16985v1	null
2023-05-26	Compositional Generalization without Trees using Multiset Tagging and Latent Permutations	Matthias Lindemann et.al.	2305.16954v1	null
2023-05-26	On Evaluating Adversarial Robustness of Large Vision-Language Models	Yunqing Zhao et.al.	2305.16934v1	link
2023-05-26	Calibration of Transformer-based Models for Identifying Stress and Depression in Social Media	Loukas Ilias et.al.	2305.16797v1	null
2023-05-25	Parallel Sampling of Diffusion Models	Andy Shih et.al.	2305.16317v1	link
2023-05-25	Cross-Lingual Knowledge Distillation for Answer Sentence Selection in Low-Resource Languages	Shivanshu Gupta et.al.	2305.16302v1	null
2023-05-25	Diversify Your Vision Datasets with Automatic Diffusion-Based Augmentation	Lisa Dunlap et.al.	2305.16289v1	link
2023-05-25	ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation	Zhengyi Wang et.al.	2305.16213v1	link
2023-05-26	Diversity-Aware Coherence Loss for Improving Neural Topic Models	Raymond Li et.al.	2305.16199v2	link
2023-05-25	Explainability Techniques for Chemical Language Models	Stefan Hödl et.al.	2305.16192v1	link
2023-05-25	Language Models Implement Simple Word2Vec-style Vector Arithmetic	Jack Merullo et.al.	2305.16130v1	link
2023-05-25	Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data	Takafumi Moriya et.al.	2305.15971v1	null
2023-05-25	Latent Diffusion Model Based Foley Sound Generation System For DCASE Challenge 2023 Task 7	Yi Yuan et.al.	2305.15905v1	null
2023-05-25	On Architectural Compression of Text-to-Image Diffusion Models	Bo-Kyeong Kim et.al.	2305.15798v1	null
2023-05-24	What can generic neural networks learn from a child's visual experience?	A. Emin Orhan et.al.	2305.15372v1	null
2023-05-24	Solving Diffusion ODEs with Optimal Boundary Conditions for Better Image Super-Resolution	Yiyang Ma et.al.	2305.15357v1	null
2023-05-24	Visual Programming for Text-to-Image Generation and Evaluation	Jaemin Cho et.al.	2305.15328v1	null
2023-05-24	Self-Evolution Learning for Discriminative Language Model Pretraining	Qihuang Zhong et.al.	2305.15275v1	null
2023-05-24	Revisiting Token Dropping Strategy in Efficient BERT Pretraining	Qihuang Zhong et.al.	2305.15273v1	null
2023-05-24	ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers	Jingfeng Yao et.al.	2305.15272v1	link
2023-05-24	Rethinking the Evaluation Protocol of Domain Generalization	Han Yu et.al.	2305.15253v1	null
2023-05-24	L-CAD: Language-based Colorization with Any-level Descriptions	Zheng Chang et.al.	2305.15217v1	null
2023-05-24	Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator	Ziwei He et.al.	2305.15099v1	null
2023-05-24	Dynamic Masking Rate Schedules for MLM Pretraining	Zachary Ankner et.al.	2305.15096v1	null
2023-05-23	Video Prediction Models as Rewards for Reinforcement Learning	Alejandro Escontrela et.al.	2305.14343v1	null
2023-05-23	ConGraT: Self-Supervised Contrastive Pretraining for Joint Graph and Text Embeddings	William Brannon et.al.	2305.14321v1	link
2023-05-23	QLoRA: Efficient Finetuning of Quantized LLMs	Tim Dettmers et.al.	2305.14314v1	link
2023-05-23	Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining	Emanuele Bugliarello et.al.	2305.14281v1	null
2023-05-23	Masked Path Modeling for Vision-and-Language Navigation	Zi-Yi Dou et.al.	2305.14268v1	null
2023-05-24	DUBLIN -- Document Understanding By Language-Image Network	Kriti Aggarwal et.al.	2305.14218v2	null
2023-05-23	Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks	Tiedong Liu et.al.	2305.14201v1	null
2023-05-23	Accessing Higher Dimensions for Unsupervised Word Translation	Sida I. Wang et.al.	2305.14200v1	null
2023-05-23	Evaluating Factual Consistency of Summaries with Large Language Models	Shiqi Chen et.al.	2305.14069v1	link
2023-05-23	Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification	Sangmin Bae et.al.	2305.14032v1	link
2023-05-22	Language-Agnostic Bias Detection in Language Models	Abdullatif Köksal et.al.	2305.13302v1	null
2023-05-22	U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech	Xin Jing et.al.	2305.13195v1	null
2023-05-22	A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity	Shayne Longpre et.al.	2305.13169v1	null
2023-05-22	LMGQS: A Large-scale Dataset for Query-focused Summarization	Ruochen Xu et.al.	2305.13086v1	null
2023-05-22	Textually Pretrained Speech Language Models	Michael Hassid et.al.	2305.13009v1	null
2023-05-22	Rethinking Semi-supervised Learning with Language Models	Zhengxiang Shi et.al.	2305.13002v1	link
2023-05-22	Text-based Person Search without Parallel Image-Text Data	Yang Bai et.al.	2305.12964v1	null
2023-05-22	Farewell to Aimless Large-scale Pretraining: Influential Subset Selection for Language Model	Xiao Wang et.al.	2305.12816v1	null
2023-05-22	In-Context Learning of Large Language Models Explained as Kernel Regression	Chi Han et.al.	2305.12766v1	null
2023-05-22	LEAN: Light and Efficient Audio Classification Network	Shwetank Choudhary et.al.	2305.12712v1	null
2023-05-19	Neural Foundations of Mental Simulation: Future Prediction of Latent Representations on Dynamic Scenes	Aran Nayebi et.al.	2305.11772v1	null
2023-05-19	Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition	Siyuan Feng et.al.	2305.11569v1	null
2023-05-19	JOINEDTrans: Prior Guided Multi-task Transformer for Joint Optic Disc/Cup Segmentation and Fovea Detection	Huaqing He et.al.	2305.11504v1	null
2023-05-19	TreePrompt: Learning to Compose Tree Prompts for Explainable Visual Grounding	Chenchi Zhang et.al.	2305.11497v1	null
2023-05-19	ReDirTrans: Latent-to-Latent Translation for Gaze and Head Redirection	Shiwei Jin et.al.	2305.11452v1	null
2023-05-18	CHBias: Bias Evaluation and Mitigation of Chinese Conversational Language Models	Jiaxu Zhao et.al.	2305.11262v1	null
2023-05-18	Comparing Biases and the Impact of Multilingual Training across Multiple Languages	Sharon Levy et.al.	2305.11242v1	null
2023-05-18	LIMA: Less Is More for Alignment	Chunting Zhou et.al.	2305.11206v1	null
2023-05-18	ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities	Peng Wang et.al.	2305.11172v1	link
2023-05-18	Exploring the Carbon Footprint of Hugging Face's ML Models: A Repository Mining Study	Joel Castaño et.al.	2305.11164v1	null
2023-05-18	UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild	Can Qin et.al.	2305.11147v1	null
2023-05-18	mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences	David Uthus et.al.	2305.11129v1	null
2023-05-18	Generalized Planning in PDDL Domains with Pretrained Large Language Models	Tom Silver et.al.	2305.11014v1	link
2023-05-18	The Web Can Be Your Oyster for Improving Large Language Models	Junyi Li et.al.	2305.10998v1	null
2023-05-18	How does the task complexity of masked pretraining objectives affect downstream performance?	Atsuki Yamaguchi et.al.	2305.10992v1	link
2023-05-18	FLIGHT Mode On: A Feather-Light Network for Low-Light Image Enhancement	Mustafa Ozcan et.al.	2305.10889v1	null
2023-05-18	VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation	Wenjing Wang et.al.	2305.10874v1	null
2023-05-18	Semantically Aligned Task Decomposition in Multi-Agent Reinforcement Learning	Wenhao Li et.al.	2305.10865v1	null
2023-05-17	DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining	Sang Michael Xie et.al.	2305.10429v1	null
2023-05-17	What You See is What You Read? Improving Text-Image Alignment Evaluation	Michal Yarom et.al.	2305.10400v1	link
2023-05-17	OpenSLU: A Unified, Modularized, and Extensible Toolkit for Spoken Language Understanding	Libo Qin et.al.	2305.10231v1	link
2023-05-17	Stop Uploading Test Data in Plain Text: Practical Strategies for Mitigating Data Contamination by Evaluation Benchmarks	Alon Jacovi et.al.	2305.10160v1	null
2023-05-17	Selective Amnesia: A Continual Learning Approach to Forgetting in Deep Generative Models	Alvin Heng et.al.	2305.10120v1	null
2023-05-17	CWD30: A Comprehensive and Holistic Dataset for Crop Weed Recognition in Precision Agriculture	Talha Ilyas et.al.	2305.10084v1	null
2023-05-17	Dynamic Structural Brain Network Construction by Hierarchical Prototype Embedding GCN using T1-MRI	Yilin Leng et.al.	2305.10077v1	null
2023-05-17	Equivariant Few-Shot Learning from Pretrained Models	Sourya Basu et.al.	2305.09900v1	null
2023-05-16	The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation	Mutian He et.al.	2305.09652v1	null
2023-05-16	Concurrent Misclassification and Out-of-Distribution Detection for Semantic Segmentation via Energy-Based Normalizing Flow	Denis Gudovskiy et.al.	2305.09610v1	link
2023-05-16	An Empirical Study on Google Research Football Multi-agent Scenarios	Yan Song et.al.	2305.09458v1	link
2023-05-16	Consistent Multi-Granular Rationale Extraction for Explainable Multi-hop Fact Verification	Jiasheng Si et.al.	2305.09400v1	null
2023-05-16	Deep Ensembling for Perceptual Image Quality Assessment	Nisar Ahmed et.al.	2305.09141v1	null
2023-05-15	Self-Supervised Pretraining on Paired Sequences of fMRI Data for Transfer Learning to Brain Decoding Tasks	Sean Paulsen et.al.	2305.09057v1	null
2023-05-15	CLIP-VG: Self-paced Curriculum Adapting of CLIP via Exploiting Pseudo-Language Labels for Visual Grounding	Linhui Xiao et.al.	2305.08685v1	null
2023-05-15	DarkBERT: A Language Model for the Dark Side of the Internet	Youngjin Jin et.al.	2305.08596v1	null
2023-05-15	What's the Meaning of Superhuman Performance in Today's NLU?	Simone Tedeschi et.al.	2305.08414v1	null
2023-05-15	TESS: Text-to-Text Self-Conditioned Simplex Diffusion	Rabeeh Karimi Mahabadi et.al.	2305.08379v1	null
2023-05-15	"Nothing Abnormal": Disambiguating Medical Reports via Contrastive Knowledge Infusion	Zexue He et.al.	2305.08300v1	null
2023-05-15	From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models	Shangbin Feng et.al.	2305.08283v1	null
2023-05-14	FactKB: Generalizable Factuality Evaluation using Language Models Enhanced with Factual Knowledge	Shangbin Feng et.al.	2305.08281v1	null
2023-05-14	MatSci-NLP: Evaluating Scientific Language Models on Materials Science Language Tasks Using Text-to-Schema Modeling	Yu Song et.al.	2305.08264v1	link
2023-05-14	Evaluating the roughness of structure-property relationships using pretrained molecular representations	David E. Graff et.al.	2305.08238v1	null
2023-05-14	DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement	Hendrik Schröter et.al.	2305.08227v1	null
2023-05-12	Measuring Progress in Fine-grained Vision-and-Language Understanding	Emanuele Bugliarello et.al.	2305.07558v1	link
2023-05-12	Comprehensive Solution Program Centric Pretraining for Table-and-Text Hybrid Numerical Reasoning	Qianying Liu et.al.	2305.07475v1	null
2023-05-12	CLIP-Count: Towards Text-Guided Zero-Shot Object Counting	Ruixiang Jiang et.al.	2305.07304v1	link
2023-05-11	Simple Token-Level Confidence Improves Caption Correctness	Suzanne Petryk et.al.	2305.07021v1	null
2023-05-11	A General-Purpose Multilingual Document Encoder	Onur Galoğlu et.al.	2305.07016v1	link
2023-05-11	Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers	Dahun Kim et.al.	2305.07011v1	null
2023-05-11	Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks	Eshaan Nichani et.al.	2305.06986v1	null
2023-05-11	IUST_NLP at SemEval-2023 Task 10: Explainable Detecting Sexism with Transformers and Task-adaptive Pretraining	Hadiseh Mahmoudi et.al.	2305.06892v1	null
2023-05-11	Extending Audio Masked Autoencoders Toward Audio Restoration	Zhi Zhong et.al.	2305.06701v1	null
2023-05-11	WeditGAN: Few-shot Image Generation via Latent Space Relocation	Yuxuan Duan et.al.	2305.06671v1	null
2023-05-11	A First Look at LLM-Powered Generative News Recommendation	Qijiong Liu et.al.	2305.06566v1	link
2023-05-11	Undercover Deepfakes: Detecting Fake Segments in Videos	Sanjay Saha et.al.	2305.06564v1	link
2023-05-11	How Good are Commercial Large Language Models on African Languages?	Jessica Ojo et.al.	2305.06530v1	null
2023-05-10	Incorporating Structured Representations into Pretrained Vision & Language Models Using Scene Graphs	Roei Herzig et.al.	2305.06343v1	null
2023-05-10	XTab: Cross-table Pretraining for Tabular Transformers	Bingzhao Zhu et.al.	2305.06090v1	link
2023-05-10	A Survey of Deep Code Search	Yutao Xie et.al.	2305.05959v1	null
2023-05-10	Mover: Mask and Recovery based Facial Part Consistency Aware Method for Deepfake Video Detection	Juan Hu et.al.	2305.05943v1	null
2023-05-10	SHS-Net: Learning Signed Hyper Surfaces for Oriented Normal Estimation of Point Clouds	Qing Li et.al.	2305.05873v1	link
2023-05-10	Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? An Examination on Several Typical Tasks	Xianzhi Li et.al.	2305.05862v1	null
2023-05-10	Vārta: A Large-Scale Headline-Generation Dataset for Indic Languages	Rahul Aralikatte et.al.	2305.05858v1	link
2023-05-09	Region-based Contrastive Pretraining for Medical Image Retrieval with Anatomic Query	Ho Hin Lee et.al.	2305.05598v1	null
2023-05-09	Recursions Are All You Need: Towards Efficient Deep Unfolding Networks	Rawwad Alhejaili et.al.	2305.05505v1	link
2023-05-09	BadCS: A Backdoor Attack Framework for Code search	Shiyi Qi et.al.	2305.05503v1	null
2023-05-09	Exploiting Pseudo Image Captions for Multimodal Summarization	Chaoya Jiang et.al.	2305.05496v1	link
2023-05-09	What is the best recipe for character-level encoder-only modelling?	Kris Cao et.al.	2305.05461v1	null
2023-05-09	MSVQ: Self-Supervised Learning with Multiple Sample Views and Queues	Chen Peng et.al.	2305.05370v1	link
2023-05-09	A Framework for Designing Foundation Model based Systems	Qinghua Lu et.al.	2305.05352v1	null
2023-05-09	Application of Artificial Intelligence in the Classification of Microscopical Starch Images for Drug Formulation	Marvellous Ajala et.al.	2305.05321v1	null
2023-05-09	Robust Acoustic and Semantic Contextual Biasing in Neural Transducers for Speech Recognition	Xuandi Fu et.al.	2305.05271v1	null
2023-05-09	Boosting Visual-Language Models by Exploiting Hard Samples	Haonan Wang et.al.	2305.05208v1	null
2023-05-08	Toeplitz Neural Network for Sequence Modeling	Zhen Qin et.al.	2305.04749v1	link
2023-05-08	Enhancing Knowledge Graph Construction Using Large Language Models	Milena Trajanoska et.al.	2305.04676v1	null
2023-05-08	MultiTACRED: A Multilingual Version of the TAC Relation Extraction Dataset	Leonhard Hennig et.al.	2305.04582v1	link
2023-05-08	A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues	Yunxin Li et.al.	2305.04530v1	link
2023-05-08	SNT: Sharpness-Minimizing Network Transformation for Fast Compression-friendly Pretraining	Jung Hwan Heo et.al.	2305.04526v1	null
2023-05-08	Retriever and Ranker Framework with Probabilistic Hard Negative Sampling for Code Search	Hande Dong et.al.	2305.04508v1	null
2023-05-08	Token-level Fitting Issues of Seq2seq Models	Guangsheng Bao et.al.	2305.04493v1	null
2023-05-09	Vision Langauge Pre-training by Contrastive Learning with Cross-Modal Similarity Regulation	Chaoya Jiang et.al.	2305.04474v2	null
2023-05-08	Vision Transformer Off-the-Shelf: A Surprising Baseline for Few-Shot Class-Agnostic Counting	Zhicheng Wang et.al.	2305.04440v1	null
2023-05-08	Breaking Through the Haze: An Advanced Non-Homogeneous Dehazing Method based on Fast Fourier Convolution and ConvNeXt	Han Zhou et.al.	2305.04430v1	link
2023-05-05	Otter: A Multi-Modal Model with In-Context Instruction Tuning	Bo Li et.al.	2305.03726v1	null
2023-05-05	COLA: How to adapt vision-language models to Compose Objects Localized with Attributes?	Arijit Ray et.al.	2305.03689v1	null
2023-05-05	Retrieval Augmented Chest X-Ray Report Generation using OpenAI GPT models	Mercy Ranjit et.al.	2305.03660v1	null
2023-05-05	Data Curation for Image Captioning with Text-to-Image Generative Models	Wenyan Li et.al.	2305.03610v1	null
2023-05-05	DisenBooth: Disentangled Parameter-Efficient Tuning for Subject-Driven Text-to-Image Generation	Hong Chen et.al.	2305.03374v1	null
2023-05-05	HiPool: Modeling Long Documents Using Graph Neural Networks	Irene Li et.al.	2305.03319v1	link
2023-05-04	Chain-of-Skills: A Configurable Model for Open-domain Question Answering	Kaixin Ma et.al.	2305.03130v1	null
2023-05-04	Adversarially-Guided Portrait Matting	Sergej Chicherin et.al.	2305.02981v1	link
2023-05-04	End-to-end spoken language understanding using joint CTC loss and self-supervised, pretrained acoustic encoders	Jixuan Wang et.al.	2305.02937v1	null
2023-05-04	Forward-Forward Contrastive Learning	Md. Atik Ahamed et.al.	2305.02927v1	null
2023-05-04	DN at SemEval-2023 Task 12: Low-Resource Language Text Classification via Multilingual Pretrained Language Model Fine-tuning	Daniil Homskiy et.al.	2305.02607v1	null
2023-05-04	How to Choose Pretrained Handwriting Recognition Models for Single Writer Fine-Tuning	Vittorio Pippi et.al.	2305.02593v1	null
2023-05-03	Learning to Detect Novel and Fine-Grained Acoustic Sequences Using Pretrained Audio Representations	Vasudha Kowtha et.al.	2305.02382v1	null
2023-05-03	PeaCoK: Persona Commonsense Knowledge for Consistent and Engaging Narratives	Silin Gao et.al.	2305.02364v1	link
2023-05-03	Entity Tracking in Language Models	Najoung Kim et.al.	2305.02363v1	null
2023-05-03	Real-Time Radiance Fields for Single-Image Portrait View Synthesis	Alex Trevithick et.al.	2305.02310v1	null
2023-05-05	A Neural Divide-and-Conquer Reasoning Framework for Image Retrieval from Linguistically Complex Text	Yunxin Li et.al.	2305.02265v2	link
2023-05-03	Explaining Language Models' Predictions with High-Impact Concepts	Ruochen Zhao et.al.	2305.02160v1	null
2023-05-02	KEPLET: Knowledge-Enhanced Pretrained Language Model with Topic Entity Awareness	Yichuan Li et.al.	2305.01810v1	null
2023-05-02	Don't Stop Pretraining? Make Prompt-based Fine-tuning Powerful Learner	Zhengxiang Shi et.al.	2305.01711v1	link
2023-05-02	SIA-FTP: A Spoken Instruction Aware Flight Trajectory Prediction Framework	Dongyue Guo et.al.	2305.01661v1	null
2023-05-02	Unlimiformer: Long-Range Transformers with Unlimited Length Input	Amanda Bertsch et.al.	2305.01625v1	link
2023-05-02	A Study on the Integration of Pipeline and E2E SLU systems for Spoken Semantic Parsing toward STOP Quality Challenge	Siddhant Arora et.al.	2305.01620v1	null
2023-05-02	RadAdapt: Radiology Report Summarization via Lightweight Domain Adaptation of Large Language Models	Dave Van Veen et.al.	2305.01146v1	null
2023-05-01	Interpreting Pretrained Source-code Models using Neuron Redundancy Analyses	Arushi Sharma et.al.	2305.00875v1	null
2023-04-30	Transfer of knowledge among instruments in automatic music transcription	Michał Leś et.al.	2305.00426v1	null
2023-04-30	Cross-Shaped Windows Transformer with Self-supervised Pretraining for Clinically Significant Prostate Cancer Detection in Bi-parametric MRI	Yuheng Li et.al.	2305.00385v1	null
2023-04-29	LD-GAN: Low-Dimensional Generative Adversarial Network for Spectral Image Generation with Variance Regularization	Emmanuel Martinez et.al.	2305.00132v1	link
2023-04-28	Towards Better Domain Adaptation for Self-supervised Models: A Case Study of Child ASR	Ruchao Fan et.al.	2305.00115v1	null
2023-04-28	NLNDE at SemEval-2023 Task 12: Adaptive Pretraining and Source Language Selection for Low-Resource Multilingual Sentiment Analysis	Mingyang Wang et.al.	2305.00090v1	null
2023-04-28	Unsupervised Discovery of 3D Hierarchical Structure with Generative Diffusion Features	Nurislam Tursynbek et.al.	2305.00067v1	null
2023-04-28	CCpdf: Building a High Quality Corpus for Visually Rich Documents from Web Crawl Data	Michał Turski et.al.	2304.14953v1	link
2023-04-28	Made of Steel? Learning Plausible Materials for Components in the Vehicle Repair Domain	Annerose Eichel et.al.	2304.14745v1	link
2023-04-28	DIAMANT: Dual Image-Attention Map Encoders For Medical Image Segmentation	Yousef Yeganeh et.al.	2304.14571v1	null
2023-04-27	Greybox Penetration Testing on Cloud Access Control with IAM Modeling and Deep Reinforcement Learning	Yang Hu et.al.	2304.14540v1	null
2023-04-27	Gradient-based Maximally Interfered Retrieval for Domain Incremental 3D Object Detection	Barza Nisar et.al.	2304.14460v1	link
2023-04-27	We're Afraid Language Models Aren't Modeling Ambiguity	Alisa Liu et.al.	2304.14399v1	link
2023-04-27	UIO at SemEval-2023 Task 12: Multilingual fine-tuning for sentiment classification in low-resource languages	Egil Rønningstad et.al.	2304.14189v1	null
2023-04-27	Lightweight, Pre-trained Transformers for Remote Sensing Timeseries	Gabriel Tseng et.al.	2304.14065v1	link
2023-04-27	Retrieval-based Knowledge Augmented Vision Language Pre-training	Jiahua Rao et.al.	2304.13923v1	null
2023-04-27	Neural Keyphrase Generation: Analysis and Evaluation	Tuhin Kundu et.al.	2304.13883v1	null
2023-04-26	highway2vec -- representing OpenStreetMap microregions with respect to their road network characteristics	Kacper Leśniara et.al.	2304.13865v1	link
2023-04-26	A Deep Learning Framework for Verilog Autocompletion Towards Design and Verification Automation	Enrique Dehaerne et.al.	2304.13840v1	null
2023-04-26	Programmatically Grounded, Compositionally Generalizable Robotic Manipulation	Renhao Wang et.al.	2304.13826v1	null
2023-04-26	Translate to Disambiguate: Zero-shot Multilingual Word Sense Disambiguation with Pretrained Language Models	Haoqiang Kang et.al.	2304.13803v1	null
2023-04-26	Domain Adaptive and Generalizable Network Architectures and Training Strategies for Semantic Image Segmentation	Lukas Hoyer et.al.	2304.13615v1	link
2023-04-26	Tissue Classification During Needle Insertion Using Self-Supervised Contrastive Learning and Optical Coherence Tomography	Debayan Bhattacharya et.al.	2304.13574v1	null
2023-04-26	Self-Supervised Multi-Modal Sequential Recommendation	Kunzhe Song et.al.	2304.13277v1	null
2023-04-25	Towards Compute-Optimal Transfer Learning	Massimo Caccia et.al.	2304.13164v1	null
2023-04-25	Hypernymization of named entity-rich captions for grounding-based multi-modal pretraining	Giacomo Nebbia et.al.	2304.13130v1	null
2023-04-25	Pretrain on just structure: Understanding linguistic inductive biases using transfer learning	Isabel Papadimitriou et.al.	2304.13060v1	null
2023-04-25	On the Generalization of Learned Structured Representations	Andrea Dittadi et.al.	2304.13001v1	null
2023-04-25	CitePrompt: Using Prompts to Identify Citation Intent in Scientific Papers	Avishek Lahiri et.al.	2304.12730v1	link
2023-04-26	Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation	Junde Wu et.al.	2304.12620v2	null
2023-04-26	OFAR: A Multimodal Evidence Retrieval Framework for Illegal Live-streaming Identification	Lin Dengtian et.al.	2304.12608v2	null
2023-04-25	Model Conversion via Differentially Private Data-Free Distillation	Bochao Liu et.al.	2304.12528v1	null
2023-04-25	Hint-Aug: Drawing Hints from Foundation Vision Transformers Towards Boosted Few-Shot Parameter-Efficient Tuning	Zhongzhi Yu et.al.	2304.12520v1	null
2023-04-25	RenderDiffusion: Text Generation as Image Generation	Junyi Li et.al.	2304.12519v1	null
2023-04-24	PEFT-Ref: A Modular Reference Architecture and Typology for Parameter-Efficient Finetuning Techniques	Mohammed Sabry et.al.	2304.12410v1	null
2023-04-24	Generative Discovery of Novel Chemical Designs using Diffusion Modeling and Transformer Deep Neural Networks with Application to Deep Eutectic Solvents	Rachel K. Luu et.al.	2304.12400v1	null
2023-04-24	Uni-QSAR: an Auto-ML Tool for Molecular Property Prediction	Zhifeng Gao et.al.	2304.12239v1	null
2023-04-24	Deep Audio-Visual Singing Voice Transcription based on Self-Supervised Learning Models	Xiangming Gu et.al.	2304.12082v1	null
2023-04-24	Robust Tickets Can Transfer Better: Drawing More Transferable Subnetworks in Transfer Learning	Yonggan Fu et.al.	2304.11834v1	null
2023-04-22	Incomplete Multimodal Learning for Remote Sensing Data Fusion	Yuxing Chen et.al.	2304.11381v1	null
2023-04-22	Single-stage Multi-human Parsing via Point Sets and Center-based Offsets	Jiaming Chu et.al.	2304.11356v1	null
2023-04-22	Self-supervised Learning by View Synthesis	Shaoteng Liu et.al.	2304.11330v1	null
2023-04-22	EEE, Remediating the failure of machine learning models via a network-based optimization patch	Ruiyuan Kang et.al.	2304.11321v1	null
2023-04-21	Factored Neural Representation for Scene Understanding	Yu-Shiang Wong et.al.	2304.10950v1	null
2023-04-24	Text2Time: Transformer-based Article Time Period Prediction	Karthick Prasad Gunasekaran et.al.	2304.10859v2	null
2023-04-21	Rethinking Benchmarks for Cross-modal Image-text Retrieval	Weijing Chen et.al.	2304.10824v1	link
2023-04-21	Deep Multiview Clustering by Contrasting Cluster Assignments	Jie Chen et.al.	2304.10769v1	link
2023-04-20	MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models	Deyao Zhu et.al.	2304.10592v1	link
2023-04-20	Implicit Temporal Modeling with Learnable Alignment for Video Recognition	Shuyuan Tu et.al.	2304.10465v1	link
2023-04-20	Domain-specific Continued Pretraining of Language Models for Capturing Long Context in Mental Health	Shaoxiong Ji et.al.	2304.10447v1	null
2023-04-20	Movie Box Office Prediction With Self-Supervised and Visually Grounded Pretraining	Qin Chao et.al.	2304.10311v1	null
2023-04-20	OptoGPT: A Foundation Model for Inverse Design in Optical Multilayer Thin Film Structures	Taigao Ma et.al.	2304.10294v1	null
2023-04-20	PREIM3D: 3D Consistent Precise Image Attribute Editing from a Single Image	Jianhui Li et.al.	2304.10263v1	null
2023-04-20	Does Manipulating Tokenization Aid Cross-Lingual Transfer? A Study on POS Tagging for Non-Standardized Languages	Verena Blaschke et.al.	2304.10158v1	link
2023-04-19	DCN-T: Dual Context Network with Transformer for Hyperspectral Image Classification	Di Wang et.al.	2304.09915v1	link
2023-04-19	Domain Adaptable Self-supervised Representation Learning on Remote Sensing Satellite Imagery	Muskaan Chopra et.al.	2304.09874v1	link
2023-04-19	NetGPT: Generative Pretrained Transformer for Network Traffic	Xuying Meng et.al.	2304.09513v1	null
2023-04-20	Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes	Simran Arora et.al.	2304.09433v2	link
2023-04-18	UniMax: Fairer and more Effective Language Sampling for Large-Scale Multilingual Pretraining	Hyung Won Chung et.al.	2304.09151v1	null
2023-04-18	Decoding Neural Activity to Assess Individual Latent State in Ecologically Valid Contexts	Stephen M. Gordon et.al.	2304.09050v1	null
2023-04-18	Adapter Learning in Pretrained Feature Extractor for Continual Learning of Diseases	Wentao Zhang et.al.	2304.09042v1	null
2023-04-18	D2CSE: Difference-aware Deep continuous prompts for Contrastive Sentence Embeddings	Hyunjae Lee et.al.	2304.08991v1	null
2023-04-18	Deep Collective Knowledge Distillation	Jihyeon Seo et.al.	2304.08878v1	null
2023-04-18	Romanization-based Large-scale Adaptation of Multilingual Language Models	Sukannya Purkayastha et.al.	2304.08865v1	null
2023-04-19	Self-Supervised 3D Action Representation Learning with Skeleton Cloud Colorization	Siyuan Yang et.al.	2304.08799v2	null
2023-04-18	Sparks of GPTs in Edge Intelligence for Metaverse: Caching and Inference for Mobile AIGC Services	Minrui Xu et.al.	2304.08782v1	null
2023-04-17	Delving into Shape-aware Zero-shot Semantic Segmentation	Xinyu Liu et.al.	2304.08491v1	link
2023-04-17	BenchMD: A Benchmark for Modality-Agnostic Learning on Medical Images and Sensors	Kathryn Wantlin et.al.	2304.08486v1	link
2023-04-18	Latent-Shift: Latent Diffusion with Temporal Shift for Efficient Text-to-Video Generation	Jie An et.al.	2304.08477v2	null
2023-04-18	Inverse design of next-generation superconductors using data-driven deep generative models	Daniel Wines et.al.	2304.08446v2	null
2023-04-17	VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset	Sihan Chen et.al.	2304.08345v1	link
2023-04-17	Human Pose Estimation in Monocular Omnidirectional Top-View Images	Jingrui Yu et.al.	2304.08186v1	null
2023-04-17	DETRs Beat YOLOs on Real-time Object Detection	Wenyu Lv et.al.	2304.08069v1	link
2023-04-17	Self-Supervised Learning from Non-Object Centric Images with a Geometric Transformation Sensitive Architecture	Taeho Kim Jong-Min Lee et.al.	2304.08014v1	null
2023-04-17	Learning to "Segment Anything" in Thermal Infrared Images through Knowledge Distillation with a Large Scale Dataset SATIR	Junzhang Chen et.al.	2304.07969v1	link
2023-04-16	Sabiá: Portuguese Large Language Models	Ramon Pires et.al.	2304.07880v1	null
2023-04-14	DINOv2: Learning Robust Visual Features without Supervision	Maxime Oquab et.al.	2304.07193v1	link
2023-04-14	The Second Monocular Depth Estimation Challenge	Jaime Spencer et.al.	2304.07051v1	null
2023-04-14	MVP-SEG: Multi-View Prompt Learning for Open-Vocabulary Semantic Segmentation	Jie Guo et.al.	2304.06957v1	null
2023-04-14	Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved With Text	Wanrong Zhu et.al.	2304.06939v1	link
2023-04-14	3D Feature Prediction for Masked-AutoEncoder-Based Point Cloud Pretraining	Siming Yan et.al.	2304.06911v1	null
2023-04-14	Generating Adversarial Examples with Better Transferability via Masking Unimportant Parameters of Surrogate Model	Dingcheng Yang et.al.	2304.06908v1	null
2023-04-14	Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding	Yu-Qi Yang et.al.	2304.06906v1	null
2023-04-17	A Contrastive Method Based on Elevation Data for Remote Sensing with Scarce and High Level Semantic Labels	Omar A. Castaño-Idarraga et.al.	2304.06857v2	null
2023-04-13	Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study	Boxin Wang et.al.	2304.06762v1	link
2023-04-13	Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction	Hansheng Chen et.al.	2304.06714v1	null
2023-04-13	Verbs in Action: Improving verb understanding in video-language models	Liliane Momeni et.al.	2304.06708v1	null
2023-04-14	G2T: A Simple but Effective Framework for Topic Modeling based on Pretrained Language Model and Community Detection	Leihang Zhang et.al.	2304.06653v2	null
2023-04-13	Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation	Mohit Sharma et.al.	2304.06600v1	null
2023-04-12	RECLIP: Resource-efficient CLIP by Training with Small Images	Runze Li et.al.	2304.06028v1	null
2023-04-14	DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion	Johanna Karras et.al.	2304.06025v2	null
2023-04-12	HaDR: Applying Domain Randomization for Generating Synthetic Multimodal Dataset for Hand Instance Segmentation in Cluttered Industrial Environments	Stefan Grushko et.al.	2304.05826v1	null
2023-04-12	Impact of Pseudo Depth on Open World Object Segmentation with Minimal User Guidance	Robin Schön et.al.	2304.05716v1	null
2023-04-12	Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning	Nikhil Singh et.al.	2304.05600v1	null
2023-04-11	A surprisingly simple technique to control the pretraining bias for better transfer: Expand or Narrow your representation	Florian Bordes et.al.	2304.05369v1	null
2023-04-11	A Billion-scale Foundation Model for Remote Sensing Images	Keumgang Cha et.al.	2304.05215v1	null
2023-04-11	MRVM-NeRF: Mask-Based Pretraining for Neural Radiance Fields	Ganlin Yang et.al.	2304.04962v1	null
2023-04-11	Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference	Tao Lei et.al.	2304.04947v1	null
2023-04-10	Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition	Shuhuai Ren et.al.	2304.04704v1	link
2023-04-10	Transfer Learning for Low-Resource Sentiment Analysis	Razhan Hameed et.al.	2304.04703v1	link
2023-04-10	Attention at SemEval-2023 Task 10: Explainable Detection of Online Sexism (EDOS)	Debashish Roy et.al.	2304.04610v1	link
2023-04-10	hist2RNA: An efficient deep learning architecture to predict gene expression from breast cancer histopathology images	Raktim Kumar Mondol et.al.	2304.04507v1	null
2023-04-10	Instance Neural Radiance Field	Benran Hu et.al.	2304.04395v1	null
2023-04-10	Leveraging Neural Representations for Audio Manipulation	Scott H. Hawley et.al.	2304.04394v1	null
2023-04-10	Towards Real-time Text-driven Image Manipulation with Unconditional Diffusion Models	Nikita Starodubcev et.al.	2304.04344v1	link
2023-04-09	Pretrained Embeddings for E-commerce Machine Learning: When it Fails and Why?	Da Xu et.al.	2304.04330v1	null
2023-04-08	Unsupervised Story Discovery from Continuous News Streams via Scalable Thematic Embedding	Susik Yoon et.al.	2304.04099v1	null
2023-04-08	WikiGoldSK: Annotated Dataset, Baselines and Few-Shot Learning Experiments for Slovak Named Entity Recognition	Dávid Šuba et.al.	2304.04026v1	link
2023-04-07	Zero-shot CT Field-of-view Completion with Unconditional Generative Diffusion Prior	Kaiwen Xu et.al.	2304.03760v1	null
2023-04-10	Anomalous Sound Detection using Audio Representation with Machine ID based Contrastive Learning Pretraining	Jian Guan et.al.	2304.03588v2	null
2023-04-10	Graph Attention for Automated Audio Captioning	Feiyang Xiao et.al.	2304.03586v2	link
2023-04-07	Language-aware Multiple Datasets Detection Pretraining for DETRs	Jing Hao et.al.	2304.03580v1	null
2023-04-07	Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4	Hanmeng Liu et.al.	2304.03439v1	link
2023-04-06	RoSteALS: Robust Steganography using Autoencoder Latent Space	Tu Bui et.al.	2304.03400v1	link
2023-04-06	Self-Supervised Video Similarity Learning	Giorgos Kordopatis-Zilos et.al.	2304.03378v1	link
2023-04-06	Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting	Syed Talal Wasim et.al.	2304.03307v1	link
2023-04-06	Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention	Mingyu Ding et.al.	2304.03282v1	link
2023-04-06	When do you need Chain-of-Thought Prompting for ChatGPT?	Jiuhai Chen et.al.	2304.03262v1	null
2023-04-06	Zero-Shot Next-Item Recommendation using Large Pretrained Language Models	Lei Wang et.al.	2304.03153v1	null
2023-04-07	Geometric-aware Pretraining for Vision-centric 3D Object Detection	Linyan Huang et.al.	2304.03105v2	link
2023-04-06	Convolutional neural networks for crack detection on flexible road pavements	Hermann Tapamo et.al.	2304.02933v1	null
2023-04-06	Mask Detection and Classification in Thermal Face Images	Natalia Kowalczyk et.al.	2304.02931v1	link
2023-04-06	Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-commerce	Yang Jin et.al.	2304.02853v1	null
2023-04-06	Longitudinal Multimodal Transformer Integrating Imaging and Latent Clinical Signatures From Routine EHRs for Pulmonary Nodule Classification	Thomas Z. Li et.al.	2304.02836v1	null
2023-04-05	Bengali Fake Review Detection using Semi-supervised Generative Adversarial Networks	Md. Tanvir Rouf Shawon et.al.	2304.02739v1	null
2023-04-05	Exploring the Utility of Self-Supervised Pretraining Strategies for the Detection of Absent Lung Sliding in M-Mode Lung Ultrasound	Blake VanBerlo et.al.	2304.02724v1	null
2023-04-05	VicTR: Video-conditioned Text Representations for Activity Recognition	Kumara Kahatapitiya et.al.	2304.02560v1	null
2023-04-05	Deep Perceptual Similarity is Adaptable to Ambiguous Contexts	Gustav Grund Pihlgren et.al.	2304.02265v1	null
2023-04-05	Towards Efficient Task-Driven Model Reprogramming with Foundation Models	Shoukai Xu et.al.	2304.02263v1	null
2023-04-04	Pac-HuBERT: Self-Supervised Music Source Separation via Primitive Auditory Clustering and Hidden-Unit BERT	Ke Chen et.al.	2304.02160v1	null
2023-04-04	Optimal operating MR contrast for brain ventricle parcellation	Savannah P. Hays et.al.	2304.02056v1	null
2023-04-04	Online augmentation of learned grasp sequence policies for more adaptable and data-efficient in-hand manipulation	Ethan K. Gordon et.al.	2304.02052v1	null
2023-04-04	AToMiC: An Image/Text Retrieval Test Collection to Support Multimedia Content Creation	Jheng-Hong Yang et.al.	2304.01961v1	link
2023-04-04	Unsupervised Improvement of Factual Knowledge in Language Models	Nafis Sadeq et.al.	2304.01597v1	link
2023-04-03	Creating Custom Event Data Without Dictionaries: A Bag-of-Tricks	Andrew Halterman et.al.	2304.01331v1	link
2023-04-03	Burstormer: Burst Image Restoration and Enhancement Transformer	Akshay Dudhane et.al.	2304.01194v1	link
2023-04-03	ScandEval: A Benchmark for Scandinavian Natural Language Processing	Dan Saattrup Nielsen et.al.	2304.00906v1	link
2023-04-03	GreekBART: The First Pretrained Greek Sequence-to-Sequence Model	Iakovos Evdaimon et.al.	2304.00869v1	null
2023-04-03	Few-shot Fine-tuning is All You Need for Source-free Domain Adaptation	Suho Lee et.al.	2304.00792v1	link
2023-04-03	Multi-Modal Representation Learning with Text-Driven Soft Masks	Jaeyoo Park et.al.	2304.00719v1	null
2023-04-03	A Post-Training Framework for Improving Heterogeneous Graph Neural Networks	Cheng Yang et.al.	2304.00698v1	null
2023-04-02	PK-Chat: Pointer Network Guided Knowledge Driven Generative Dialogue Model	Cheng Deng et.al.	2304.00592v1	link
2023-04-02	DropMAE: Masked Autoencoders with Spatial-Attention Dropout for Tracking Tasks	Qiangqiang Wu et.al.	2304.00571v1	link
2023-04-02	Video Pretraining Advances 3D Deep Learning on Chest CT Tasks	Alexander Ke et.al.	2304.00546v1	link
2023-04-02	Instance-level Trojan Attacks on Visual Question Answering via Adversarial Learning in Neuron Activation Space	Yuwei Sun et.al.	2304.00436v1	null
2023-03-31	Procedure-Aware Pretraining for Instructional Video Understanding	Honglu Zhou et.al.	2303.18230v1	link
2023-03-31	Siamese DETR	Zeren Chen et.al.	2303.18144v1	null
2023-03-31	INoD: Injected Noise Discriminator for Self-Supervised Representation Learning in Agricultural Fields	Julia Hindel et.al.	2303.18101v1	null
2023-03-31	LaCViT: A Label-aware Contrastive Training Framework for Vision Transformers	Zijun Long et.al.	2303.18013v1	null
2023-03-31	Knowledge Distillation for Feature Extraction in Underwater VSLAM	Jinghe Yang et.al.	2303.17981v1	link
2023-03-31	Exploring the Limits of Deep Image Clustering using Pretrained Models	Nikolas Adaloglou et.al.	2303.17896v1	null
2023-03-30	Learning Garment DensePose for Robust Warping in Virtual Try-On	Aiyu Cui et.al.	2303.17688v1	null
2023-03-30	Whether and When does Endoscopy Domain Pretraining Make Sense?	Dominik Batić et.al.	2303.17636v1	null
2023-03-30	Anatomically aware dual-hop learning for pulmonary embolism detection in CT pulmonary angiograms	Florin Condrea et.al.	2303.17593v1	null
2023-03-30	DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder	Chenpng Du et.al.	2303.17550v1	null
2023-03-30	Finetuning from Offline Reinforcement Learning: Challenges, Trade-offs and Practical Solutions	Yicheng Luo et.al.	2303.17396v1	null
2023-03-30	A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision	Lucas Beyer et.al.	2303.17376v1	null
2023-03-30	PMatch: Paired Masked Image Modeling for Dense Geometric Matching	Shengjie Zhu et.al.	2303.17342v1	null
2023-03-30	Discriminative Class Tokens for Text-to-Image Diffusion Models	Idan Schwartz et.al.	2303.17155v1	null
2023-03-29	Transductive few-shot adapters for medical image segmentation	Julio Silva-Rodríguez et.al.	2303.17051v1	link
2023-03-29	AutoAD: Movie Description in Context	Tengda Han et.al.	2303.16899v1	link
2023-03-29	Towards Understanding the Effect of Pretraining Label Granularity	Guan Zhe Hong et.al.	2303.16887v1	null
2023-03-28	Training Language Models with Language Feedback at Scale	Jérémy Scheurer et.al.	2303.16755v1	null
2023-03-29	Visibility Aware Human-Object Interaction Tracking from Single RGB Camera	Xianghui Xie et.al.	2303.16479v1	null
2023-03-28	Variational Distribution Learning for Unsupervised Text-to-Image Generation	Minsoo Kang et.al.	2303.16105v1	null
2023-03-28	Soft-prompt tuning to predict lung cancer using primary care free-text Dutch medical notes	Auke Elfrink et.al.	2303.15846v1	null
2023-03-28	Instruct 3D-to-3D: Text Instruction Guided 3D-to-3D conversion	Hiromichi Kamata et.al.	2303.15780v1	null
2023-03-28	SVD-DIP: Overcoming the Overfitting Problem in DIP-based CT Reconstruction	Marco Nittscher et.al.	2303.15748v1	link
2023-03-28	Large-scale pretraining on pathological images for fine-tuning of small pathological benchmarks	Masataka Kawai et.al.	2303.15693v1	null
2023-03-28	Pre-training Transformers for Knowledge Graph Completion	Sanxing Chen et.al.	2303.15682v1	null
2023-03-28	StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing	Senmao Li et.al.	2303.15649v1	null
2023-03-27	Training-free Style Transfer Emerges from h-space in Diffusion models	Jaeseok Jeong et.al.	2303.15403v1	null
2023-03-27	Generalizable Neural Voxels for Fast Human Radiance Fields	Taoran Yi et.al.	2303.15387v1	null
2023-03-27	Improving Neural Topic Models with Wasserstein Knowledge Distillation	Suman Adhya et.al.	2303.15350v1	link
2023-03-27	Prompt-Guided Zero-Shot Anomaly Action Recognition using Pretrained Deep Skeleton Features	Fumiaki Sato et.al.	2303.15167v1	null
2023-03-27	Parameter Efficient Local Implicit Image Function Network for Face Segmentation	Mausoom Sarkar et.al.	2303.15122v1	null
2023-03-27	Adapting Pretrained Language Models for Solving Tabular Prediction Problems in the Electronic Health Record	Christopher McMaster et.al.	2303.14920v1	null
2023-03-27	Seer: Language Instructed Video Prediction with Latent Diffusion Models	Xianfan Gu et.al.	2303.14897v1	null
2023-03-25	Indian Language Summarization using Pretrained Sequence-to-Sequence Models	Ashok Urlana et.al.	2303.14461v1	null
2023-03-25	Sem4SAP: Synonymous Expression Mining From Open Knowledge Graph For Language Model Synonym-Aware Pretraining	Zhouhong Gu et.al.	2303.14425v1	null
2023-03-25	Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware Compression	Denis Kuznedelev et.al.	2303.14409v1	null
2023-03-27	Best of Both Worlds: Multimodal Contrastive Learning with Tabular and Imaging Data	Paul Hager et.al.	2303.14080v2	link
2023-03-24	Accelerating Vision-Language Pretraining with Free Language Modeling	Teng Wang et.al.	2303.14038v1	link
2023-03-24	SPEC: Summary Preference Decomposition for Low-Resource Abstractive Summarization	Yi-Syuan Chen et.al.	2303.14011v1	null
2023-03-24	Robust Test-Time Adaptation in Dynamic Scenarios	Longhui Yuan et.al.	2303.13899v1	link
2023-03-23	Three ways to improve feature alignment for open vocabulary detection	Relja Arandjelović et.al.	2303.13518v1	null
2023-03-23	Ablating Concepts in Text-to-Image Diffusion Models	Nupur Kumari et.al.	2303.13516v1	link
2023-03-23	A Closer Look at Model Adaptation using Feature Distortion and Simplicity Bias	Puja Trivedi et.al.	2303.13500v1	null
2023-03-23	The effectiveness of MAE pre-pretraining for billion-scale pretraining	Mannat Singh et.al.	2303.13496v1	null
2023-03-23	Increasing Textual Context Size Boosts Medical Image-Text Matching	Idan Glassberg et.al.	2303.13340v1	null
2023-03-23	Parameter-Efficient Sparse Retrievers and Rerankers using Adapters	Vaishali Pal et.al.	2303.13220v1	link
2023-03-23	Retrieval-Augmented Classification with Decoupled Representation	Xinnian Liang et.al.	2303.13065v1	link
2023-03-23	gDoc: Automatic Generation of Structured API Documentation	Shujun Wang et.al.	2303.13041v1	null
2023-03-23	MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models	Dohwan Ko et.al.	2303.13009v1	link
2023-03-22	JaCoText: A Pretrained Model for Java Code-Text Generation	Jessica López Espejel et.al.	2303.12869v1	null
2023-03-21	Affordance Diffusion: Synthesizing Hand-Object Interactions	Yufei Ye et.al.	2303.12538v1	null
2023-03-21	Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding	Morris Alper et.al.	2303.12513v1	link
2023-03-22	CLIP^2: Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data	Yihan Zeng et.al.	2303.12417v1	null
2023-03-21	Prompt-MIL: Boosting Multi-Instance Learning Schemes via Task-specific Prompt Tuning	Jingwei Zhang et.al.	2303.12214v1	null
2023-03-21	Toward Accurate Interpretable Predictions of Materials Properties within Transformer Language Models	Vadim Korolev et.al.	2303.12188v1	null
2023-03-21	MV-MR: multi-views and multi-representations for self-supervised learning and knowledge distillation	Vitaliy Kinakh et.al.	2303.12130v1	link
2023-03-21	Logical Reasoning over Natural Language as Knowledge Representation: A Survey	Zonglin Yang et.al.	2303.12023v1	null
2023-03-21	A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?	Chaoning Zhang et.al.	2303.11717v1	null
2023-03-21	Manipulating Transfer Learning for Property Inference	Yulong Tian et.al.	2303.11643v1	link
2023-03-21	Large AI Models in Health Informatics: Applications, Challenges, and the Future	Jianing Qiu et.al.	2303.11568v1	null
2023-03-20	eP-ALM: Efficient Perceptual Augmentation of Language Models	Mustafa Shukor et.al.	2303.11403v1	link
2023-03-20	Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding	Jihao Liu et.al.	2303.11325v1	null
2023-03-20	Conversation Modeling to Predict Derailment	Jiaqing Yuan et.al.	2303.11184v1	null
2023-03-20	Coreset Sampling from Open-Set for Fine-Grained Self-Supervised Learning	Sungnyun Kim et.al.	2303.11101v1	null
2023-03-20	Discovering Interpretable Directions in the Semantic Latent Space of Diffusion Models	René Haas et.al.	2303.11073v1	null
2023-03-20	Tubelet-Contrastive Self-Supervision for Video-Efficient Generalization	Fida Mohammad Thoker et.al.	2303.11003v1	null
2023-03-20	EMC2-Net: Joint Equalization and Modulation Classification based on Constellation Network	Hyun Ryu et.al.	2303.10934v1	link
2023-03-20	Exploring Representation Learning for Small-Footprint Keyword Spotting	Fan Cui et.al.	2303.10912v1	null
2023-03-21	Actionlet-Dependent Contrastive Learning for Unsupervised Skeleton-Based Action Recognition	Lilang Lin et.al.	2303.10904v2	null
2023-03-20	Character, Word, or Both? Revisiting the Segmentation Granularity for Chinese Pre-trained Language Models	Xinnian Liang et.al.	2303.10893v1	null
2023-03-20	A Global Model Approach to Robust Few-Shot SAR Automatic Target Recognition	Nathan Inkawhich et.al.	2303.10800v1	null
2023-03-17	Enhancing the Role of Context in Region-Word Alignment for Object Detection	Kyle Buettner et.al.	2303.10093v1	null
2023-03-17	DialogPaint: A Dialog-based Image Editing Model	Jingxuan Wei et.al.	2303.10073v1	null
2023-03-17	Breast Cancer Histopathology Image based Gene Expression Prediction using Spatial Transcriptomics data and Deep Learning	Md Mamunur Rahaman et.al.	2303.09987v1	null
2023-03-17	Dual-path Adaptation from Image to Video Transformers	Jungin Park et.al.	2303.09857v1	link
2023-03-17	CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos	Seungju Han et.al.	2303.09713v1	null
2023-03-16	VEIL: Vetting Extracted Image Labels from In-the-Wild Captions for Weakly-Supervised Object Detection	Arushi Rai et.al.	2303.09608v1	null
2023-03-16	DiffIR: Efficient Diffusion Model for Image Restoration	Bin Xia et.al.	2303.09472v1	null
2023-03-16	Team SheffieldVeraAI at SemEval-2023 Task 3: Mono and multilingual approaches for news genre, topic and persuasion technique classification	Ben Wu et.al.	2303.09421v1	null
2023-03-16	3D Masked Autoencoding and Pseudo-labeling for Domain Adaptive Segmentation of Heterogeneous Infant Brain MRI	Xuzhe Zhang et.al.	2303.09373v1	null
2023-03-16	StylerDALLE: Language-Guided Style Transfer Using a Vector-Quantized Tokenizer of a Large-Scale Generative Model	Zipeng Xu et.al.	2303.09268v1	link
2023-03-16	GridCLIP: One-Stage Object Detection by Grid-Level CLIP Representation Learning	Jiayi Lin et.al.	2303.09252v1	null
2023-03-16	Emotional Reaction Intensity Estimation Based on Multimodal Data	Shangfei Wang et.al.	2303.09167v1	null
2023-03-15	Deep Learning Weight Pruning with RMT-SVD: Increasing Accuracy and Reducing Overfitting	Yitzchak Shmalo et.al.	2303.08986v1	link
2023-03-15	Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement	Fartash Faghri et.al.	2303.08983v1	null
2023-03-15	PLEX: Making the Most of the Available Data for Robotic Manipulation Pretraining	Garrett Thomas et.al.	2303.08789v1	null
2023-03-15	2D and 3D CNN-Based Fusion Approach for COVID-19 Severity Prediction from 3D CT-Scans	Fares Bougourzi et.al.	2303.08740v1	link
2023-03-15	Mapping Urban Population Growth from Sentinel-2 MSI and Census Data Using Deep Learning: A Case Study in Kigali, Rwanda	Sebastian Hafner et.al.	2303.08511v1	link
2023-03-15	Task-specific Fine-tuning via Variational Information Bottleneck for Weakly-supervised Pathology Whole Slide Image Classification	Honglin Li et.al.	2303.08446v1	null
2023-03-15	Lana: A Language-Capable Navigator for Instruction Following and Generation	Xiaohan Wang et.al.	2303.08409v1	link
2023-03-15	SegPrompt: Using Segmentation Map as a Better Prompt to Finetune Deep Models for Kidney Stone Classification	Wei Zhu et.al.	2303.08303v1	null
2023-03-14	Contextualized Medication Information Extraction Using Transformer-based Deep Learning Architectures	Aokun Chen et.al.	2303.08259v1	null
2023-03-14	Diversity-Aware Meta Visual Prompting	Qidong Huang et.al.	2303.08138v1	link
2023-03-15	Eliciting Latent Predictions from Transformers with the Tuned Lens	Nora Belrose et.al.	2303.08112v2	link
2023-03-14	Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection	Jinchao Li et.al.	2303.08019v1	null
2023-03-14	A Theory of Emergent In-Context Learning as Implicit Structure Induction	Michael Hahn et.al.	2303.07971v1	null
2023-03-14	Edit-A-Video: Single Video Editing with Object-Aware Consistency	Chaehun Shin et.al.	2303.07945v1	null
2023-03-15	Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation	Junyoung Seo et.al.	2303.07937v2	null
2023-03-14	The Learnability of In-Context Learning	Noam Wies et.al.	2303.07895v1	null
2023-03-14	Geolocation Predicting of Tweets Using BERT-Based Models	Kateryna Lutsai et.al.	2303.07865v1	null
2023-03-14	Feature representations useful for predicting image memorability	Takumi Harada et.al.	2303.07679v1	null
2023-03-14	Variation of Gender Biases in Visual Recognition Models Before and After Finetuning	Jaspreet Ranjit et.al.	2303.07615v1	null
2023-03-13	Model-tuning Via Prompts Makes NLP Models Adversarially Robust	Mrigank Raman et.al.	2303.07320v1	null
2023-03-13	Vision-Language Models as Success Detectors	Yuqing Du et.al.	2303.07280v1	null
2023-03-13	InferFix: End-to-End Program Repair with LLMs	Matthew Jin et.al.	2303.07263v1	null
2023-03-13	PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents	Weixiong Lin et.al.	2303.07240v1	null
2023-03-13	AdaptiveNet: Post-deployment Neural Architecture Adaptation for Diverse Edge Environments	Hao Wen et.al.	2303.07129v1	null
2023-03-13	Generating multiple-choice questions for medical question answering with distractors and cue-masking	Damien Sileo et.al.	2303.07069v1	null
2023-03-14	Pretrained ViTs Yield Versatile Representations For Medical Images	Christos Matsoukas et.al.	2303.07034v2	link
2023-03-13	Self-supervised based general laboratory progress pretrained model for cardiovascular event detection	Li-Chin Chen et.al.	2303.06980v1	null
2023-03-14	Uni-RXN: A Unified Framework Bridging the Gap between Chemical Reaction Pretraining and Conditional Molecule Generation	Bo Qiang et.al.	2303.06965v2	link
2023-03-13	Contextually-rich human affect perception using multimodal scene information	Digbalay Bose et.al.	2303.06904v1	link
2023-03-10	Rewarding Chatbots for Real-World Engagement with Millions of Users	Robert Irvine et.al.	2303.06135v1	null
2023-03-13	Improving Domain-Invariance in Self-Supervised Learning via Batch Styles Standardization	Marin Scalbert et.al.	2303.06088v2	null
2023-03-10	MVImgNet: A Large-scale Dataset of Multi-view Images	Xianggang Yu et.al.	2303.06042v1	null
2023-03-10	Marginalia and machine learning: Handwritten text recognition for Marginalia Collections	Adam Axelsson et.al.	2303.05929v1	link
2023-03-10	Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection	Luting Wang et.al.	2303.05892v1	null
2023-03-10	3D Masked Autoencoders with Application to Anomaly Detection in Non-Contrast Enhanced Breast MRI	Daniel M. Lang et.al.	2303.05861v1	null
2023-03-10	Contrastive Language-Image Pretrained (CLIP) Models are Powerful Out-of-Distribution Detectors	Felix Michels et.al.	2303.05828v1	null
2023-03-10	Scaling Up 3D Kernels with Bayesian Frequency Re-parameterization for Medical Image Segmentation	Ho Hin Lee et.al.	2303.05785v1	null
2023-03-10	CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with Variational Alignment	Jiangbin Zheng et.al.	2303.05725v1	null
2023-03-10	MuLTI: Efficient Video-and-Language Understanding with MultiWay-Sampler and Multiple Choice Modeling	Jiaqi Xu et.al.	2303.05707v1	null
2023-03-09	FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis Using Self-Supervised Speech Representation Learning	Kazi Injamamul Haque et.al.	2303.05416v1	link
2023-03-09	Greener yet Powerful: Taming Large Code Generation Models with Quantization	Xiaokai Wei et.al.	2303.05378v1	null
2023-03-09	Can a Frozen Pretrained Language Model be used for Zero-shot Neural Retrieval on Entity-centric Questions?	Yasuto Hoshi et.al.	2303.05153v1	null
2023-03-08	Enhancing Low-resolution Face Recognition with Feature Similarity Knowledge Distillation	Sungho Shin et.al.	2303.04681v1	null
2023-03-08	Aberration-Aware Depth-from-Focus	Xinge Yang et.al.	2303.04654v1	null
2023-03-08	FastSurf: Fast Neural RGB-D Surface Reconstruction using Per-Frame Intrinsic Refinement and TSDF Fusion Prior Learning	Seunghwan Lee et.al.	2303.04508v1	null
2023-03-08	Onsets and Velocities: Affordable Real-Time Piano Transcription Using Convolutional Neural Networks	Andres Fernandez et.al.	2303.04485v1	link
2023-03-07	PSDNet: Determination of Particle Size Distributions Using Synthetic Soil Images and Convolutional Neural Networks	Javad Manashti et.al.	2303.04269v1	null
2023-03-07	Comparing PSDNet, pretrained networks, and traditional feature extraction for predicting the particle size distribution of granular materials from photographs	Javad Manashti et.al.	2303.04265v1	null
2023-03-09	Patch of Invisibility: Naturalistic Black-Box Adversarial Attacks on Object Detectors	Raz Lapid et.al.	2303.04238v2	null
2023-03-07	Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?	Boris Knyazev et.al.	2303.04143v1	link
2023-03-07	Foundation Models for Decision Making: Problems, Methods, and Opportunities	Sherry Yang et.al.	2303.04129v1	null
2023-03-07	CroCoSum: A Benchmark Dataset for Cross-Lingual Code-Switched Summarization	Ruochen Zhang et.al.	2303.04092v1	null
2023-03-07	Larger language models do in-context learning differently	Jerry Wei et.al.	2303.03846v1	null
2023-03-07	Lformer: Text-to-Image Generation with L-shape Block Parallel Decoding	Jiacheng Li et.al.	2303.03800v1	null
2023-03-07	Prediction of transonic flow over supercritical airfoils using geometric-encoding and deep-learning strategies	Zhiwen Deng et.al.	2303.03695v1	null
2023-03-07	AST-SED: An Effective Sound Event Detection Method Based on Audio Spectrogram Transformer	Kang Li et.al.	2303.03689v1	null
2023-03-06	Structured Kernel Estimation for Photon-Limited Deconvolution	Yash Sanghvi et.al.	2303.03472v1	link
2023-03-06	CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning	Hritik Bansal et.al.	[2303.03323v1](http://arxiv.org/abs/2303.03

Name		Name	Last commit message	Last commit date
Latest commit History 1,268 Commits
.github/workflows		.github/workflows
docs		docs
.gitignore		.gitignore
README.md		README.md
cv-arxiv-daily.json		cv-arxiv-daily.json
daily_arxiv.py		daily_arxiv.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Updated on 2024.07.16

pretrain

About

Releases

Packages

Languages

stoneyang/cv-arxiv-daily

Folders and files

Latest commit

History

Repository files navigation

Updated on 2024.07.16

pretrain

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages