Paper with Code Resources
Paper with Code Resources
Trending Papers of 2021
- ADOP: Approximate Differentiable One-Pixel Point Rendering — Rückert et al — https://paperswithcode.com/paper/adop-approximate-differentiable-one-pixel
- The Bayesian Learning Rule —Khan et al https://paperswithcode.com/paper/the-bayesian-learning-rule
- Program Synthesis with Large Language Models — Austin et al https://paperswithcode.com/paper/program-synthesis-with-large-language-models
- Masked Autoencoders Are Scalable Vision Learners — He et al https://paperswithcode.com/paper/masked-autoencoders-are-scalable-vision
- 8-bit Optimizers via Block-wise Quantization — Dettmers et al https://paperswithcode.com/paper/8-bit-optimizers-via-block-wise-quantization
- Revisiting ResNets: Improved Training and Scaling Strategies — Bello et al https://paperswithcode.com/paper/revisiting-resnets-improved-training-and
- Image Super-Resolution via Iterative Refinement — Saharia et al https://paperswithcode.com/paper/image-super-resolution-via-iterative
- Perceiver IO: A General Architecture for Structured Inputs & Outputs — Jaegle et al https://paperswithcode.com/paper/perceiver-io-a-general-architecture-for
- Do Vision Transformers See Like Convolutional Neural Networks? — Raghu et al https://paperswithcode.com/paper/do-vision-transformers-see-like-convolutional
- Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions — Niepert et al https://paperswithcode.com/paper/implicit-mle-backpropagating-through-discrete
Trending Libaries of 2021
- PyTorch Image Models — Ross Wightman — https://github.com/rwightman/pytorch-image-models
- Transformers — Hugging Face — https://github.com/huggingface/transformers
- PyTorch-GAN — Erik Linder-Norén — https://github.com/eriklindernoren/PyTorch-GAN
- MMDetection — OpenMMLab — https://github.com/open-mmlab/mmdetection
- Darknet — AlexeyAB — https://github.com/AlexeyAB/darknet
- Vision Transformer PyTorch — lucidrains — https://github.com/lucidrains/vit-pytorch
- InsightFace — DeepInsight — https://github.com/deepinsight/insightface
- Detectron2 — Meta AI — https://github.com/facebookresearch/detectron2
- PaddleOCR — PaddlePaddle — https://github.com/PaddlePaddle/PaddleOCR
- FairSeq — Meta AI — https://github.com/pytorch/fairseq
Top Dataset - 2021
- MATH — Hendrycks et al https://paperswithcode.com/dataset/math
- UAV-Human — Li et al https://paperswithcode.com/dataset/uav-human
- UPFD (User Preference-aware Fake News Detection) — Dou et al https://paperswithcode.com/dataset/upfd
- OGB-LSC (OGB Large-Scale Challenge) — Hu et al https://paperswithcode.com/dataset/ogb-lsc
- CodeXGLUE —Lu et al https://paperswithcode.com/dataset/codexglue
- AGORA — Patel et al https://paperswithcode.com/dataset/agora
- BEIR (Benchmarking IR) — Thakur et al https://paperswithcode.com/dataset/beir
- WikiGraphs — Wang et al https://paperswithcode.com/dataset/wikigraphs
- Few-NERD — Ding et al https://paperswithcode.com/dataset/few-nerd
- PASS (Pictures without humAns for Self-Supervision) —Asano et al https://paperswithcode.com/dataset/pass
Papers of 2022
- Controllable Animation of Fluid Elements in Still Images
- F-SfT: Shape-From-Template With A Physics-Based Deformation Model
- TWIST: Two-Way Inter-Label Self-Training for Semi-Supervised 3D Instance Segmentation
- Do Learned Representations Respect Causal Relationships?
- ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
- 3D Moments From Near-Duplicate Photos
- Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization
- Blind2Unblind: Self-Supervised Image Denoising With Visible Blind Spots
- Balanced and Hierarchical Relation Learning for One-Shot Object Detection
- NICE-SLAM: Neural Implicit Scalable Encoding for SLAM
- Stochastic Trajectory Prediction Via Motion Indeterminacy Diffusion
- CLRNet: Cross Layer Refinement Network for Lane Detection
- Motion-Aware Contrastive Video Representation Learning Via Foreground-Background Merging
- DINE: Domain Adaptation From Single and Multiple Black-Box Predictors
- FaceFormer: Speech-Driven 3D Facial Animation With Transformers
- Rotationally Equivariant 3D Object Detection
- Accelerating DETR Convergence Via Semantic-Aligned Matching
- Cloning Outfits From Real-World Images to 3D Characters for Generalizable Person Re-Identification
- GeoNeRF: Generalizing NeRF With Geometry Priors
- ABPN: Adaptive Blend Pyramid Network for Real-Time Local Retouching of Ultra High-Resolution Photo
- Expanding Low-Density Latent Regions for Open-Set Object Detection
- Uformer: A General U-Shaped Transformer for Image Restoration
- Exploring Dual-Task Correlation for Pose Guided Person Image Generation
- Portrait Eyeglasses and Shadow Removal By Leveraging 3D Synthetic Data
- Modeling 3D Layout for Group Re-Identification
- Toward Fast, Flexible, and Robust Low-Light Image Enhancement
- Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos
- HandOccNet: Occlusion-Robust 3D Hand Mesh Estimation Network
- Modular Action Concept Grounding in Semantic Video Prediction
- StyleSwin: Transformer-Based GAN for High-Resolution Image Generation
- Discrete Cosine Transform Network for Guided Depth Map Super-Resolution
- Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing
- TransGeo: Transformer Is All You Need for Cross-View Image Geo-Localization
- Contrastive Boundary Learning for Point Cloud Segmentation
- Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution
- CVNet: Contour Vibration Network for Building Extraction
- Swin Transformer V2: Scaling Up Capacity and Resolution
- Projective Manifold Gradient Layer for Deep Rotation Regression
- HCSC: Hierarchical Contrastive Selective Coding
- TransRank: Self-Supervised Video Representation Learning Via Ranking-Based Transformation Recognition
- DiSparse: Disentangled Sparsification for Multitask Model Compression
- Pushing The Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make A Difference
- Towards Efficient and Scalable Sharpness-Aware Minimization
- OSSO: Obtaining Skeletal Shape From Outside
- A Study on The Distribution of Social Biases in Self-Supervised Learning Visual Models
- Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes
- Comparing Correspondences: Video Prediction With Correspondence-Wise Losses
- Towards Fewer Annotations: Active Learning Via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation
- CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
- Few Shot Generative Model Adaption Via Relaxed Spatial Structural Alignment
- Enhancing Adversarial Training With Second-Order Statistics of Weights
- Dual Temperature Helps Contrastive Learning Without Many Negative Samples: Towards Understanding and Simplifying MoCo
- Moving Window Regression: A Novel Approach to Ordinal Regression
- Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection
- Robust Optimization As Data Augmentation for Large-Scale Graphs
- Robust Structured Declarative Classifiers for 3D Point Clouds: Defending Adversarial Attacks With Implicit Gradients
- Improving The Transferability of Targeted Adversarial Examples Through Object-Based Diverse Input
- ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer
- 360MonoDepth: High-Resolution 360deg Monocular Depth Estimation
- POCO: Point Convolution for Surface Reconstruction
- Neural Texture Extraction and Distribution for Controllable Person Image Synthesis
- Classification-Then-Grounding: Reformulating Video Scene Graphs As Temporal Bipartite Graphs
- DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis
- ZeroWaste Dataset: Towards Deformable Object Segmentation in Cluttered Scenes
- UNIST: Unpaired Neural Implicit Shape Translation Network
- APES: Articulated Part Extraction From Sprite Sheets
- SPAct: Self-Supervised Privacy Preservation for Action Recognition
- De-Rendering 3D Objects in The Wild
- Global Sensing and Measurements Reuse for Image Compressed Sensing
- Practical Evaluation of Adversarial Robustness Via Adaptive Auto Attack
- Cross-View Transformers for Real-Time Map-View Semantic Segmentation
- Controllable Dynamic Multi-Task Architectures
- FastDOG: Fast Discrete Optimization on GPU
- Focal and Global Knowledge Distillation for Detectors
- Learning To Prompt for Continual Learning
- Human Mesh Recovery From Multiple Shots
- Convolution of Convolution: Let Kernels Spatially Collaborate
- Make It Move: Controllable Image-to-Video Generation With Text Descriptions
- Neural Points: Point Cloud Representation With Neural Fields for Arbitrary Upsampling
- Video-Text Representation Learning Via Differentiable Weak Temporal Alignment
- Bi-Directional Object-Context Prioritization Learning for Saliency Ranking
- Vehicle Trajectory Prediction Works, But Not Everywhere
- MonoDTR: Monocular 3D Object Detection With Depth-Aware Transformer
- Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-Shot Learning
- Generalized Category Discovery
- Contour-Hugging Heatmaps for Landmark Detection
- Voxel Field Fusion for 3D Object Detection
- DisARM: Displacement Aware Relation Module for 3D Detection
- MixFormer: Mixing Features Across Windows and Dimensions
- FineDiving: A Fine-Grained Dataset for Procedure-Aware Action Quality Assessment
- HEAT: Holistic Edge Attention Transformer for Structured Reconstruction
- Mobile-Former: Bridging MobileNet and Transformer
- CycleMix: A Holistic Strategy for Medical Image Segmentation From Scribble Supervision
- VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution
- Towards End-to-End Unified Scene Text Detection and Layout Analysis
- AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation
- ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior
- End-to-End Referring Video Object Segmentation With Multimodal Transformers
- IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo
- Not All Points Are Equal: Learning Highly Efficient Point-Based Detectors for 3D LiDAR Point Clouds
- Detecting Camouflaged Object in Frequency Domain
- SelfRecon: Self Reconstruction Your Digital Avatar From Monocular Video
- Equivariant Point Cloud Analysis Via Learning Orientations for Message Passing
- Node Representation Learning in Graph Via Node-to-Neighbourhood Mutual Information Maximization
- Semi-Supervised Video Semantic Segmentation With Inter-Frame Feature Reconstruction
- Amodal Segmentation Through Out-of-Task and Out-of-Distribution Generalization With A Bayesian Model
- How Well Do Sparse ImageNet Models Transfer?
- REX: Reasoning-Aware and Grounded Explanation
- Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes
- Object-Aware Video-Language Pre-Training for Retrieval
- MAT: Mask-Aware Transformer for Large Hole Image Inpainting
- Align and Prompt: Video-and-Language Pre-Training With Entity Prompts
- MSG-Transformer: Exchanging Local Spatial Information By Manipulating Messenger Tokens
- Cross Modal Retrieval With Querybank Normalisation
- Ray3D: Ray-Based 3D Human Pose Estimation for Monocular Absolute 3D Localization
- ASM-Loc: Action-Aware Segment Modeling for Weakly-Supervised Temporal Action Localization
- Scaling Up Your Kernels to 31×31: Revisiting Large Kernel Design in CNNs
- End-to-End Multi-Person Pose Estimation With Transformers
- REGTR: End-to-End Point Cloud Correspondences With Transformers
- Neural 3D Scene Reconstruction With The Manhattan-World Assumption
- V2C: Visual Voice Cloning
- Revisiting AP Loss for Dense Object Detection: Adaptive Ranking Pair Selection
- MAD: A Scalable Dataset for Language Grounding in Videos From Movie Audio Descriptions
- Gait Recognition in The Wild With Dense 3D Representations and A Benchmark
- ArtiBoost: Boosting Articulated 3D Hand-Object Pose Estimation Via Online Exploration and Synthesis
- QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small Object Detection
- IDEA-Net: Dynamic 3D Point Cloud Interpolation Via Deep Embedding Alignment
- BEHAVE: Dataset and Method for Tracking Human Object Interactions
- Revisiting Random Channel Pruning for Neural Network Compression
- Generating Diverse and Natural 3D Human Motions From Text
- E-CIR: Event-Enhanced Continuous Intensity Recovery
- Towards Robust Rain Removal Against Adversarial Attacks: A Comprehensive Benchmark Analysis and Beyond
- Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation
- AziNorm: Exploiting The Radial Symmetry of Point Cloud for Azimuth-Normalized 3D Perception
- Weakly Supervised Rotation-Invariant Aerial Object Detection Network
- Surface Reconstruction From Point Clouds By Learning Predictive Context Priors
- IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes
- DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation
- Weakly Supervised Temporal Action Localization Via Representative Snippet Knowledge Propagation
- E2EC: An End-to-End Contour-Based Method for High-Quality High-Speed Instance Segmentation
- BatchFormer: Learning To Explore Sample Relationships for Robust Representation Learning
- Self-Supervised Image-Specific Prototype Exploration for Weakly Supervised Semantic Segmentation
- Learning Multi-View Aggregation in The Wild for Large-Scale 3D Semantic Segmentation
- PIE-Net: Photometric Invariant Edge Guided Network for Intrinsic Image Decomposition
- Clothes-Changing Person Re-Identification With RGB Modality Only
- Robust Image Forgery Detection Over Online Social Network Shared Images
- Representation Compensation Networks for Continual Semantic Segmentation
- Tracking People By Predicting 3D Appearance, Location and Pose
- Text2Mesh: Text-Driven Neural Stylization for Meshes
- C-CAM: Causal CAM for Weakly Supervised Semantic Segmentation on Medical Image
- Forward Compatible Few-Shot Class-Incremental Learning
- Weakly Supervised Object Localization As Domain Adaption
- Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation
- Deep Orientation-Aware Functional Maps: Tackling Symmetry Issues in Shape Matching
- Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation
- MatteFormer: Transformer-Based Image Matting Via Prior-Tokens
- Video Shadow Detection Via Spatio-Temporal Interpolation Consistency Training
- Robust and Accurate Superquadric Recovery: A Probabilistic Approach
- Grounding Answers for Visual Questions Asked By Visually Impaired People
- Sparse Instance Activation for Real-Time Instance Segmentation
- VisualGPT: Data-Efficient Adaptation of Pretrained Language Models for Image Captioning
- MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation
- Surface-Aligned Neural Radiance Fields for Controllable 3D Human Synthesis
- Towards Implicit Text-Guided 3D Shape Generation
- SoftCollage: A Differentiable Probabilistic Tree Generator for Image Collage
- Query and Attention Augmentation for Knowledge-Based Explainable Reasoning
- Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality
- Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection
- Fine-Grained Object Classification Via Self-Supervised Pose Alignment
- Animal Kingdom: A Large and Diverse Dataset for Animal Behavior Understanding
- Fine-Grained Temporal Contrastive Learning for Weakly-Supervised Temporal Action Localization
- Relieving Long-Tailed Instance Segmentation Via Pairwise Class Balance
- Online Convolutional Re-Parameterization
- Mimicking The Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning
- RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition
- Personalized Image Aesthetics Assessment With Rich Attributes
- Part-Based Pseudo Label Refinement for Unsupervised Person Re-Identification
- HDNet: High-Resolution Dual-Domain Learning for Spectral Compressive Imaging
- OW-DETR: Open-World Detection Transformer
- Learning Deep Implicit Functions for 3D Shapes With Dynamic Code Clouds
- Reversible Vision Transformers
- Amodal Panoptic Segmentation
- Correlation Verification for Image Retrieval
- Temporal Feature Alignment and Mutual Information Maximization for Video-Based Human Pose Estimation
- Self-Supervised Transformers for Unsupervised Object Discovery Using Normalized Cut
- Exploring Structure-Aware Transformer Over Interaction Proposals for Human-Object Interaction Detection
- Decoupled Multi-Task Learning With Cyclical Self-Regulation for Face Parsing
- Glass: Geometric Latent Augmentation for Shape Spaces
- DPICT: Deep Progressive Image Compression Using Trit-Planes
- Text to Image Generation With Semantic-Spatial Aware GAN
- Generalizable Cross-Modality Medical Image Segmentation Via Style Augmentation and Dual Normalization
- Learning To Prompt for Open-Vocabulary Object Detection With Vision-Language Model
- Interactive Segmentation and Visualization for Tiny Objects in Multi-Megapixel Images
- Neural MoCon: Neural Motion Control for Physically Plausible Human Motion Capture
- Surface Representation for Point Clouds
- Implicit Motion Handling for Video Camouflaged Object Detection
- DeepLIIF: An Online Platform for Quantification of Clinical Pathology Slides
- Learning With Twin Noisy Labels for Visible-Infrared Person Re-Identification
- Optical Flow Estimation for Spiking Camera
- GradViT: Gradient Inversion of Vision Transformers
- Spatial-Temporal Space Hand-in-Hand: Spatial-Temporal Video Super-Resolution Via Cycle-Projected Mutual Learning
- Joint Global and Local Hierarchical Priors for Learned Image Compression
- Knowledge Distillation Via The Target-Aware Transformer
- Subspace Adversarial Training
- 3D-VField: Adversarial Augmentation of Point Clouds for Domain Generalization in 3D Object Detection
- Image Segmentation Using Text and Image Prompts
- AutoMine: An Unmanned Mine Dataset
- Background Activation Suppression for Weakly Supervised Object Localization
- Synthetic Generation of Face Videos With Plethysmograph Physiology
- Hallucinated Neural Radiance Fields in The Wild
- Global Tracking Transformers
- Backdoor Attacks on Self-Supervised Learning
- GMFlow: Learning Optical Flow Via Global Matching
- Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation
- Explore Spatio-Temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline
- Graph-Based Spatial Transformer With Memory Replay for Multi-Future Pedestrian Trajectory Prediction
- Scanline Homographies for Rolling-Shutter Plane Absolute Pose
- AdaInt: Learning Adaptive Intervals for 3D Lookup Tables on Real-Time Image Enhancement
- Recurrent Glimpse-Based Decoder for Detection With Transformer
- SimMIM: A Simple Framework for Masked Image Modeling
- Label Matching Semi-Supervised Object Detection
- RegionCLIP: Region-Based Language-Image Pretraining
- Video Frame Interpolation Transformer
- BCOT: A Markerless High-Precision 3D Object Tracking Benchmark
- Omni-DETR: Omni-Supervised Object Detection With Transformers
- Transferable Sparse Adversarial Attack
- CREAM: Weakly Supervised Object Localization Via Class RE-Activation Mapping
- VALHALLA: Visual Hallucination for Machine Translation
- HINT: Hierarchical Neuron Concept Explainer
- Neural Face Identification in A 2D Wireframe Projection of A Manifold Object
- Nonuniform-to-Uniform Quantization: Towards Accurate Quantization Via Generalized Straight-Through Estimation
- An Empirical Study of End-to-End Temporal Action Detection
- Object Localization Under Single Coarse Point Supervision
- Unsupervised Learning of Accurate Siamese Tracking
- Non-Parametric Depth Distribution Modelling Based Depth Inference for Multi-View Stereo
- Equalized Focal Loss for Dense Long-Tailed Object Detection
- DeepDPM: Deep Clustering With An Unknown Number of Clusters
- ISDNet: Integrating Shallow and Deep Networks for Efficient Ultra-High Resolution Segmentation
- Unsupervised Domain Adaptation for Nighttime Aerial Tracking
- RestoreFormer: High-Quality Blind Face Restoration From Undegraded Key-Value Pairs
- Mask-Guided Spectral-Wise Transformer for Efficient Hyperspectral Image Reconstruction
- A Variational Bayesian Method for Similarity Learning in Non-Rigid Image Registration
- Not Just Selection, But Exploration: Online Class-Incremental Continual Learning Via Dual View Consistency
- Coupling Vision and Proprioception for Navigation of Legged Robots
- Exploiting Rigidity Constraints for LiDAR Scene Flow Estimation
- EMOCA: Emotion Driven Monocular Face Capture and Animation
- Quarantine: Sparsity Can Uncover The Trojan Attack Trigger for Free
- AlignQ: Alignment Quantization With ADMM-Based Correlation Preservation
- Interactive Multi-Class Tiny-Object Detection
- Learning From Pixel-Level Noisy Label: A New Perspective for Light Field Saliency Detection
- Multi-View Depth Estimation By Fusing Single-View Depth Probability With Multi-View Geometry
- Slimmable Domain Adaptation
- High-Resolution Image Harmonization Via Collaborative Dual Transformations
- MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation
- Self-Supervised Neural Articulated Shape and Appearance Models
- Topology Preserving Local Road Network Estimation From Single Onboard Camera Image
- Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes
- SwinTextSpotter: Scene Text Spotting Via Better Synergy Between Text Detection and Text Recognition
- Deblur-NeRF: Neural Radiance Fields From Blurry Images
- Whose Track Is It Anyway? Improving Robustness to Tracking Errors With Affinity-Based Trajectory Prediction
- Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation
- Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning
- Blind Image Super-Resolution With Elaborate Degradation Modeling on Noise and Kernel
- Faithful Extreme Rescaling Via Generative Prior Reciprocated Invertible Representations
- Proto2Proto: Can You Recognize The Car, The Way I Do?
- TVConv: Efficient Translation Variant Convolution for Layout-Aware Visual Processing
- Dual Adversarial Adaptation for Cross-Device Real-World Image Super-Resolution
- Habitat-Web: Learning Embodied Object-Search Strategies From Human Demonstrations at Scale
- Simple But Effective: CLIP Embeddings for Embodied AI
- NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition
- Collaborative Transformers for Grounded Situation Recognition
- CPPF: Towards Robust Category-Level 9D Pose Estimation in The Wild
- Continual Test-Time Domain Adaptation
- Dynamic MLP for Fine-Grained Image Classification By Leveraging Geographical and Temporal Information
- MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-Based Visual Question Answering
- Fair Contrastive Learning for Facial Attribute Classification
- Directional Self-Supervised Learning for Heavy Image Augmentations
- No-Reference Point Cloud Quality Assessment Via Domain Adaptation
- Comprehending and Ordering Semantics for Image Captioning
- A Large-Scale Comprehensive Dataset and Copy-Overlap Aware Evaluation Protocol for Segment-Level Video Copy Detection
- Label Relation Graphs Enhanced Hierarchical Residual Network for Hierarchical Multi-Granularity Classification
- HeadNeRF: A Real-Time NeRF-Based Parametric Head Model
- Occlusion-Robust Face Alignment Using A Viewpoint-Invariant Hierarchical Network Architecture
- IDR: Self-Supervised Image Denoising Via Iterative Data Refinement
- MogFace: Towards A Deeper Appreciation on Face Detection
- Learning Affinity From Attention: End-to-End Weakly-Supervised Semantic Segmentation With Transformers
- CamLiFlow: Bidirectional Camera-LiDAR Fusion for Joint Optical Flow and Scene Flow Estimation
- FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos
- Learning To Detect Mobile Objects From LiDAR Scans Without Labels
- WildNet: Learning Domain Generalized Semantic Segmentation From The Wild
- DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection
- Point-to-Voxel Knowledge Distillation for LiDAR Semantic Segmentation
- Generating Diverse 3D Reconstructions From A Single Occluded Face Image
- Stand-Alone Inter-Frame Attention in Video Models
- Large-Scale Pre-Training for Person Re-Identification With Noisy Labels
- Semantic Segmentation By Early Region Proxy
- LD-ConGR: A Large RGB-D Video Dataset for Long-Distance Continuous Gesture Recognition
- HVH: Learning A Hybrid Neural Volumetric Representation for Dynamic Hair Performance Capture
- Rethinking Visual Geo-Localization for Large-Scale Applications
- The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy
- ViM: Out-of-Distribution With Virtual-Logit Matching
- Class-Aware Contrastive Semi-Supervised Learning
- Ditto: Building Digital Twins of Articulated Objects From Interaction
- Adaptive Early-Learning Correction for Segmentation From Noisy Annotations
- Cross-Domain Correlation Distillation for Unsupervised Domain Adaptation in Nighttime Semantic Segmentation
- RSTT: Real-Time Spatial Temporal Transformer for Space-Time Video Super-Resolution
- Partial Class Activation Attention for Semantic Segmentation
- Multi-Scale Memory-Based Video Deblurring
- A Scalable Combinatorial Solver for Elastic Geometrically Consistent 3D Shape Matching
- Geometric Structure Preserving Warp for Natural Image Stitching
- GOAL: Generating 4D Whole-Body Motion for Hand-Object Grasping
- Conditional Prompt Learning for Vision-Language Models
- Graph Sampling Based Deep Metric Learning for Generalizable Person Re-Identification
- Undoing The Damage of Label Shift for Cross-Domain Semantic Segmentation
- FisherMatch: Semi-Supervised Rotation Regression Via Entropy-Based Filtering
- Affine Medical Image Registration With Coarse-To-Fine Vision Transformer
- A Differentiable Two-Stage Alignment Scheme for Burst Image Reconstruction With Large Shift
- Deformable ProtoPNet: An Interpretable Image Classifier Using Deformable Prototypes
- Restormer: Efficient Transformer for High-Resolution Image Restoration
- IFRNet: Intermediate Feature Refine Network for Efficient Frame Interpolation
- Large Loss Matters in Weakly Supervised Multi-Label Classification
- Neural Inertial Localization
- GraftNet: Towards Domain Generalized Stereo Matching With A Broad-Spectrum and Task-Oriented Feature
- VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning
- Catching Both Gray and Black Swans: Open-Set Supervised Anomaly Detection
- MLSLT: Towards Multilingual Sign Language Translation
- Towards An End-to-End Framework for Flow-Guided Video Inpainting
- Contrastive Test-Time Adaptation
- MotionAug: Augmentation With Physical Correction for Human Motion Prediction
- Modeling Indirect Illumination for Inverse Rendering
- TransWeather: Transformer-Based Restoration of Images Degraded By Adverse Weather Conditions
- H2FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-Domain Weakly Supervised Object Detection
- P3Depth: Monocular Depth Estimation With A Piecewise Planarity Prior
- GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection
- Simple Multi-Dataset Detection
- Proactive Image Manipulation Detection
- StyTr2: Image Style Transfer With Transformers
- Global Matching With Overlapping Attention for Optical Flow Estimation
- Language As Queries for Referring Video Object Segmentation
- MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
- Audio-Visual Generalised Zero-Shot Learning With Cross-Modal Attention and Language
- Rethinking Efficient Lane Detection Via Curve Modeling
- Self-Supervised Arbitrary-Scale Point Clouds Upsampling Via Implicit Neural Representation
- Co-Advise: Cross Inductive Bias Distillation
- AdaMixer: A Fast-Converging Query-Based Object Detector
- DTFD-MIL: Double-Tier Feature Distillation Multiple Instance Learning for Histopathology Whole Slide Image Classification
- BEVT: BERT Pretraining of Video Transformers
- Deep Generalized Unfolding Networks for Image Restoration
- VISOLO: Grid-Based Space-Time Aggregation for Efficient Online Video Instance Segmentation
- Deep Unlearning Via Randomized Conditionally Independent Hessians
- Revisiting Skeleton-Based Action Recognition
- Stereo Depth From Events Cameras: Concentrate and Focus on The Future
- A Simple Data Mixing Prior for Improving Self-Supervised Learning
- Knowledge Distillation As Efficient Pre-Training: Faster Convergence, Higher Data-Efficiency, and Better Transferability
- BigDL 2.0: Seamless Scaling of AI Pipelines From Laptops to Distributed Cluster
- Attentive Fine-Grained Structured Sparsity for Image Restoration
- Learning Fair Classifiers With Partially Annotated Group Labels
- NightLab: A Dual-Level Architecture With Hardness Detection for Segmentation at Night
- Constrained Few-Shot Class-Incremental Learning
- Threshold Matters in WSSS: Manipulating The Activation for The Robust and Accurate Segmentation Model Against Thresholds
- TransMVSNet: Global Context-Aware Multi-View Stereo Network With Transformers
- DPGEN: Differentially Private Generative Energy-Guided Network for Natural Image Synthesis
- The Majority Can Help The Minority: Context-Rich Minority Oversampling for Long-Tailed Classification
- IntentVizor: Towards Generic Query Guided Interactive Video Summarization
- Shape-Invariant 3D Adversarial Point Clouds
- Bootstrapping ViTs: Towards Liberating Vision Transformers From Pre-Training
- PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents
- Meta-Attention for ViT-Backed Continual Learning
- DST: Dynamic Substitute Training for Data-Free Black-Box Attack
- Unified Contrastive Learning in Image-Text-Label Space
- Unsupervised Pre-Training for Temporal Action Localization Tasks
- Look Outside The Room: Synthesizing A Consistent Long-Term 3D Scene Video From A Single Image
- High-Fidelity Human Avatars From A Single RGB Camera
- Multiview Transformers for Video Recognition
- How Good Is Aesthetic Ability of A Fashion Model?
- Deformation and Correspondence Aware Unsupervised Synthetic-to-Real Scene Flow Estimation for Point Clouds
- Sequential Voting With Relational Box Fields for Active Object Detection
- Semantic-Aware Auto-Encoders for Self-Supervised Representation Learning
- Consistency Learning Via Decoding Path Augmentation for Transformers in Human Object Interaction Detection
- Consistent Explanations By Contrastive Learning
- Hierarchical Modular Network for Video Captioning
- Depth Estimation By Combining Binocular Stereo and Monocular Structured-Light
- Salient-to-Broad Transition for Video Person Re-Identification
- DeeCap: Dynamic Early Exiting for Efficient Image Captioning
- RepMLPNet: Hierarchical Vision MLP With Re-Parameterized Locality
- DR.VIC: Decomposition and Reasoning for Video Individual Counting
- ARCS: Accurate Rotation and Correspondence Search
- Learning To Anticipate Future With Dynamic Context Removal
- GCFSR: A Generative and Controllable Face Super Resolution Method Without Facial and GAN Priors
- On The Integration of Self-Attention and Convolution
- Domain Adaptation on Point Clouds Via Geometry-Aware Implicits
- GroupViT: Semantic Segmentation Emerges From Text Supervision
- DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation
- BppAttack: Stealthy and Efficient Trojan Attacks Against Deep Neural Networks Via Image Quantization and Contrastive Adversarial Learning
- Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation
- Towards Better Plasticity-Stability Trade-Off in Incremental Learning: A Simple Linear Connector
- Topology-Preserving Shape Reconstruction and Registration Via Neural Diffeomorphic Flow
- Segment and Complete: Defending Object Detectors Against Adversarial Patch Attacks With Robust Patch Detection
- MAXIM: Multi-Axis MLP for Image Processing
- Learning Part Segmentation Through Unsupervised Domain Adaptation From Synthetic Vehicles
- PSTR: End-to-End One-Step Person Search With Transformers
- NFormer: Robust Person Re-Identification With Neighbor Transformer
- Bridging Global Context Interactions for High-Fidelity Image Completion
- SwinBERT: End-to-End Transformers With Sparse Attention for Video Captioning
- Not All Tokens Are Equal: Human-Centric Visual Analysis Via Token Clustering Transformer
- Temporally Efficient Vision Transformer for Video Instance Segmentation
- The Devil Is in The Margin: Margin-Based Label Smoothing for Network Calibration
- NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks
- WarpingGAN: Warping Multiple Uniform Priors for Adversarial 3D Point Cloud Generation
- Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
- E2(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition
- OoD-Bench: Quantifying and Understanding Two Dimensions of Out-of-Distribution Generalization
- OnePose: One-Shot Object Pose Estimation Without CAD Models
- Rethinking Minimal Sufficient Representation in Contrastive Learning
- Scalable Penalized Regression for Noise Detection in Learning With Noisy Labels
- Federated Class-Incremental Learning
- Show, Deconfound and Tell: Image Captioning With Causal Inference
- MobRecon: Mobile-Friendly Hand Mesh Reconstruction From Monocular Image
- Parameter-Free Online Test-Time Adaptation
- SIGMA: Semantic-Complete Graph Matching for Domain Adaptive Object Detection
- No Pain, Big Gain: Classify Dynamic Point Cloud Sequences With Static Models By Fitting Feature-Level Space-Time Surfaces
- HerosNet: Hyperspectral Explicable Reconstruction and Optimal Sampling Deep Network for Snapshot Compressive Imaging
- Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space
- Learning To Estimate Robust 3D Human Mesh From In-the-Wild Crowded Scenes
- Detecting Deepfakes With Self-Blended Images
- Implicit Sample Extension for Unsupervised Person Re-Identification
- Energy-Based Latent Aligner for Incremental Learning
- Towards Semi-Supervised Deep Facial Expression Recognition With An Adaptive Confidence Margin
- Group R-CNN for Weakly Semi-Supervised Object Detection With Points
- Weakly-Supervised Action Transition Learning for Stochastic Human Motion Prediction
- Hybrid Relation Guided Set Matching for Few-Shot Action Recognition
- Cross-Patch Dense Contrastive Learning for Semi-Supervised Segmentation of Cellular Nuclei in Histopathologic Images
- Generalized Binary Search Network for Highly-Efficient Multi-View Stereo
- SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation
- FlexIT: Towards Flexible Semantic Image Translation
- CRAFT: Cross-Attentional Flow Transformer for Robust Optical Flow
- BoxeR: Box-Attention for 2D and 3D Transformers
- Neural Architecture Search With Representation Mutual Information
- Can Neural Nets Learn The Same Model Twice? Investigating Reproducibility and Double Descent From The Decision Boundary Perspective
- Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction
- Multi-View Transformer for 3D Visual Grounding
- Structured Sparse R-CNN for Direct Scene Graph Generation
- BARC: Learning To Regress 3D Dog Shape From Images By Exploiting Breed Information
- PCA-Based Knowledge Distillation Towards Lightweight and Content-Style Balanced Photorealistic Style Transfer Models
- Towards Understanding Adversarial Robustness of Optical Flow Networks
- Lifelong Graph Learning
- Hypergraph-Induced Semantic Tuplet Loss for Deep Metric Learning
- Computing Wasserstein-p Distance Between Images With Linear Cost
- Unsupervised Representation Learning for Binary Networks By Joint Classifier Learning
- Large-Scale Video Panoptic Segmentation in The Wild: A Benchmark
- GrainSpace: A Large-Scale Dataset for Fine-Grained and Domain-Adaptive Recognition of Cereal Grains
- Learning Modal-Invariant and Temporal-Memory for Video-Based Visible-Infrared Person Re-Identification
- MSDN: Mutually Semantic Distillation Network for Zero-Shot Learning
- Oriented RepPoints for Aerial Object Detection
- Weakly Supervised Temporal Sentence Grounding With Gaussian-Based Contrastive Proposal Learning
- Low-Resource Adaptation for Personalized Co-Speech Gesture Generation
- Task-Specific Inconsistency Alignment for Domain Adaptive Object Detection
- MS2DG-Net: Progressive Correspondence Learning Via Multiple Sparse Semantics Dynamic Graph
- Learning To Listen: Modeling Non-Deterministic Dyadic Facial Motion
- Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation From Monocular Video
- MixFormer: End-to-End Tracking With Iterative Mixed Attention
- Plenoxels: Radiance Fields Without Neural Networks
- Selective-Supervised Contrastive Learning With Noisy Labels
- SimT: Handling Open-Set Noise for Domain Adaptive Semantic Segmentation
- Frequency-Driven Imperceptible Adversarial Attack on Semantic Similarity
- Video Demoireing With Relation-Based Temporal Consistency
- Industrial Style Transfer With Large-Scale Geometric Warping and Content Preservation
- Modeling Image Composition for Complex Scene Generation
- Decoupling Zero-Shot Semantic Segmentation
- Templates for 3D Object Pose Estimation Revisited: Generalization to New Objects and Robustness to Occlusions
- Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting The Adversarial Transferability
- IFOR: Iterative Flow Minimization for Robotic Object Rearrangement
- Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation
- TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation
- The Wanderings of Odysseus in 3D Scenes
- All-in-One Image Restoration for Unknown Corruption
- PUMP: Pyramidal and Uniqueness Matching Priors for Unsupervised Learning of Local Descriptors
- MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video
- RCP: Recurrent Closest Point for Point Cloud
- A Dual Weighting Label Assignment Scheme for Object Detection
- Hyperbolic Vision Transformers: Combining Improvements in Metric Learning
- Instance-Aware Dynamic Neural Network Quantization
- Exploring Effective Data for Surrogate Training Towards Black-Box Attack
- JRDB-Act: A Large-Scale Dataset for Spatio-Temporal Action, Social Group and Activity Detection
- Investigating Top-k White-Box and Transferable Black-Box Attack
- Decoupling and Recoupling Spatiotemporal Representation for RGB-D-Based Motion Recognition
- A Self-Supervised Descriptor for Image Copy Detection
- Negative-Aware Attention Framework for Image-Text Matching
- An Image Patch Is A Wave: Phase-Aware Vision MLP
- Shunted Self-Attention Via Multi-Scale Token Aggregation
- Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression
- Recurrent Variational Network: A Deep Learning Inverse Problem Solver Applied to The Task of Accelerated MRI Reconstruction
- Surpassing The Human Accuracy: Detecting Gallbladder Cancer From USG Images With Curriculum Learning
- Appearance and Structure Aware Robust Deep Visual Graph Matching: Attack, Defense and Beyond
- TrackFormer: Multi-Object Tracking With Transformers
- 3D Shape Reconstruction From 2D Images With Disentangled Attribute Flow
- Feature Statistics Mixing Regularization for Generative Adversarial Networks
- OpenTAL: Towards Open Set Temporal Action Localization
- Self-Supervised Learning of Adversarial Example: Towards Good Generalizations for Deepfake Detection
- Ego4D: Around The World in 3,000 Hours of Egocentric Video
- Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis
- Weakly Supervised Semantic Segmentation Using Out-of-Distribution Data
- DAD-3DHeads: A Large-Scale Dense, Accurate and Diverse Dataset for 3D Head Alignment From A Single Image
- Reconstructing Surfaces for Sparse Point Clouds With On-Surface Priors
- VCLIMB: A Novel Video Class Incremental Learning Benchmark
- Robust Equivariant Imaging: A Fully Unsupervised Framework for Learning To Image From Noisy and Partial Measurements
- ST++: Make Self-Training Work Better for Semi-Supervised Semantic Segmentation
- Interacting Attention Graph for Single Image Two-Hand Reconstruction
- Rope3D: The Roadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task
- Cross-Image Relational Knowledge Distillation for Semantic Segmentation
- Towards Layer-Wise Image Vectorization
- Scenic: A JAX Library for Computer Vision Research and Beyond
- Real-Time Object Detection for Streaming Perception
- VisualHow: Multimodal Problem Solving
- Spatial Commonsense Graph for Object Localisation in Partial Scenes
- OSSGAN: Open-Set Semi-Supervised Image Generation
- Bi-Level Alignment for Cross-Domain Crowd Counting
- ST-MFNet: A Spatio-Temporal Multi-Flow Network for Frame Interpolation
- Efficient Multi-View Stereo By Iterative Dynamic Cost Volume
- TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing
- Use All The Labels: A Hierarchical Multi-Label Contrastive Learning Framework
- SGTR: End-to-End Scene Graph Generation With Transformer
- Decoupled Knowledge Distillation
- DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection
- Reusing The Task-Specific Classifier As A Discriminator: Discriminator-Free Adversarial Domain Adaptation
- Show Me What and Tell Me How: Video Synthesis Via Multimodal Conditioning
- SIMBAR: Single Image-Based Scene Relighting for Effective Data Augmentation for Automated Driving Vision Tasks
- Multi-Label Classification With Partial Annotations Using Class-Aware Selective Loss
- CADTransformer: Panoptic Symbol Spotting Transformer for CAD Drawings
- IntraQ: Learning Synthetic Images With Intra-Class Heterogeneity for Zero-Shot Network Quantization
- I M Avatar: Implicit Morphable Head Avatars From Videos
- Weakly-Supervised Metric Learning With Cross-Module Communications for The Classification of Anterior Chamber Angle Images
- A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-Resolution
- Multi-Modal Dynamic Graph Transformer for Visual Grounding
- Geometric Transformer for Fast and Robust Point Cloud Registration
- UMT: Unified Multi-Modal Transformers for Joint Video Moment Retrieval and Highlight Detection
- Demystifying The Neural Tangent Kernel From A Practical Perspective: Can It Be Trusted for Neural Architecture Search Without Training?
- The Devil Is in The Details: Window-Based Attention for Image Compression
- DiLiGenT102: A Photometric Stereo Benchmark Dataset With Controlled Shape and Material Variation
- PolyWorld: Polygonal Building Extraction With Graph Neural Networks in Satellite Images
- Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation
- Spatio-Temporal Relation Modeling for Few-Shot Action Recognition
- Multi-Person Extreme Motion Prediction
- B-DARTS: Beta-Decay Regularization for Differentiable Architecture Search
- CMT: Convolutional Neural Networks Meet Vision Transformers
- KNN Local Attention for Image Restoration
- Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered By Pre-Trained Vision-Language Model
- TransMix: Attend To Mix for Vision Transformers
- Inertia-Guided Flow Completion and Style Fusion for Video Inpainting
- Long-Tailed Visual Recognition Via Gaussian Clouded Logit Adjustment
- Image Animation With Perturbed Masks
- Domain Generalization Via Shuffled Style Assembly for Face Anti-Spoofing
- OcclusionFusion: Occlusion-Aware Motion Estimation for Real-Time Dynamic 3D Reconstruction
- MonoScene: Monocular 3D Semantic Scene Completion
- AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition
- Continuous Scene Representations for Embodied AI
- Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds
- Non-Probability Sampling Network for Stochastic Human Trajectory Prediction
- ResSFL: A Resistance Transfer Framework for Defending Model Inversion Attack in Split Federated Learning
- Human-Aware Object Placement for Visual Environment Reconstruction
- X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
- RAMA: A Rapid Multicut Algorithm on GPU
- Adversarial Parametric Pose Prior
- Mask Transfiner for High-Quality Instance Segmentation
- It Is Okay To Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning By Contrastive Data Collection
- DiRA: Discriminative, Restorative, and Adversarial Learning for Self-Supervised Medical Image Analysis
- Event-Based Video Reconstruction Via Potential-Assisted Spiking Neural Network
- YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset
- DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation
- Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification
- Self-Supervised Video Transformer
- AutoRF: Learning 3D Object Radiance Fields From Single View Observations
- Coopernaut: End-to-End Driving With Cooperative Perception for Networked Vehicles
- TubeR: Tubelet Transformer for Video Action Detection
- MUM: Mix Image Tiles and UnMix Feature Tiles for Semi-Supervised Object Detection
- Learning Non-Target Knowledge for Few-Shot Semantic Segmentation
- UKPGAN: A General Self-Supervised Keypoint Detector
- Raw High-Definition Radar for Multi-Task Learning
- Coarse-To-Fine Feature Mining for Video Semantic Segmentation
- Compressing Models With Few Samples: Mimicking Then Replacing
- PokeBNN: A Binary Pursuit of Lightweight Accuracy
- Zoom in and Out: A Mixed-Scale Triplet Network for Camouflaged Object Detection
- SOMSI: Spherical Novel View Synthesis With Soft Occlusion Multi-Sphere Images
- EMScore: Evaluating Video Captioning Via Coarse-Grained and Fine-Grained Embedding Matching
- PoseTriplet: Co-Evolving 3D Human Pose Estimation, Imitation, and Hallucination Under Self-Supervision
- Group Contextualization for Video Recognition
- Single-Domain Generalized Object Detection in Urban Scene Via Cyclic-Disentangled Self-Distillation
- L2G: A Simple Local-to-Global Knowledge Transfer Framework for Weakly Supervised Semantic Segmentation
- Self-Augmented Unpaired Image Dehazing Via Density and Depth Decomposition
- Neural 3D Video Synthesis From Multi-View Video
- SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation
- Shapley-NAS: Discovering Operation Contribution for Neural Architecture Search
- HyperTransformer: A Textural and Spectral Feature Fusion Transformer for Pansharpening
- Structure-Aware Flow Generation for Human Body Reshaping
- Learning To Answer Questions in Dynamic Audio-Visual Scenarios
- Synthetic Aperture Imaging With Events and Frames
- MonoGround: Detecting Monocular 3D Objects From The Ground
- Deep Visual Geo-Localization Benchmark
- StyleGAN-V: A Continuous Video Generator With The Price, Image Quality and Perks of StyleGAN2
- LISA: Learning Implicit Shape and Appearance of Hands
- Iterative Deep Homography Estimation
- Learned Queries for Efficient Local Attention
- Colar: Effective and Efficient Online Action Detection By Consulting Exemplars
- SoftGroup for 3D Instance Segmentation on Point Clouds
- MVS2D: Efficient Multi-View Stereo Via Attention-Driven 2D Convolutions
- Beyond Semantic to Instance Segmentation: Weakly-Supervised Instance Segmentation Via Semantic Knowledge Transfer and Self-Refinement
- Deep Constrained Least Squares for Blind Image Super-Resolution
- EDTER: Edge Detection With Transformer
- AirObject: A Temporally Evolving Graph Embedding for Object Identification
- From Representation to Reasoning: Towards Both Evidence and Commonsense Reasoning for Video Question-Answering
- Semantic-Aware Domain Generalized Segmentation
- DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion
- UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection
- AKB-48: A Real-World Articulated Object Knowledge Base
- Stratified Transformer for 3D Point Cloud Segmentation
- Aug-NeRF: Training Stronger Neural Radiance Fields With Triple-Level Physically-Grounded Augmentations
- Semantic-Shape Adaptive Feature Modulation for Semantic Image Synthesis
- Day-to-Night Image Synthesis for Training Nighttime Neural ISPs Literature ~Highlight: To address this problem, we propose a method that synthesizes nighttime images from daytime images.
References
- https://www.paperdigest.org/2022/06/cvpr-2022-papers-with-code-data/
Author
Dr Hari Thapliyaal
dasarpai.com
linkedin.com/in/harithapliyal