Skip to content

Commit

Permalink
Deployed 6712d4c with MkDocs version: 1.5.3
Browse files Browse the repository at this point in the history
  • Loading branch information
Unknown committed Jul 2, 2024
1 parent 435233b commit dbb984e
Show file tree
Hide file tree
Showing 3 changed files with 4 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -6202,7 +6202,8 @@ <h1>Are less inductive biases better or worse?</h1>
- Although positional encodings exist and fixed sinusoidal encodings can be used, they are mostly learned and randomly/zero initialized.<br />
They show that Vision Transformers scale better than ConvNets and Mixed Architectures (Convolutional stems + Transformer).</p>
<p><a href="../../100%20Reference%20notes/101%20Literature/A%20ConvNet%20for%20the%202020s/">A ConvNet for the 2020s</a> proves that ResNets are outdated and improves the network with recent advances to match ViTs performance. </p>
<p><a href="../../100%20Reference%20notes/101%20Literature/An%20Image%20is%20Worth%20More%20Than%2016x16%20Patches%20-%20Exploring%20Transformers%20on%20Individual%20Pixels/">An Image is Worth More Than 16x16 Patches - Exploring Transformers on Individual Pixels</a> tackles the toy question of dropping the convolutional stem that does the patchification in ViTs, with the intention of further reducing inductive biases. They prove that the resulting model (although unusable in practice), competes with ViTs.</p>
<p><a href="../../100%20Reference%20notes/101%20Literature/The%20Lie%20derivative%20for%20measuring%20learned%20equivariance/">The Lie derivative for measuring learned equivariance</a> shows surprising result: ViTs exhibit more translational equivariance after training than ConvNets, as measured per their Lie Derivative.</p>
<p><a href="../../100%20Reference%20notes/101%20Literature/An%20Image%20is%20Worth%20More%20Than%2016x16%20Patches%20-%20Exploring%20Transformers%20on%20Individual%20Pixels/">An Image is Worth More Than 16x16 Patches - Exploring Transformers on Individual Pixels</a> tackles the toy question of dropping the convolutional stem that does the patchification in ViTs, with the intention of further reducing inductive biases. They prove that the resulting model (although too computationally intensive to be used in practice), competes with ViTs.</p>
<p><a href="../100 Reference notes/101 Literature/How do vision transformers work?.md">How do vision transformers work?</a> argues that the benefit of Vision Transformers is not that they have less inductive biases, but that the their operations are input dependent (see <a href="../Input-dependent%20convolutions/">Input-dependent convolutions</a>) and that Self Attention acts as a smoothing mechanism (that helps with better training dynamics on the large data regimes). They ablate this decision by constraining ViTs attention to be local, outperforming ViTs with global attention both in small and large data regimes. This is a strong indication that locality constraints are useful. </p>
<p><a href="../../100%20Reference%20notes/101%20Literature/Learning%20with%20Unmasked%20Tokens%20Drives%20Stronger%20Vision%20Learners/">Learning with Unmasked Tokens Drives Stronger Vision Learners</a> implicitly counter-argues <a href="../100 Reference notes/101 Literature/How do vision transformers work?.md">How do vision transformers work?</a> by noticing that MIM-trained ViTs exhibit localized attention maps and "fixing" it. Their approach outperforms other MIM-trained ViTs, so locality as good inductive bias is not definitely answered.</p>
<h2 id="vits-vs-dense-prediction-tasks">ViTs vs Dense prediction tasks<a class="headerlink" href="#vits-vs-dense-prediction-tasks" title="Permanent link">&para;</a></h2>
Expand Down Expand Up @@ -6230,7 +6231,7 @@ <h2 id="vits-vs-dense-prediction-tasks">ViTs vs Dense prediction tasks<a class="
<span class="md-icon" title="Last update">
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M21 13.1c-.1 0-.3.1-.4.2l-1 1 2.1 2.1 1-1c.2-.2.2-.6 0-.8l-1.3-1.3c-.1-.1-.2-.2-.4-.2m-1.9 1.8-6.1 6V23h2.1l6.1-6.1-2.1-2M12.5 7v5.2l4 2.4-1 1L11 13V7h1.5M11 21.9c-5.1-.5-9-4.8-9-9.9C2 6.5 6.5 2 12 2c5.3 0 9.6 4.1 10 9.3-.3-.1-.6-.2-1-.2s-.7.1-1 .2C19.6 7.2 16.2 4 12 4c-4.4 0-8 3.6-8 8 0 4.1 3.1 7.5 7.1 7.9l-.1.2v1.8Z"/></svg>
</span>
<span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-timeago"><span class="timeago" datetime="2024-07-02T11:57:37+00:00" locale="en"></span></span><span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-iso_date">2024-07-02</span>
<span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-timeago"><span class="timeago" datetime="2024-07-02T15:28:15+00:00" locale="en"></span></span><span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-iso_date">2024-07-02</span>
</span>


Expand Down
2 changes: 1 addition & 1 deletion search/search_index.json

Large diffs are not rendered by default.

Binary file modified sitemap.xml.gz
Binary file not shown.

0 comments on commit dbb984e

Please sign in to comment.