Skip to content

Commit

Permalink
mm: numa: recheck for transhuge pages under lock during protection ch…
Browse files Browse the repository at this point in the history
…anges

Sasha reported the following bug using trinity

  kernel BUG at mm/mprotect.c:149!
  invalid opcode: 0000 [alexander-zimmermann#1] PREEMPT SMP DEBUG_PAGEALLOC
  Dumping ftrace buffer:
     (ftrace buffer empty)
  Modules linked in:
  CPU: 20 PID: 26219 Comm: trinity-c216 Tainted: G        W    3.14.0-rc5-next-20140305-sasha-00011-ge06f5f3-dirty torvalds#105
  task: ffff8800b6c80000 ti: ffff880228436000 task.ti: ffff880228436000
  RIP: change_protection_range+0x3b3/0x500
  Call Trace:
    change_protection+0x25/0x30
    change_prot_numa+0x1b/0x30
    task_numa_work+0x279/0x360
    task_work_run+0xae/0xf0
    do_notify_resume+0x8e/0xe0
    retint_signal+0x4d/0x92

The VM_BUG_ON was added in -mm by the patch "mm,numa: reorganize
change_pmd_range".  The race existed without the patch but was just
harder to hit.

The problem is that a transhuge check is made without holding the PTL.
It's possible at the time of the check that a parallel fault clears the
pmd and inserts a new one which then triggers the VM_BUG_ON check.  This
patch removes the VM_BUG_ON but fixes the race by rechecking transhuge
under the PTL when marking page tables for NUMA hinting and bailing if a
race occurred.  It is not a problem for calls to mprotect() as they hold
mmap_sem for write.

Signed-off-by: Mel Gorman <[email protected]>
Reported-by: Sasha Levin <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
  • Loading branch information
Mel Gorman authored and torvalds committed Apr 7, 2014
1 parent 88a9ab6 commit 1ad9f62
Showing 1 changed file with 34 additions and 2 deletions.
36 changes: 34 additions & 2 deletions mm/mprotect.c
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,34 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)
}
#endif

/*
* For a prot_numa update we only hold mmap_sem for read so there is a
* potential race with faulting where a pmd was temporarily none. This
* function checks for a transhuge pmd under the appropriate lock. It
* returns a pte if it was successfully locked or NULL if it raced with
* a transhuge insertion.
*/
static pte_t *lock_pte_protection(struct vm_area_struct *vma, pmd_t *pmd,
unsigned long addr, int prot_numa, spinlock_t **ptl)
{
pte_t *pte;
spinlock_t *pmdl;

/* !prot_numa is protected by mmap_sem held for write */
if (!prot_numa)
return pte_offset_map_lock(vma->vm_mm, pmd, addr, ptl);

pmdl = pmd_lock(vma->vm_mm, pmd);
if (unlikely(pmd_trans_huge(*pmd) || pmd_none(*pmd))) {
spin_unlock(pmdl);
return NULL;
}

pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, ptl);
spin_unlock(pmdl);
return pte;
}

static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
unsigned long addr, unsigned long end, pgprot_t newprot,
int dirty_accountable, int prot_numa)
Expand All @@ -45,7 +73,10 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
spinlock_t *ptl;
unsigned long pages = 0;

pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
pte = lock_pte_protection(vma, pmd, addr, prot_numa, &ptl);
if (!pte)
return 0;

arch_enter_lazy_mmu_mode();
do {
oldpte = *pte;
Expand Down Expand Up @@ -132,12 +163,13 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
pages += HPAGE_PMD_NR;
nr_huge_updates++;
}

/* huge pmd was handled */
continue;
}
}
/* fall through, the trans huge pmd just split */
}
VM_BUG_ON(pmd_trans_huge(*pmd));
this_pages = change_pte_range(vma, pmd, addr, next, newprot,
dirty_accountable, prot_numa);
pages += this_pages;
Expand Down

0 comments on commit 1ad9f62

Please sign in to comment.