Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JavaScript数组乱序 #6

Open
mengxiong10 opened this issue Mar 12, 2018 · 0 comments
Open

JavaScript数组乱序 #6

mengxiong10 opened this issue Mar 12, 2018 · 0 comments
Labels

Comments

@mengxiong10
Copy link
Owner

mengxiong10 commented Mar 12, 2018

关于JavaScript数组乱序

Array.prototype.sort

常见的一个写法是利用Array.prototype.sort

function shuffle1 (arr) {
  return arr.concat().sort(function () {
    return Math.random() - 0.5
  })
}

问题

这个排序算法并不是完全随机的.如果是完全随机的那么数组[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ]中的元素出现在每个位置的概率是一样的.
下面测试任意一个元素在多次测试后在各个位置的概率.

const test1 = Array.from({ length: 10 }).map((v, i) => i)
// [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
const test2 = Array.from({ length: 20 }).map((v, i) => i)
// [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

function stat (arr, fn) {
  const count = arr.slice().fill(0)
  const len = 100000
  const key = arr[Math.random() * arr.length | 0]
  for (let i = 0; i < len; i++) {
    const shuffleArray = fn(arr.slice())
    count[shuffleArray.indexOf(key)]++
  }
  count.forEach((v, i) => {
    count[i] = (v / len * 100).toFixed(2) + '%'
  })
  console.log(`${key}${len}次测试中出现在各个位置的概率:\n`, JSON.stringify(count))
}

stat(test1, shuffle1)
stat(test2, shuffle1)

// 元素8在100000次测试中出现在各个位置的概率:
// ['0.40%', '0.40%', '0.80%', '1.53%', '3.07%', '5.92%', '11.44%', '20.20%', '31.14%', '25.09%']
// 元素10在100000次测试中出现在各个位置的概率:
// ["11.40%","11.21%","7.76%","4.72%","3.26%","3.02%","3.20%","3.86%","5.00%","6.30%","7.20%","5.18%","3.58%","2.19%","1.58%","1.33%","1.50%","2.63%","5.04%","10.06%"]

如上看到结果并不随机, test1元素保留在自身位置概率大, test2概率也很不平均.
ECMAScript中关于Array.prototype.sort(comparefn)的标准,其中并没有规定具体的实现算法.
而v8引擎对于数组长度小于等于10使用的是插入排序, 长数组是快速排序.

/**
 * 比较函数
 * @param {*} a 
 * @param {*} b 
 */
const compare = function (a, b) {
  return Math.random() - 0.5
}

/**
 * 插入排序
 * @param {Array} arr
 */
const insertion = function (arr) {
  const len = arr.length
  for (let i = 1; i < len; i++) {
    const key = arr[i]
    let j
    for (j = i - 1; j >= 0 && compare(arr[j], key) > 0; j--) {
      arr[j + 1] = arr[j]
    }
    arr[j + 1] = key
  }
  return arr
}

/**
 * 快速排序
 * @param {*} arr 
 */
const quickSort = function (arr) {
  const exch = function (arr, i, j) {
    [arr[i], arr[j]] = [arr[j], arr[i]]
  }
  const partition = function (arr, start, end) {
    const key = arr[start]
    let i = start
    let j = end + 1
    while (true) {
      while (compare(arr[++i], key) < 0) {
        if (i === end) {
          break
        }
      }
      while (compare(key, arr[--j]) < 0) {}
      if (i >= j) {
        break
      }
      exch(arr, i, j)
    }
    exch(arr, start, j)
    return j
  }
  const sort = function (arr, start, end) {
    if (end - start <= 0) {
      return
    }
    const j = partition(arr, start, end)
    sort(arr, start, j - 1)
    sort(arr, j + 1, end)
  }
  const len = arr.length
  sort(arr, 0, len - 1)
  return arr
}

如上插入排序越后面的元素,交换到越前的位置的概率越小, 所以元素分布在原地的概率大.
快速排序因为在每次的迭代中,它至少会把一个元素摆到它最后的位置去.第一次迭代的基准,由于每个元素放在它左边和右边的概率是50%, 所以它的位置概率应该是靠近数组中间高,显然也不能完全随机.

Fisher–Yates Shuffle

正确解法应该是 Fisher–Yates Shuffle,复杂度 O(n)

/**
 * Fisher-Yates shuffle
 * @param {Array} array
 */
function shuffle2 (array) {
  const arr = [].concat(array)
  const len = arr.length
  for (let i = len - 1; i > 0; i--) {
    let j = (i + 1) * Math.random() | 0;
    [arr[i], arr[j]] = [arr[j], arr[i]]
  }
  return arr
}

每次从数组中随机选一个元素出来放到新数组, 循环完成的新数组就是完全随机的,如上Fisher–Yates Shuffle 只是每次随机的元素交换到数组最后面,这样就不用新建一个额外的数组.
测试:

stat(test1, shuffle2)
stat(test2, shuffle2)
// 元素1在100000次测试中出现在各个位置的概率:
// ["10.04%","9.99%","10.01%","9.91%","9.93%","10.09%","10.04%","9.94%","10.05%","10.01%"]
// 元素16在100000次测试中出现在各个位置的概率:
// ["5.11%","4.99%","4.96%","4.88%","4.89%","5.02%","5.07%","4.99%","5.07%","4.99%","4.96%","4.96%","5.03%","5.00%","5.10%","4.96%","5.05%","4.95%","4.99%","5.03%"]

随机性的数学归纳法证明 引用

对 n 个数进行随机:

  1. 首先考虑 n = 2 的情况,根据算法,显然有 1/2 的概率两个数交换,有 1/2 的概率两个数不交换,因此对 n = 2 的情况,元素出现在每个位置的概率都是 1/2,满足随机性要求。
  2. 假设有 i 个数, i >= 2 时,算法随机性符合要求,即每个数出现在 i 个位置上每个位置的概率都是 1/i。
  3. 对于 i + 1 个数,按照我们的算法,在第一次循环时,每个数都有 1/(i+1) 的概率被交换到最末尾,所以每个元素出现在最末一位的概率都是 1/(i+1) 。而每个数也都有 i/(i+1) 的概率不被交换到最末尾,如果不被交换,从第二次循环开始还原成 i 个数随机,根据 2. 的假设,它们出现在 i 个位置的概率是 1/i。因此每个数出现在前 i 位任意一位的概率是 (i/(i+1)) * (1/i) = 1/(i+1),也是 1/(i+1)。
  4. 综合 1. 2. 3. 得出,对于任意 n >= 2,经过这个算法,每个元素出现在 n 个位置任意一个位置的概率都是 1/n。
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant