algorithm

marstalk · Aug 25, 2024 · e14fa8a · e14fa8a
1 parent 0c5bb2a
commit e14fa8a
Show file tree

Hide file tree

Showing 22 changed files with 5,879 additions and 0 deletions.
diff --git a/algorithm/0.basic.algm.md b/algorithm/0.basic.algm.md
diff --git a/algorithm/1.tree.algm.md b/algorithm/1.tree.algm.md
diff --git a/algorithm/10.consistent_hashing.md b/algorithm/10.consistent_hashing.md
@@ -0,0 +1,31 @@
+https://www.acodersjourney.com/system-design-interview-consistent-hashing/
+
+
+1. we should be able to distribute incoming request uniformly among the set of n database servers;
+2. we should be able to dynamically add/remove database servers;
+3. when we add/remove a database servers, we need to move minimal amount of data between database servers;
+
+
+for simple hashing, there are two drawbacks with this approch: **horizontal scalability** and **non-uniformly distribute data across servers**.
+
+# elastic scaling database server /cache server with minimal data remapped.
+
+# avoiding hot-spots 
+
+# conclusion
+1. enable elastic scaling of cluster of database or cache servers
+2. faciliate replication and partitioning of data across servers;
+3. partitioning of data enables uniform distribution which relieves hot pots.
+4. points a-c enables hight availability of the system as a whole.
+
+# reference:
+- http://tom-e-white.com/2007/11/consistent-hashing.html
+- 
+
+---
+1. if we add/remove servers from the set, all our existing **mappings are broken**.
+2. this means all existing data needs to be **remapped** and **migrated** to different server.
+3. This might be a **herculean**英 /ˌhə:kju'li:ən/(费力的) task because it'll either require a **scheduled system downtime** to update mappings or creating read replicas of the existing system which can service queries during the migration. In other words, a lot of pain and expenditure英 /ɪk'spendɪtʃə/(支出).
+4. avoid **data hot-spot** in the cluter
+5. We cannot expect **uniform distribution** of data coming in all the time. There may be many more keys whose hashValue maps to server number 3 than any other servers , in which case server number 3 will **become a hotspot for queries**.
+6. 
diff --git a/algorithm/11.random.algm.md b/algorithm/11.random.algm.md
@@ -0,0 +1,72 @@
+# Java Random的原理
+
+1. 伪随机，依赖种子数，默认使用系统时间相关的数字。
+2. Random尽量使用全局变量，没必要每次都new一个Random对象。比如Collections中的random就是全局`static Random rnd`;
+3. 如果两个Random的种子相等，那么这两个Random的第N次产生的随机数random.nextInt(x)是相同的，其中x也是相同的。
+4. Math.random()使用的是random.nextDouble();
+[code](../../javademo/random/RandomDemo.java)
+[code](../../javademo/random/RandomDemo2.java)
+
+
+# 洗牌算法-有限随机
+我一开始想到的是，使用随机函数，从备选元素中随机挑选一个出来，但实际上有一些细节没有考虑到：
+1. 随机出现的重复情况如何处理？循环吗？ --- 为了避免出现重复的情况，可以把【其他未被选择的数】填到【被选中的数中】使得不会出现重复选择的情况。
+2. 随着备选的元素越来越少，那么随机到的概率也越来越小，上述第一个问题出现的概率越来越大，怎么办？ --- 每次都减小随机数选择范围即可
+3. 从左到右，遍历一遍：针对每个元素，将其放置到后续的随机的位置。
+
+于是有了洗牌算法：事件复杂度是O(n)
+```java
+static Random rnd = new Random();
+public int[] shuffle(int[] arr){
+    if(arr == null) return;
+    if(arr.length == 1) return;
+    int n = arr.length;
+    for(int i = 0; i < n; i++){
+        int j = i + random.nextInt(n - i);
+        //swap
+        int tmp = arr[i];
+        arr[i] = arr[j];
+        arr[j] = tmp;
+    }
+}
+```
+[code](../../javademo/random/Shuffle.java)
+
+
+TODO
+> 给你一个文本文件，设计一个算法随机从文本文件中抽取一行，要保证每行被抽取到的概率一样。
+
+> 最简单的思路其实就是：先把文件每一行读取出来，假设有 n 行，这个时候随机从 1-n生成一个数，读取对应的行即可。
+
+> 这种方法当然可以解决，咱们加深一下难度，假设文件很大很大很大呢，或者直接要求只能遍历该文件内容一遍，怎么做到呢？
+>
+> 其实题目 1 还可以扩展一下，不是选择 1 行了，是选择 k 行，又应该怎么做呢？
+
+
+## 打乱数组
+https://leetcode.cn/problems/shuffle-an-array/description/
+
+# 水塘抽样算法-无限随机
+Reservoir Sampling
+洗牌算法很厉害，但是有些场景却不适用，比如：
+1. 给你一个未知长度的单链表，请你设计一个算法，**只能遍历一次**，随机地返回链表中的一个节点。[here]()
+2. 在一个非常大（内存装不下）的棋盘上随机放置K个地雷
+
+这个问题可以抽象为
+1. 有N个元素。
+2. 每个元素，也就是第i个元素被选到的概率是相等的，即1/N
+
+解决这个问题，只需要：
+1. 对于第i个元素，我们选择它作为结果的概率是1/i，那么不选择它的概率是1-1/i（即保持原样）
+那么在n个元素遍历完之后，第i个元素任然被选中的概率是：`1/i * (1- 1/(i+1)) * (1-1/(i+2)) ... * 1-(1/n) = 1/n`
+```java
+
+```
+
+# 蒙特卡洛验证法
+如何验证随机算法的正确性？大力出奇迹，暴力验证。
+现在有一个正方形，还有内切圆，往正方形里打大量的点，假设随机是均匀的，那么打在圆里的点占所有的点非常近似的等于圆的面积占正方形的面积。
+
+```java
+
+```
diff --git a/algorithm/12.virus_detection.algm.md b/algorithm/12.virus_detection.algm.md
@@ -0,0 +1,40 @@
+2020-12-6号，将近年关，但是新冠疫情尚未得到彻底的解决，美国每日新增感染仍然有二十多万。今日在抖音上看到一个科普视频，讲的是如何快速找到感染者，
+
+某城市爆发了零星感染，如何在百万或者千万级别的情况下，快速定位到感染者。
+目前样品检测速度是3~6小时，我们折中一下，每个样本需要4小时。
+且每个城市有1000台机器可以同时工作，那么1000万的样本需要多长时间呢？
+10000000/1000=10000
+10000*4hours = 40000hours/24hours=1666day/365day=4.5year
+
+上面是检测过程是简单的集合遍历。
+
+但实际上，检测方法并不是简单的遍历，而是使用了高效的算法
+
+假设有27个样本，按照传统方法，需要检查27次。
+
+现在我们做如下改进：
+第一轮，将27个样本分成3组，每组9个样本混在一起，那么第一轮只需要检查3次：
+1）对于混合样本检查结果呈现阳性的，可以确定这9人中有人感染了病毒。
+2）对于混合样本检查结果呈现阴性的，可以确定这9个人没有感染。
+
+第二轮，接着对第一轮检测是阳性的9个人进行二次组合检查
+将这9个人放在一个3x3的矩阵中：
+1 2 3
+4 5 6
+7 8 9
+每行及每列都混在一起做检查，本来需要9次检测，减少为6次检测。
+如下图，如果第3列和第2行检测出来是阳性，那么6号是感染者。
+1 2 3
+4 5 6
+7 8 9
+再如下面的检测结果，第3列和第2行第3行都是阳性，那么6号和9号是感染者。
+1 2 3
+4 5 6
+7 8 9
+总共下来，只需要3+6=9次检测，相比较于传统的27次检测，效率提高了3倍。
+以前需要3天完成的工作量，现在只需要1天即可完成。
+
+当然这个算法的效率，跟整个集合的感染率有关，感染率越小，算法越高效。
+所以，在美丽国当下的情况，使用上述算法不如全表扫描来得简单。
+
+注：喀什4天500万，武汉10天1000万，青岛5天1000万
diff --git a/algorithm/13.majiang.algm.md b/algorithm/13.majiang.algm.md
@@ -0,0 +1,18 @@
+
+# 工程设计
+
+# 模块设计
+
+# 核心数据结构与算法
+
+## 洗牌
+
+## 胡牌
+https://tinyoculus.github.io/2017/05/17/%E9%80%9A%E7%94%A8%E9%BA%BB%E5%B0%86%E8%83%A1%E7%89%8C%E7%AE%97%E6%B3%95/
+
+
+# 其他
+## console input & output
+```java
+
+```
diff --git a/algorithm/14.stack.algm.md b/algorithm/14.stack.algm.md
@@ -0,0 +1,118 @@
+https://leetcode.cn/problemset/all/?page=1&topicSlugs=stack
+
+20,155,232,844,224,682,496
+
+有种实现方式：
+- 数组：顺序栈，考虑到扩容的情况，入栈最坏时间复杂度是O(N)，均摊时间复杂度是O(1)
+- 链表：链式栈，没有扩容问题，但是同样大小的栈消耗的内存比数组要多。
+
+# 数组实现栈 TODO
+
+# 队列实现栈 TODO
+
+# 基本计算器
+https://leetcode.cn/problems/basic-calculator/
+> 给你一个字符串表达式 s ，请你实现一个基本计算器来计算并返回它的值。
+>注意:不允许使用任何将字符串作为数学表达式计算的内置函数，比如 eval() 。
+
+> 1 <= s.length <= 3 * 105
+> s 由数字、'+'、'-'、'('、')'、和 ' ' 组成
+> s 表示一个有效的表达式
+> '+' 不能用作一元运算(例如， "+1" 和 "+(2 + 3)" 无效)
+> '-' 可以用作一元运算(即 "-1" 和 "-(2 + 3)" 是有效的)
+> 输入中不存在两个连续的操作符
+> 每个数字和运行的计算将适合于一个有符号的 32位 整数
+
+因为只涉及到+-两个运行，他们的等级是相同的，所以从左到右依次计算即可。
+1. 对字符串s从左到右的遍历
+2. res保存之前的计算结果（初始0），sign保存计算符（初始1），res = res + sign * cur;
+3. 如果遇到左括号，那么将res和sign先push到栈中保存起来，并重新初始化res和sign。
+4. 如果遇到右括号，那么将当前的res和之前暂存起来的res和sign进行计算：res = res + sign * res;
+[code](../../javademo/stack/Calculator.java)
+[code](../../javademo/stack/Calculator2.java)
+```java
+char ch = '4';
+//如何判断是否数字？
+Character.isDigit(ch);
+//char如何转数字？
+int i = ch - '0';
+
+String s = "234";
+int i = 0;
+int num = s.charAt(0) - '0';
+while(i + 1 < s.length() && Character.isDigit(s.charAt(i + 1))){
+    num = num * 10 + (s.charAt(++i) - '0');
+}
+print(num); // 234
+```
+
+# 基本计算器2 TODO
+https://leetcode.cn/problems/basic-calculator-ii/description/
+如果涉及到加减乘除，如果改进？
+- 两个栈，分别存储操作符和操作数。
+- 操作符入栈之间比较栈顶，
+  - 如果当前操作符的高于栈顶操作符，那么直接入栈。
+  - 如果当前操作符的优先级等于或者低于栈顶操作符，那么：
+    - 出栈两个操作数分别是x、y，出栈一个操作符a，等到一个运算 x a y，运算等到的结果入栈操作数，直到操作符栈的栈顶元素是左括号(
+  - 如果当前操作符是右括号，那么丢弃之，并进行出栈计算，直到栈顶元素是左括号(
+
+
+# 棒球比赛
+https://leetcode.cn/problems/baseball-game/
+[code](../../javademo/stack/BaseballGame.java)
+
+# 下一个更大元素1 TODO
+https://leetcode.cn/problems/next-greater-element-i/
+
+
+# 最小栈
+https://leetcode.cn/problems/min-stack/
+- 方法一：使用链表实现，node节点除了val和next之外，额外记录min，min来总是记录着以该节点为head的链表最小的值。比如 head -> tail
+  - 2(2) , 9(4) , 7(4) , 4(4)
+  - 5(2) , 2(2) , 8(6) , 6(6) , 10(10)
+- 方法二：每次push，会入栈两个数，第一个数表示val，第二个数表示min，min = Math(val, stack.peek())，取top值stack.get(stack.size() - 2); 如
+  - 2 , 2 , 9 , 4 , 7 , 4 , 4 , 4
+[code](../../javademo/stack/MinStack.java)
+[code](../../javademo/stack/MinStack2.java)
+
+# 使用栈来实现队列 TODO
+
+> Implement a first in first out (FIFO) queue using only two stacks. The implemented queue should support all the functions of a normal queue (push, peek, pop, and empty).
+
+> You must use only standard operations of a stack, which means only push to top, peek/pop from top, size, and is empty operations are valid.
+> Depending on your language, the stack may not be supported natively. You may simulate a stack using a list or deque (double-ended queue) as long as you use only a stack's standard operations.
+
+
+# 浏览器的前进后退
+双栈
+
+# 比较含退格字符串
+https://leetcode.cn/problems/backspace-string-compare/
+- 方法一，两个栈，空间复杂度是O(n+m)
+- 方法二，双指针，空间复杂度是O(1)：
+
+[code](../../javademo/stack/BackspaceStringCompare.java)
+
+
+# 单调栈 monotony stack
+单调栈能够解决一类“下（上）一个更大（更小）”，模板如下：
+```
+
+```
+
+# 括号问题
+## 括号匹配
+https://leetcode.cn/problems/valid-parentheses/description/
+
+[code](../../javademo/stack/ValidParenthese.java)
+
+## 括号生成
+https://leetcode.cn/problems/generate-parentheses/description/
+- 回溯算法[link](5.backtracing.algm.md)
+
+## 最长有效括号 TODO
+https://leetcode.cn/problems/longest-valid-parentheses/discussion/
+
+
+## 删除无效括号 TODO
+https://leetcode.cn/problems/remove-invalid-parentheses/description/
diff --git a/algorithm/14.stack.monotony.md b/algorithm/14.stack.monotony.md
@@ -0,0 +1,104 @@
+monotonic stack
+
+# template
+```java
+public int[] nextGreaterItem(int[] items){
+    int[] res = new int[items.length];
+    Stack stack = new Stack();
+    for(int i=items.length - 1; i>-1; i--){
+        int item = items[i];
+        while(!stack.isEmpty() && stack.peek() <= item){
+            stack.pop();
+        }
+        res[i] = stack.isEmpty()? -1 : stack.peek();
+        stack.push(item);
+    }
+    return res;
+}
+```
+
+
+# 1. next greater element I
+```java
+public int[] nextGreater(int[] nums1, int[] nums2){
+
+}
+
+private int[] nextGreater(int[] nums){
+    int[] res = new int[nums.length];
+    Stack<Integer> stack = new Stack();
+    for(int i = res.length-1; i> -1; i--){
+        while(!stack.isEmpty() && stack.peek() <= nums[i]){
+            stack.pop();
+        }
+        res[i] = stack.isEmpty()? -1 : stack.peek();
+        stack.push(nums[i]);
+    }
+    return res;
+}
+
+```
+
+# 2. next greater element II
+Giving a circular integer array which the next element of the nums[length-1] is nums[0].
+Return **next greater number** for every element in nums. Example:
+input [2,4,1,6,3]
+output [4,6,6,-1,4], 
+- the next greater elemnt of 2 is 4; 
+- the next greater element of 4 is 6;
+- for the last element 3, the greater element is 4.
+
+So, here's how we design the algorithm:
+1. use monotony stack to find the [0,length-1) corresponding next greater element.
+2. for the element of nums[length-1], we expend the input like this nums + nums, we don't actually expend the nums, we could use modular operation instead.
+```java
+public int[] circularNextGreater(int[] nums){
+    int n = nums.length;
+    int[] res = new int[n];
+    Stack<Integer> stack = new Stack<>();
+    for(int i = 2*n -1; i > -1; i--){
+        while(!stack.isEmpty() && stack.peek() <= nums[i % n]){
+            stack.pop();
+        }
+
+        res = stack.isEmpty() ? -1 : stack.peek();
+        stack.push(nums[i % n]);
+    }
+}
+```
+
+
+
+
+# 3. stock price
+stockPrice = [33, 34, 14, 12, 16]
+output = [1, 0, 2, 1, 0] which means that how many days that the stack price will increase?
+method:
+1. loop from end of the stockPrice.
+2. if stack is empty then then hit nothing, which mean there is no item that bigger than item[i]
+3. if stack is not empty, pop until the peek item is bigger than item[i], 
+
+```java 
+public int[] nextSmaller(int[] nums){
+    int[] res = new int[nums.length];
+    Stack<Integer> stack = new Stack<>();
+    for(int i = nums.length - 1; i > -1; i--){
+        while(!stack.isEmpty() && stack.peek() >= nums[i]){
+            stack.pop();
+        }
+        res[i] = stack.isEmpty()? -1: stack.peek();
+        stack.push(nums[i]);
+    }
+    return res;
+}
+```
+
+
+# tranform !
+
+# next smaller element I
+Given a integer array names nums, 
+Return next smaller elemnts for every element in nums, example:
+input = [4,5,1,3,2]
+output = [1, 1, -1, 2, -1]
+