why数据结构与算法
一个数据结构的组成(From Wikipedia)
- A collection of data items
- The relationships among them
- The operations that can be applied to the data structure
数据结构的Common operations(From Geeks4Geeks)
- Access: Get a data item in the given data structure.
- Search: Find a particular data item in the given data structure.
- Insert: Add a data item in the given data structure.
- Delete: Delete a data item in the given data structure.
Why不同的数据结构
根据不同的应用场景,对于以上common operations调用的数量不一样,而不同的数据结构对于不同的common operations效率不一样,因此需要不同的数据结构来应对不同的应用场景,使得整个程序的效率更高。
数据结构的几大类
- Primitive Data Structure: Primitive data structures are predefined data types. They are supported by all programming languages. All the programming languages like java, c#, or any object-oriented programming language are all inherited from C and C++.
- Integer, Float, Char, Boolean, Pointer
- Non-primitive Data Structure: Not predefined in programming languages. They can be implemented with the help of primitive data types
- Linear: The data items are arranged in a sequential manner.
- Static: Array
- Dynamic: Linked List, Stack, Queue
- Nonlinear: The data items are arranged in a random manner,也因此使得
- Hash table
- Tree(Also check Wikipedia Trees: One tree node can have more than 2 child nodes)
- Graph(与其他数据结构的设计初衷都不同,主要是为了表达network)
- Linear: The data items are arranged in a sequential manner.
算法快慢的评价指标
- Space Complexity
- Time Complexity: 通常我们主要关注Big-O(Asymptotic上界), Big-Ω(Asymptotic下界)和Big-Θ(When the lower and upper bounds are the same)。从最快到最慢的次序是[latexpage] $O(1)< O(log(n)) < O(n) < O(nlog(n)) < O(n^2) < O(2^n) < O(n!)$
不同数据结构对4种common operations的效率
从下图中可以看出: static linear的Array是方便access,不方便search, insert和delete。所有的linear dynamic的数据结构(Queue, stack和linked-list)都是方便insert和delete,但是不方便access和search。Nonlinear的Tree是为了search快,Hash table是search, delete和insert都快,而Graph是有别于其他的2个nonlinear data structures(hash table和tree)以及其他的linear data structures
用到指针的数据结构
算法的几大类
基于用途对算法分类(Princeton CS算法课程https://algs4.cs.princeton.edu/home/):
- Sorting algorithm
- Searching/Traversal algorithm
- Graph algorithm
基于pattern对算法分类(https://cs.lmu.edu/~ray/notes/algpatterns/)
- Brute Force: Not really a “pattern”. Enumerate all possible solutions, unintelligently, and try them all until you find a solution.
- Divide and Conquer: Breaking down a problem into multiple independent subproblems(mutiple), solving the subproblems (recursively), and combining those solutions into a solution for the original problem.
- Mergesort, Quicksort, Median, Karatsuba’s Integer Multiplication, Matrix Multiplication, FFT, Nearest Neighbors
- Decrease and Conquer: A variant of divide and conquer where the problem is broken down into one subproblem.
- Binary search, Factorial, Selection Sort, Largest Number, Greatest Common Divisor, Topological Sort, Insertion or lookup in a binary search tree, Computing the median
- The Greedy Method: Solving a problem by doing the “best looking” thing at each step. (May miss a solution, or may miss the optimal one. But in some cases where it is known to work, it is a great approach.)
- Minimum Spanning Trees, Naive coin changing, Huffman Compression, Dijkstra’s Shortest Path
- Dynamic Programming: Solving an optimization problem by breaking down a problem into multiple overlapping(not independent) subproblems, solving the subproblems (recursively), and combining those solutions into a solution for the original problem. The idea is to cache the results of overlapping subproblems. Can be done bottom-up (table construction) or top-down (recursive with memorization).
- Interval scheduling, Longest common subsequence, Coin changing, Levenshtein distance, Matrix-chain multiplication, Integer knapsack, Shortest path, Word wrap, Traveling salesperson
- Backtracking: A method for systematically generating possible solutions to a problem in which you sometimes have to back up when realizing your partially generated candidate can’t possibly be extended to a real solution.
- Map coloring, Eight queens, Knight’s Tour, Maze solving, Regular expression matching, Generic pathfinding
- Branch and Bound: Backtracking applied to optimization problems.
- Satisfiability, Traveling salesperson, Integer programming, Nearest neighbor search, Nonlinear programming
- Hill Climbing: Solving (or finding an approximate solution to) an optimization problem by generating candidate solutions that are (hopefully) improvements over the previous candidate.
- Basic Hill Climbing chooses the “best” next step, Genetic algorithms choose a genetic mutation of the previous candidate. Simulated Annealing makes the next choice based on a particular formula used in metallurgy.
- Particle Swarm Optimization: Solving an optimization problem with a bunch of decentralized particles all searching for a solution with something that looks like it has a collective organization (e.g. ant colonies, bird flocks, animal herds, etc.)
- Neural network training, Finite element updating
- Las Vegas: A randomized algorithm that always produces the correct answer but makes no guarantees on how long it will run or how much space it will need (in the worst case).
- Used as a defense against algorithm complexity attacks.
- Randomized Quicksort, Finding a value in a collection
- Monte Carlo: A randomized algorithm that has time and space guarantees but has a small probability of giving the wrong answer. The probability of error can be reduced by running the algorithm longer.
- Used when all known deterministic algorithms for a problem are too slow, or when estimation is an inherent part of the problem.
- Miller-Rabin primality test, Approximating π (by throwing darts), Approximating integrals, Game playing
- Transform and Conquer: Solving a problem by reducing, or transforming, it to a similar (usually easier) problem whose solution implies a solution to the original.
- Preprocessing: Playing tricks with the input (input enhancement) or building up a cache (prestructuring) prior to doing the official run.
- Table of counts for counting sort, Boyer-Moore pattern matching, Storing often used data in a hashtable, Store often-used data in a search tree (B-tree, BST, Red-black, …), Heapify, prior to heapsort.
1 Comment