2011-10-23

MIT计算机科学及编程导论笔记_Introduction to Computer Science and Programing Note

1.Goal of the course;what is computation;introduction to data types,operators,and variables
what is knowledge?
I can divide knowledge into at least two categories.
陈述性知识和程序性知识
are declarative and imperative knowledge.
任何可以合法使用值的地方都可以使用变量
So,any place it’s legal to use the value.

2.Operators and operands statements;branching,conditionals,and iteration
分支程序就是一个能基于一些测试来改变指令顺序的程序
a branching program is something that can change the order of instructions based on some test

3.Common code patterns:iterative programs
在写每个循环程序的时候确保他们总是能终结,第二件要做的事情就是要确保程序返回了正确的答案
you’d like to be able to assrue yourself that they will always terminate,and then the second thing you’d like to do,is to assure yourself that it does give you back a reasonable answer.
4.Decomposition and abstraction through functions;introcuction to recursion
函数的关键是它要提供分解和抽象
the key of the function is that it’s going to provide both of these.
将代码分解为模块
it’s going to let us break up into modules.
忽略细节
suppress detail.
一种创建原语的思考方式,我们将引用这些原语,这就是继承
it is to create new primitives.and i’m going to put those in quotes,it’s a generalization.
函数的目的就是寻找计算的共同模式
The idea of a function is,that i’m going to capture a common pattern of computation.

这些本地绑定并不影响全局绑定
These local bingdings do not affect any global bindings.

递归的概念就是我将把一个问题进行分解成一个更简单的同类问题以解决我可以解决的小步骤
The idea of recursion is that I’m going to take a problem and break it down into a simpler version of the same problem plus some steps

Fibonacci
Pairs(0) = 1;
Pairs(1) = 1;
Pairs(n) = Pairs(n-1) + Pairs(n-2);

5.Floating point numbers,successive refinement,finding roots
当我们表达一些数字的时候尾数位于1和2之间,严格的小于2
when we represent something,the mantissa is between one and two.Strictly less than two.
而指数在-1022到+1023之间
The exponent,is in the range,-1022 to +1023
因此我们表达的数字可以从正负10到正负308th
So this lets us represent numbers up to about 10 to the 308th,plus or minus to the 308th,plus or minus.
现代计算机语言是64位的
the words today in a modern computer are 64 bits
我们设立1位以存储符号这个数字是整数还是负数然后11字节存储指数剩下的52个字节存放尾数(小数)
we get one bit for the sign is it a positive or negative number 11 for the exponent,and that leaves 52 for the mantissa

有些10进制的数如1/10 ,在转变为二进制的时候会无限循环下去

打印十进制数字的近似二进制数
So on most computers,if you were to print the decimal value of the binary approximation

repr()函数将数字以字符串表达.
对于浮点数计算机截至第17位数
for floats,it rounds it to seventeen digits

warry about == on floats
每一次猜想都比前一次的更接近答案的方法,这就是逐次逼近法
that each guess is better than the previous guess
This is what’s called successive approximation
起始猜想,然后迭代
which would be the initial guess,you then iterate

Bisection method二分法
处理过程是对数级的
because I am getting logarithmically progressed

6.Bisection methods,Newton/Raphson,introduction to lists

我们实际上是在解方程式f:猜想数的平方减去x
we are looking a equation f of guess equals the guess squared minus x.

我们正需求方程式f=0的解.
and we trying to solving the equation that f of guess equals 0.Looking for the root of this equation.

sqrt(x);
F(guess) = guess^2 -x;

猜想切线(导数)
You take a guess and you find the tangent of that guess.

下一个猜想是切线与x的交点.
My next guess is going to be where the tangent crosses the x axis.

这里用到了导数的概念,
This is where derivatives come in.

我们都知道切点处切线的斜率跟函数在切点处的一阶导数是相同的,也就是dy/dx
What we know is that the slope of the tangent is given by the first derivative of the function f at the point of the guess.Which dy over dx.

guess(i+1) = guess(i) -F(guess(i))/2*guess(i);

可以将第i+1次猜想等于上一次猜想也就是第i次的减去第i次的猜测的结果值除以第i次猜想的本身值
we’ll know that we can choose guess i plus 1 to be equal to the old guess guess i,minus whatever the value is of the new guess–of the old rather the old guess– divided by twice the old guess.

从猜想数3开始求16平方根
F(3) = 9 -16 = -7;
guess(i+1) = 3 -(-7/6) = 4.1666

7.Lists and mutability,dictionaries,pseudocode,introduction to efficiency

order of grows
-choice of algorithm
-map problem into class of algorithms

8.Complexity; log,linear,quadratic,exponential algorithms

增长率,当问题的规模变大时,随之时间和空间的增长
Rate of growth as the size of the problem grows.
If it was,how much bigger does this get
大O符号,代表着当输入增加时也就是问题规模变大时对应的解决方法增长上限
Big Oh notation - is basically going to be an upper limit to the growth of a function as the input grow–as the input gets large.

a的b次方
ab = (a*a)(b/2) b even;
= a(a**(b-1)) b odd;
ef exp3(a,b):
if b==1:
return a;
if(b%2)2 == b:
return exp3(aa , b/2);
else: return aexp3(a,b-1);
b even t(b) = 6 + (b/2);
b odd t(b) = 6 + (b-1) = 12 + t((b-1)/2)

t(b) = 12 + (b/2);
=12+12+t(b/4)
=12k+t(b/2^k)

k=log2 b;
In the b even case, again i’m going to let t of b be the number of steps I want to go through.
if b is eqaul to 1,and then I’ve got to do the remainder(求余数),the multiplication,and the test,i’m up to four.

And then in the even case I’ve got to do a square(求平方) and the divide.
So i’ve got six steps
对数级,线性的,平方级,指数级算法
log,Linear ,quadratic,exponential algorithm

9.Binary search,bubble and selection sorts

但如果我们用的是保存列表的列表呢?如果列表存储的不同的东西比如存储的是整数字符串浮点数列表列表的列表列表的列表的列表或者其他各种东西呢?
if i know that i have things stored in constant size.But what if i have a list of lists?What if i have a homogeneous list,a list of integers and strings and floats and lists and lists of lists and lists of lists of lists
标准的处理方法是使用链表
So in this case,one of the standard ways to use linked list
开始我们指向列表的开始
Start again,we’ll point to the beginning of the list.
在第一点,我们存下一个元素的地址偏移量,然后用之后连续的几个单元表示第一个元素如果你喜欢也可以说是第0个元素.在这里我可能需要五个内存单元在下一个位置中存放需要多大的内存偏移量跳转到下一个元素然后无论你想存放什么都可以放在接下来的单元中可能只是些空格
在下一点可能我得存放一个很长的列表了

in the first spot,I am going to store something that says,here’s how far you have to jump to get to the next element.And then,I am going to use the next sequence of things to represent the first element,or the zero-th element,if you like.In this case I might need five.And then in the next spot,I am going to say how far you have to jump to get to the next element.All right,followed by whatever i need to represent it,which might only be a blank one.And in the next spot,maybe I’ve got a really long list

here is the problem.How do i get to the nth er,the k’th element in the list,in this case?
Well i have to go to the zero-th element,and say OK,gee,to get to the next element,I’ve got to jump this here.And to get to the next element,I’ve got to jump to here,and to get to the next element,I’ve got to jump to here,until i get there.
primarily Lisp.
盒状指南针每个元素实际上由两部分组成这里我们颠倒了顺序这里有指针指向内存包含实际值的地址可能实际值也是一系列指针这里还有指向实际指向下一个元素的指针
Python store list, it is called a box pointer diagram,what we really have for each element is two things.And i’ve actually just reversed the order here.we have a pointer to the location in memory that contains the actual value,which itself might be a bunch of pointers,and we have a pointer to the actual a pointer the value and we have a pointer

一次一个的比较遍历这些元素直到找到了它,这个复杂度是n
I am just going to take an unsorted list and search it,i cound do it in linear time,right?One at a time.Walk down the elements until i find it.That wound be order n.
另一种方法,先排序再搜索,需要花nlog n时间排序,然后花log n时间搜索
On the other hand,sort it first,and search.take n logn time to sort it,then i can search it in log n time.

amortize the cost.
Linear sort + search
kn nlog n + klog n

bubbule sort
我遍历一次列表每次取两个值确认最大的元素在后面一个
I’m walking along the list once,taking two things,and saying, make sure the biggest one is next.

所以无论最大的元素在列表的什么位置跑完了一遍它总是在最下面然后我回来再做一次,下一个最大元素又能跑到倒数第二的位置.
So wherever the largest element started out in the list,by the time i get through it,it’s at the end.And then i go back and i start again,and i do the same thing.The next largest element has to end up in the second last spot.

divide and conquer algorithm
分而治之

10. Divide and conquer methods,merge sort,exceptions

merge sort

所以我在树的每个层次都要做O(n)的操作,因为在每一层我都是把问题分解成两半,以n开始然后是n/2 n/4 n/8 所以我log n遍的n次操作
have order n operations at each level in the tree.And then we have Log n level,Beacause at each stage i cute problem in half.So i start off with n then it’s n over two n over four n over eight.So i have n operations log n times, n log n

hashing
Trade space for time 用空间换时间

11. Testing and Debugging

validation
progress to uncover problem and increase confidence.
我们需要一种不会我们这种无保证的自信的方法
So we need to have a method not designed to give us unwarranted confidence.

是两件事情的组合测试和推理
It’s typically a combination of two things.Testing and reasoning.
测试就是我们输入一些信息并且运行我们的程序然后查看答案
Testing we run our program on some set of inputs.And check the answers,and say yeah

debugging
这其实是一个查明为什么程序不运行或者不按期待运行的过程
And that’s basically the process of ascertaining why the program is not working.

Defensive programming.防卫性程序设计

And that’s basically writing your programs in such a way that it will facilitate both validation and dubugging.

实验必须有做什么的能力才能被称为一个有效的科学实验呢?
what must this experiment have the potential to do,to be a valid scientific experiment?
实验必须有反驳我们假设的可能
It must have the potential to refute the hypothesis.

12. More about debugging,knapsack problem,introduction to dynamic programming

first,keep in mind that the bug is probably not where you think it is.

simple things you can care:

自变量顺序错误
Reversed order of arguments.

忘记初始化
Initialization.

对象与值相等
Object versus value equality.

别名
Aliasing deep vs shallow capy.

Keep record of what you tried.

Think about reconsidering your assumptions.

Bin packing

sequence alignment

Knapsack problem 背包问题

Continuous problem 连续问题
4/bs Au dust
3/bs Ag dust
10/bs Raisins

function is cost value of gold times how many pounds of gold.Plus the cost of silver times however many pounds of silver.Plus cost of raisins times the number of pounds of raisins.
maximize the function Cg PG + Cs PS + Cr * PR

the constraint is PG + PS + PR <=8
Greedy algorithm
at every step you do what maximizes your value at that step.So there’s no planning ahead.You just do what is ever best.

Locally optinal not always lead to global optimums.

zero-one knapsack problem 0/1 背包问题
exhaustive enumeration 穷举法

E(n ,i = 1) W i X i <= C

Dynamic programming

有重叠子问题且被称为最理想子结构的情况
there are overlapping sub-problems and what’s called optimal substructrue.

redundant computation 累赘计算

13.Dynamic programming: overlapping subproblems,optimal substructure

Overlapping subproblems 重叠子问题
Memoization
我们在第一次计算一个值后就记录一个值.然后在后续问题中使用这个值.
we record a value the first time it’s computed,then look it up the subsequent times we need it.

这就是动态编程这种通用技术的核心
And this is what lies at the heart of this very general technique called dynamic programming.

实际上它也是很多保存结果的计算技巧的核心
And in fact,it lies at the heart of a lot of useful computational techniques where we save results.

Decision Tree 决策树
Back track

14.Analysis of knapsack problem,introduction to object-oriented programming

Size of solution
运行时间与装进背包的物品数目有关
how long it takes to run is related to how many items i end up being able to fit into the knapsack.
要记住的物品数量是和我能在背包中装多少东西有关
So the number of items i have to remember is related to how many items i can fit in the Knapsack.

这里有的是伪多项式算法
so we’d much rather define big O in terms of the inputs.

通常我们查看计算复杂度O的时候我们会把它定义为实现输入的代码的大小
computational complexity,big O,what we’ll define the in terms of,is the size of the coding of the input.
size of coding of input

Pseudo polynomial algorithm 伪多项式

polynomial in the numeric value of input

1.动态编程中,我们是在用空间换时间
In dynamic programming, one of the things that’s going on is we’re trading time for space.

2. 不要被指数型问题吓到
Two,don’t be intimidated by exponential problems.

3. 动态编程有广泛的应用,每当你在处理一个看上去有自然而然的递归算法的时候

Dynamic programming is broadly useful.Whenever you’re looking at a problem that seems to have a natural recursive solution,think about whether you can attact it with dynamic programming.

4. 减少问题 problem reduction

Module (modularity)
我想要说的是一种类型的模块当然这是因为我们关注的是程序的可组合性
what i want to be talking about is modules of one sort,and of course that’s because what we’re interested in is modularity.

模块是相识函数的集合
Now i think of a module as a collection of related functions.

我们将要通过圆点标记法来使用函数( . )
and we’re going to refer to the functions using dot notation.

一种特殊的模块,模块中包括了类或者它们本身就是类
a particular kind of module,and those are the modules that include classes or that are classes.

在这个问题上我们将要强调要在面向对象编程的环境中使用类
In this subject we’re going to emphasize using classes in the context of what’s called object-oriented programming.

data abstractions
Abstract data type

对象就是数据和函数的集合
An object is a collection of data and functions.
这里关键思想就是将函数和操作数据的函数绑在一起成为一个东西
The key idea here is to bind together the data and the functions that operate on that data as a single thing.

user defind types
数据与函数的结合是面向对象编程的核心
This combining of data and functions on that data is a very essence of object-oriented programming.

we also call it encapsulation 封装

消息传递传递隐喻的消息
we talk about it in terms of message pass, a message passing metaphor.

one object can pass a message to another object and the receiving object responds by executing one of its methods on the object.

l.sort();
这个对象传递一个消息sort 这个消息让我们去找sort方案,然后应用到对象I
pass the object I the message sort,and that message says find the method sort,and apply it to the object I.

notion of an instance.
类是有相同特质对象的集合
A class is a collection of objects with characteristics in common.

and extend the language.

15.Abstract data types,classes and methods

Class ,a template for creating instances of an object.
类是创建对象实例的模板

浅意义上的相等和深意义上的相等
shallow equality and deep equality.

测试浅意义上的相等
The first thing is testing shallow equality.
它本质是说,对于给定的两个东西它们是不是指向同一个引用
is essentially saying,given 2 things,do they point to exactly the same referent?
看看这两个是否指向内存中的同一位置.
is saying,do these things point to exactly the same spot in memory.

建立不同的版本的这些东西的一个模板
that’s going to let us build versions of these things.

这里有一个奇怪的额外的名为self的变量
There’s this weird extra variable in there called self.

当我创建了一个实例我应该能够获得那些特性化这个实例的东西
When i create a instance,i have to be able to get access to the things that characterize that instance.
这里特性化这个实例的东西是内在的详细说明要做什么的参数
what characterizes an instance here,it’s the internal parameters that specify what is going on.

init –when we call it, create an instance
–use “self” to refer to that instance

init方法创建了一个指向这个实例的指针
Init create a pointer to the instance.

把self作为指向这个实例的指针传了进去
And then it needs to have access to that,so it calls it passing in self as the pointer to the instance.

init这个方法可以去访问内存中的这一块了然后在这块内存里面
it says it has access to that piece in memory,and now inside of that piece of memory

self will always point to the particular instance.
same other programming languages do not provide that pointer.

Methods - can access values of the specific instance.

Data hiding
一个人仅仅可以通过定义好的方法来访问实例变量

one can only access instance values through defined methods.

python doesnot do this

str - printed representation,it converts things into a string type.代表打印,将东西转为字符串格式.

cmp - doing comparisons.

Operate overloading

Instance (

methods
fields
)
attributes

与实例相关联的是方法和域,它们统称为实例的属性
associated with an instance,we have both methods and fields.these are both called attributes of the instance.

16.Encapsulation,inheritance,shadowing

class
是一种将数据聚集起来的数据,这是模块/抽象的概念,在这里都把它们当作了原语
just trying to cluster data together.And this is a notion of modularity slash abstraction,where i’m treating them as primitives.

类将会创建实例,也就是该结构的特例
Class is used to make instances,meaning particular versions,of that structure

当我们调用person类定义的时候,它就创建了一个实例
when we called the class definition,person,it create an instance,there it is.

把它想成是指向内存中一点的指针,然后我们所做的是初始化成员函数
Thinkd of it as a pointer to a spot in memory,and then what we do is,then we call,or find that init method.

初始化(init)成员函数
And the first argument self,points to the instance.

17.Computational models:random walk simulation

探索一些随机性
Dealing with & exploiting randomness

随机指标它们包含了一些随机性
Stochastic they incorporate randomness.

弄清数据的概念
Making sense of data

当我们在再说一次在对世界进行建模我们发现里面有大量的数据
As we look at,again,modeling the world,what we discover,there’s a lot of data out there

评价答案的质量
Evalvating quality of answers

从简单的地方入手
Start Simple

也就是从一些对真实问题的简单估算开始
So start with some simple approximation to the real problem.

我会去假设就像我们之前做的一样我们有一个笛卡尔坐标系,玩家是站在一片像坐标纸一样的场地上
And I’m going to assume,as we’ve seen before,that we have Cartesian coordinates and that the player is standing on a field that has been cut to resemble a piece of graph paper.

只能选择四个方向中的一个:北南东或西
only go in one of four directions: north, south, east, or west.

然后对着一个方向迈出一步.不失去一般性
student is here and takes a step in one direction or another.without loss of generality

有四分之三的几率学生离出发点更远了
three times out of four you get further away.

模拟无规则运动
Simulate random walk

模拟也就是我们去尝试着建立一个假扮成真实世界的模型然后模拟一些东西
Simulation where we try build the model that pretends it’s the real world and simulates what goes on,and a random walk.

来尝试着给你们做一个实例分析
Trying to give you a case study

开始去想象合适的抽象数据是什么
Thinking about what might be the appropriate data abstractions.

位置
Location

Compass point

Field

Drunk

罗经点的数据抽象来代表这个学生会朝北南东西四个方向走
compass point to capture the notion that the student is going north,south,east,or west.

把位置和方向的概念分离开了
separate the notion of direction

因此可以把它当作是一个完全的笛卡尔平面而这个人所处的位置就是平面上的一个点
So think of that as the whole Cartesian plain,as opposed to a point in the plane,which is what the location is.
因为全部问题是关于这个喝醉的人现在在哪儿的
since after all the whole problem talks about where the drunk is.

pseudo random

18.Presenting simulation results,Pylab,plotting

执行一个模拟实验的内循环
1) we start with an inner loop that simulates one trail.

然后我会把这个内循环”包”在执行恰当数量的实验的循环中
2)And then i’ll quoto,enclose,unquote the inner loop in another loop that conducts an appropriate number of trials.

计算实验的统计数据并将它们表示出来
3)Calculate and present some relevant statistics about the trials.

19.Biased random walks,distributions

有偏的随机漫步

Polymorphism 多态性

尝试创建一个代表性的场景
Generate a sample of representative scenarios.

因为穷举所有可能的洲是不大可能的
Because an exhaustive enumeration of all possible states would be impossible.

你可以穷举空间,然后看看它会得到什么结果
you can exhaustively enumerate the space and then see what’s going on.

Experimental device

I’m running an experiment designed to give me some information about the real world.

他们都是描述性的而非预测性的
they are typically descriptive not presciptive.

they describe a situation,they don’t tell you what the answer is.

模拟并不是一个最优化过程
a simulation is not an optimization procedure.When we looked at optimization,that was prescriptive.We ran an optimization algorithm,and it gives us the best possible solution.

20.Monte Carlo simulations,estimating pi

将模拟模型进行分类的不同途径
classify simulation models.
是随机还是确定性的问题
it’s stochastic or deterministic

静态还是动态
static vs dynamic time

第三个分歧就是离散和连续
A third dichotomy is discrete vs continuous.

21.Validating simulation results,curve fitting,linear regression
验证模拟结果,曲线拟合,线性回归

How many samples?
How accurate?
Biased sample

Data
Models that explain data
Consequences that follow from the models

Whenever we try and fit something,We need some sort of an objective function.

Least squares fit

进行最优化的方式,实际上还有更有效率的办法
It turns out that for this particular optimization,there’s something more efficient.

对这种问题有一个闭型解决方案
there is a closed form way of attacting this

线性回归
Linear regression

因变量y和自变量的关系是这些参数的线性函数
the relationship of the dependent variable y to the independent variables is assumed to be a linar function of the parameters.

R2 - coefficient of determination

R2 = 1 - EE/Dr

就是它通过模型中的变量去尝试获取因变量的比例
It’s attempting to capture the proportion of the response variation explained by the variables in the model.

所以你可以看到一部分值的改变是因为变量值的改变
So you’ll have some amount of variation that is explained by changing the values of the variables

我有一些预测数据如果你愿意
So e e is going to be the errors in the estimation.

r2 = 0.9

基本上这些变量90%的改变都能靠这个模型来解释
approximately 90 percent of the variation in the variables can be explained by the model.

潜在变量
lurking variable

宅鱼

部落格

MIT计算机科学及编程导论笔记_Introduction to Computer Science and Programing Note