思维之海

——在云端,寻找我的星匙。

Review of ICS

“Intro to Computer Systems” is a CS course held by CMU with course number CS213. Its home page says:

The ICS course provides a programmer’s view of how computer systems execute programs, store information, and communicate. It enables students to become more effective programmers, especially in dealing with issues of performance, portability and robustness. It also serves as a foundation for courses on compilers, networks, operating systems, and computer architecture, where a deeper understanding of systems-level issues is required. Topics covered include: machine-level code and its generation by optimizing compilers, performance evaluation and optimization, computer arithmetic, memory organization and management, networking technology and protocols, and supporting concurrent computation.

I note down some of the good stuff in case I forgot sometime. But mostly just a review before final test, don’t expect too much LOL.

Video resource on bilibili is: Here (2015 Fall) or Here(2017 Fall). In case you need it.
Website of this course: https://www.cs.cmu.edu/~213/
Textbook PDF: Computer Systems: A Programmer’s Perspective(2rd edition),Chinese version; 3rd edition on amazon.
Lecture notes: https://www.cs.cmu.edu/~213/schedule.html
Test PDF: https://www.cs.cmu.edu/~213/exams.html, test(answer), Moc, NE
Review Slides(Chinese): review

Overview

Ex.1: Is $x^2 \geq 0$ ?

  • Float’s: Yes!
  • Int’s: Not quiet…
    • 40000$\times$40000=1600000000
    • 50000$\times$50000=-179496729 (overflow)

Ex.2: Is (x+y)+z = x+(y+z) ?

  • Float’s:
    • (1e20+ -1e20) +2.14 = 3.14
    • 1e20+ (-1e20 +2.14) = 0 (lose precision)

Sometimes this kind of issue(corner cases) can be essential! (Even though most of time it doesn’t matter)

Get to know assembly(汇编).

Get to know memory matters.

Ex.3: (Memory referencing bugs)

Double type has 2 bytes.( refer to a[2] & a[3])

System can be vulnerable by this.

Ex.4. (Memory system performance)

Those two codes’ performance differs from each other as almost 10 times on speed.

Showing as follows: (See in the Memory-Cache chapter, called “Memory Mountain”)

A programmer ‘s perspective.

  • thing need to know for whom really do the coding

David talks about how influential this course is. (Even a story about a bookstore in Peking University)

MORE(Course Resources)

There are more topics taught in this class about network programing including several chapter.

Seems the course reschedule at Jan 15th. I can’t access the slides now. The note ends here. (Still can, use the schedule below instead)

Schedule

from: https://www.cs.cmu.edu/~213/schedule.html

15-213/18-213/15-513: Intro to Computer Systems, Fall 2018

Notes on links

  • pptx links are to Powerpoint versions of the lectures
  • pdf links are to Adobe Acrobat versions of the lectures
  • code links are to directories containing code used for class demonstrations
  • tar links are to archive files in TAR format. Use the tar command on a linux machine to unpack these
Lecture/Recitation
Recitation 1: No recitation—Semester starts with first lecture
Overview (pptx , pdf , code)
Bits, Bytes, & Integers I (pptx , pdf , code)
Recitation 2: No recitation—Labor Day / Linux Boot Camp (pdf)
Bits, Bytes, & Integers II (pptx , pdf , code)
Floating Point (pptx , pdf)
Recitation 3: Datalab and Data Representations (pdf , handout , solution)
Machine Prog: Basics (pptx , pdf)
Machine Prog: Control (pptx , pdf)
Recitation 4: Bomb Lab (pdf , pptx , handout)
Machine Prog: Procedures (catchup-pptx , catchup-pdf , pptx , pdf)
Machine Prog: Data (pptx , pdf)
Recitation 5: Attack Lab and Stacks (pdf , pptx , activity)
Machine Prog: Advanced (pptx , pdf , code)
Code Optimization (pptx , pdf , C Bootcamp slides pdf , pptx , activity)
Recitation 6: C Review (pdf , activity)
The Memory Hierarchy (pptx , pdf)
Cache Memories (pptx , pdf)
Recitation 7: Cache Lab and blocking (pptx , pdf)
Linking (pptx , pdf , code)
ECF: Exceptions & Processes (pptx , pdf , code)
7pm - 9pm Exam Review in Rashid Auditorium
Recitation 8: Exam Review (pptx , pdf)
ECF: Signals & Nonlocal Jumps (pptx , pdf , code)
System Level I/O (pptx , pdf , code)
Recitation 9: Shell lab, processes, signals, and I/O (pdf , pptx)
Virtual Memory: Concepts (pptx , pdf)
Virtual Memory: Systems (pptx , pdf)
Recitation 10: TSHLab and Virtual memory (pptx , pdf)
Dynamic Memory Allocation: Basic (pptx , pdf)
Dynamic Memory Allocation: Advanced (pptx , pdf)
Recitation 11: Malloc lab (Part I) (pptx , pdf , code)
Network Programming (Part I) (pptx , pdf , code)
Network Programming (Part II) (pptx , pdf , code)
7pm - 8pm Malloc Bootcamp in Rashid Auditorium (pdf)
Recitation 12: Malloc lab (Part II) (pptx , pdf , code)
Concurrent programming (pptx , pdf , code)
Synchronization: Basic (pptx , pdf , code)
Recitation 13: Proxy lab (pptx , pdf)
Synchronization: Advanced (pptx , pdf , code)
No lecture—Thanksgiving
Recitation 14: Synchronization (pptx , pdf)
Thread-Level Parallelism (pptx , pdf)
Future of Computing I (pptx , pdf)
Recitation 15: Exam review
Future of Computing II (pptx , pdf)
Final exam

Exam

https://www.cs.cmu.edu/~213/exams.html

Representing&Processing of Information

Bits

  • Binary: 11110001
  • Decimal: 241
  • Hexadecimal: 0xF1

Boolean Algebra

And(&&), Or(||), Not(~/!), Exclusive-Or / Xor(^)

Opearte on bit vectors: &, |

  • can be used to represent sets
    • 01101001 {0, 3, 5, 6}
    • 76543210 ———1 means the set contains the element, 0 otherwise
Operation Name Byte Set
& Intersection 01000001 {0, 6}
| Union 01111101 {0, 2, 3, 4, 5, 6}
^ Symmetric difference 00111100 {2, 3, 4, 5}
~/! Complement 10101010 {1, 3, 5, 7}

p && *p: avoids null pointer access

Shift Operators

  • Left shift: <<
  • Right shift: >>

There is a difference between logical shift and arithmetical shift, see in the following pic: The negative num’s arithmetical right shift will fill in 1s. (Others are 0s)

Intergers

Unsigned & Signed

  • Unsigned Values
    • $U_{Min} = 0$
      • 000…0
    • $U_{Max} = 2^w - 1 $
      • 111…1
  • Two’s Complement Values【求负:取反加一
    • $T_{Min} = -2^{w-1}$
      • 100…0
    • $T_{Max} = 2^{w-1} -1$
      • 011…1
  • Other Values
    • Minus 1
      • 111…1

Here is a graph: (4 bits)

Notice:

Conversion Visualization

Casting(转换)

Principle: If there is a mix of unsigned and signed in single expression, signed values implicitly cast to unsigned.

Examples above are for W=32: TMIN=-2,147,483,648 ; TMAX=2,147,483,647.

1
2
3
for(i = n-1; i >= 0; i--)
do_sth(i);
//if i is unsigned, this loop will run forever!
1
2
3
for(i = n-1; i - sizeof(str)>= 0; i--)
do_sth(i);
//sizeof() return an unsigned value, this loop will crash very strangely!

Extension

Sign Extension

Principle: Add all 1s for the new blank.

Unsigned

Principle: Add all 0s for the new blank. (Similarly)

Truncation(截断)

Unsigned: mod operation, Just drop it.
Signed: similar to mod (sign can be converted in the process)

Arithmetic

Addition

Unsigned Addition

Drop the overflowed part. (But when computing use extra bits)

Two’s Complement Addition

Drop the overflowed part as well. (Will get the exact correct answer)

TAdd and UAdd have identical bit-level behavior.

Visualizing 2’s complement addition:

Multiplication

Unsigned multiplication

Still drop the overflowed part.

Signed multiplication

Drop the overflowed part. (sign can be converted in the process)

For negative numbers: drop the overflowed part will somehow still get right answer.

1101 -3 13(U)
1110 -4 14(U)
10110 6 6(U)

Cause the actual effective bits are not really overflowed. (1101=101, 1110=10)

Shift

Power-of-2 Multiply with Shift

u<<k gives u*2^k(both signed and unsigned)

Shift is much faster than multiplication.

Unsigned Power-of-2 Divide with Shift

Use logical shift.(0s)

Signed Power-of-2 Divide with Shift

Use arithmetical shift.

(>>)1010 -6
(>>)1101—->1110 -3—->-2
1110—->1111 -2—->-1

Trick: before you shift, you should add a bias.(For this example—->1)
But for c, it seems we don’t use bias. (右移向下舍入)

Summary

Why should I Use Unsigned?

In java there is no unsigned.

A new principle: Don’t use without understanding implications.

Counting down with Unsigned:

Finally, Why should I Use Unsigned?

  • Do Use When Performing Modular Arithmetic
    • Multiprecision arithmetic
  • Do Use When Using Bits to Represent Sets
    • Logical right shift, no sign extension
  • Do Use In System Programming
    • Bit masks, device commands,…

Representation in memory, pointers, strings

Byte-Oriented Memory Organization

BUG: SegmentFault.

Trick: $2^{10}=1024\approx 1000=10^{3}$.

Machine Words: Any given computer has a “Word Size”.(32bit, 64bit)

For 32bits: limits addresses to 4GB(2^32 bytes)
For 64bits: limits addresses to 18PB(2^64 bytes)

Word-Oriented Memory Organization

Addresses Specify Byte Locations.

Example Data Representations:

Byte Ordering(Endianness)

  • Big Endian: Sun (Oracle SPARC), PPC Mac, Internet
    • Least significant byte has highest address
  • Little Endian: x86, ARM processors running Android, iOS, and Linux
    • Least significant byte has lowest address

Ex:(Variable x has 4-byte value of 0x01234567, Address given by &x is 0x100)

Other representation can be see in the slides.


Puzzles: (32 bits for int)

For x > y ---> -x < y, note that there is a special occasion:

For(x|-x)>>31==-1, consider x=0.

Forx>>3==x/8, using the ceil function(round down)

  • for/, always map close to 0
  • for>>, always map with ceil function

Floating Point

Background

What is 1011.101(2) ?

Representation:

  • Bits to right of “binary point” represent fractional powers of 2
  • Represents rational number

Here are some examples:

Value Representation
5 3/4 101.11
2 7/8 10.111
1 7/16 1.0111

Limitation:

  • Can only exactly represent numbers of the form x/2^k
    • Other rational numbers have repeating bit representations
  • Just one setting of binary point within the w
    • Limited range of numbers (very small values? very large ( ——> floating point)

IEEE standard

Numerical Form:

  • Sign bit s determines whether number is negative or positive
  • Significand M normally a fractional value in range [1.0, 2.0).
  • Exponent E weights value by power of two

There are mainly two types of floats:

Three “kinds” of floating point numbers:

“Normalized” Values: (Very large and small numbers)

  • When: exp ≠ 000…0 and exp ≠ 111…1
  • Exponent coded as a biased value: E = exp – Bias (To represent small float numbers, just like 2’s complement)
    • exp: unsigned value of exp field (Also: exp = E + Bias)
    • Bias = $2^{k-1} - 1$, where k is number of exponent bits
      • Single precision: 127 (exp: 1…254, E: -126…127
      • ▪ Double precision: 1023 (exp: 1…2046, E: -1022…1023)
  • Significand coded with implied leading 1: M = 1.xxx…x
    • xxx…x: bits of frac field
      • Minimum when frac=000…0 (M = 1.0)
      • Maximum when frac=111…1 (M = 2.0 – ε)
    • Get extra leading bit for “free”

Ex:

Note: 0<=Exp<=255, -127<=E<=128

Denormalized Values: ( represent values in $[0,1)$, especially 0)

  • Condition: exp = 000…0
  • Exponent value: E = 1 – Bias=$2-2^{k-1}$ =-126 (instead of exp – Bias) (why?)
  • Significand coded with implied leading 0: M = 0.xxx…x
    • xxx…x: bits of frac
  • Case
    • exp = 000…0, frac = 000…0
      • Represents zero value
      • Note distinct values: +0 and –0 (why? - signed)
    • exp = 000…0, frac ≠ 000…0
      • Numbers closest to 0.0
      • Equispaced (mean distribution)

Special Values: (Too Large to represent)

  • Condition: exp = 111…1
  • Case: exp = 111…1, frac = 000…0
    • Represents value $\infty$ (infinity)
    • Operation that overflows
    • Both positive and negative
    • E.g., $1.0/0.0 = +\infty ,−1.0/−0.0 = + \infty, 1.0/−0.0 = − \infty$ (Not number)
  • Case: exp = 111…1, frac ≠ 000…0
    • Not-a-Number (NaN)
    • Represents case when no numeric value can be determined
    • E.g., sqrt(–1),

Ex: (Do this and you will get the idea)

Example and properties

Tiny Floating Point Example:

8-bit Floating Point Representation
▪ the sign bit is in the most significant bit
▪ the next four bits are the exp, with a bias of 7
▪ the last three bits are the frac

As you can see as follows, look at the red arrows. The mechanism actually give a smooth change from De. to Nor.

the distribution gets denser toward zero: (1s-3e-2f)

6-bit IEEE-like format (close-up view): (1s-3e-2f)

Operations

Floating Point Operations: Basic Idea
▪ First compute exact result
▪ Make it fit into desired precision
▪ Possibly overflow if exponent too large
▪ Possibly round to fit into frac

Rounding(取整)

Rounding Modes: (illustrate with $ rounding)

Round to nearest, but if half-way in-between then round to nearest even(偶)

Ex.

Addition

Multiplication

Lecture note: here.

Floating point in C

  • C Guarantees Two Levels
    • float single precision
    • double double precision
  • Conversions/Casting
    • Casting between int, float, and double changes bit representation
    • double/float → int
      • Truncates fractional part
      • Like rounding toward zero
      • Not defined when out of range or NaN: Generally sets to TMin
    • int → double
      • Exact conversion, as long as int has ≤ 53 bit word size
    • int → float
      • Will round according to rounding mode

Floating Point Puzzles:

Summary

Machine Programming

Basics

Intel x86 Processors : Complex instruction set computer (CISC).

The class will mainly use gcc complier.

2018 State of the Art: (Coffee Lake)

History blablabla…

C, assembly, machine code

Definitions

  • Architecture: (also ISA: instruction set architecture) The parts of a processor design that one needs to understand for writing correct machine/assembly code
    • Examples: instruction set specification, registers
    • Machine Code: The byte-level programs that a processor executes
    • Assembly Code: A text representation of machine code
  • Microarchitecture: Implementation of the architecture
    • Examples: cache sizes and core frequency
  • Example ISAs:
    ▪ Intel: x86, IA32, Itanium, x86-64
    ▪ ARM: Used in almost all mobile phones
    ▪ RISC V: New open-source ISA

Assembly/Machine Code View:

Turning C into Object Code

Ex:

1
gcc -Og -S sum.c

Use gcc to comply
-S: stop for the 1st step
-Og: what kind of optimization on code you choose (for debugging)
-O1: a lot of optimization( pretty hard to understand)

Data Types

C Data Type assembly suffix byte
char byte b 1
short word w 2
int (two) word l 4
char * (two) word l 4
float single precision s 4
double bio-precision l 8

Disassembly(反汇编)

Ex:

1
2
gcc -Og -S sum.c -o sum
objdump -d sum > sum.d //disassemly

sum is a binary file(to disassembly the origin function)

Registers

x86-64 Integer Registers:

%rax(%eax) reserves the function’s return value.
$eip stands for instruction register and can not be changed. (Not in the picture)

%rsp(%esp): stack pointer.
%rbp(%ebp): base pointer.

Note: (the stack’s in memory top has lower address) (See more in the Procedures)

1
2
3
4
5
6
pushl %ebp  #equal to
subl $4,%esp
movl %ebp,(%esp)
popl %ebp #equal to
movl (%esp),%ebp
addl $4,%esp

IA32 Registers:

Just a legacy thing.

Move

1
movq Source,Dest

Operand types showing above,

Ex: (Understanding Swap())

1
2
3
4
5
6
swap:
movq (%rdi), %rax # t0 = *xp
movq (%rsi), %rdx # t1 = *yp
movq %rdx, (%rdi) # *xp = t1
movq %rax, (%rsi) # *yp = t0
ret

Addressing Modes

Simple Memory Addressing Modes

  • Normal (R) Mem[Reg[R]]
    • Register R specifies memory address
      Aha! Pointer dereferencing in C
1
movq (%rcx),%rax
  • Displacement D(R) Mem[Reg[R]+D]
    • Register R specifies start of memory region
      Constant displacement D specifies offset
1
movq 8(%rbp),%rdx

Complete Memory Addressing Modes

  • Most General Form D(Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]+ D]
    • ▪ D: Constant “displacement” 1, 2, or 4 bytes
      ▪ Rb: Base register: Any of 16 integer registers
      ▪ Ri: Index register: Any, except for %rsp
      ▪ S: Scale: 1, 2, 4, or 8 (why these numbers?)
  • Special Cases
    • (Rb,Ri) Mem[Reg[Rb]+Reg[Ri]]
      D(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]+D]
      (Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]]

Ex. (Addressing Modes)

Arithmetic & logical operations

Address Computation Instruction

1
leaq Src, Dst

Src is address mode expression
Set Dst to address denoted by expression

Uses: (Actually similar to move op)
▪ Computing addresses without a memory reference
▪ E.g., translation of p = &x[i];
▪ Computing arithmetic expressions of the form x + k*y
▪ k = 1, 2, 4, or 8

Difference from mov: (example)

1
2
3
4
5
6
7
mov 1(%eax),%ebx
# b = *(a + 1)
lea 1(%eax),%ebx
# b = a + 1
# equal to add & mov without reference
--------------------
# Consider %eax actually reserve an address, then lea load an address instead of load address's reference value in memory like mov

Some Arithmetic Operations

Two Operand Instructions: (src=source)

Watch out for argument order! Src,Dest
(Warning: Intel docs use “op Dest,Src”)
No distinction between signed and unsigned int (why?)

One Operand Instructions: (See book for more instructions)

Ex.

Control

Control: Condition codes

Compare Instruction:

1
cmpq Src2,Src1 # Src1 - Src2

Test Instruction:

1
testq Src2,Src1 # Src1 & Src2

Those instructions doesn’t change the value of registers.

SetX Instructions:

Help to change bit level values(t low-order byte of destination) for arithmetic.

Ex.

Note: (Intel x86 machine sets the upper bits to zeros implicitly)

Conditional branches

Jumping

1
jmp .L1 #address

Ex.

It’s behavior is just like to “goto” statement in C.

Condition move: (if the instruction is simple, calculate it first and then judge)

Loops

Do-While” Loop Example: (it’s not often used)

While Loop Example: (there is an initial test)

“For” Loop Form: (can be converted into other 2 kinds of loops)

Switch Statements

Ex. (use “break” to end early)

Jump Table Structure: (Avoid to step though all the codes)

Ex:

ja: jump above. (x>6 and x<0 will jump to default case)
jmp: goto jump table

.quad: state that need a 8byte value here.

Jump table:

Code Blocks (x == 2, x == 3): (Note that the block doesn’t have some orders)

Code Blocks (x == 5, x == 6, default):

Summary:

Procedures

Function, method, procedure….

Mechanisms

Stack Structure

x86-64 Stack:

Note: the direction are controversial maybe.
(The memory doesn’t change, only the value of %rsp)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#the %ebp contains a value to save
pushl %ebp #equal to
subl $4,%esp
movl %ebp,(%esp)
popl %ebp #equal to
movl (%esp),%ebp
addl $4,%esp
----------------------
pushl %rbp #equal to
subl $8,%rsp
movl %rbp,(%rsp)
popl %rbp #equal to
movl (%rsp),%rbp
addl $8,%rsp
1
2
3
leave #equal to
movl %ebp,%esp
popl %ebp

Ex.

A example: Here. (Chinese)

Calling Conventions

Passing control

1
callq address <func_name>

Passing data

Ex.

Managing local data

Call Chain:

Stack Frames: (%rsp)

Example see in the slides.(Also the Illustration of Recursion part)

Data

silde.

Arrays

Easy.

Ex.

Structures

Alignment: (对齐)

Alignment is depending on the next data type.

Save space:

Floating Point

Similarly.

Advanced Topics

slide.

Memory Layout

Buffer Overflow

when exceeding the memory size allocated for an array.

String Library Code:

Ex; (Buffer Overflow Disassembly)

Things can go wrong without a crash.

For string, note that there is a byte order.

Avoid Overflow Vulnerabilities in Code:

1
fgets(buf, 4, stdin);

System-Level Protections can help.

  • Randomized stack offsets

Nonexecutable code segments.

Stack Canaries can help.

  • Idea
    ▪ Place special value (“canary”) on stack just beyond buffer
    ▪ Check for corruption before exiting function
  • GCC Implementation
    ▪ -fstack-protector
    ▪ Now the default (disabled earlier)

Unions


Byte Ordering Example: (only ordered inside a single data)

Code Optimization

slides.

Generally Useful Optimizations

Optimizations that you or the compiler should do regardless of processor / compiler.

Code Motion

Reduction in Strength

Share Common Subexpressions

Optimization Example: Bubblesort

Optimization Blockers

Procedure calls

Note the function strlen(), it will be running the whole time and make the program perform badly.($O(n^2)$)

Memory aliasing

Removing Aliasing: (don’t use reference unless you have to)

Exploiting Instruction-Level Parallelism

Cycles Per Element (CPE)

Benchmark Performance:

Loop Unrolling

Multiple Accumulators

Loop Unrolling with Separate Accumulators (2x2):

Summary:

The Memory Hierarchy

slides.

Random-Access Memory(RAM).

RAM comes in two varieties:
▪ SRAM (Static RAM)
▪ DRAM (Dynamic RAM)

Disk.

Solid State Disks(SSD).

Bus.(总线)

Locality of reference

Caching in the memory hierarchy

General Cache Concepts:

See in the slides.

3 Types of Cache Misses:

For conflict miss, each block represent one specific next level cache block. (Can’t use to save other values)

Cache Memories

slides.


Cache organization and operation

Direct Mapped Cache (E = 1): (One line per set. Assume: cache block size B=8 bytes)

Locate set
Check if any line in sethas matching tag
Yes + line valid: hit
Locate data starting at offset

If tag doesn’t match (= miss): old line is evicted and replaced

Ex. (Direct-Mapped Cache Simulation)

E-way Set Associative Cache (Here: E = 2):

Ex. (2-Way Set Associative Cache Simulation)

What about writes?

Performance impact of caches

The memory mountain

Read throughput (read bandwidth)
▪ Number of bytes read from memory per second (MB/s)
Memory mountain: Measured read throughput as a function of spatial and temporal locality.
▪ Compact way to characterize memory system performance.

Rearranging loops to improve spatial locality

Ex. (Matrix Multiply Performance)

Using blocking to improve temporal locality

See in the slides.

Linking

slides.

Ex. (Example C Program)

Programs are translated and linked using a compiler driver:

1
2
linux> gcc -Og -o prog main.c sum.c
linux> ./prog

Why Linkers?

  • Reason 1: Modularity
  • Reason 2: Efficiency (time/space)

What Do Linkers Do?

  • Step 1: Symbol resolution (During symbol resolution step, the linker associates each symbol reference
    with exactly one symbol definition.)
  • Step 2: Relocation (Merges separate code and data sections into single sections)

Three Kinds of Object Files (Modules)

  • Relocatable object file (.o file)
  • Executable object file (a.out file)
  • Shared object file (.so file)

Linker Symbols

just like: public, friend, and private.

Program symbols are either strong or weak:

  • Strong: procedures and initialized globals
  • Weak: uninitialized globals
    Or ones declared with specifier extern

Rule 1: Multiple strong symbols are not allowed

  • Each item can be defined only once
    Otherwise: Linker error

Rule 2: Given a strong symbol and multiple weak symbols, choose the strong symbol

  • References to the weak symbol resolve to the strong symbol

Rule 3: If there are multiple weak symbols, pick an arbitrary
one

  • Can override this with gcc –fno-common

Ex. (Linker Puzzles)

Global Variables: (caution)

  • Avoid if you can
  • Otherwise
    § Use static if you can
    § Initialize if you define a global variable
    § Use extern if you reference an external global variable
     § Treated as weak symbol
     § But also causes linker error if not defined in some file
    

关于text段、data段和bss段

Shared Library (Modern Solution)

地址重定位:静态重定位和动态重定位

Case study: Library interpositioning

ECF(Exceptional Control Flow)

Exceptions & Processes

Exceptional Control Flow

Exceptions

Interrupts(Asynchronous Exceptions)

Synchronous Exceptions

Processes

Process Control

Obtaining Process IDs

pid_t getpid(void)

  • Returns PID of current process

pid_t getppid(void)

  • Returns PID of parent process

Creating and Terminating Processes

void exit(int status)

terminated.

Convention: normal return status is 0, nonzero on error

int fork(void)

Returns 0 to the child process, child’s PID to parent process。(called once, return twice

Ex. (can’t predict the exact return value of fork)

Two consecutive fork:

Nested forks in parent: (Similar)

More exercise: Sample, Modor

int wait(int *child_status)

return child’s PID if succeed, -1 otherwise.

Ex.

Ex. (fork + wait)

Note: (The processes will run parallelly unless some of them have to wait)

int execve(char *filename, char *argv[], char *envp[])

  • never returns.

Signals & Nonlocal Jumps

slides.

setjmp

  • use once, return twice

longjmp

  • use once and never return

Virtual Memory

Concepts

slides.

Address spaces

Why Virtual Memory (VM)?

  • Uses main memory efficiently
  • Simplifies memory management
  • Isolates address spaces

VM as a tool for caching

Enabling Data Structure: Page Table

A page table is an array of page table entries (PTEs) that maps virtual pages to physical pages.
▪ Per-process kernel data structure in DRAM

Page hit: reference to VM word that is in physical memory (DRAM cache hit)
Page fault: reference to VM word that is not in physical memory (DRAM cache miss)
Handling Page Fault: Page miss causes page fault (an exception)

Locality to the Rescue Again!

VM as a tool for memory management

Key idea: each process has its own virtual address space

VM as a tool for memory protection

Address translation

Address Translation: Page Hit

Address Translation: Page Fault


Integrating VM and Cache

VA: virtual address, PA: physical address, PTE: page table entry, PTEA = PTE address.

Multi-Level Page Tables

A Two-Level Page Table Hierarchy

Translating with a k-level Page Table

Systems

Case study: Core i7/Linux memory system

None.

Some page were marked to be protected by core so that the user can’t alter.

There are L1~L4 page tables.

Trick: VPO=PPO=Physical Address. (The lower bits of Block address)

Memory mapping

The address translation is always running.

Dynamic Memory Allocation

Basic

void* malloc(size_t size)

  • get a block by given size

void free(void *p)

  • free the block at the address

calloc: version of malloc that initializes block to 0

realloc: change the size of it

sbrk: control allocators to grow or shrink the heap

Throughout: number of complete requests per unit time.

5000 malloc + 5000 free in 10 sec, then throughout will be 1000 operation/sec.

Peak Memory Utilization:

  • aggregate payload $P_k$.

  • Current heap size $H_k$

  • $Uk=max{i\leq k}P_i/H_k$

Can cause by fragmentation.(internal or external)

Advanced

Explicit free lists

Keeping Track of Free Blocks

Explicit free lists:

Freeing With Explicit Free Lists:

Often use BBST to manage the allocated memory.

Explicit List Summary:

Segregated free lists

Each size class of blocks has its own free list.

Garbage collection

Garbage collection :automatic reclamation of heap-allocated storage—application never has to explicitly free memory.

Common in many dynamic languages:

▪Python, Ruby, Java, Perl, ML, Lisp, Mathematica

The tech is still ongoing.

Graph:

Mark and Sweep Collecting:

Assumptions For a Simple Implementation:

Mark and Sweep (cont.)

Dereferencing Bad Pointers

The classic scanf bug:

Reading Uninitialized Memory

Overwriting Memory

Referencing Nonexistent Variables

Freeing Blocks Multiple Times

Referencing Freed Blocks

Failing to Free Blocks (Memory Leaks)

Ex.

System-Level I/O

slides.

Unix I/O

Unix I/O Overview:

File Types

Regular Files

Windows type \r\n actually relates to the usage of typewriter.

Pathname

Opening Files

Closing Files

Reading Files

Writing Files

Ex. (sample)

Metadata. sharing, and redirection

Ex. (Example of Accessing File Metadata)

File Sharing

I/O Redirection

Standard I/O

Functions:

Std. I/O Streams:

Buffering in Standard I/O:

RIO(robust I/O) package

Buffered I/O: Implementation:

Closing remarks

For more information:

Network Programing

slide1, slide2.

A Client-Server Transaction

Computer Networks

Structure

Lowest Level: Ethernet Segment

Broadcast mechanism.

Next Level: Bridged Ethernet Segment

LAN: Local area network.
WLAN: Wireless local area network.

Next Level: internets

Transferring internet Data Via Encapsulation:

Global IP Internet

Internet Domain Name

Domain Naming System (DNS)

can be multi-multi mapping.

Sockets

Socket Interface

The advanced topic can be seen in the video and related slides.

Concurrent programming

slides.

The lecturer always says “good to see you”.

Situation Examples

Data Race:

Deadlock:

Starvation:

Iterative Servers

Iterative servers process one request at a time.

Fundamental Flaw of Iterative Servers: (example)

Writing Concurrent Servers

Approaches for Writing Concurrent Servers:

Approach 1: Process-based Servers(进程)

Approach 2: Event-based Servers

I/O Multiplexed Event Processing:

Approach 3: Thread-based Servers(线程)

Very similar to approach #1 (process-based)

…but using threads instead of processes

Summary: Approaches to Concurrency

Synchronization

slides-basic, slides-advanced.

Threads review

Sharing

Ex. (Example Program to Illustrate Sharing)

Mapping Variable Instances to Memory:

Ex.

Ex. (Shared Variable Analysis)

Mutual exclusion

Semaphores

Summary:

Using semaphores to schedule shared resources

Producer-consumer problem

Note: limited buffer

Maintain two semaphores: full + empty

Circular Buffer

Readers-writers problem

Other concurrency issues

Thread safety

Races

Deadlocks

Avoid:

Thread-Level Parallelism

slides.

Longer term perspective. Think bigger!

Today

Just see the video.

Future of Computing

slides: part1, part2.

Moore’s Law


The blue ones are actually for celphones.
The general trend is produced by regression.

It’s a prediction rather than a law.

Technological singularity:

Ex. (3D structure)

High-Performance Computing

What’s So Special About Big Learning?

This part is available only on 2017 Fall version or after.

Please just see the slides and watch the video.

GraphLab, GraphLab-Guide_Chinese.