Clemson University
CPSC 464/664 Lecture Notes
Fall 2002
Mark Smotherman
LP: LOAD FACTOR NORMALIZED, 0(X4)
MULTIPLY AND ADD NORMALIZED, 0(X5)
ADD IMMEDIATE TO VALUE, X5, COLUMN LENGTH
COUNT BRANCH AND REFILL PLUS, X4, LP
int main(void){
int i;
float sum;
float c,y,t;
sum = 0.0;
for(i=1; i<=10000000; i++){
sum = sum + 1.0/((float)i);
}
printf("decreasing order: %f\n",sum);
sum = 0.0;
for(i=10000000; i>0; i--){
sum = sum + 1.0/((float)i);
}
printf("increasing order: %f\n",sum);
/* sum formula suggested by W. Kahan */
sum = 1.0/((float)1);
c = 0.0;
for(i=2; i<=10000000; i++){
y = 1.0/((float)i) - c;
t = sum + y;
c = (t - sum) - y;
sum = t;
}
printf("kahan summation1: %f\n",sum);
sum = 1.0/((float)10000000);
c = 0.0;
for(i=9999999; i>0; i--){
y = 1.0/((float)i) - c;
t = sum + y;
c = (t - sum) - y;
sum = t;
}
printf("kahan summation2: %f\n",sum);
}
/* output:
float:
decreasing order: 15.403683
increasing order: 16.686031
kahan summation1: 16.695311
kahan summation2: 16.695311
double:
decreasing order: 16.695311
increasing order: 16.695311
kahan summation1: 16.695311
kahan summation2: 16.695311
*/
#define FP_TYPE float
main()
{
FP_TYPE eps,epsp1,small,lastsmall,x,y,h,a,b,c,d,q;
int i;
/* find smallest eps such than eps + one not equal to one */
eps=1.0;
epsp1=eps+1.0;
while(epsp1>1.0){
eps/=2.0;
epsp1=eps+1.0;
}
printf("part 1: eps=%23.16e\n",2.0*eps);
/* find smallest non-zero number */
small=1.0;
while(small>0.0){
lastsmall=small;
small/=2.0;
}
printf("part 2: small=%23.16e\n",lastsmall);
/* find the error in 0.1 added ten times */
x=0.0;
h=0.1;
for(i=0;i<10;i++) x+=h;
y=1.0-x;
printf("part 3: x=%23.16e y=%23.16e\n",x,y);
/* compare calculations */
h=1.0/2.0;
a=2.0/3.0-h; /* 2/3 - 1/2 should equal 1/6 */
b=3.0/5.0-h; /* 3/5 - 1/2 should equal 1/10 */
c=(a+a+a)-h; /* 3*(1/6) - 1/2 should equal 0 */
d=(b+b+b+b+b)-h; /* 5*(1/10) - 1/2 should equal 0 */
q=c/d; /* 0/0 should give an error */
printf("part 4: a=%23.16e b=%23.16e\n",a,b);
printf(" c=%23.16e d=%23.16e\n",c,d);
printf(" q=%23.16e\n",q);
}
/* single precision output:
part 1: eps= 1.1920928955078125e-07
part 2: small= 1.4012984643248171e-45
part 3: x= 1.0000001192092896e+00 y=-1.1920928955078125e-07
part 4: a= 1.6666667163372040e-01 b= 1.0000000149011612e-01
c= 0.0000000000000000e+00 d= 0.0000000000000000e+00
q= NaN
*/
/* double precision output:
part 1: eps= 2.2204460492503131e-16
part 2: small=4.9406564584124654e-324
part 3: x= 9.9999999999999989e-01 y= 1.1102230246251565e-16
part 4: a= 1.6666666666666663e-01 b= 9.9999999999999978e-02
c=-1.1102230246251565e-16 d=-1.1102230246251565e-16
q= 1.0000000000000000e+00
*/
Assume a binary significand with two digits to the
right of the binary point. Subtract 1.11 * (2)^0
from 1.00 * (2)^1 without and then with a guard digit.
That is, subtract 1 3/4 from 2.
Without a guard digit
1.00 * (2)^1
- 1.11 * (2)^0
--------------
requires that we align the operands to equalize
the exponents
1.00 * (2)^1
- 0.11 * (2)^1
--------------
0.01 * (2)^1
= 1.00 * (2)^-1 = 1/2 => 100% error
With a guard digit
1.00 * (2)^1
- 1.11 * (2)^0
--------------
requires that we align the operands to equalize
the exponents
g
1.000 * (2)^1
- 0.111 * (2)^1
---------------
0.001 * (2)^1
= 1.000 * (2)^-2
= 1.00 * (2)^-2 = 1/4 => correct
[Course home page] [Mark's homepage] [CPSC homepage] [Clemson Univ. homepage]
mark@cs.clemson.edu