programming

Thanks to the folks at CCIL for hosting these pages.
remlaPS
Home Page Web Logs at CCIL

Dec_24_2008

Alternate formulation is something like... Need to verify "Average" and
simplify (if possible).
Let
   r = radix
   n = number of bits

And
   i - log_2 (r)
   j = the number of bytes needed to make a character

Then
   Best = ceil(i)/jn
   Average = (ceil(i) + 2 - 2^(1 + ceil(i))/r))/jn
   Worst = floor(i)/jn

Link for this entry: http://home.ccil.org/~remlaps/weblog_2008/programming_index.html#ThesisFormula24Dec08
Dec_22_2008

All representations of all numbers stored in a 3D matrix.
X axis = base
Y axis = numeral
Z axis = digits (index 0 = ones place)

scp44@tux64-07:~/programming/cpp$ cat dnum.cpp
#include "NumSpace.h"
#include 
#include 

using namespace std;

int main(int argc, char *argv[])
{

         int MB=atoi(argv[1]);
         int MN=atoi(argv[2]);

         NumSpace abc(MB,MN);
         abc.Print();
         abc.AllReps(47);
         for (int lcv=1;lcv<5;lcv++)
                 cout << abc.GetDigit(13,2,lcv);
         cout <
#include 
#include "NumSpace.h"

int NumSpace::DigitMatrix[MAXBASE][MAXNUM][MAXLEN];

NumSpace::NumSpace(int X, int Y)
{
         // log_2(N) = log(N)/log(2)
    int Z=(int) ceil(log(Y+1) / log(2));
    Xs = X;
    Ys = Y;
    Zs = Z;

   for (int Dig=0; Dig<=Z; Dig++)
   {
           // std::cout << "Digit: " << Dig << std::endl;
         for (int Base=2; Base<=Xs; Base++)
         {
                 int DigVal=0, DigInd=1;
                 if ( Dig >0 ) DigVal = 1;
                 int length=pow(Base,Dig);
                 int Top=Dig+1;
                 if ( Dig > 0 ) Top--;

                 for (int Num=0; Num = Base ) DigVal = 0;
                         }
                 }
        }
   }
};

void NumSpace::Print()
{
for (int Base=2; Base<=Xs; Base++)
{
         std::cout << std::endl << "Base " << Base << std::endl;

         for (int Num=2; Num<=Ys; Num++)
         {
                 std::cout << Num << ": ";
                 for (int Dig=Zs; Dig>=0; Dig--)
                         std::cout << DigitMatrix[Base][Num][Dig] << ", ";
                 std::cout << std::endl;
         }
}
};

void NumSpace::AllReps(int N)
{
         std::cout << "All reps for " << N << std::endl;
         for (int B = 2; B < Xs; B++ )
         {
                 std::cout << "Base " << B << ": ";
                 for (int Dig=Zs; Dig>=0; Dig--)
                         std::cout << DigitMatrix[B][N][Dig] << ", ";
                 std::cout << std::endl;
         }
}

int NumSpace::GetDigit(int N, int B, int D)
{
         int nd=log(N+1)/log(B);
         return DigitMatrix[B][N][nd-D+1];
}

Link for this entry: http://home.ccil.org/~remlaps/weblog_2008/programming_index.html#NumberSpace22Dec08
Dec_13_2008

Read later. From December's Journal of the ACM...

"Linear-time disk-based implicit graph search" - Richard E. Korf
http://doi.acm.org/10.1145/1455248.1455250

Link for this entry: http://home.ccil.org/~remlaps/weblog_2008/programming_index.html#DiskGraphSearch12Dec08
Oct_09_2008

In computer graphics last week, we learned about "scaling matrices".

With 2d, heterogeneous coordinates, they look like this...
sx 0  0
  0 sy 0
  0 0  1

With 3d, heterogeneous coordinates, they look like this...
sx 0 0  0
0 sy 0  0
0  0 sz 0
0  0 0  1

I assume there are higher dimensional scaling matrices following the same
pattern too.

Of course, that reminded me of the base conversion matrices I played with
a while back. Apparently, my base conversion matrix when the source_base
is a multiple of the target_base is a scaling matrix. See here...

>> A=[4 0 0;0 2 0;0 0 1]

A =
      4     0     0
      0     2     0
      0     0     1

>> P=[4 3 2]

P =
      4     3     2

>> P * A

ans =
     16     6     2

>> 16 * 25 + 6 * 5 + 2

ans =
    432

Sx = 4 and Sy = 2. To complete the conversion, we would "normalize" the
16 and the 6 by carrying,

carrying 25 from the 2nd place to the first, 16,6,2 becomes
17,1,2 then carrying three 125's from the first place to create a new first place gives
3,2,1,2

which is
>> 3 * 125 + 2 * 25 + 1 * 5 + 2

ans =
    432

None of that is new, except I didn't know that there was such a thing as a
"scaling" matrix before. Now I'm wondering if there are any special
geometric properties of these base conversion matrices with increasing
powers on the diagonal.

I'm also wondering about the similar matrix, when the target is a multiple
of the source. i.e.

>> R=[1 0 0;0 2 0;0 0 4]

R =
      1     0     0
      0     2     0
      0     0     4

Since the 4 would line up with a "1" in heterogeneous coordinates, it's
hard to imagine what that does, graphically (and maybe that explains why I
had to divide the 4 back out, when doing base conversions). That would be
somehow similar too...

R' =
      1/4    0      0
      0     2/4     0
      0      0     4/4

Which would be a scaling matrix. Shrinking it, instead of growing it.

Link for this entry: http://home.ccil.org/~remlaps/weblog_2008/programming_index.html#ScalingMatrices09Oct08
Jul_11_2008

Didn't take long to hit my first mismatch between octave and matlab...

printf "Experiment with Random Matrices and Singular Values\n"

function [x] = r(MU, SIGMA, N)
    [x] = svd(normrnd(MU,SIGMA,N,N));
endfunction;

A=r(1,1,16);
B=r(1,2,16);

Works in octave, but matlab says...
"Function definitions are not permitted at the prompt or in scripts."

According to this guy
http://www.math.tamu.edu/~comech/tools/octave-basics/

matlab functions must apparently be defined in their own file. What a
pain.
Link for this entry: http://home.ccil.org/~remlaps/weblog_2008/programming_index.html#MatlabNoFunctions11July07
Jul_11_2008

Here's an online reference for GNU Octave. Hopefully octave is close
enough to matlab for doing my homework assignments, because remote matlab
from school is painfully slow.

http://www.gnu.org/software/octave/doc/interpreter/

Link for this entry: http://home.ccil.org/~remlaps/weblog_2008/programming_index.html#Octave11July08
Feb_24_2008

A quick look at corporate banding or academic grading on a curve....

As mentioned before, I was wondering how accurate a commonly used
algorithm for ranking people is, so I wrote a little program (computer
model ; -) to rank randomly ordered numbers. If we take a population
(like 240,000 employees) and split it into sub-groups (like departments of
20 people) then categorize the sub-groups, how does that compare to the
categorization over the entire population? Fundamentally, banding (or
grading on a curve) is just an attempt at a bucket-sort algorithm. We do
have bucket-sort algorithms that work. (It's not brain surgery... anyone
who has ever played cards can do it.) Does this one?

From program output pasted at the bottom, it looks like (assuming random
distribution of merit) applying this algorithm in a hypothetical 240,000
person company split into departments of 20, would misplace somewhere
around 2.5% of the employees (12,000) in each of the top-10 and bottom-10
percentage bands.

Correspondingly, 1/4 of the people ranked as top-10 or bottom-10 would not
belong there if we compared their merit with the entire population. This
is assuming every personnel decision along the way is executed perfectly.
It looks as if it gets worse as the size of the category or size of the
subgroup gets smaller.

I did the program logic in my head, while distracted by a 6 year old
wielding a water-gun, so errors are likely, but I think maybe it's worth
digging into further if I ever find time.

C++ source code (as of today) is here:
http://home.ccil.org/~remlaps/src/curve.cpp.txt

Here are 3 runs with {subgroups=20, category=10%} followed by one with
{subgroup=20, category=5}, and one with {subgroup=100, category=10}.

C:\Personal\PROGRA~1\cpp>curve
Population size?: 240000
Subgroup size?: 20
What percent in high num group?: 10
What percent in low num group?: 10
Ranked high but not: 6114 2.5475% of population 25.475% of high rankings.
Ranked low but not: 6380 2.65833% of population 26.5833% of low rankings.

C:\Personal\PROGRA~1\cpp>curve
Population size?: 240000
Subgroup size?: 20
What percent in high num group?: 10
What percent in low num group?: 10
Ranked high but not: 6215 2.58958% of population 25.8958% of high rankings.
Ranked low but not: 6326 2.63583% of population 26.3583% of low rankings.

C:\Personal\PROGRA~1\cpp>curve
Population size?: 240000
Subgroup size?: 20
What percent in high num group?: 10
What percent in low num group?: 10
Ranked high but not: 6171 2.57125% of population 25.7125% of high rankings.
Ranked low but not: 6363 2.65125% of population 26.5125% of low rankings.

C:\Personal\PROGRA~1\cpp>curve
Population size?: 240000
Subgroup size?: 20
What percent in high num group?: 5
What percent in low num group?: 5
Ranked high but not: 4313 1.79708% of population 35.9417% of high rankings.
Ranked low but not: 4411 1.83792% of population 36.7583% of low rankings.

C:\Personal\PROGRA~1\cpp>curve
Population size?: 240000
Subgroup size?: 100
What percent in high num group?: 10
What percent in low num group?: 10
Ranked high but not: 2930 1.22083% of population 12.2083% of high rankings.
Ranked low but not: 3195 1.33125% of population 13.3125% of low rankings

Link for this entry: http://home.ccil.org/~remlaps/weblog_2008/programming_index.html#24Feb08
Feb_14_2008

My code won't handle numbers bigger than 152**4 as written... If we do
composite character codes making use of 152 extended ASCII characters, we
get to 533,794,816 character possibilities and 90% bit utilization.

- rangeRSE(pow(152,4),pow(2,32),32,32);
val it = (0.875,0.906069915823,0.90625) : real * real * real
- pow(152,4);
val it = 533794816 : int
-

Link for this entry: http://home.ccil.org/~remlaps/weblog_2008/programming_index.html#Composite152-14Feb08
Feb_14_2008

- 93 * 93 * 93 * 93;
val it = 74805201 : int
- pow(2,32);
val it = 4294967296 : int
- rangeRSE(74805201,4294967296,32,32);
val it = (0.8125,0.818930315754,0.84375) : real * real * real

It seems that we could get > 80% bit utilization and make 74,805,201
characters available by aggregating 4 ASCII (not extended ASCII)
characters into a composite character set. I.E. four ASCII characters
make a character in the composite character set.

"0000" = 0
"0001" = 1
...
"7342" = alpha
"7344" = beta
...

I wonder how such a scheme would compare to unicode in performance,
portability and ease of implementation?

Right now, my machine doesn't want to answer the question for 5
consectuive ASCII characters. : -(


Link for this entry: http://home.ccil.org/~remlaps/weblog_2008/programming_index.html#CompositeCode14Feb08