this is a transcript of this video: Geo Carncross: How to program computers (kos)
Intro
Geo Carncross * Big: 1-6TB lzma'd logfiles per day from all over the world! * Fast: Deadline is 50msec * 100% Uptime is normal for us.
Hello, my name is Geo and I work for a company called Telemetry. We are ad server. Basically web serving for money. We get paid to physically deliver video ad. So if our systems don’t work correctly, we don’t get paid. My aims are motivated by programs that work. We do a lot of programming and most of it’s not in k but today I want you to look at k. Because today I want to talk about code bloat.
Our logfiles are money! (we do other things. like figthing bad guys committing ad fraud, and everyone in my office is enjoying a beer right now... let me know if this is something you want to know more about) --- * I'm part of the reason ads play fine, but your Youtube stutters
How to program computers – code bloat
* Give it a meaningful definition * Don't be distracted by other, informal uses of "code bloat" * Work the problem as it appears
I define code bloat strictly as redundant code inside a source tree. Features that nobody wants, and whether a compiler can identify and eliminate redundancy or other topics. Code bloat costs a lot. For one, it makes the source code big which makes binaries big and that makes them slow. But multiple programmers working on a large code base cannot know exactly what each other is doing and thinking, so there is bound to be some duplication of effort. So code bloat is obviously bad, but it might be inevitable. However it might not. I am going to tell you about kOS.
kos is a small team (4 people) kos is tiny * k is arthur's language; something like scheme+apl * kdb is a database faster than popular dbms by a factor of 1000x * z is a gui and window manager. icons, mouse and font * kos kernel is modesetting. memory, vmm, isr and filesystem
- kOS is a graphical operating system with dynamic window management.
- A programming language called k.
- A programmer’s editor for k.
- A file manager.
- This presentation tool.
- A database design for big memory queries involving trillion row datasets.
The binary image that is what boots is around 90 kilobytes and that does not use any {packers?} to achieve that. While this may not impressed everyone, it does not impressed (because) you didn’t understand what I just said.
kOS is written in k, C and assembly. These lines here:
one system kos is tiny
all devices
* k is 300 lines of C across 5 files
* kdb adds only about 100 lines of C
* z adds another 60 lines of C
* kos kernel is 100 lines of code/assembly.
that I’m referring to fill my screen the way words on a book fill a page. This provides some significant advantages by making programs as readable as a book and it contributes to why kOS does not contain much redundancy.
I do not however have much time to go into k in general/in detail. But I say the interpreter is very simple and it uses basic table based dispatch. No fancy assembly dispatch or jumping or dynamic recompilation. Does not do any subroutine threading or any program optimisation. Doing a simple loop in k versus C, I would expect every C compiler to optimise this triviality.
kos is how to program computers for(i=0;i But k won't. Yet you've probably heard of k or kdb at some point, googling, maybe you're googling it now... There are benchmark hiding away in various places on stack and whatnot: kdb is notoriously fast.msec k 1 sql 6400 sqlA 4200 kdb: select last bid by symbol from quote where date=d,sym in S sql select sym, last(bid order by time) from quote where date=d and sym in (select sym from S) group by sym sqlA select sym,bid from (select sym,bid,row_number() over(partition by sym order by time desc) as row from quote where date=d and sym in(select sym from S) ) AS q where row=1So the question comes up: "Why is k fast?". I have said that I think, we should be asking "why is X (or whatever x is) why is it so slow?". And I don't think this is a restatement just by inverting things to be clever. I think it's- I kind of honesty that we don't generally have as programmers about why is our program slow? Why is a program slow? Why is a large commercial relational database made by a company that begins with O slow? And I wish we had time to discuss it further.
* Unnecessary code is a bug and we try to remove it. * Slow programs are the programmers responsibility. * Bugs are mistakes we programmers make; they are not inevitable or ok. * The programmer is responsible that the program is used correctly. * "O" complexity ignores that computers are made of hardware * Arrays might be O(1) when small but O(log n) when bigger * B-trees are a terrible data structure for storing data * Hashtables aren't remotely close to O(1) * "Best practices" are a straw man that make having real discussions about making systems difficult. * That includes syntax hilighting * ... and object-oriented programming * ... and what is "clean code"? * ... and unit-tests * ... and debian packages * ... and puppet/chef/etc * Programming correctly is hard! * Can you write an algorithm shorter or faster than ... If so I'd like to talk to you: PUTITYYEUOPBut I want to show you some non-trivial k source code that I did not write. I wanna show you how I produce code. And I wanna show you how I minimise the amount of redundant code that I produce.
[types "edir", a view appears, top bar shows "ERROR: /home/geocar/$/edir.k", deletes three chars back, and adjusts it to .../edit, hits enter, the view disappears, types "edit" this time, a view appears, types "$/e.k"]
This is Arthur's editor:
c::a$"\n";b::0,1+c;d::(#c),|/-':b;h:{c[x]&y+b x};i::x,j-b x:b'j;l:{h/0|x&d-1} j::*|k;K:{k::2#0|x&*|c};s::i&s|i-w-2;D:{s::0|x&d-w};px:{D s+w*x};wx:{D s+4*x} J:{K$[S;|\k[0],x;x]};lx:{J j+x};ux:{J l i x,0};hx:{J l i+d*x};mx:{J l s*_x%F} kx:kr:{U,:,(+\k[0],#x;*k_a);e[k]x};kb:{$[=/k;K j-1 0;];kx""} e:{a::?[a;x;y];K(*x)+#y};cz:{$[#u;e/_`u;]};cc:{9'*k_a};cx:{kx cc`};cv:{kx@9'`} f:"";g::a$,f;cf:{cg f::0'"^"};cg:{K(g,1#g)1+g'k} t::(a x;q[x]|/7 4 4*~(k'x;2!(,/g)'x;^(j 9'a)?x:h\:/s+!:'1+w));q::n 9'a A:{r::a::$[#x;1:x;,"\n"];n::x;K@#u::()};co:{A@0'""};cs:{n::$[#n;n;0'""]1:a;r::a} A@*x;w::_W%F;x::(r~a)_"*",n;y::F*i-s;z::(2'(0;W;F*s;F*d);1'0,t)So the syntax here,
c::a$"\n"is pretty simple, [audience laughs]. [opens an interactive console to the left half of the screen] No, seriously.1: 2+2 4Two plus two is four. You can see that down here.
1: -6 -6Minus six negates six. It's very easy. Semicolon separates things. This double colon thing that I want to look at first is interesting 'cause it creates a view.
c::a$"\n"It's something that doesn't exist in a lot of programming languages. I am not actually aware of any programming languages that it actually exists. And it's not like an SQL view. It's actually kinda like what Excel does. It's a memoized function, I guess... that keeps track of its dependencies automatically. So this expression right here only gets re-evaluated whenever whatever dependent variables change.
So,
ais the contents of the buffer we are looking at.1: a "c::a$\"\\n\";b::0,1+c;d::(#c),|/-':b;h:{c[x]&y+b x};i::x,j-b x:b'j; l:{h/0|x&d-1}\nj::*|k;K:{k::2#0|x&*|c};s::i&s|i-w-2;D:{s::0|x&d-w};p x:{D s+w*x};wx:{D s+4*x}\nJ:{K$[S;|\\k[0],x;x]};lx:{J j+x};ux:{J l i x,0};hx:{J l i+d*x};mx:{J l s*_x%F}\nkx:kr:{U,:,(-\\k[0],#x;'k_a};e [k]x};kb:{$[=/k;K j-1 0;];kx\"\"}\ne:{a::?[a;x;y];K(*x)+#y};cz:{$[#u ;e/_`u;]};cc:{9'*k_a};cx:{kx cc`};cv:{kx@9'`}\nf:\"\";g::a$,f;cf:{cg f::0'\"^\"};cg:{K(g,1#g)1+g'k}\nt::(a x;q[x]|/7 4 4*~(k'x;2!(,/g)'x ;^(j 9'a)?x:h\\:/s+!:'1+w));q::n 9'a\nA:{r::a::$[#x;1:x;,\"\\n\"];n: :x;K@#u::()};co:{A@0'\"\"};cs:{n::$[#n;n;0'\"\"]1:a;r::a}\nA@*x;w::_ W%F;x::(r~a)_\"*\",n;y::F*i-s;z::(2'(0;W;F*s;F*d);1'0,t)\n"And
care these offsets.1: c 77 155 233 294 373 422 493 574 638 639You can see here:
c::a$"\n"
cis a view whereais a newline. That's all that is. [closes the interactive console, which makes the "$/e.k"'s window full screen]
bis a view that zero joined one plusc.b::0,1+cSo
cis the offset of all the newlines, one plus that is going to be character after the newline put a zero in front of it: it's the offset of the beginning of every line.Even when you know the basics of the syntax, having Arthur's reference manual handy helps.
[types "
view" to the input bar, which opens a new window to the right half, then types "k.txt" in the input bar]Verb Adverb Noun(null list) : gets self ' each|bn ': eachprior Bool 0b 011b + plus flip / over|sv /: eachright Char " " "ab" - minus negate \ scan|vs \: eachleft Int 0N 2 23 * times first Flt 0n 2 .3 % div sqrt Sym ` ``ab ! mod|dct til|key & min|and where System Verbs date 2032.03.01 | max|or reverse 0: write read(line) time 0T12:34:56 more desc 2: sock open/close list (2;3.4;`c) = equal group 3: http kson dict [a:2;b::a] ~ match not func {(+/x)%#x} , join enlist view f::32+1.8*c ^ except null # take|rsh count \l a.k load _ drop|cut floor \t:n x time $ cast|pad string $[c;t;f] \v variables ? find|rnd unique ?[x;y;z[;r]] \f functions @ at type @[x;y;z[;r]] \w workspace . eval dot .[x;y;z[;r]] \d directory function parameters are x,y,z. unary list needs ",", e.g. 2(atom) ,2(list) unary verb needs ":", e.g. #'(take each) #:'(count each) parse: E:E;e|e e:nve|te| t:n|v v:tA|V n:(E)|{E}|[E]|t[E]|N overload: L!L !D .D i? i$ F$(+/*) S$(ss)
d, up here, is a viewd::(#c),|/-':bof the length of
c, that is the number of lines, joined with the max of the difference of each element ofb. So the lengths of each line. So this is the max length of each line joined with the number of lines. This isd, is the rectangular size of whatever value you're looking at.
h, is a lambda function.h:{c[x]&y+b x}The first argument is
x, second isyif there's one. Functions with more than three arguments are confusing so k doesn't really have them [audience chuckles]. Function calling and array indexing use the same syntax. This is a bit unusual but has interesting and very powerful ramifications. Soxapplied tobwhich you can see here (b x) is the same as looking up the xth element ofb. So rememberbis the offset of each line, and we addyto it. We look up the xth element ofc. Here's our familiar bracket syntax. And take the min of both.So given line
xand columny;hcalculates the offset insideabut capped at the end of each line.As we can see k is a very dense language and there is a lot here. When I first program, I start with something that I want to do, the first thing that I did here in k -my first day programming, learning k- was to deal with the fact that my Mac does not have any home or end keys on it. I use Emacs so I'll actually add
control-aandcontrol-e. Now to do this, let's look at the documentation for the windowing interface.[opens another window and types "
z2.txt" in the input bar]EVENT (override with zf) lx i / leftright(i is -1 or 1) hx i / homeend ux i / updown px i / pageupdown wx i / wheelupdown mx v / mouseclick mm v / mousemove mk v / mouseselect kx k / key kb / back kr / return kt / tab f[1-9] func key c[a-z] ctrl key(cc cx cv cf cg co cs cz ..) OTHER 2'(v;w;s;d) / scrollbars C:0'C / prompt(ESC to escape) S: / shift key t:9'` / cv(get clip) 9't / cc(set clip) j 9'a / parens, e.g. (4)0'"{()}\"\"//asdf" is 0 3 n 9'a / syntax, e.g. ".c"0'"{()}\"\"//asdf" is 000011222222I can see here that
hxhandles two dimensional home and end. Andckwould becontrol-k. So I think it should be something like this, [types]:ca:{hx 0 -1};ce:{hx 0 1}We can try that out actually. [opens another view, types "whatever" and verifies pressing
caorcemove the cursor].OK. I'd like to add "delete" as well and I don't think Arthur's implemented this. But where do I start? I know
hxmoves the cursor, so I start by reading the definition ofhx. I seedtimesxplusi:i+d*xremember the
dis the rectangular dimensions of the file. Thisxwill be vector. This will produce the number of characters to move. But what isiandlandJ?hx:{J l i+d*x}[hits control-f in the editor and types "
i:" to search the definition ofi]i::x,j-b x:b'jOK, so this is using the bin (
') operator here. So this is finding the first index inbthatjis equal-to or greater-than. So we save that line number asx. Becausejis some byte offset. So you'll note, this (i::...) is a view andxis being reused, this is not "x, the argument" like it was withh, this is just a scratch variable. So we assign toxthat line number thatjexists on. We apply that to b (b x:..) and then we subtract it fromj. Rememberjis the byte offset. So this is going to give us the number of characters in the line that it is. And here's thexagain (x,...) which we know is the line number. Soiis going to be a view of the y and x coordinates of whateverjhappens to be.[moves to
l's definition]l:{h/0|x&d-1}OK, so this takes the rectangular dimensions of
d. Subtracts one from it. Takes the min of that inxwhich is our argument. So you see this contraption often. This is just capping it to whatever the dimensions are. So you havex-which is, you know, some 2d coordinates- we're just capping it to whateverdis. And then we apply that overh. Which we'd go over another time if I would had more time. Let's look atJthough because that was interesting.[moves to
J's definition]J:{K$[S;|\k[0],x;x]}So
JappliesK. This is what a conditional looks like in the k language, this dollar sign here ($[..;..;..]).S, we can note, down here, is the shift key. So if shift is pressed do this thing (|\k[0],x) otherwise do this thing (x). So when you are unshifted it's going to applyxtoK. So let's just go look at whatKis. I can see it right here.[moves to
K's definition]K:{k::2#0|x&*|c}OK, so this reverses
c. You can follow along here (reference manual: "| max|or reverse..."). Takes the first of that, so the last line ofc. And then the min of that andx, whatever that is. And we do the max of that - of zero on that. So this is our capping thing again. And we take two from that. This double colon here is going to now not mean "view", it's going to mean "define". So this is going to set the global variablek(k::2#...). So this uppercaseKsets the lowercasekto: take two values of the value capped in there capped by zero and the number of lines,c. So this is settingkto two points, two byte offsets. Now, remember last element ofcis gonna contain the byte offset, the last line of the file. So this is gonna cap us some place inside the file. Andkis taking two, so it's taking two dimensional. So let's go back and look at shift for a moment, we can see that if we're shifted, we keep the original value ofkand that's why you can see over here, this is actually how shift-select works. [shows it in the text "whatever"].So with that in mind, I think we have enough information to actually write "delete", [types]:
cd:{K j+!2;kx""}OK, So this is
jis our byte offset inside the file. But it could be two dimensional. We're going to add- This bang two (!2) means, that it means zero and one, zero and whatever up to whatever value you put there, so "bang five" (!5) would be zero one two three four. Sojplus that is gonna do the same rules. It's going to take whatever the current cursor position is and the character after it and assign it tokwhich does the selection. And thenkxdouble quote (kx"") -you can see whatkxdoes- it sets the key, that's how we insert text to the editor. So if I save that, [hits control-s, opens another view, types "foo again" and deletes the characters one by one using control-d], and delete works. Phew! I hope you enjoyed that. [audience chuckles]what else?
So, is code bloat inevitable? Code written like this definitely violates what a generally considered "best practices" everywhere.
hello world
main(){puts("Hello world");} k:"Hello World" real 0m0.006s real 0m0.016s user 0m0.001s real 0m0.012s sys 0m0.002s real 0m0.006sOne can write a tiny hello world that runs fast and without bugs. And it's generally accepted that this doesn't extend to larger programs. I hope that I've demonstrated that this is false.
kOS is not ready for users. I don't know when it will be and I don't really want to talk about that.
* Unnecessary code is a bug and we try to remove it. * Slow programs are the programmers responsibility. * Bugs are mistakes we programmers make; they are not inevitable or ok. * The programmer is responsible that the program is used correctly. * "O" complexity ignores that computers are made of hardware * Arrays might be O(1) when small but O(log n) when bigger * B-trees are a terrible data structure for storing data * Hashtables aren't remotely close to O(1) * "Best practices" are a straw man that make having real discussions about making systems difficult. * That includes syntax hilighting * ... and object-oriented programming * ... and what is "clean code"? * ... and unit-tests * ... and debian packages * ... and puppet/chef/etc * Programming correctly is hard!But I do want to talk about programming and specifically how to program and I hope that by giving you this view of kOS we can have a conversation about how to program. That's a little different. There turns out to be a lot of usefulness that follows from this idea that I'm pointing to here... whatever this is [shows the screen running kOS]. And I am certain that because it's so useful that this may be evidence that we as a human race don't actually understand programming very well. It's because of that the term best practices so suspicious to me, it's actually banned where I work, the term. I mean you have to justify even if everybody else is doing it.
Just some final notes, I get asked about something called kona that maybe is easier to find on the internet. It is an open source re-implementation of an old version of k. I don't know very much about it. I know that it's bigger and slower than k. I know that it does not contain kdb. And so probably does not represent k very well if you get it looking to learn k, this is not going to help. If you want to try k, you can just download the 32bit version of kdb from kx's website. And just push backslash (
\) to get from q to k. And I've covered about as much as I can in the time that has been available to me. So if anybody wants to talk about programming, get in touch. Last four lines by the way,n:0;p:0:"p.md";co:{p::0:0'""} c::(&"#"=*:'p),#p;a::*(1 0+c[n,n+1])_p;x::($n),p c n z::{1'((x**F),0;a[x])}'!#a; lx:{2'(0;W;7);n::|/0,&/(-2+#c),n+x}are the source code to the presentation tool that you are looking at there. [audience laughs]. Thank you. [applause]