The billion-dollars problem of token injection into insecure code can be solved with category theory

Why category theory?

Ever since I stumbled upon mentions of Voevodsky‘s “Univalent Foundations of Mathematics“, I realized that the man was digging very close in the area of the Holy Grail. The only question that I wanted to ask him is: “Are you also looking for the Holy Grail, because why else would you be here?”

If you just walk in the direction of where things get more beautiful and where a pleasant scent smells stronger, you will eventually run into Voevodsky digging there, and when you ask him, “Have you found it?”, he will say: “Not yet.”

We do not know what Holy Grail looks like, because it is virtual object and therefore if you know what it looks like, you have effectively found it. You will know, however, perfectly well that you are now holding it in your hand, because your senses will fail-safe confirm it.

There is some kind of tree over there, and if you shake it, there will be billions or even trillions of dollars dropping out of it. I conjecture that every possible security issue in computer source code amounts to a failure to respect structure-preserving category invariants. If you correctly describe the category, you will automatically have solved an entire class of security issues.

Notation typically used in category theory

Obviously, it is possible to formally describe category theory in obnoxious and inherently inferior Russell-Whitehead notation, but that would not help anything. You see, Russell-Whitehead notation is not executable. That means that it is not even unambiguous. Then indeed, what is the point? In other words, it is just a pagan depravity. It took Russell and Whitehead ten years to create the useless crap with which they still terrorize the entire planet. In the foot notes of this article, you can find lots of inane examples written in that pagan vernacular. The heathens love it. Their pagan rituals are full of it. They honour their false gods with it. It has always been the primary facilitator for Soviet pseudoscience. It is yet another false belief holding back the progress of mankind.

Example of token injection

I keep using examples from SQL, even though I have had many angry remarks from readers who said that I should pick another language or format, because they consider the SQL injection problem to be “solved” already (“just use library x or y”). The reason for this is, that I am not going to change my examples just to hear that other people consider the problem also “solved” in that other language (“just sanitize your templates with chewing gum”).

If the template is:

template = “select * from T where f1 = ‘{value1}’ and f2 = {value2}”

and we supply a map:

map = { “value1″:”anything'”, “value2”: “or 1=1 or f1=’whatever value2=5” }

then, the expansion looks like this:

expand(template,map) = “select * from T where f1=’anything’ or 1=1 or f1=’whatever’ and f2=5”

In a previous blog post, I already explained, also for this particular example, why it is trivially easy to detect the token injection. I will now clarify the invariants governing the solution category.

Terminology

  • language, object, grammar: SQL
  • statement, expression, template: select * from t where x={y}
  • template variable: {y}
  • morphismsrewrite(sql,lisp_style_sql,”select * from t where x={y}”) == “(select * t (=x {y}))”

category:  { “objects”:[“sql”,”lisp_style_sql”, “euler_style_sql”, “sql_with_arbitrary_keywords1″,”sql_with_arbitrary_keywords2″],”morphisms”:[rewrite__lisp_style_sql__euler_style_sql, rewrite__sql_with_arbitrary_keywords2__euler_style_sql]}

The json table above represents a category with its objects and morphisms.

Detection of token injection

All literals in an expression may be replaced by template variables. The expression then becomes a template. Expansion takes place when the expand() function replaces in the template, a map between the template variables and actual literals:

 expression=expand(template,map)

The validation function isvalid() is a predicate function that returns true if the expression is valid in the target language, and false if not:

result=isvalid(SQL, expression)

If we consider two homeomorphic languages, HOMEO1 and HOMEO2 that can be rewritten to each other using the rewrite() function:

expression2=rewrite(HOMEO1,HOMEO2,expression1)
expression1=rewrite(HOMEO2,HOMEO1,expression2)

then, the following invariant holds true for any expression in HOMEO1:

expression=rewrite(HOMEO2,HOMEO1,rewrite(HOMEO1,HOMEO2,expression));

This is the “round trip” requirement in homeomorphisms.

Token injection can always be detected, because the following invariants must hold true for any (template,map) tuple:

[1] isvalid(HOMEO2,expand(rewrite(HOMEO1,HOMEO2,template),map)) == true

[2] expand(template,map) == rewrite(HOMEO1, HOMEO2, expand(rewrite(HOMEO2, HOMEO1,template),map))

Token injection through the map will violate either or both invariants and therefore fail to preserve the structure of an arbitrary statement across objects (=languages). Therefore, it will be impossible to hide token injection from the category invariants.

The Achilles heel

Any programming language can be subdivided in semantic subsets of statements that are extensional. A simple example of such subset:

select * from t where a=5
select * from t where a=5 and 1=1
select * from t where (a=5 and 1=1) or (2<>3)

It is generally impossible to construct a predicate function extensional():

extensional(expression1,expression2)

that will return true if both statements are part of the same semantic subset and false if not. If it were possible to do that, it would solve the halting problem, for which the impossibility to solve it has been proven.

It is not the languages themselves that are homeomorphic, but the sets of semantic subsets. This is probably the stumbling block that is holding back Univalent Foundations too. But then again, we may not really be obliged to implement the function extensional(). It may be enough to declare it, and if it does not show up in any final conclusions (to implement), it should be a non-issue.

However, it is clear that there will always be conclusions that contain a mention to this function and that can therefore not be implemented. It is not possible to see what the damage is, without actually trying to reach such conclusion. It certainly turns these things into a mine field.

The impossibility to implement the extensional() function does not affect at all our ability to implement solutions for the token injection problem. Fully automated solutions are still perfectly attainable. In this case, we are lucky to be entirely exempt from the burden to attempt to solve the unsolvable.

An orthogonal language cannot exist

Based on the halting problem again, it becomes clear that it is not possible to create a programming language in which there is exactly one way to say what you want to say. There will always be other expressions that say exactly the same as you have just said, but in a different way. Furthermore, it is generally not possible to know if that second expression is really exactly the same (“extensional”) as the first one.

There is also no way to define a canonical element that could represent an entire semantic subset. Solving this problem, would also solve the impossible-to-solve halting problem.

“It looks different, but in reality it is the same” is a tricky form of ambiguity. It may lead you to the wrong conclusion that two things are different, while in reality they are not, while there is also no way for you ascertain that they are truly the same. Since the mapping between a language and its conceptually orthogonal variant is surjective, this relationship is not a homeomorphism.

A surjection represents a strategic decision, because you cannot go back. Entering the surjection commits you to dropping the information that you would need to revoke your decision. Backtracking is not possible. The true nature of surjections is that they force you to make up your mind. That is actually a signal to drop the formalisms and start thinking intuitively. You see, it is a possibility that it may simply not be possible to solve the real problem in the Univalent Foundations.

Advertisements

Published by

eriksank

I mostly work on an alternative bitcoin marketplace -and exchange applications. I am sometimes available for new commercial projects but rather unlikely right now.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s