-
Imagine that I have the following XML document, (Adapted from OWASP - https://owasp.org/www-community/attacks/XPATH_Injection) <?xml version="1.0" encoding="utf-8"?>
<Employees>
<Employee ID="1">
<FirstName>Arnold</FirstName>
<LastName>Baker</LastName>
<UserName>ABaker</UserName>
<Password>SoSecret</Password>
<Type>Admin</Type>
</Employee>
<Employee ID="2">
<FirstName>Peter</FirstName>
<LastName>Pan</LastName>
<UserName>PPan</UserName>
<Password>NotTelling</Password>
<Type>User</Type>
</Employee>
</Employees> And my app takes user input and uses that in an xpath query: (Adapted from https://github.com/zeux/pugixml/blob/master/docs/samples/xpath_variables.cpp as well as OWASP xpath doc) The program below is a toy, imagine that instead of CLI, we instead took user input from a remote source: // get_user.cpp
#include "pugixml.hpp"
#include <iostream>
#include <string>
int main(int argc, char **argv)
{
if (argc <= 2) {
printf("Usage: %s username password \n", argv[0]);
return (0);
}
char * username = argv[1];
char * password = argv[2];
pugi::xml_document doc;
if (!doc.load_file("user_data.xml")) return -1;
// Select nodes via compiled query
pugi::xpath_variable_set vars;
vars.add("username", pugi::xpath_type_string);
vars.add("password", pugi::xpath_type_string);
vars.set("username", username);
vars.set("password", password);
const char* query_string = "//Employee[UserName/text()='$username' And Password/text()='$password']";
pugi::xpath_query query_employee(query_string, &vars);
pugi::xpath_node_set employee_result = query_employee.evaluate_node_set(doc);
std::cout << "Remote tool: ";
employee_result[0].node().print(std::cout);
// You can pass the context directly to select_nodes/select_node
pugi::xpath_node_set employee_result_direct = doc.select_nodes(query_string, &vars);
std::cout << "Local tool imm: ";
employee_result_direct[0].node().print(std::cout);
} The program is invoked like so: > get_user
# Usage: get_user username password
In the above example, an attacker might try to send a username and password that contains an Xpath expression, like so: get_user "blah' or 1=1 or 'a'='a" "password_irrelevant" The xpath query is written like so:
Naive variable expansion of the xpath query would expand it to:
Ideally, a parameterized xpath query API would escape these input strings, however, I don't see any mention of this in the docs (or in any issue/discussion) I've done some digging in the source, and so far haven't found an explicit escapes, at least within
I've done some digging into the actual implementation of the evaluation / parser, but it's non-trivial to actually follow the parsing logic and follow through to where the variable gets substituted, so for now I'd rather hear about what the intended behaviour is. I'd like to get feedback from the maintainers here about whether this parameterized xpath query binding is actually intended to be a safe API for xpath injection, and if that is the case, it should probably be mentioned in the docs. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
XPath variables are not escaped, nor do they need to be. They work like variables in other programming languages - there is no textual substitution taking place, instead if the value of the variable is necessary during evaluation, it's looked up and used just as any other value extracted from an XML like an attribute value would. This is of course resistant to XPath injection. I would hope all XPath implementations in existence that support XPath variables work like this, see https://www.w3.org/TR/1999/REC-xpath-19991116/#section-Basics. The link you noted uses dynamically generated query strings in its attack example, not XPath variables. |
Beta Was this translation helpful? Give feedback.
XPath variables are not escaped, nor do they need to be. They work like variables in other programming languages - there is no textual substitution taking place, instead if the value of the variable is necessary during evaluation, it's looked up and used just as any other value extracted from an XML like an attribute value would. This is of course resistant to XPath injection.
I would hope all XPath implementations in existence that support XPath variables work like this, see https://www.w3.org/TR/1999/REC-xpath-19991116/#section-Basics. The link you noted uses dynamically generated query strings in its attack example, not XPath variables.