Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hive metastore client times out while listing the large number of tables. #24164

Open
vburenin opened this issue Nov 18, 2024 · 1 comment
Open

Comments

@vburenin
Copy link

vburenin commented Nov 18, 2024

I am finalizing migration from Trino 419 to Trino 464 and running into the issues of getting a large list of tables, closer to 200k in a single schema. My timeouts are set to 300s. Trino 419 is capable of returning result within couple seconds.

The problem appears to be a change in ThriftHiveMetastore that handles how tables metadata is received:

    @Override
    public List<TableMeta> getTables(String databaseName)
    {
        try {
            return retry()
                    .stopOn(NoSuchObjectException.class)
                    .stopOnIllegalExceptions()
                    .run("getTables", () -> {
                        try (ThriftMetastoreClient client = createMetastoreClient()) {
                            return client.getTableMeta(databaseName);
                        }
                    });
        }

In trino 419 the method is called differently and also invokes a different method.

    @Override
    public List<String> getAllTables(String databaseName)
    {
        try {
            return retry()
                    .stopOn(NoSuchObjectException.class)
                    .stopOnIllegalExceptions()
                    .run("getAllTables", () -> {
                        try (ThriftMetastoreClient client = createMetastoreClient()) {
                            return client.getAllTables(databaseName);
                        }
                    });
        }
        catch (NoSuchObjectException e) {
            return ImmutableList.of();
        }
        catch (TException e) {
            throw new TrinoException(HIVE_METASTORE_ERROR, e);
        }
        catch (Exception e) {
            throw propagate(e);
        }
    }
@vburenin
Copy link
Author

vburenin commented Nov 18, 2024

After digging deeper, I found this parameter:
metastoreSupportsTableMeta that is hardcoded as True.

    protected ThriftMetastoreClient create(TransportSupplier transportSupplier, String hostname)
            throws TTransportException
    {
        return new ThriftHiveMetastoreClient(
                transportSupplier,
                hostname,
                catalogName,
                metastoreSupportsDateStatistics,
                true,
                chosenGetTableAlternative,
                chosenAlterTransactionalTableAlternative,
                chosenAlterPartitionsAlternative);
    }

Later on it is used in ThriftHiveMetastoreClient:

    @Override
    public List<TableMeta> getTableMeta(String databaseName)
            throws TException
    {
        // TODO: remove this once Unity adds support for getTableMeta
        if (!metastoreSupportsTableMeta) {
            String catalogDatabaseName = prependCatalogToDbName(catalogName, databaseName);
            Map<String, TableMeta> tables = new HashMap<>();
            client.getTables(catalogDatabaseName, ".*").forEach(name -> tables.put(name, new TableMeta(databaseName, name, RelationType.TABLE.toString())));
            client.getTablesByType(catalogDatabaseName, ".*", VIRTUAL_VIEW.name()).forEach(name -> {
                TableMeta tableMeta = new TableMeta(databaseName, name, VIRTUAL_VIEW.name());
                // This makes all views look like a Trino view, so that they are not filtered out during SHOW VIEWS
                tableMeta.setComments(PRESTO_VIEW_COMMENT);
                tables.put(name, tableMeta);
            });
            return ImmutableList.copyOf(tables.values());
        }

        if (databaseName.indexOf('*') >= 0 || databaseName.indexOf('|') >= 0) {
            // in this case we replace any pipes with a glob and then filter the output
            return client.getTableMeta(prependCatalogToDbName(catalogName, databaseName.replace('|', '*')), "*", ImmutableList.of()).stream()
                    .filter(tableMeta -> tableMeta.getDbName().equals(databaseName))
                    .collect(toImmutableList());
        }
        return client.getTableMeta(prependCatalogToDbName(catalogName, databaseName), "*", ImmutableList.of());
    }

Once I changed that value to False, everything went back to normal.

I think that TODO is irrelevant and has to be given as a configuration option, otherwise large schemas become unusable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant