Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time out #90

Open
smoosies-dev opened this issue Oct 4, 2023 · 4 comments · May be fixed by #96
Open

Time out #90

smoosies-dev opened this issue Oct 4, 2023 · 4 comments · May be fixed by #96

Comments

@smoosies-dev
Copy link

I have hundreds of thousands of data to index and I have a timeout when indexing this module. Would it be possible to send in batches to avoid this?

@npotier
Copy link
Member

npotier commented Oct 4, 2023

Hello @smoosies-dev the typesense:import command normaly works as a batch import.

Did you try to use the max-per-page option (the default value is 100) to increase the number of documents indexed by iterations ?

@smoosies-dev
Copy link
Author

smoosies-dev commented Oct 4, 2023

I wanted to index 5,800,000 rows and I have a timeout on each try and I did not put a specific parameter, not seen in the Bundle documentation. So the create function of importCommand.php I added a small batch division

protected function execute(InputInterface $input, OutputInterface $output): int
    {
        $io = new SymfonyStyle($input, $output);
        if (!in_array($input->getOption('action'), self::ACTIONS, true)) {
            $io->error('Action option only takes the values : "create", "upsert" or "update"');
            return 1;
        }
        $action = $input->getOption('action');
        $this->em->getConnection()->getConfiguration()->setSQLLogger(null);
        $execStart = microtime(true);
        $populated = 0;
        $io->newLine();
        $collectionDefinitions = $this->collectionManager->getCollectionDefinitions();
        foreach ($collectionDefinitions as $collectionDefinition) {
            $collectionName = $collectionDefinition['typesense_name'];
            $class          = $collectionDefinition['entity'];
            $q = $this->em->createQuery('select e from '.$class.' e');
            $page = 1;
            $batchSize = 50000;
            while (true) {
                $q->setFirstResult(($page - 1) * $batchSize)->setMaxResults($batchSize);
                $entities = $q->toIterable();
                $nbEntities = 0;
                $data = [];
                foreach ($entities as $entity) {
                    $nbEntities++;
                    $data[] = $this->transformer->convert($entity);
                }
                if ($nbEntities === 0) {
                    break;
                }
                $populated += $nbEntities;
                $result = $this->documentManager->import($collectionName, $data, $action);
                if ($this->printErrors($io, $result)) {
                    $this->isError = true;
                    $io->error('Error happened during the import of the collection : '.$collectionName.' (you can see them with the option -v)');
                    return 2;
                }
                $io->text('---------------------------------Import <info>['.$collectionName.'] '.$class.', page='.$page.'</info>');
                $page++;
            }
            $io->text('Import <info>['.$collectionName.'] '.$class.'</info>');
            $io->newLine();
        }
        $io->newLine();
        if (!$this->isError) {
            $io->success(sprintf(
                '%s element%s populated in %s seconds',
                $populated,
                $populated > 1 ? 's' : '',
                round(microtime(true) - $execStart, PHP_ROUND_HALF_DOWN)
            ));
        }
        return 0;
    }

@james2001
Copy link
Contributor

I'm working on a optimisation

image

I stopped using entities and used arrays instead

->toIterable(hydrationMode: AbstractQuery::HYDRATE_ARRAY);

@smoosies-dev
Copy link
Author

ok thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants