Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't load speedspeech onnx file #1263

Open
xd009642 opened this issue Nov 17, 2023 · 6 comments
Open

Can't load speedspeech onnx file #1263

xd009642 opened this issue Nov 17, 2023 · 6 comments

Comments

@xd009642
Copy link

Uploaded a zip of the model

speedyspeech.zip

Error: Failed analyse for node #1250 "/Pow" Pow

Caused by:
0: Infering facts
1: Applying rule outputs[0].shape == ?
2: Unifying shapes and ?
3: Impossible to unify closed shapes of different rank (found and ?).

The code in question (as an aside wondering how to tell what the generics should be):

use super::*;
use tract_onnx::prelude::*;
use ndarray::Array2;
use std::path::Path;


pub struct SpeedyTract {
   // model: RunnableModel<F, O, M>
}

impl SpeedyTract {
    #[must_use]
    pub fn load(path: impl AsRef<Path>) -> anyhow::Result<Self> {
        let model = tract_onnx::onnx()
            .model_for_path(path)?
            .into_optimized()?
            .into_runnable()?;

        todo!()
 //       Ok(Self{ model })
    }
}
@xd009642
Copy link
Author

Okay on further look this disappears if I remove into_optimized() call. So removing that to try and continue to play with it.

@xd009642
Copy link
Author

Latest code that fails with:

Error: Evaluating #1195 "/SequenceEmpty" Unimplemented(SequenceEmpty)

Caused by:
stateless evaluation not implemented

Still having a play around, I will say the level of ONNX support is impressive, all the other rust solutions I've tried have failed much much earlier and with less/no actionable logs!

use super::*;
use anyhow::Context;
use tract_onnx::prelude::*;
use tract_onnx::tract_hir::infer::InferenceOp;
use ndarray::Array2;
use std::path::Path;


pub struct SpeedyTract {
    model: SimplePlan<InferenceFact, Box<dyn InferenceOp>, Graph<InferenceFact, Box<dyn InferenceOp>>>,
    phoneme_ids: Vec<Unit>,
}


impl SpeedyTract {
    #[must_use]
    pub fn load(path: impl AsRef<Path>) -> anyhow::Result<Self> {
        let model = tract_onnx::onnx()
            .model_for_path(path)
            .context("loading ONNX file")?
        // https://github.com/sonos/tract/issues/1263
        //    .into_optimized()
        //    .context("optimising graph")?
            .into_runnable()
            .context("converting to runnable model")?;

        Ok(Self{ 
            model,
            phoneme_ids: generate_id_list(),
        })
    }
    
    pub fn infer(&self, units: &[Unit]) -> anyhow::Result<Array2<f32>> {
        let phonemes = units
            .iter()
            .map(|x| best_match_for_unit(x, &self.phoneme_ids))
            .collect::<Vec<_>>(); // This is a Vec<i64>

        let tensor = Tensor::from_shape(&[1, units.len()], &phonemes)?;
        let plen = Tensor::from(units.len() as i64);

        let result = self.model.run(tvec!(tensor.into(), plen.into()))?;

        tracing::info!("Result: {:?}", result);

        todo!()
    }
}

@kali
Copy link
Collaborator

kali commented Nov 17, 2023

Thanks for the kind words, but Sequences (and Maps) are not supported, and are very not on the roadmap. Is there any chance you model could be refactored without sequences ?

@xd009642
Copy link
Author

I don't think so tbh, it's a TTS model and as such works on variable input lengths. I wouldn't mind looking into implementing sequences if a PR would be accepted - but naturally I'm new to the code and internals so that might not feasible without at least pointing in the general direct.

@kali
Copy link
Collaborator

kali commented Nov 18, 2023

Sequences in tract would be a massively epic overhaul. tract "variables" are Tensors of known fixed rank and "symbolic dimensions". Changing this is huge and would probably have long-term impact on code complexity, maintainability and performance. So don't start hacking tensors sequences, you would most likely drown yourself in it or I would probably have to reject the PR. Let's look a other options first.

You may be aware of it, but tract main application is actually voice processing and we manage to do everything we need without tensor sequences, including dealing with variable lengths and/or "infinite" inputs. Recurring networks is the traditional way, but tract state management, network pulsification and symbolic dimension management gave us the flexibility we needed to.

The only kind of generalization I think tensor sequence could bring to the table would be to represent a time-based sequence of tensors having a varying dimension on a non-time axis. This is super exotic. I have never been shown such a design yet.

OK, so what can we do ? I had a quick look at the network, most of it look fine, then there is a Loop that takes an empty sequence as input, push stuff in there and then the sequence is made into a tensor again. Well, bad news, the Loop is not supported either. tract has only support for Scan (supporting Loop is actually an ongoing background relatively low-priority task).

Scan does a bit of what the Loop plus Sequence seems to do here: builds an output tensor by concatenating chunks of data together. The main restriction between the scan and the loop is the scan performs a fixed number of iteration determined by the time dimension of its input (which can be symbolic, it only has to be determined when the Scan starts). The loop, on the other hand can stop the iteration based on a runtime condition computed as part of the loop body itself. That is not something that can be done with scan.

How familiar are you with this model design ? Am I making sense here ?

@xd009642
Copy link
Author

Yeah that all makes sense thanks. I'm more aware of the model design from the details in the paper, not sure how well that maps to the actual implementation. From the phonemes going in it generates a number of frames duration for each phoneme and then for phoneme + duration it will generate the necessary spectrogram frames.

I was going to look at using the torch tracing to generate a model with a longer input context than I need, but I'm a bit concerned that the loop is for the phoneme durations and therefore dynamic and it might not work as I hope if I pick my dummy input for tracing wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants