-
Notifications
You must be signed in to change notification settings - Fork 3
YOLOv7 Encapsulation
The implementation of ObjectDetector
exemplifies an encapsulated design. In SNPEPipeline
, it's mentioned that the Inference SDK is generally universal for most models, with the only difference lying in the handling of input and output data, referred to as pre/post-processing. This task is carried out by ObjectDetector.cpp
.
bool
Detector::init(const std::string &model_path) {
m_snpe_task = std::move(std::unique_ptr<snpe::SNPEPipeline>(new snpe::SNPEPipeline()));
if (!m_snpe_task->init(model_path)) {
std::cerr << "[error] failed to initialize snpe instance" << std::endl;
return false;
}
m_isInit = m_snpe_task->isInit();
return m_isInit;
}
The basic initialization information includes
- Initializing an instance of
SNPEPipeline
- Allocating memory for post-processing.
After preparing the image, it can be passed to the SNPE instance to get ready for forward inference. However, to minimize influencing factors, in addition to scaling the image, normalization is typically performed before inference
bool
Detector::preprocess(cv::Mat &frame) {
cv::resize(frame, frame, cv::Size(416, 416), cv::INTER_LINEAR);
std::vector<float> input_vec;
for (float y = 0; y < 416; y++) {
for (float x = 0; x < 416; x++) {
cv::Vec3b value = frame.at<cv::Vec3b>(y, x);
float r = static_cast<float>(value[2]) / 255.0f;
float g = static_cast<float>(value[1]) / 255.0f;
float b = static_cast<float>(value[0]) / 255.0f;
input_vec.push_back(r);
input_vec.push_back(g);
input_vec.push_back(b);
}
}
m_snpe_task->loadInputTensor(input_vec);
return true;
}
The core of the pre-processing involves using the loadInputTensor()
interface to obtain the starting address for the input layer. The incoming image is then copied to this address.
For different networks, the output of the forward propagation varies. However, for most applications, the primary information includes landmarks and confidence scores. These two types of information constitute the final results of inference. However, for ordinary users, this data may not be visually intuitive. Therefore, it's necessary to perform a certain level of interpretation on these outputs — a step commonly referred to as post-processing.
In the case of Python, the existence of certain libraries makes operations on high-dimensional matrix data very straightforward. Operations can be performed on an entire list along a specific dimension. However, in C++, due to the absence of such libraries, matrix operations often require nested for loops. The process of post-processing typically involves translating Python code into C++ code, and for the sake of convenience, some data copying may be involved
See the ObjectDetector.cpp
for more information