-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RSDK-9195] ota async task #350
base: main
Are you sure you want to change the base?
[RSDK-9195] ota async task #350
Conversation
micro-rdk/Cargo.toml
Outdated
bincode = "2.0.0-rc.3" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can make this an optional dependency for ota
feature moving forward. That way it doesn't get into the regular build
will rm ws
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to just being compiled when target isn't esp32 for now. the ticket for having a native representation for the app description might use a different crate eventually so defering until then.
micro-rdk/src/common/conn/viam.rs
Outdated
@@ -622,6 +600,42 @@ where | |||
} | |||
} | |||
|
|||
#[cfg(feature = "ota")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved the whole ota section to after certs have been retrieved and stored
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need certs? I thought the certs were for us serving
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I am using the same certs for the hyper client's https connection. should I not be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try the empirical answer: does it still work if you don't use them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you were right. we just need the certificate authority, which the H2 connector is already using. removed the certificates completely.
micro-rdk/src/common/ota.rs
Outdated
struct EspAppDesc { | ||
//TODO verify this is doing what I think it's doing, validate results | ||
/// ESP_APP_DESC_MAGIC_WORD (0xABCD5432) | ||
magic_word: u32, | ||
secure_version: u32, | ||
reserv1: [u32; 2], | ||
/// application version | ||
version: [u8; 32], | ||
project_name: [u8; 32], | ||
/// compile time | ||
time: [u8; 16], | ||
/// compile date | ||
date: [u8; 16], | ||
idf_ver: [u8; 32], | ||
app_elf_sha256: [u8; 32], | ||
/// minimal eFuse block revision supported by image, in format: major * 100 + minor | ||
min_efuse_blk_rev_full: u16, | ||
/// maximal eFuse block revision supported by image, in format: major * 100 + minor | ||
max_efuse_blk_rev_full: u16, | ||
reserv2: [u32; 19], | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this for testing in native
, however we can maybe use it in place of esp_app_desc_t
entirely.
Or if we want to just use the c ffi for esp_app_desc_t
we can do an unsafe transmute on the first 256 bytes of an image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine for now, let's make a ticket to revisit. I'd prefer we ultimately use the C FFI one when possible, and declare our own for native. Ideally with some sanity checks on esp32 that the native one seems to line up with the FFI one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please link the ticket in here once you have filed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
micro-rdk/src/common/ota.rs
Outdated
EspAppDesc, | ||
bincode::config::Configuration, | ||
>( | ||
&data[..256], bincode::config::standard() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will change this to size_of::<EspAppDesc>
.await | ||
.map_err("failed to write data to partition".to_string())?; | ||
// TODO change back to 'n' after impl async writer | ||
nwritten += data.len(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will probably take the length before writing, but seems to still work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It mostly LGTM actually, just a few questions
micro-rdk/src/common/ota.rs
Outdated
let mut loader = EspFirmwareInfoLoader::new(); | ||
loader.load(&data).map_err(OtaError::EspError)?; | ||
let new_fw = loader.get_info().map_err(OtaError::EspError)?; | ||
log::info!("current firmware: {:?}", running_fw_info); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log::debug
instead (for both statements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall though, I do feel like this method should probably have some well thought out user facing logs that provide context about what the OTA system is doing.
micro-rdk/src/common/ota.rs
Outdated
let (mut sender, conn) = { | ||
http2::Builder::new(self.exec.clone()) | ||
.keep_alive_interval(Some(std::time::Duration::from_secs(120))) | ||
.keep_alive_timeout(std::time::Duration::from_secs(10)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the minimum amount of time we can use for this timeout? If a smaller timeout works, I think it would be preferred
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a lower limit you have in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, I figured we could arrive at this by lowering the duration until it no longer works on a relatively unstable network (probably do something more like binary search)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason that we know we need to adjust these from the defaults?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confirmed working without setting keep_alive_interval
and using default timeout (20 seconds).
|
||
#[cfg(feature = "esp32")] | ||
type OtaConnector = crate::esp32::tcp::Esp32H2Connector; | ||
#[cfg(not(feature = "esp32"))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe I forgot from the previous PR, but why are we allowing cases where we're not on an esp32? Isn't OTA meaningless otherwise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
faster iteration when testing, specifically around hyper, config checks, and inspecting the binary to be downloaded (recommended by nico)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair for development purposes maybe, but isn't this confusing for merged-in code? It took me a while to see that OTA compiles for native but makes http calls that essentially do nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean by nothing. The calls are for downloading the OTA binary and inspecting it. This would also be useful for verifying future work we talked about for editing the binary signatures directly.
Should I put it back and take it out between every PR for this feature? Or clean it up at the end if it doesn't prove useful beyond the planned work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess to that point, what is the purpose of having a native build?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the native build still tests the connection infrastructure abstraction and acts as a server. In native "OTA", there's no updating happening so it doesn't feel like a fair comparison. But we can leave it in and revisit at the end of the epic
|
||
// TODO(RSDK-9200): set according to active partition scheme | ||
const OTA_MAX_IMAGE_SIZE: usize = 1024 * 1024 * 4; // 4MB | ||
pub const OTA_MODEL_TYPE: &str = "ota_service"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this just be a slice of OTA_MODEL_TRIPLET so they can't get out of phase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried doing that initially, but was having issues with using a const str to define another const str, will look into it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accomplished using once_cell::sync::Lazy, though not a fan of the deref, but it works.
micro-rdk/src/common/ota.rs
Outdated
struct EspAppDesc { | ||
//TODO verify this is doing what I think it's doing, validate results | ||
/// ESP_APP_DESC_MAGIC_WORD (0xABCD5432) | ||
magic_word: u32, | ||
secure_version: u32, | ||
reserv1: [u32; 2], | ||
/// application version | ||
version: [u8; 32], | ||
project_name: [u8; 32], | ||
/// compile time | ||
time: [u8; 16], | ||
/// compile date | ||
date: [u8; 16], | ||
idf_ver: [u8; 32], | ||
app_elf_sha256: [u8; 32], | ||
/// minimal eFuse block revision supported by image, in format: major * 100 + minor | ||
min_efuse_blk_rev_full: u16, | ||
/// maximal eFuse block revision supported by image, in format: major * 100 + minor | ||
max_efuse_blk_rev_full: u16, | ||
reserv2: [u32; 19], | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine for now, let's make a ticket to revisit. I'd prefer we ultimately use the C FFI one when possible, and declare our own for native. Ideally with some sanity checks on esp32 that the native one seems to line up with the FFI one.
micro-rdk/src/common/ota.rs
Outdated
reserv2: [u32; 19], | ||
} | ||
|
||
// TODO(RSDK-9214) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When TODO(TICKET)
please also include a little explanation of what is TODO. It can be brief, but that way the reader can get a sense without needing to jump over to the ticket.
micro-rdk/src/common/ota.rs
Outdated
cert: TlsCertificate, | ||
exec: Executor, | ||
) -> Result<Self, OtaError> { | ||
// TODO(RSDK-9205) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto more details on comment
let mut total_downloaded: usize = 0; | ||
let mut got_info = false; | ||
|
||
while let Some(next) = response.frame().await { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any ability to exercise control over the max size of the frames we will get?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it appears the closest thing would be the range
header, to request a specific range of bytes to be served (could be useful for resumable download). this would turn into mutiple requests though, one for each segment of our max size.
However, there's no guarantee that the server supports ranges and that would have to be verified and handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
micro-rdk/src/common/ota.rs
Outdated
if total_downloaded < core::mem::size_of::<EspAppDesc>() { | ||
log::error!("initial frame too small to retrieve esp_app_desc_t"); | ||
} else { | ||
log::info!("data length {}", data.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a debug log, at best, or better removed
micro-rdk/src/common/ota.rs
Outdated
let mut loader = EspFirmwareInfoLoader::new(); | ||
loader.load(&data).map_err(OtaError::EspError)?; | ||
let new_fw = loader.get_info().map_err(OtaError::EspError)?; | ||
log::info!("current firmware: {:?}", running_fw_info); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall though, I do feel like this method should probably have some well thought out user facing logs that provide context about what the OTA system is doing.
log::info!("new firmware: {:?}", new_fw); | ||
if let Some(ref running_fw) = running_fw_info { | ||
if running_fw.version == new_fw.version | ||
&& running_fw.released == new_fw.released |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is released
here and what does it mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the compiled timestamp we talked about before
micro-rdk/src/common/conn/viam.rs
Outdated
@@ -622,6 +600,42 @@ where | |||
} | |||
} | |||
|
|||
#[cfg(feature = "ota")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need certs? I thought the certs were for us serving
d569cde
to
f3ccf5c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -219,7 +219,7 @@ impl Future for AsyncioSleep { | |||
} | |||
|
|||
#[derive(Default, Clone, Debug)] | |||
struct H2Timer; | |||
pub struct H2Timer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub(crate)
instead?
The following PR:
The errors will get handled better as part of RSDK-9214