Skip to content

Latest commit

 

History

History
172 lines (136 loc) · 11.9 KB

0032-2020-06-20.md

File metadata and controls

172 lines (136 loc) · 11.9 KB

20 Jun 2020

test05i

Optimisation/Fitting

  • Initially just a copy of test05h. Doesn't "Fit", as test05h didn't before I changed it to "Density" optimisation in the Fit stage.
  • I tried first just changing the existing XST (Synthesis) speed-based optimisation from "Normal" to "High". This may have made it worse, requiring 51 equations instead of 50 equations.
  • I changed it to "Goal: Area; Effort: Normal". This didn't work either. The text report shows what failed to map:
    *************************  Summary of UnMapped Logic  ************************
    
    ** 4 Buried Nodes **
    
    Signal                                Total Total User
    Name                                  Pts   Inps  Assignment
    U1/tone<1>                            2     2     
    U1/tone<2>                            2     3     
    U1/tone<3>                            2     4     
    U1/tone<4>                            2     5     
    
    ...I think Speed/High mode failed to map 6 signals? I do wonder whether rearranging the logic could push for a better result, since I don't know that we need speed and timing precision for this design.
  • Changing to "Area/High" still doesn't help so I changed back to "Speed/Normal". I then tried the "Exhaustive Fit Mode" (see here for more info).
    • It can be seen that this iteratively runs cpldfit with different "Collapsing Input Limit" and "Collapsing Product Term Limit" combinations, hoping to find one that succeeds
    • It is relatively fast for each try, with such a small design, but we could still be looking at something on the order of over 2,000 iterations, with each taking a second or more in my VM.
  • As it happens, Exhaustive Fit Mode did manage to find a solution after only a few minutes, which it determined with these parameters:
    INFO:Cpld:994 - Exhaustive fitting is trying pterm limit: 22 and input limit: 54
    
    ...and these overall stats:
    Design Name: test05i                             Date:  6-20-2020,  2:04AM
    Device Used: XC9572XL-7-VQ64
    Fitting Status: Successful
    
    *************************  Mapped Resource Summary  **************************
    
    Macrocells     Product Terms    Function Block   Registers      Pins           
    Used/Tot       Used/Tot         Inps Used/Tot    Used/Tot       Used/Tot       
    51 /72  ( 71%) 288 /360  ( 80%) 111/216 ( 51%)   47 /72  ( 65%) 6  /52  ( 12%)
    
    ** Function Block Resources **
    
    Function    Mcells      FB Inps     Pterms      IO          
    Block       Used/Tot    Used/Tot    Used/Tot    Used/Tot    
    FB1           8/18       24/54       90/90*      2/13
    FB2          12/18       27/54       78/90       2/13
    FB3          18/18*      28/54       36/90       1/14
    FB4          13/18       32/54       84/90       1/12
                 -----       -----       -----      -----    
                 51/72      111/216     288/360      6/52 
    
  • Compare this with a non-exhaustive "Density" optimisation:
    Design Name: test05i                             Date:  6-20-2020,  2:09AM
    Device Used: XC9572XL-7-VQ64
    Fitting Status: Successful
    
    *************************  Mapped Resource Summary  **************************
    
    Macrocells     Product Terms    Function Block   Registers      Pins           
    Used/Tot       Used/Tot         Inps Used/Tot    Used/Tot       Used/Tot       
    62 /72  ( 86%) 186 /360  ( 52%) 104/216 ( 48%)   47 /72  ( 65%) 6  /52  ( 12%)
    
    ** Function Block Resources **
    
    Function    Mcells      FB Inps     Pterms      IO          
    Block       Used/Tot    Used/Tot    Used/Tot    Used/Tot    
    FB1          18/18*      29/54       35/90       2/13
    FB2          18/18*      32/54       69/90       2/13
    FB3          17/18       32/54       67/90       1/14
    FB4           9/18       11/54       15/90       1/12
                 -----       -----       -----      -----    
                 62/72      104/216     186/360      6/52 
    
    By default, this is using -inputs 54 and -pterms 90.
  • For now, I will just leave this on "Density" optimisation, but I know I could use Exhaustive Fitting if I need it later.

Adapting MusicBox4 tutorial code

So MusicBox4 might be a little wrong...? The ROM code is only included in this link at the bottom of the page, and the rest of the code in that archive is different from what the page recommends we change.

Hence, I adapted what was in the actual archive to fit with the code I have in test05i (copied from test05h).

This produced a design that wouldn't initially fit, even with 'Density' optimisation. I let it run with Exhaustive Fitting, and it did take a long time. I'm not sure how long but according to timestamps on various files it took 3 hours and 7 minutes.

The solution was:

INFO:Cpld:994 - Exhaustive fitting is trying pterm limit: 14 and input limit: 15

...with these results:

Design Name: test05i                             Date:  6-20-2020,  6:32AM
Device Used: XC9572XL-7-VQ64
Fitting Status: Successful

*************************  Mapped Resource Summary  **************************

Macrocells     Product Terms    Function Block   Registers      Pins           
Used/Tot       Used/Tot         Inps Used/Tot    Used/Tot       Used/Tot       
70 /72  ( 97%) 267 /360  ( 74%) 138/216 ( 64%)   55 /72  ( 76%) 6  /52  ( 12%)

** Function Block Resources **

Function    Mcells      FB Inps     Pterms      IO          
Block       Used/Tot    Used/Tot    Used/Tot    Used/Tot    
FB1          18/18*      41/54       67/90       2/13
FB2          17/18       36/54       84/90       2/13
FB3          18/18*      29/54       56/90       1/14
FB4          17/18       32/54       60/90       1/12
             -----       -----       -----      -----    
             70/72      138/216     267/360      6/52 

Anyway, these were the specific changes I made from the base test05h code to produce test05i:

  • Moved divide_by_12 module into main music.v code.
  • Added the inline melody ROM, but I also changed the tail end of it from the original:
        240: note<= 8'd25;
        241: note<= 8'd0;
        242: note<= 8'd00;
        default: note <= 8'd0;
    endcase
    ...to now alternate the last few notes rapidly between notes 2 octaves apart:
        240: note<= 25;
        241: note<= 60;
        242: note<= 36;
        243: note<= 60;
        244: note<= 36;
        245: note<= 60;
        246: note<= 36;
        247: note<= 60;
        248: note<= 36;
        249: note<= 60;
        250: note<= 36;
    default: note <= 0;
    endcase
    NOTE: This change breaks the fitting step, again. For more info, see below.
  • Added the instance of music_ROM to read notes into fullnote.
  • Extended main tone counter from 28 bits to 30 bits, because now instead of counting the 64 notes (upper 6 bits) to play in a scale, we are counting 256 note addresses (upper 8 bits) to retrieve from the inline ROM.
  • Simplified counter_octave to use a bit shift instead of hardcoded constants.
  • Added fullnote!=0 check check; if the note is 0, the speaker is silent.
  • Where tone[21:0] counts the duration of each note (222 cycles at 50MHz, or ~168ms), we check for when the top 4 bits are 0000, and silence the speaker during that time, which basically means the first ~10ms of each note is silent, to sound like a tiny gap.

Anyway, the melody it plays is "Rudolph the Red-Nosed Reindeer". It's the whole song, except for the last few notes, which were left out of the ROM for some reason.

Note that when I made the change to the end of the ROM, as described in one of the steps above, it broke the fitting step again, so I'm running it overnight with the most strict settings I can... though it's possible my XST change for "Area/High" will trip me up; I'm not sure. It's also possible the design simply won't fit as-is, in which case I might consider taking the 50MHz-to-25MHz clock halver and breaking it down to maybe 12.5MHz or even 6.25MHz. This will add 2 more bits to reg clk25, but hence probably save us 4 bits in total, between tone and counter_note (and don't forget to match div).

There is one other line I think I could change, though I'm not sure if it would make much difference. It could be this, perhaps?

  always @(posedge clk) if (counter_note==0 && octave!=0) counter_octave <= counter_octave==0 ? 5'b11111 >> octave : counter_octave-1;

...i.e. don't constantly assign 0 back into counter_octave.

I ended up changing the code to use a 6.25MHz clock (by dividing 50MHz by 8), so I was able to save on some repeated counter bits in music, and this made the fitting work on the first try. I also put the last line of the song in, but had to trim the last note to allow a gap of silence before it loops back to the start.

Notes

  • I should learn more about Parameterized Modules which allow us to do things like define a module with a variable number of bits (i.e. variable vector size). See also: this and this.
  • AR# 40377: Info on possible work-arounds if XST crashes.