Transact-SQL
Reinforcement Learning
R Programming
React Native
Python Design Patterns
Python Pillow
Python Turtle
Verbal Ability
Interview Questions
Company Questions
Artificial Intelligence
Cloud Computing
Data Science
Machine Learning
Data Structures
Operating System
Computer Network
Compiler Design
Computer Organization
Discrete Mathematics
Ethical Hacking
Computer Graphics
Software Engineering
Web Technology
Cyber Security
C Programming
Control System
Data Mining
Data Warehouse
There are Two types of Procedural Assignments in Verilog.
To learn more about Delay: Read Delay in Assignment (#) in Verilog
Blocking assignments
Non-Blocking assignments
To learn more about Blocking and Non_Blocking Assignments: Read Synthesis and Functioning of Blocking and Non-Blocking Assignments
The following example shows interactions between blocking and non-blocking for simulation only (not for synthesis).
For Synthesis (Points to Remember):
Leave a reply cancel reply.
Your email address will not be published. Required fields are marked *
Save my name, email, and website in this browser for the next time I comment.
Notify me of follow-up comments by email.
Notify me of new posts by email.
Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
What is the difference between x=#5 a; where a blocking statement is used
And x<=#5 a; where a non blocking statement is used
If you've a blocking assignment statement it'll be executed in the order that's specified in a sequential block. For example,
The a is assigned to x at simulation time 5, while b is assigned to y at simulation time 10. Now consider nonblocking assignment statements with intra-assignment delays that follow in a sequential block:
In the above case both a and b are concurrently assigned to x and y at simulation time 5. In other words, non-blocking assignment statements don't block execution of other statements that follow in a sequential block. Or it can be said that the ordering of non-blocking assignment statements in a sequential block doesn't matter at all.
Sign up or log in, post as a guest.
Required, but never shown
By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .
Sep 21, 2022
I’ve now spent more time than I want to admit to debugging simulation issues when using Verilog’s simulation semantics. Let me therefore share some problems I’ve come across, together with my proposed solution for them.
Today’s problem stems from logic like the following:
In general, this comes to me in “working” simulation code that’s been handed down to me to maintain. The simulations that use this logic often take hours to run, and so debugging this sort of thing can be very time consuming. (Costly too–my hourly rate isn’t cheap.)
Let’s walk through this logic for a moment–before tearing it apart.
In this example the first condition, the one I’ve called trigger_condition above, is simply some form of data change condition. Sometimes its a reference to a clock edge, sometimes its a reference to a particular piece of data changing. This isn’t the problem.
The second condition, some_other_condition_determining_relevance , is used to weed out all the times the always block might get triggered when you don’t want it to be. For example, it might be triggered during reset or when the slave device being modeled is currently responsive to some other trigger_condition . This is natural. This is not (yet) a problem either.
So what’s the problem with the logic above? Well, let’s start with the #1 assignment delay. In this case, it’s not representing a true hardware delay. No, the #1 is there in order to schedule Verilog simulation statement execution. Part of the reason why it’s there is because the rest of the block uses blocking logic (i.e. via the = ). Hence, if this block was triggered off of a clock edge, the #1 allows us to reason about what follows the clock edge but before the next edge.
Now, let me ask, what happens five years from now when clock speeds get faster? Some poor soul (like me) will be hired to maintain this logic, and that poor soul will look at the #1 and ask, why is this here? Maybe it was a 1ns delay, and they are now trying to run a clock at 500MHz instead of 100MHz. That 1ns delay will need to be understood, and replaced– everywhere it was used. It doesn’t help that the 1ns doesn’t come with any explanations, but that may be specific to the examples I’m debugging.
Here’s a second problem, illustrated in Fig. 2: what happens when you use this one nanosecond delay in multiple always blocks, similar to this one, all depending on each other? Which one will execute first?
The third problem often follows this one, and it involves a wait statement of some type. To illustrate this, let me modify the example above a bit more.
In this case, the user wants to make certain his logic is constant across the clock edge, and so he sets all his values on the negative edge of the clock. This leads to two problems: what happens when the #1 delay conflicts with the clock edge? And what happens when the output value depends upon other inputs that are set on the negative clock edge?
Fig. 3 shows another problem, this time when using a case statement. In this case, it’s an attempt to implement a command structure within a modeled device. The device can handle one of many commands, so depending on which one is received you go and process that command. The actual example this is drawn from was worse, since it depended not only on commands but rather command sequences, and the command sequences were found within case statements within case statements.
What’s wrong with this? Well, what happens when the original trigger takes place a second time, but the logic in the always block hasn’t finished executing? Perhaps this is erroneous. Perhaps it finishes just barely on the wrong side of the next clock edge. In my case, I find the bug four hours later–on a good day. It doesn’t help that simulations tend to run rather slow.
A better approach would’ve been to use a state machine rather than embedded tasks. Why is this better? Well, if for no other reason, a case statement would contain state variables which could be seen in the trace file. That means that you could then find and debug what would (or should) happen when/if the new command trigger shows up before a prior command completes.
These problems are only compounded when this logic is copied. For example, imagine a device that can do tasks A, B, and C, but requires one of two IO protocols to accomplish task A, B, or C. Now, if that IO protocol logic is copied and embedded into each of the protocol tasks, then all three will need to be updated when the IO protocol is upgraded. (I2C becomes I3C, SPI becomes Quad SPI, etc.)
While some of these problems are specific to hardware, many are not. Magic numbers are a bad idea in both RTL and software. Design reuse and software reuse are both very real things. Even a carpenter will build a custom jig of some type when he has to make fifty copies of the same item.
The good news is that better approaches exist.
Before diving into some better approaches, let me take just a couple of moments to introduce the terms I will be using. In general, a test bench has three basic (types of) components, as illustrated in Fig. 6.
The Device Under Test (DUT) : The is the hardware component that’s being designed, and for which the test has been generated.
Since the DUT is intended to be synthesizable, Verilog delay statements are inappropriate here.
The Hardware Device Model, or just model : Our hardware component is being designed to interact with an external piece of hardware. This component is often off-chip, and so our “model” is a simulation component designed to interact with our IP in the same way the actual hardware would.
Although I’ve called these “models” “emulators” in the past, these aren’t truly “emulators”. An “emulator” would imply a description of the actual hardware existed, such as an RTL description, yielding an additional level of realism in simulation. Barring sufficient information from the external device’s manufacturer to actually and truly “emulate” the device, the test designer often settles for a “model” instead.
Hardware models may naturally require Verilog delays in order to model the interfaces they are designed for. For example, a signal may take some time to transition from a known value to an unknown one following a clock transition. As another example, a hardware device may become busy following a command of some kind. The good news is that Verilog can model both of these behaviors nicely.
How to handle these delays “properly” will become part of the discussion below.
The Test Script, or driver : This is the component of the design that interacts with the device under test, sequencing commands to given to it to make sure all of the capabilities of the DUT are properly tested.
This component of the Verilog test script often reads more like it is software than hardware. Indeed, we’ve already discussed the idea of replacing the test script with a piece of software compiled for a soft-core CPU existing in the test environment, and then emulating that CPU as part of the simulation model . The benefit of this approach is that it can test and verify the software that will be used to drive the hardware under test. The downside is that simulation’s are slow, and adding a CPU to the simulation environment can only slow it down further.
For the purposes of our discussion today I’ll simply note that the test script commonly interacts with the design in a synchronous manner. Any delays, therefore, need to be synchronized with the clock.
There is another problem with the driver that we won’t be discussing today. This is the simple reality that there’s no way to test all possible driver delays. Will a test driver accurately test if your DUT can handle back to back requests, requests separated by a single clock cycle, by two clock cycles, by N clock cycles? You can’t simulate all of these possible delays, but you can catch them using formal methods.
Not shown in Fig. 6, but also relevant is the Simulation Environment : While the DUT and model are both necessary components of any simulation environment, the environment might also contains such additional components as an AXI interconnect , CPU , DMA, and/or RAM , all of which are neither the test script, DUT, or model.
Ideally these extra components will have been tested and verified in other projects prior to the current one, although this isn’t always the case.
Now that we’ve taken a moment to define our terms, we can now return to the simulation modeling problem we began.
The good news is that Verilog was originally written as a language for driving simulations.
Even better, subsets of Verilog exist which can do a good job of modeling synthesizable logic. This applies to both asynchronous and synchronous logic. The assignment delay problems that I’ve outlined above, however, arise from trying to use Verilog to model a mix of logic and software when the goal was to create a hardware device model.
Here are some tips, therefore, for using delays in Verilog:
Write synthesizable simulation logic where possible.
This is really only an issue for test bench or modeling logic. It’s not really an issue for logic that was meant to be synthesizable in the first place.
The good news about writing test bench logic in a synthesizable fashion is that you might gain the ability to synthesize your model in hardware, and then run tests on it just that much faster. You could then also get a second benefit by formally verifying your device model–it’d save you that much time later when running integrated simulations.
As an example, compare the following two approaches for verifying a test chip:
ASIC Test chip #1: Has an SPI port capable of driving internal registers. This is actually a really good idea, since you can reduce the number of wires necessary to connect to such a test chip. The problem, however, was that the SPI driver came from encrypted vendor IP. Why was this a problem? It became a problem when the test team tried to connect to the device once it had been realized in hardware. They tried to connect their CPU to this same SPI port to drive it–and then didn’t drive it according to protocol properly.
The result of testing ASIC test chip #1? I got a panic’d call from a client, complaining that the SPI interface to the test chip wasn’t working and asking if I could find the bugs in it.
ASIC Test chip #2: Also has a SPI port for reading and writing internal registers. In this chip, however, the SPI port was formally verified as a composition of both the writer and the reader–much as Fig. 7 shows below.
I say “much as Fig. 7 shows” because the verification of this port wasn’t done with using the CPU as part of the test script. However, because both the SPI master and SPI slave were verified together, and even better because they were formally verified in an environment containing both components, the test team can begin it’s work with a verified RTL interface.
You can even go one step farther by using a soft-core CPU to verify the software driver at the same time. This is the full extent of what’s shown in Fig. 7. As I mentioned above, the formal verification for ASIC test chip #2 stopped at the AXI-lite control port for the SPI master. When testing this chip as part of an integrated test, a test script was used to drive a Bus Functional Model (BFM), rather than actual CPU software. However, if you just read the test script’s calls to the BFM, you would have the information necessary to build a verified software driver.
Use always @(*) for combinatorial blocks, and always @(posedge clk) (or negedge) or always @(posedge clk or negedge reset_n) for synchronous logic.
While I like using the positive edge of a clock for everything, the actual edge you need to use will likely be determined by the device and protocol you are modeling. The same is true of the reset.
I would discourage the use of always @(trigger) , where trigger is some combinatorial signal–lest you forget some required trigger component. I would also discourage the use of any always @(posedge trigger) blocks where trigger wasn’t a true clock–lest you create a race condition within your logic. I use the word discourage , however, because some modeling contexts require triggering on non-clocked logic. If there’s no way around it, then you do what you have to do to get the job done.
Synchronous (clocked) logic should use non-blocking assignments ( <= ), and combinatorial logic should use blocking assignments ( = ).
It seems like my debugging problems began when the prior designer used a delay instead of proper blocking assignments.
Just … don’t do this. When you start doing things like this, you’ll never know if (whatever) expression had finished evaluating, or be able to keep track of when the #1 delay needs to be updated.
Device models aren’t test drivers. Avoid consuming time within them–such as with a wait statement of any type. Let the time be driven elsewhere by external events.
This applies to both delays and wait conditions within always blocks, as well as any tasks that might be called from within them. Non-blocking assignment delays work well for this purpose.
Ideally, device models should use finite state machines, as in Fig. 4, to model the passing of time if necessary, rather than consuming time with wait statements or ill defined assignment delays, as in Fig. 3.
When driving synchronous logic from a test script, synchronize any test driven signals using non-blocking assignments.
I have now found the following simulation construct several times over:
Sometimes the author uses the negative edge of the clock instead of the positive edge here to try to “schedule” things away from the clock edge. Indeed, I’ve been somewhat guilty of this myself . Sadly, this causes no end of confusion when trying to analyze a resulting trace file.
A better approach would be to synchronize this logic with non-blocking assignments.
This will avoid any delta-time cycle issues that would otherwise be very difficult to find and debug. Note that this also works because this block is the only block controlling ARVALID from within the test bench. Should you wish to control ARVALID from multiple test bench clocks, you may run into other concurrency problems.
While you can still do this sort of thing with Verilator, I’ll reserve my solution for how to do it for another post.
Pick a clock edge and use it. Don’t transition on both edges–unless the hardware protocol requires it.
As I alluded to above, I’ve seen a lot of AXI modeling that attempts to set the various AXI signals on the negative edge of the clock so that any and all logic inputs will be stable later when the positive edge comes around. This approach is all well and good until someone wants to do post–layout timing analysis, or some other part of your design also wants to use the negative edge, and then pain ensues.
Sadly, this means that the project may be turned in and then rest in a “working” state for years before the problem reveals itself.
In a similar fashion, what happens when you have two always blocks, both using a #1 delay as illustrated in Fig. 2 above? Or, alternatively, what happens when you want the tools to put real post place-and-route delays into your design for a timing simulation? You may find you’ve already lost your timing slack due to a poor simulation test bench or model. Need I say that it would be embarrassing to have to own up to a timing failure in simulation, due to your own simulation constructs?
There is a time for using multiple always blocks–particularly when modeling DDR devices.
In today’s high speed devices, I’ve often found the need for multiple always blocks, triggered off of different conditions, to capture the various triggers and describe the behavior I want. One, for example, might trigger off the positive edge, and another off the negative edge. This is all fine, well, and good for simulation (i.e. test bench ) logic. While this would never work in hardware, it can easily be used to accurately model behavior in simulation.
Use assignment delays to model physical hardware delays only .
For example, if some event will cause the ready line to go low for 50 microseconds, then you might write:
Notice how I’ve carefully chosen not to consume any time within this always block, yet I’ve still managed to create something that will capture the passage of time. In this case, I’ve used the Verilog <= together with a delay statement to schedule the transition of ready from zero back to one by #tWAIT ns.
I’ve now used this approach on high speed IO lines as well, with a lot of success. For example, if the data will be valid tDVH after the clock goes high and remain valid for tDV nanoseconds, then you might write:
I’ve even gone so far in some cases to model the ‘x values in this fashion as well. That way the output is properly ‘x while the voltage is swinging from one value to the next.
No magic numbers ! Capture hardware delays in named parameters, specparams, and registers, rather than using numeric assignment delays.
For example, were I modeling a flash memory, I might do something like the following to model an erase:
Notice the use of tERASE rather than some arbitrary erase time buried among the logic. Placing all such device dependent times in one location (at the top of the file) will then make it easier to upgrade this logic for a new and faster device at a later time.
We can also argue about when the actual erase should take place. As long as the user can’t interact with the device while it’s busy , this probably doesn’t make a difference. Alternatively, we could register the erase address and set a time for later when the erase should take place.
Even this isn’t perfect, however, since we now have a transition taking place on something other than a clock. Given that the interface clock isn’t continuous, this may still be the best option to create a reliable edge.
Remember our example from Fig. 5 above? Fig. 9 shows a better approach to handling three separate device tasks, each with two separate protocols that might be used to implement them.
For protocols that separate themselves nicely between the link layer control (LLC) protocol and a media access control (MAC) layer, this works nicely to rearrange the logic so that each layer only needs to be written once, rather than duplicated within structures implementing both MAC and LLC layers together.
Remember: fully verified, well tested, well written logic is pure re-usable gold in this business. Do the job right the first time, and you’ll reap dividends for years to come.
A client recently called me to ask if I could modify an IP I had written so that it would be responsive on an APB slave input with a different clock frequency from the one the rest of the device model used.
The update required inserting an APB cross clock domain bridge into the IP. This wasn’t hard, since I’d built (and formally verified) such a bridge two months prior–I just needed to connect the wires and do a bit of signal renaming for the case when the bridge wasn’t required.
That was the easy part.
But, how shall this new capability be tested? It would need an updated test script and more.
Thankfully, this was also easy.
Because I had built the top level simulation construct using parameters, which could easily be overridden by the test driver , the test suite was easy to update: I just had to set an asynchronous clock parameter, create a new parameter for the clock speed, adjust the clock speed itself, and away I went. Thankfully, I had already (over time) gotten rid of any inappropriate delays, so the update went smoothly.
Smoothly? Indeed, the whole update took less than a half an hour. (This doesn’t include the time it took to originally build and verify a generic APB cross-clock domain bridge.)
… and that’s what you would hope for from well written logic.
Well, okay, it’s not all roses–I still have to go back and update the user guide, update the repository, increment the IP version, update the change log, and then bill for the task. Those tasks will take longer than the actual update, but such is the business we are in.
Let’s face it, this article is a rant. I know it. Perhaps you’ll learn something from it. Perhaps I’ll learn something from any debate that will ensue. (Feel free to comment on Reddit …)
Yes, I charge by the hour. Yes, messes like these will keep me gainfully employed and my family well fed for years to come. However, I’d rather charge for doing the useful work of adding new capabilities to a design rather than fixing up someone else’s mess.
Find centralized, trusted content and collaborate around the technologies you use most.
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Get early access and see previews of new features.
I have this system verilog code, that does continuous assignment for some simple operations with delays and a simple testbench with clocks.
I am running this and getting this result:
I am getting the expected delay of 1ns for bb and then expected delay of 2ns for n1 , but for some reason n2 changes it's x state to 1 only after 3ns after the value of bb have become known, instead of 2ns .
What's interesting is if I get rid of bb and have 2ns delay, then everything works as expected
And if I try to keep the same duration for the whole module, then n2 never gets out of the x state.
All of this behavior seems rather weird to me and I cannot understand if this is actually to be expected or is it a tool bug. But either way I suspect it have something to do with the duration of the clock and that it is close to the duration of the simulation of the module, because if I set the clocks to be slower, then everything starts to work properly.
My guess is it has something to do with event scheduling, but I am quite unsure of that as I am not experienced in this area and because I don't want to have gaps in my understanding any help is really welcome.
This is known as the inertial delay model —the LHS cannot change faster than the delay on RHS.
Basically, a continuous assignment can only have one pending scheduled update to the LHS at a time. In certain cases, a new scheduled update cancels the pending update.
This is defined in section 10.3.3 Continuous assignment delays in the IEEE 1800-2017 SystemVerilog LRM
There other kinds of delay models to choose from using a variety of different constructs.
Reminder: Answers generated by artificial intelligence tools are not allowed on Stack Overflow. Learn more
Post as a guest.
Required, but never shown
By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .
COMMENTS
An intra-assignment delay is one where there is a delay on the RHS of the assignment operator. This indicates that the statement is evaluated and values of all signals on the RHS is captured first. Then it is assigned to the resultant signal only after the delay expires. module tb;
Syntax: #delay. It delays execution for a specific amount of time, 'delay'. There are two types of delay assignments in Verilog: Delayed assignment: #Δt variable = expression; // " expression" gets evaluated after the time delay Δt and assigned to the "variable" immediately. Intra-assignment delay: variable = #Δt expression ...
Verilog Delay Control. There are two types of timing controls in Verilog - delay and event expressions. The delay control is just a way of adding a delay between the time the simulator encounters the statement and when it actually executes it. The event expression allows the statement to be delayed until the occurrence of some simulation event ...
The intra-assignment delay statement is left over from very early Verilog before non-blocking assignments were added to the language. They no longer should be used. ... begin temp = B; #delay A = temp; end You ahould instead use A <= Delay B; which delays the assignment to A without blocking the flow of procedural statements. jaswanth_b August ...
Continuous assign statements are used to drive values on to wires. For example: assign a = b & c; This is referred to as a continuous assign because the wire on the left-hand side of the assignment operator is continuously driven with the value of the expression on the right hand side. The target of the assign statement must be a wire.
Intra-assignment delays in a Verilog always block should NEVER be used (NEVER!). There is no known hardware that behaves like this intra-assignment delay using blocking assignments. If I could take this out of the Verilog language I would, but I can't because of backward compatibility. In Verilog training, I always say, "if your mother and I ...
Adding delays to the left hand side (LHS) of any sequence of blocking assignments to model combinational logic is also flawed. The adder_t7a example shown in Figure 4 places the delay on the first blocking assignment and no delay on the second assignment. This will have the same flawed behavior as the adder_t1 example.
A delay is specified by a # followed by the delay amount. The exact duration of the delay depends upon timescale. For example, if with `timescale 2ns/100ps, a delay with statement. #50 ; will mean a delay of 100 ns. Delays can also be specified within an assignment statement as in. p = #10 ( a | b); // Example of intra-assignment delay.
In this verilog tutorial use of inter assignment delay and intra assignment delay has been covered in details with verilog code. Most of the time during VLSI...
SNUG Boston 2002 Verilog Nonblocking Assignments Rev 1.4 With Delays, Myths & Mysteries 44 11.6 The 20,000 flip-flop benchmark with #1 delays in the I/O flip-flops All of the preceding mixed RTL and gate-level simulation problems can be traced to signals becoming skewed while crossing module boundaries.
Intra-assignment delay control delays computed value assignment by a specified value. The RHS operand expression is evaluated at the current simulation time and assigned to LHS operand after a specified delay value. ... In Verilog, the keyword 'event' is used to declare 'named events' that trigger an event using -> symbol and it is ...
In Verilog, Inter assignment delays often correspond to the inertial delay or the VHDL's regular delay statements. It indicates that the statement itself is executed after the delay expires, and is the most commonly used form of delay control. Example. Here, q becomes 1 at time 10 units because the statement gets evaluated at 10 time units and ...
An intra- assignment delay in a non-blocking statement will not delay the start of any subsequent statement blocking or non-blocking. However normal delays are cumulative and will delay the output. Non-blocking schedules the value to be assigned to the variables but the assignment does not take place immediately.
The evaluation of the assignment is delayed by the delay when the delay is specified before the register name. ... Verilog language has the capability of designing a module in several coding ...
The a is assigned to x at simulation time 5, while b is assigned to y at simulation time 10. Now consider nonblocking assignment statements with intra-assignment delays that follow in a sequential block: initial begin. x<=#5 a; y<=#5 b; end. In the above case both a and b are concurrently assigned to x and y at simulation time 5.
It is not an intra-assignment delay. Time must advance by 5 for each loop of the always. The non-blocking assignment (NBA) is superfluous. Use a blocking assignment. always #5 clk = !clk; always clk = #5 !clk; // this is intra-assigment (blocking) delay. I have a fundamental verilog event region question that I want clarification on. The ...
3. In the first case the whole assignment is delayed by 5 'time units'. In the second case value of b is read but the assignment is delayed by 5 'time units'. The difference is if b changes in the 'next' 5 time units. - In the first case the new, (changed) value of b gets assigned. - In the second case the old value of b is assigned.
I am posting this question after going through a famous presentation on "understanding verilog blocking and non-blocking assignments by stuart sutherland" which literally discusses every possible variant. Input 0-10 LOW 10-20 HIGH 20-30 LOW 30-33- HIGH 33-36-LOW 36-45 HIGH 45-end of simulation LOW For the 2 cases below my analysis is output follows input with the mentioned delay. so the ...
Even better, subsets of Verilog exist which can do a good job of modeling synthesizable logic. This applies to both asynchronous and synchronous logic. The assignment delay problems that I've outlined above, however, arise from trying to use Verilog to model a mix of logic and software when the goal was to create a hardware device model.
How does event scheduling in Verilog work with respect to delays and non-blocking statements? verilog; delay; Share. Follow edited Feb 12, 2021 at 10:40. ... (Non Blocking Assignment) region and hence it will have the original values of the nonblocking assigned signals in the same timeslot. Share. Follow edited Feb 12, 2021 at 10:44.
I have this system verilog code, that does continuous assignment for some simple operations with delays and a simple testbench with clocks. ... This is defined in section 10.3.3 Continuous assignment delays in the IEEE 1800-2017 SystemVerilog LRM. There other kinds of delay models to choose from using a variety of different constructs.