GreenBusEIS

GreenBus

Green Bus Example Implementation Specification

Roberts comments on this page will always be blue

Architectural Aspects

The Example Implementation Specification (E.I.S.) is based onto the OSCI tlm Modelling Architecture. Figure 1 ilustrates that.

http://www.eis.cs.tu-bs.de/klingauf/high-level-arch.jpg Figure 1 Green Bus High Level Architecture

All the parts that are marked as "XML Generated" are planned to be extracted from an XML description. In the example implementation this will not be included.

A transaction is always initialized by a master calling one of its API methods (in the example implementation this will be the tac API). An API call is split into several transfer-phases (transfer atoms). Up to now we believe that the phases INIT, DATA-HANDSHAKE and FINALIZE may be sufficient. Therefore all API calls (like read or write) are translated into the calls:

  1. // Every API transfer method is translated into the following sequence of calls in the initiator port
  2.  
  3. //INIT Phase
  4. target_port->put(init_req);
  5. wait(target_port->ok_to_get(ID));
  6. init_resp=target_port->get(ID);
  7. if(init_resp.is_timeout) ...   //timeout check
  8.  
  9. //Data handshake Phase
  10. target_port->put(data_hs_req);
  11. wait(target_port->ok_to_get(ID));
  12. data_hs_resp=target_port->get(ID);
  13. if(data_hs_resp.is_timeout) ...   //timeout check
  14.  
  15. //Finalize Phase
  16. target_port->put(finalize_req);
  17. wait(target_port->ok_to_get(ID));
  18. finalize_resp=target_port->get(ID);
  19. if(finalize_resp.is_error) ...   //error check

Every master port contains the three request member objects init_req, data_hs_req and finalize_req, which are constructed and initialized during the ports constrcution. By that repeated request constrcution and destrcution is avoided. The responses in the master port are simply response references. The used requests are all derived from the gs_request_base class and the responses are all derived from gs_response_base. Those base classes contain all members that are common to all conceivable requests and responses (right now this is the Master ID [the ID is an tlm_tag], target address, rNw-flag and Req_type for requests and the resp_type for responses)

Router and Bus Protocol Details

The most significant modules of the GreenBus are the router module and the bus protocol module. Figure 2 shows a more detailed view of those two modules.

http://www.eis.cs.tu-bs.de/klingauf/router-and-bfm.JPG Figure 2 Router and Bus protocol details''

In the router there's a thread and an ok_to_get event for each connected master and port.

The specification of the router and bus protocol gets clear with the help of an example:

When a master calls its send API method, the API sepcific init_request object (derived from gs_request_base) is configured with the appropriate information (target address, access_mode...), and then put to the router. The routers put(request) implementation enqueues the given request into the routers request queue and triggers the router internal arbitrate_event. Due to this event the arbiter calls the get_request() method of the bus_protocol class. This call returns the next valid request from the request queue (this depends on the bus protocol) and updates the state of the bus. Then the arbiter checks the master ID of the returned request and forwards it to the corresponding request handler thread and starts this thread by triggering the corresponding event. Now the arbiter thread calls get_request again (there could be an request that can be handled concurrently to the other request). If this calls returns a NULL pointer (or something similar) the arbiter thread waits for the next arbitrate_event. The request handler thread that was triggered, checks the request type (init, data_hs, finalize), decodes the address and performs the according callbacks to the bus_protocol class, which include the bus grant and addressing delays, bus state updates and eventual timeout control etc. These calls lead to call put(init_req) to the addressed slave base. If the slave responds within the timeout period by calling put(init_resp), the request handler thread triggers the corresponding ok_to_get_event (the correct reference to this event was given to the master through its ok_to_get call and the given ID) and the arbitrate event, since a new address may be transferred now. Now the master proceeds to the data-handshake phase, which equals the one described above. By implementing the get_request method, the bus state struct and all other callbacks correctly, maybe every bus can be moddeled.

The example implementation should model the behaviour of the IBMs CoreConnect PLB.

Implementation Details (Proposal)

This is a very very draft spec, it is still based onto our request queue concept

@FZI: please change this to your queueing concept, thanks

@everyone: Feel free to add or change, have fun!

Request and Response Objects

Since the infrastructure makes use of request and response objects (see above) the first thing to do is implementing those objects.

At first the gs_request_base and the gs_response_base must be implemented. They are template classes (at least the request, where the address type is a template parameter), so that template specialisation can be used. The members of this bases where already mentioned above.

Before saying something to the plb-specific request and response objects, some words concerning the data to be transferred: To keep the implementation effort as small as possible, we should transfer arrays (or vectors) of 64 bit data. Since the tac API (and that's the one we want to use) has a template data parameter we should set this to an sc_unint<64> array or vector. The address template parameter should be set to unsigned long (32 bit).

After this the plb-specific requests and responses are to be implemented.

The plb-init_req must contain:

    • a BE value,
    • a Read/Write flag, (this one resides in the base)
    • a BusLock flag*,
    • a compressed flag**,
    • a guarded flag**,
    • a ordered flag**,
    • a priority value,
    • a burst flag,
    • a size value*,
    • a type value*,
    • a lockErr flag**,
    • a target address (this one resides in the base)

The members marked with one or two stars will certainly not be used in the first versions of the implementation. The members marked with two stars are only information for the slave and are not used by the bus.

The plb data_hs_req must contain:

    • A data field (vector or array) of 64 bit words
    • A lastXfer flag

A finalize_req is not required.

The plb init_resp must contain:

    • An integer encoding the acknowledgement type(this one may reside in the response_base), the actual meanings of the values depend on the bus
    • Proposal: 1 equals primary ack, 2 equals secondary ack, 3 equals timeout

The plb data_resp must contain:

    • A data field (vector or array)
    • A word location array or vector (only valid with line reads; for each element of the datafield there's an element in this vector, specifying the actual position of the associated word in the requested line; this vector reflects the PLB's rdWdAddr qualifier)

The plb finalize_resp is not required.

The slave-terminates-burst, the variable-burst-length and the master-and-slave-differ-in-word-sizes features of the plb should be ignored in the first implementation.

This is also true for the slave-signals-busy and slave-signals-error feature.

Having finsihed these objects the initiator port and slave base should be implemented:

Initiator Port and Slave Base

The initiator port translates the tac API into the bus atoms.

A tac write is translated as follows (please refer to the following code as pseudo code, explicit casts, template stuff and so on are ignored):

  1. //the port id is a unique value. there could be a static member of
  2. //the initiator_port class that is incremented whenever an instance
  3. //of this class is created and the value of this static member is
  4. //copied to the non static port-id.
  5. //IMPORTANT: the portid must be a class (or struct) derived from tlm_tag!
  6. //otherwise it can not be used as an argument for tlm calls.
  7.  
  8. init_req.masterID=PORTID;
  9. init_req.rNw=true; //part of base class
  10. init_req.busLock=false;
  11. init_req.compressed=false;
  12. init_req.guarded=false;
  13. init_req.ordered=false;
  14. init_req.priority=PORTPRIORITY  //portpriority is a value that is fixed during construction of the port
  15. init_req.burst=false;
  16. init_req.size=0x0;  //the size is an 4 bit value
  17. init_req.type=0x0; //the type value is also a 4 bit value
  18. init_req.lockErr=false;
  19. init_req.targetAddress=givenAddress; //store address from tac_method call in request object
  20. init_req.be=0xf;
  21.  
  22. target_port->put(init_req);
  23. wait(target_port->ok_to_get(PORTID);
  24. init_resp=target_port->get(PORTID);
  25. if (init_resp.ackType==3) {
  26.   cout<<"address time_out! in port "<< PORTID <<endl<<flush;
  27.   return false;  //here the error tac_status must be returned
  28. }
  29. if (init_resp.ackType!=1 && init_resp.ackType!=2){
  30.  cout<<"unknown response! in port "<<PORTID<<endl<<flush;
  31.  sc_stop();  //fatal error
  32. }
  33.  
  34. data_hs_req.data=givenData;  //this is only possible because the input data is already 64 bit value
  35. data_hs_req.lastXfer=true;  //it's a single write
  36. data_hs_req.masterID=PORTID;
  37. target_port->put(data_hs_req);
  38. wait(target_port->ok_to_get(PORTID));
  39. data_hs_resp=target_port->get(PORTID);
  40. //no if-else is required as plb data xfers cannot time out and slave-signals-error is ignored
  41.  
  42. return;

A tac write_block(...) looks the same, but the data_hs stuff is put into a for-loop which repeats the data_hs transfer until all data is transferred (lastXfer is set to true when the last data_hs transfer is set. Otherwise it's 0) Furthermore, in the init-phase the burst-flag is set to true, the size value is set to 0xB and the BE value doesn't matter any more.

The slave_base does the opposite of the initiator ports. It reconstructs the tac API calls out of the incoming puts and gets. (Therefore no between-phase delays can be added by the slave. This is only possible, when the slave is at a lower level of abstraction and does not use the tac API any more.) The slavebase must have members that can store all API related information (like data, address, be, rNw...)

  1. slave_base::put(gs_request_base req){
  2.   switch(req.type){
  3.     CASE "init": //store be, rNw, burst, address..., and set a conuter i=0
  4.     CASE "data_hs":
  5.       if(m_rNw) {
  6.         if (!m_burst) {
  7.             tac_read(...);  //call slave's read implementation and get data (will be called tac_data in the next lines)
  8.             m_data_hs_resp.data=tac_data;
  9.             m_canGet=true;
  10.             m_ok_to_get_event.notify();
  11.         }
  12.        else { //burst
  13.             if (i=0) tac_read_block(...);   //read whole data when first data is required
  14.             m_data_hs_resp.data=tac_data[i++];
  15.             m_canGet=true;
  16.             m_ok_to_get_event.notify();
  17.       }
  18.      else //write...
  19.   }
  20. }

    • ok_to_get() returns a reference to the m_ok_to_get_event
    • get() simply returns m_data_hs_resp and sets m_canGet to false
    • nb_can_get() returns m_canGet

Maybe we have to have two sets of API related information, because while one master is exchanging data with the slave, another master may invoke the addressing phase and this would overwrite the information that is still needed in the other masters data phase. So I think when the addressing is finished, one should copy the gathered info to a data-stage set of this info.

Router and Bus Protocol

Now here comes the complicated stuff. The bus model.

    • The router contains 1 thread for each connected master port. This can be achieved by dynamically creating the threads. To make the first implementation a little easier, we could fix the number of these threads and implement them manually (which means a lot of copy'n'past. Ugly but easy).
    • The router contains an arbiter thread.
    • The router contains two arrays of events. One ok_to_get_event and one thread_start_event for each thread.
    • The idea is that the PORTID of a port defines which thread handles the port and furthermore the call event_array[PORTID.toInt()] should return the event that is associated with this master port.
    • The router is a template class where the template parameter T defines the type of the address.
    • The router has an sc_port<bus_protocol_if<T> > let's name it bfm_port.
    • The router contains a request queue (rq).
    • The router contains a response vector (rspv; one vector entry for each thread).
    • The router contains an active_request vector (arv, which stores the active request of each thread).

Writing this, I get the feeling there seems to be some optimisation potential...

The router implements the tlm_blocking_put and the tlm_get_if. This means the methods put(), nb_get(), nb_can_get, ok_to_get() and get() must be implemented. Not all of them are used.

  1. //remember this is PSEUDO code! template parameter stuff is mainly ignored
  2. bool put(gs_request_base req){
  3.   if (rq.insert(req)) return true;
  4.  else return false;
  5. }
  6.  
  7. event& ok_to_get(tlm_tag<T> ID){
  8.   return ok_to_get_event_array[ID.toInt()]  //toInt must be implemented in the ID class or struct
  9. //(which is derived from tlm_tag, see above)
  10. }
  11.  
  12. gs_response_base get(tlm_tag<T> ID){
  13.   return rv.getElement(ID.toInt());
  14. }

The arbiter is a thread sensitive to a do_arbitrate event, as long as the bus_protocol can retrieve valid requests from the request queue, the arbiter thread "passes" them to the handler threads and notifies the according start_events:

  1. void arbiter(){
  2.   while(bfm_port->getRequest(req)){
  3.     arv.getElement(req.masterID)=req;
  4.     thread_start_event_array(req.masterID).notify();
  5.   }
  6. }

A request handler thread is sensitive to its thread start event; it's generic for all busses so it has to cover all phases, even those which are not part of the plb:

  1. void request_handler_1(){
  2.   switch(arv.getElement(1).requestType())   //requestType is a member of the gs_request_base
  3.    CASE "init":
  4.      bfm_port->requestBus(1);
  5.      rspv.getElement(1)=bfm_port->InitXferToTarget(1, arv.getElement(1).targetAddress, arv.getElement(1).rNw);
  6.      ok_to_get_event_array[1].notify();
  7.      do_arbitrate.notify(); //trigger arbiter, since a phase is finished
  8.      break;
  9.    CASE "data_hs":
  10.      if (arv.getElement(1).rNw){
  11.        rspv.getElement(1)=bfm_port->ReadFromTarget(1, arv.getElement(1).targetAddress);
  12.        ok_to_get_event_array[1].notify();
  13.        } else {
  14.        rspv.getElement(1)=bfm_port->WriteToTarget(1, arv.getElement(1).targetAddress);
  15.        ok_to_get_event_array[1].notify();
  16.      }
  17.      do_arbitrate.notify();
  18.      break;
  19.    CASE "finalize":
  20.      rspv.getElement(1)=bfm_port->FinalizeXferFromTarget(1);
  21.      bfm_port->releaseBus(1);
  22.      ok_to_get_event_array[1].notify();
  23.      do_arbitrate.notify();
  24.      break;
  25. }

The router implements the whole put-get-if-stuff once more but with a "do_" in front. these methods invoke the ransfers to the targets. Examples:

  1. void do_put(tlm_tag<T>  ID, int port_rank){
  2.    target_port[port_rank]->put(arv.getElement(ID.toInt()));  //the target_port rank is nessacary since the router is connected to many slaves
  3. }
  4.  
  5. event& do_ok_to_get(tlm_tag<T> ID, int port_rank){
  6.   return target_port[port_rank]->ok_to_get();
  7. }
  8.  
  9. bool do_get(tlm_tag<T> ID, int port_rank){
  10.   if (!target_port[port_rank]->nb_can_get()) return false;
  11.   else {
  12.     rspv.getElement(ID.toInt())=target_port[port_rank]->get();
  13.     return true;
  14.   }
  15. }

That's it for the router.

The router is the generic part.

The PLB stuff is implemented in the bus_protocol class. The names, functions and number of callbacks to the bfm_port are only proposals. After reviewing more contribs and analysing more busses, we may come up with a final generic set of callback functions.

The bus_protocol class contains no thread, it just implements the bus_protocol_if, a bus_state struct and maybe some helper methods. The bus_state_member may be called state;

Some Enums that could be used to model the PLB's state:

  1. enum addrStageState {IDLE=0, BUSY};
  2.  enum virtualReadAddrStageState {PRIM=0, SEC, STALL};
  3.  enum virtualWriteAddrStageState {PRIM=0, SEC, STALL};
  4.  typedef addrStageState wrDataStageState;
  5.  typedef addrStageState rdDataStageState;

A struct that may be able to reflect all of the PLB's states:

  1. struct plb_bus_state{
  2.  addrStageStage ass;
  3.  virtualReadAddrStageStage vrass;
  4.  virtualWriteAddrStageStage vwass;
  5.  wrDataStageState wdss;
  6.  rdDataStageState rdss;
  7.  tlm_tag active_prim_rd_master;
  8.  tlm_tag active_prim_wr_master;
  9.  tlm_tag active_sec_rd_master;
  10.  tlm_tag active_sec_wr_master;
  11.  
  12.  void reset() {
  13.   ass=IDLE;
  14.   vrass=PRIM;
  15.   vwass=PRIM;
  16.   wdss=IDLE;
  17.   rdss=IDLE;
  18.   active_prim_rd_master=255;  //a value that's indicating that no master is active
  19.   active_prim_rd_master=255;
  20.   active_sec_rd_master=255;
  21.   active_sec_rd_master=255;
  22.  
  23.  }
  24.  
  25. }

An example of a Callback function:

  1. gs_response_base InitXferToTarget(tlm_tag ID, T targetAddress, bool rNw){
  2.   if (rNw) //read
  3.      if (state.vrass==PRIM) {
  4.        int port_rank=decode(tragetAddress);
  5.        state.vrass==SEC;
  6.        state.ass==BUSY;
  7.        wait(ADDRESS_DELAY);  //PLB specific delay
  8.        router.do_put(ID, port_rank);
  9.        wait(router.do_ok_to_get(ID, port_rank), PLB_ADDRESS_TIME_OUT);
  10.        if (!router.do_get(ID, port_rank)) router.rspv.getElement(ID.toInt()).responseType=timeout;
  11.        state.ass==IDLE;
  12.      }
  13.      else //state.vrass==SEC
  14.      {...}
  15.   else //write
  16.  
  17. }

In this way all acesses to the callbacks update the busstate, check eventual timeouts and so on. The most critical function will be the get_request method. there the whole request queue must be checked, wether there are new init_requests (only if address stage is IDLE and the corresponding virtual stage is not stalled) if there are data_hs_requests that fit to the active master id and can be processed. Every call has to update the busstate. For example a call to WriteToTarget has to update the state of the virtualAddrStage and the data stage.

Since I'm quite familiar with the PLB I think it would be good Idea when I set up the callback methods (especially concerning the bus state things) when the generic stuff seems to be finsihed.

I hope now we something to discuss about.