Compute nodes of a parallel computer organized for collective operations via a network, each compute node having a receive buffer and establishing a topology for the network; selecting a schedule for a broadcast operation; depositing, by a root node of the topology, broadcast data in a target node's receive buffer, including performing a DMA operation with a well-known memory location for the target node's receive buffer; depositing, by the root node in a memory region designated for storing broadcast data length, a length of the broadcast data, including performing a DMA operation with a well-known memory location of the broadcast data length memory region; and triggering, by the root node, the target node to perform a next DMA operation, including depositing, in a memory region designated for receiving injection instructions for the target node, an instruction to inject the broadcast data into the receive buffer of a subsequent target node.