The Dynamically-scheduled Reduced Instruction Set Computer (DRISC) architecture has been used to explore how Instruction-Level Parallelism (ILP) can be traded off against and how multithreading could be provisioned with simpler circuits to break the roof imposed by hardware complexity and power dissipation. Its simplicity and elegance remains attractive and promising as a basis of multi-core or many-core chips for high system throughput using multithreading and in-order issue.However, although an individual DRISC core can perform remarkably well with sufficient concurrency, as a part of the many-core system, it suffers from the sharing of on-chip interconnections and off-chip pins in the context of massive parallelism. The resultant bandwidth problem is shared by other many-core research designs and is detrimental to system scalability. This thesis extends this prior work by investigating system performance and by broadening the application of this processor design.Domain-specific computing may also exploit multiple cores to accelerate jobs. Real-time processing is a good example of this but perhaps with modest numbers of cores. The question explored here is whether we can enhance specific single or multi-thread real-time task performance while still meeting timing requirement and maintaining a high efficiency, or even under a general workload. The results from the time-multiplexed execution of both the normal task and periodic real-time benchmarks highlight the benefits of the proposed strategy with hardware prioritization, and prove the wisdom of reserving thread slots instead of cores.