Objects are abstractions of processing. Threads are abstractions of schedule.
Concurrent code is difficult to get right. Code that is simple to follow can become hard to understand when multiple threads and shared data get into the mix. If you are faced with writing concurrent code, you need to write clean code with rigor or else face infrequent failures.
If you take a clean approach, your chances of getting it right increase drastically.
It helps in decoupling our code, decoupling what gets done from when it gets done. In a single-threaded application, you can determine the state of the program exactly by looking at the stack trace. We use breakpoints for the job which is quite fair and simple.
Writing multithreaded improves the structural and performance of your program.
Imagine you have a new website that shows news from all the other websites combined. You have a web scrapper that scraps news from different websites one after the other.
All this has to be done every hour to keep your content up to date. As the number of news source website will increase, it will eventually take your scrapper more than an hour to scrape data from all the sources. Hence, building a multithreaded web scrapper will improve your performance here.
Keep your concurrency-related code separate from other code.
Your thread-aware code should be small and focused.
Make sure when you are testing your thread-aware code, you are only testing it and nothing else.
Take data encapsulation to heart; severely limit the access of any data that may be shared.
synchronizedkeyword to protect a critical section in the code that uses the shared object.
A good way to avoid shared data is to avoid sharing the data in the first place. In some situations it is possible to copy objects and treat them as read-only. In other cases it might be possible to copy objects, collect results from multiple threads in these copies and then merge the results in a single thread.
Consider writing your threaded code such that each thread exists in its own world, sharing no data with any other thread. Each thread processes one client request, with all of its required data coming from an unshared source and stored as local variables. This makes each of those threads behave as if it were the only thread in the world and there were no synchronization requirements.
Attempt to partition data into independent subsets than can be operated on by independent threads, possibly in different processors.
Learn your library and know the fundamental algorithms. Understand how some of the features offered by the library support solving problems similar to the fundamental algorithms.
Understand some basic definitions:
- Bound Process: Resources of fixed size or number used in a concurrent. Examples include database connections and fixed size read/write buffers.
- Mutual Exclusion: Only one thread can access shared data or a shared resource at a time.
- Starvation: One thread or group of threads is prohibited from proceeding for an excessively long time or forever. For example, always letting fast-running thread through first could starve out longer running threads if there is no end to the fast running threads.
- Deadlock: Two or more threads waiting each other to finish. Each thread has a resource that the other thread requires and neither can finish until it get the other resource.
- Livelock: Thread in lockstep, each trying to do work but finding another "in the way". Due to resonance, threads continue trying to make progress but are unable to for an excessively long time or forever.
Some execution models used in concurrent programming:
- Dining Philosophers
There are several things to consider when writing threaded code in Java:
- Use the provided thread-safe collections.
- Use the executor framework for executing unrelated tasks.
- Use nonblocking solutions when possible.
- Several library classes are not thread safe.
Dependencies between synchronized methods cause subtle bugs in concurrent code.
Avoid using more than one method on a shared object.
Learn how to find regions of code that must be locked and lock them. Do not lock regions of code that do not need to be locked. Avoid calling one locked section from another. This requires a deep understanding of whether something is or is not shared.
Keep the amount of shared objects and the scope of the sharing as narrow as possible.
Change designs of the objects with shared data to accommodate clients rather than forcing clients to manage shared state.
Writing a system that is meant to stay live and run forever is different from writing something that works for awhile and then shuts down gracefully.
Graceful shutdown can be hard to get correct. Common problems involve deadlock, with threads waiting for a signal to continue that never comes.
If you must write concurrent code that involves shutting down gracefully, expect to spend much of your time getting the shutdown to happen correctly.
Think about shut-down early and get it working early. It’s going to take longer than you expect, review existing algorithms because this is probably harder than you think.
Proving that code is correct is impractical. Testing does not guarantee correctness. However, good testing can minimize risk.
Write tests that have the potential to expose problems and then run them frequently, with different programatic configurations and system configurations and load. If tests ever fail, track down the failure. Don’t ignore a failure just because the tests pass on a subsequent run.
Thread typically only happen under load or at seemingly random times.
Here are a few more fine-grained recommendations:
- Treat spurious failures as candidate threading issues: Do not ignore system failures as one-offs.
- Get your nonthreaded code working first: Do not try to chase down nonthreading bugs and threading bugs at the same time. Make sure your code works outside of threads.
- Make your threaded code pluggable: Make your thread-based code especially pluggable so that you can run it in various configurations.
- Make your threaded code tunable: Allow the number of threads to be easily tuned. Consider allowing it to change while the system is running. Consider allowing self-tuning based on throughput and system utilization.
- Run with more threads than processors: The more frequently your tasks swap, the more likely you’ll encounter code that is missing a critical section or causes deadlock.
- Run on different platforms: Run your threaded code on all target platforms early, repeatedly and continuously.
- Instrument your code to try and force failures:
- You will greatly improve your chances of finding erroneous code if you take the time to instrument your code. You can either do so by hand or using some kind of automated technology like an Aspect-Oriented Framework, CGLIB. Invest in this early. You want to be running your thread-based code as long as possible before you put it into production.
- You can instrument your code and force it to run in different orderings by adding calls to methods like