Identify non cancelable background tasks in Eclipse
Two weeks ago I was in Paris for Devoxx France 2016 where I’ve presented what’s new in the upcoming Eclipse release (aka Neon — to be released in June). During the talk, I’ve been asked if Eclipse will eventually cancel a background task (a job in the Eclipse terminology) when it is asked for. Who never fulminate against a progress bar stating that cancel has been requested and that the task does not finish quickly?
I explained that Eclipse (the platform) can’t do much about the situation. The contract of the job API is clear: when a user asks for a cancelation, the job is informed about this request through its progress monitor. It is up to the job implementations to check for the cancelation requests. They should do it on a regular basis, but there is no way to force them to do so. If a job does not check for it and if it does not exit quickly after a cancelation request, a job will stay in the Cancel Requested state until it stops. You may wonder: why does the platform not have a kill switch for jobs? Why has the API contract never been changed to introduce a forced cancelation of jobs? Well, it is rather simple: jobs are based on Java threads and it is inherently unsafe to stop a thread from the outside.
So, are we doomed to live with non cancelable jobs? Is there a way to improve the user experience? The only way is to help plug-ins developers to better use the job API and make them check for cancelation requests more often. This is a collective work. As a user, you could give this feedback to the projects you use. Unfortunately, this is not an easy task. When you see that the project failed to check for cancelation, it is usually when you actually try to cancel a task and that it does not respond. This leads to two issues:
- you are already living a bad user experience (problem 1),
- you don’t have a lot of information to provide to the project about the actual task which is not cancelable (problem 2).
After the conference, I was decided to improve the situation. First, I searched for bugs related to the cancelability of jobs. I found one reported by Alex Blewitt. His idea is to let users know the last time a job has checked for a cancelation request. I’ve pushed the reasoning a bit further and implemented a job monitoring system. The idea is similar to the UI responsiveness monitoring that has been implemented last year: when a job does not check for cancelation request often enough (or even worst, when it does not do any check), an error or a warning is logged depending on thresholds:
The functionality is de-activable and the thresholds are configurable in a preference page. As the system is constantly monitoring all jobs, it lets users and developers identify tasks that don’t reply to cancelation requests fast enough, without actually living the bad user experience by canceling them. Problem 1 solved!
As the amount of code executed by a job can be huge, identifying where cancelation checks should be added can be cumbersome. To help with that, the system also log stack samples before and after the longest periods without cancelation check. It helps projects to identify where cancelation request checks should be added (problem 2 solved).
For instance on the screenshot above, you can see that the method ReferenceAnalyzer.analyze in PDE has been the last one to check for cancelation before a long gap. On the the screenshot below, the same method is the first one to check for cancelation after the long gap. It is clear that the PDE API Tooling code would benefit from checking the cancelation requests more often. I’ve filled bug 493198 for this issue.
The job cancelability monitoring system really becomes interesting when combined with the Automated Error Reporting (AERI). If the error reporting is activated, every time a job does not check cancelation requests often enough, a new report will be made and the project to be blamed will be informed about the issue. Hopefully, it will help to improve job implementations and get rid of the jobs staying in the Cancel Requested state for long periods.
This system is not merged yet and is still waiting for approval. It consists of two patchsets:
- a core one that adds the monitoring to the job infrastructure. It has been merged but then reverted as we currently are in the feature freeze time period for the next Eclipse release. Late addition of a new feature requires special approval, which it will hopefully get. If you’re interested in this feature, feel free to add a comment on bug 470175 to say so.
- a UI one that adds the preference page to the Eclipse workbench.
Originally published at mikael-barbero.tumblr.com and mikael-barbero.medium.com.