Tuesday, April 11, 2017

Java Finalizer and Java File Input/Output Streams

I often find myself noticing topics online more after I've worked directly with them or spent time learning about them. The recent Stephen Connolly (CloudBees) post FileInputStream / FileOutputStream Considered Harmful caught my attention because of my recent issues with Java's finalizer. In that post, the author talks about potential consequences of java.io.FileInputStream and java.io.FileOutputStream implementing overridden finalize() methods FileInputStream.finalize() and FileOutputStream.finalize(). With talk of deprecating the finalizer in JDK 9, my perspective is that a subject I had not thought about in years is all of a sudden all around me.

Connolly's post references the Hadoop JIRA HDFS-8562 ("HDFS Performance is impacted by FileInputStream Finalizer"). That JIRA was opened in June 2015 and its description includes interesting background on why the finalizer of FileInputStream causes issues for those using HDFS. This JIRA is also interesting because it looks at why it's not trivial to change FileInputStream and FileOutputStream to not use the protected finalize() methods.

JDK-8080225 ("FileInputStream cleanup should be improved.") is referenced in HDFS-8562 and was written in May 2015. It states, "FileInputStream relies on finalization to perform final closes if the FIS is not already closed. This results in extra work for GC that occurs in a burst. The cleanup of FileInputStreams should happen sooner and not contribute to overhead in GC." Alan Bateman has commented on this with a work-around, "The issue can be easily worked around by using Files.newInputStream instead." Roger Riggs writes of the difficulty of adequately addressing this issue, "Since it is unknown/unknowable how many FIS/FOS subclasses might rely on overriding close or finalize the compatibility issue is severe. Only a long term (multiple release) restriction to deprecate or invalidate overriding would have possibility of eventually eliminating the compatibility problem."

Connolly ends his post with reference to Jenkins changing this via JENKINS-42934 ("Avoid using new FileInputStream / new FileOutputStream"). An example of changing new FileInputStream to Files.newInputStream is available from there.

The fact that I've been able to use Java for so many years without worrying about the finalizer even while I used classes such as FileInputStream is evidence that, by themselves, limited use of these classes with finalize() implementations doesn't necessarily lead to garbage collection or other problems. I like how Colin P. McCabe articulates the issue in the HDFS JIRA on this: "While it's true that we use FileInputStream / FileOutputStream in many places, most of those places have short-lived objects or only use very small numbers of objects. Like I mentioned earlier, the big problem with finalizers we encounter is in the short-circuit read stream cache. If we can fix that, as this patch attempts to do, we will have eliminated most of the problem." In other words, not all uses of FileInputStream and FileOutputStream are causes for concern. Using tools to identify unusually high garbage collection related to finalizers is the best way to identify those that need to be addressed.

For many years of Java development, I did not use or case about the Java finalizer. In recent months, it has become an issue that I'm seeing more people dealing with. Deprecating the Java finalizer is a good first step toward removing it from core APIs.

1 comment:

@DustinMarx said...

Roger Riggs's message "RFR 8192939: Remove Finalize methods from FileInputStream and FileOutputStream" on the OpenJDK core-libs-dev mailing list references the code changes to remove the finalizers from FileInputStream and FileOutputStream. These removals are associated with JDK-8192939 and JDK-8212050 and are targeted for JDK 12.