Why is a .class file not human readable?
When a java file is compiled, it generates a .class file. Now this .class file has the bytecode which the JVM interprets. when we open the .class file in a text editor, it is not human readable. Now to view the bytecode a disassembler like javap can be used.
My question is, why do we need to disassemble bytecode in order to view the bytecode itself?
What does the disassembler actually do, to convert the .class file into human readable format?
The Java virtual machine simulates a machine. This is why it is called a machine, despite it being a virtual one that does not exist in hardware. Thus, when thinking about the difference of the javap outout and the actual Java byte code, think about the difference between assembly and machine code:
Assembly code uses so-called mnemonics to make code human readable. Such mnemonic names are however nothing a machine can relate to because a machine only knows how to read and manipulate binary data. Thus, we have to assemble the mnemonic (and its potential arguments) using an assembler where each such mnemonic is translated into its binary equivalent. For example, for loading a value from a specific register we would write something like load 0xFF in assembly instead of using the actual binary opcode for this instruction which might be something like 1001 1011 1111 1111. Similarly, with Java byte code, the mnemonic being what javap produces, we need to represent binary data to the (virtual) machine which it is then is able to process. Only if we want to read the byte code, we rather disassemble it into the assembly code that javap represents.
Keep in mind: The only reason that assembly language and the javap output exists is the fact that humans such as you and me do not enjoy reading binary code. We are trained to distinguish what we see by shapes as for example letters and names. In contrast, a machine interprets data sequentially by reading a stream of bits. As mentioned, these bits are hard for us to read which is why we rather present them in hexadecimal format: Instead of 1111 1111, we rather write 0xFF. But this is still rather difficult to read as such a numeric value does not reveal its contextual meaning. 0xFF could still mean about everything. This is why we rather use the mentioned mnemonics where this meaning is implicit.
You might argue that a virtual machine is still only virtual and this machine could therefore indeed interpret mnemonics rather than binary Java byte code. However, such mnemonics would consume more space (strings are of course just represented as bytes by a machine) and it also take more time to interpret than the simulated machine language that is run on the JVM. You can therefore also think about the byte code being a weird encoding compared to standard encodings such as ASCII where the charset only contains words instead of letters where the words are only those that are used and understood by the Java virtual machine. Obviously, this Java byte code charset is more efficient than using ASCII for describing the contents of a class file.