I'll disagree with the first answer. While you are developing your algorithm, Python is an excellent choice, mostly because the distance from an idea to working code is much shorter.
This is one reason that, as an example, the BitTorrent reference implementations were done in Python. You can recode in a compiled language later if you need increased speed or simplified installation for users of a production version. (Deployment of applications is one area where Python lags behind other languages.)
Don't get me wrong. I like C++ a lot, particularly for its dual nature allowing high-level operations with classes and templates and also low-level control in "C with benefits" mode. It's just that C++ takes so much more coding to get anything done. If you need to rethink parts of your algorithm, you end up throwing away more code.
Java has similar issues. The language and library are more applications-friendly than C++, but things like a limited generics facility (some very normal things can't be done because of type elision) and lack of operator overrides makes many standard classes harder to use. Getting a character from a string looks like name[pos] in most languages, but name.charAt(pos) in Java. Not a big problem for production code, which is written and debugged once, then run many times, but extra work during algorithm design.
All this assumes a programmer who is equally fluent (and comfortable) in all three languages. If you are significantly stronger in one of them, you'll probably get your best results there. If you have issues with one of them, you're probably better off avoiding that--at least during the "prototyping" phase.
----- Edit:
Oh, yes..."how do I start?" I'd suggest starting with reading about existing compression algorithms if you haven't already. Try implementing one or two of them. The classic algorithm ("Huffman coding") only works well on random but non-uniformly-distributed data. That's rarely even approximately true for text or for binary data files. However, it (or the related "arithmetic coding") can be used in conjunction with the LZ-based methods mentioned below.
Most modern compression algorithms are based on two algorithms by Abraham Lempel and Jacob Ziv in 1977 and 1978 (LZ77 and LZ78 for short.) Wikipedia has a good summary at:
https://en.wikipedia.org/wiki/LZ77_and_LZ78
Usually some practical features are added, so you get a variety of algorithms, described in the Wikipedia article on lossless compression:
https://en.wikipedia.org/wiki/Lossless_compression
Those should keep you busy for a while.