I want to make a compression algorithm?

Question:

Myeh

2016-06-23 10:35:40 UTC

Now that school is over I would like to make my own compression algorithm. I know C++, Java, and python. Which of these is the best, and how do I start?

Eight answers:

anonymous

2017-02-28 09:16:52 UTC

Go with C++. Compression algorithm needs performance much.

Software

2016-07-02 02:20:24 UTC

Go with C++. Compression algorithm needs performance much.

anonymous

2016-06-26 11:13:45 UTC

download and read 'source code' in

http://www.rarlab.com/rar_add.htm

from google 'winrar source code' result 1

that is source code for winrar, the best compression software

Naruto245

2016-06-24 01:57:43 UTC

Go with C++. Compression algorithm needs performance much.

2016-06-23 18:45:54 UTC

I certainly agree that Python is the better choice for developing your algorithms, since you can alter the code and retest immediately, without recompiling. Recoding the finished algorithm is probably better done in C rather than C++.

anonymous

2016-06-23 12:07:36 UTC

For any serious work always use C or a derivative.

husoski

2016-06-23 11:40:57 UTC

I'll disagree with the first answer. While you are developing your algorithm, Python is an excellent choice, mostly because the distance from an idea to working code is much shorter.

This is one reason that, as an example, the BitTorrent reference implementations were done in Python. You can recode in a compiled language later if you need increased speed or simplified installation for users of a production version. (Deployment of applications is one area where Python lags behind other languages.)

Don't get me wrong. I like C++ a lot, particularly for its dual nature allowing high-level operations with classes and templates and also low-level control in "C with benefits" mode. It's just that C++ takes so much more coding to get anything done. If you need to rethink parts of your algorithm, you end up throwing away more code.

Java has similar issues. The language and library are more applications-friendly than C++, but things like a limited generics facility (some very normal things can't be done because of type elision) and lack of operator overrides makes many standard classes harder to use. Getting a character from a string looks like name[pos] in most languages, but name.charAt(pos) in Java. Not a big problem for production code, which is written and debugged once, then run many times, but extra work during algorithm design.

All this assumes a programmer who is equally fluent (and comfortable) in all three languages. If you are significantly stronger in one of them, you'll probably get your best results there. If you have issues with one of them, you're probably better off avoiding that--at least during the "prototyping" phase.

----- Edit:

Oh, yes..."how do I start?" I'd suggest starting with reading about existing compression algorithms if you haven't already. Try implementing one or two of them. The classic algorithm ("Huffman coding") only works well on random but non-uniformly-distributed data. That's rarely even approximately true for text or for binary data files. However, it (or the related "arithmetic coding") can be used in conjunction with the LZ-based methods mentioned below.

Most modern compression algorithms are based on two algorithms by Abraham Lempel and Jacob Ziv in 1977 and 1978 (LZ77 and LZ78 for short.) Wikipedia has a good summary at:

https://en.wikipedia.org/wiki/LZ77_and_LZ78

Usually some practical features are added, so you get a variety of algorithms, described in the Wikipedia article on lossless compression:

https://en.wikipedia.org/wiki/Lossless_compression

Those should keep you busy for a while.

Daniel B

2016-06-23 10:48:49 UTC

Definitely C++.

ⓘ

This content was originally posted on Y! Answers, a Q&A website that shut down in 2021.

about - legalese