Simple is Beautiful | Technology, Programming, Video Games
This blog is about technology, programming, video games, books and other related topics. It is published by Mark Papadakis.

iPhone CSRs, Digital Certificates, Encryption and Cryptography

I have been reading about cryptography, digital certificates and signatures, symmetric and asymmetric keys based encryption lately(well, yesterday). Developing iPhone applications (and even more so deploying them to devices or the AppStore) involves dealing with certificate signing requests(CSRs), digital certificates, provision profiles, and more.

Apple went into great lengths to simplify the processes involved - for the most part its trivial and they guide you through every step. (alas, it wasn't always so; back in the day, code signing and other related processes were the cause of much pain among iPhone Developers ).

Nevertheless, whenever I come across something I haven't been familiar enough with, I can't help not 'wasting' time to learn it(how it works, why it works, etc). This practice has deprived me of many hours I could have spent elsewhere (working on other stuff, spending time with loved ones, whatever) but I just can't help it; I "need" to learn how things work.

So here is simplified, basics mostly, overview of those concepts, in case someone needs to get started with them.

The fundamental problems encryption solve are those of data integrity validation and identity authentication. That is to say, to verify that data(messages, anything) sent from Alice to Bob (Alice and Bob) can be verified by Bob that indeed came from Alice and that the data were not tempered with/modified in any way while they were in 'transit'. That's about it.

Encryption is the process of generating new data from the original data. The new data is usually unintelligible to anyone but the intended recipient. Decryption is the process of transforming that data back to the original data.

Those processes are facilitated by a cryptographic algorithm(a cipher). Mostly, this involves the use of a key(a number) which is used with the algorithm to perform the encryption and decryption. The same key is used for both encryption and decryption. With symmetric-key encryption, the encryption key has to be kept secret by both parties. If someone else gains access to the key, he can not only decrypt the data but also encrypt new data and send them to Bob and Bob would assume they came from Alice. That clearly is not desirable. Enter Public-Key Encryption.

Public Key Encryption(asymmetric encryption) involves a pair of keys. The public and the private key. The public key is published and is freely available. The private key is kept secret. Alice never reveals the private key. Ever.

The fundamental idea is that data encrypted using the public key can only be decrypted using the private key. Bob, who has access to the public key much like everyone else, can encrypt his message using Alice's public key, send it over to Alice and only Alice can decrypt it, for it is only one who has the private key.

PKE is computational expensive though and not always suitable for large amounts of data. Often enough, a hybrid approach is employed. PKE is used to send a symmetric key, which then can be used(since both parties will know that secret key) to encrypt additional data using symmetric encryption. Using a symmetric key to encrypt and decrypt data is far less computational expensive. SSL and other protocols rely on this hybrid approach.

It is also possible to encrypt a piece of data using a private key which can only be decrypted using the public key. Given that Alice shared her public key with anyone interested, that wouldn't make much sense if Alice was to send data to Bob encrypted with her private key. Anyone could read it

Well, it does make sense, thanks to digital signatures. A digital signature can be used to verify that data sent from Alice - encrypted or not - were not modified in any way by the time Bob received them. In other words, it validates the authenticity of the data. It deals with tampering and impersonation.

Alice will use a hashing algorithm to generate a signature out of the message she wishes to send to Bob. That signature will then be encrypted with her secret private key. Then, she will send both the data she wishes to send to bob and the encrypted signature she generated from that data. It will also send Bob information about the hashing algorithm Alice used to generate the signature of her message.

Bob will use Alice's public key to decrypt the signature. Then, he will use the same hashing algorithm to generate a signature from the message he received. If the signature matches the signature Alice provided him with, it means the message is authentic. That is so because only Alice knows the private key and only Alice could have encrypted the signature like that.
Unless Alice lost her private key, it is 'impossible' for Alice to deny that she signed the data she sent to Bob, or for anyone to 'sign' anything, send it over to Bob and claim she is Alice.

A hashing function converts data into a single value (often a big integer). Hash functions are fundamental in the design and implementation of some of the most important data structures.

There is one last issue that needs to be addressed. Confirming identities. Digital Certificates solve this problem.

A certificate is an electronic document that is used to identify an entity(individual, company, anything) and associate that entity with a unique public key. Your passport identifies you and associates bits of information with you (your name, etc).
Public Key cryptography uses certificates to address impersonation problems.

Much like one would go about obtaining a driving license, by providing the authorities with whatever information and credentials required, so that they can verify the identity of the applicant and then issue her the driving license, Certificate Authorities(CAs) serve a similar purpose.
They will get Alice's application for a certificate (which includes her public key and information about her), they will validate the information she provided is correct and indeed represent her, and then issue her a Digital Certificate.
In essence, the Digital Certificate binds a public key to an entity. They help prevent the use of fake public keys. So, the digital certificate contains the public key of the entity, its name and other information (key/value pairs, e.g name=Steve, organization=Apple, Inc.) It also includes the digital signature of the CA. It is that digital signature that allows the certificate to function as a verified and trusted certificate, by users who know and trust CA (in other words, have the CA's public key and know that that public key belongs to that CA) and trust the CA but do not know the entity identified by the certificate.

Apple is a Certificate Authority. Before deploying your iPhone application to a device, you need to obtain a certificate and a provision profile. So you prepare a Certificate Signing Request(CSR). This contains information about you. It is the information you want Apple to verify. When you create the CSR, the public and private keys pair is also created. The public keys is included in the CSR. The private key is never sent to Apple. The private key is used for signing your binary.
Apple will get your CSR, create a digital certificate based on it. Then, you need to create a provisioning profile. The provisioning profile holds application IDs, device IDs and certificates.

You will need to submit CSRs and install Digital Certificates if you need to deploy and distribute iPhone applications, use Apple iPhone Notifications and In-App purchases. Hopefully this helps understanding why those are needed and what they are about.

Saturday, 27 March 2010 11:01 am


Update on CloudDS

Here is a progress update to my current main project (we call it 'CloudDS' which stands for cloud data store which is a silly name but it will have to do until we can find a replacement ).
I have been working on the data store component of the service. It has taken at least x4 as much time and effort as I thought it would. A prime reason for underestimating the time requirements is that the initial features list I wanted to implement doubled in size. In addition to that, testing for most of the possible logic paths that could result to failure also took a long time - even if some of that testing was automated, not all of it was and validating results is harder than setting up the test environment.

In such a service, it matters little if most of underlying components fail (I/O and tasks scheduler, garbage collector, cache subsystem, etc) as long as the data management component is not affected. Suffering from a service outage is bad, suffering data corruption and/or data loss is something that has to be prevented by any means necessary.

As it stands, that said component now deals fine with reads and writes, self-healing, caching and performs faster than I hoped it would. The data model is based on BigTable, Dynamo, Cassandra and some earlier prototypes/projects we toyed with in the past. It borrows Cassandra's ColumnFamily/SuperColumn/Column key value representation model. Data are pushed into MemTables and an append only commit log, memtables are flushed into SSTables to disk.

The GCollector merges SSTables whenever required to reclaim space, resolve conflicts and extract a single value out of multiple versions, etc. All operations supported by Cassandra are implemented (query by path, predicate, column names, key ranges, etc ) and CloudDS clients/users will also be able to use a scripting language to describe explicitly down to bytes what they need(i.e give me the first couple of bytes for those values, or gimme a concatenation of those values, etc etc).

Now that that component is out of the way, I can move on to the rest; those are relatively straight forward to implement ( the tasks scheduler and the network I/O subsystems are mostly done ).

Friday, 26 March 2010 9:19 pm


mySQL, noSQL, and Key Value datastores

Monolithic RDBMs are losing ground to key-value data stores, particularly persistent distributed in nature. mySQL mounting problems was perhaps the key reason (pun intended) people looked elsewhere. Google's brilliant engineers realized that a key/value data model can satisfy the needs of almost every class of application that needs a datastore backend.

Key/value datastores are simple to build, easy to understand, easy to optimize, easy to scale. The, now famous, CAP theorem states that it is not practically possible to guarantee consistency, availability and partitioning resilience/tolerance all at the same time; one of those traits has to be sacrificed. Again, most applications really do not require all three to function. The CAP theorem is most likely derived from the Project Triangle mode.

Most web-based applications are built on simple data models. Most web-applications eventually suffer from service capacity and availability issues(i.e scalability woes). It is trivial to scale out(vertically) application logic processing(application servers), HTTP requests processing(web servers, load balancers).
It is not easy to scale out an RDBMS. Some expensive systems(Oracle, etc) provide ways to address those issues (e.g Oracle RAC) but its expensive to deploy them, and most of them rely on a shared everything setup which just doesn't work in the long run. (Shared nothing is really the way to go).

Google released a bunch of papers ( actually, a bazillion of papers ), many of them defining and shaping the development of future related technologies. Namely, the papers describing GFS, BigTable, MapReduce (and of course, the paper the changed everything, "The Anatomy of a Large-Scale Hypertextual Web Search Engine" ) steered everyone to the right direction.

In the datastores domain, Hadoop/HBase, Radix, Cassandra and others, based on BigTable and Amazon's Dynamo papers, all relying on the simple key/value datastore model, are gaining market share - rightly so. Coupled with Memmache and similar services(in-memory key/value stores) they are solving the problems of service capacity and availability. This is a paradigm shift. Its a downhill for heavy-footprint, complex and inflexible datastore systems. They wont go away but will not be such a valuable(pun intended) component in tomorrow's technology landscape.

We are going to gradually migrate from RDBMs - though, we are not relying that much on them nowadays - to a key/value datastore (we are currently building one, also based on BigTable and Dynamo ). If nothing else, those simple systems are both simple and beautiful (for the most part).

Saturday, 13 March 2010 8:14 pm

« Older Posts  
Powered by Pathfinder Blogs