Interesting problem: how to partially shroud open source

Thread Starter

someonesdad

Joined Jul 7, 2009
1,583
I am doing some consulting work writing some code for a company. The program is in python (using the wxPython and numpy libraries) and will thus be open source. In fact, the company wants to make sure that it's open source so their customers have access to it to be able to adapt it to their needs.

However, they want to ensure that their copyright, company logos, and other branding information cannot be removed without causing the program to quit running. This is a very reasonable request, as this type of theft has been known to occur in this industry, especially in the far east.

I just received this request an hour or two ago, so I haven't thought a lot about it yet. But I have a nodding familiarity with some cryptographic tools and a nice solution hasn't popped into my head yet. I'm guessing it will take some secure hashing (e.g., something like SHA1) method to hash some of the program's key resources, then perhaps some compiled code that refuses to run unless the proper hash is generated from the required resources.

I doubt it would be possible to easily secure things against a determined and resourceful hacker. However, the desire is to make someone work sufficiently hard trying to figure out what's happening to make it not worth the effort.

If any of you have some suggestions on how to attack this problem or relevant pointers to web resources, I'd be grateful for the help
 

Thread Starter

someonesdad

Joined Jul 7, 2009
1,583
I did solve a similar problem, several years ago, collecting, at startup, all the system variables and constants from the logo and copyright files.
Hi, Alberto:

Could you explain what you mean by "collecting"?

The fundamental problem here is that the source code is included in the package. We (both myself and the corporation paying for the programming) would like to keep the source open so customers can modify it if they want.

One tool could be some code obfuscation, but that's not sufficient to a deter a person determined to figure out how things work. I can also include a DLL written in a compiled language that participates in the system. It would have to do processing that is central to the program's operation and somehow be vital enough so that if it is bypassed, the program doesn't work.

It's an interesting problem. My intuition tells me there might be a solution, but it's not obvious.
 

Thread Starter

someonesdad

Joined Jul 7, 2009
1,583
If you created a runtime dissassembler that ran clients open source code, then your 'shell' would not be opensource.
Sorry, I don't understand your suggestion.

The problem's constraints are that the open source program is python code that uses wxPython (GUI stuff) and NumPy (array processing). I can't change that, as that's what the customer wants.

One possibility that might work would be to modify the python interpreter and include that in the package instead of the usual interpreter. This interpreter could have the needed resources compiled into it. Is this what you meant by "runtime disassembler"? Alas, this generates some other problems and could be a fair bit of work.
 

GetDeviceInfo

Joined Jun 7, 2009
2,195
yes, sorry I was holding another thought so it didn't come out, but yes, that is where I was going.

Another possibility is to verify the computors serial number to ensure your licensed version only runs on one machine
Yet another possibility is to process the input file where you 'wrap' it with a template of your choosing.
 
Last edited:

Thread Starter

someonesdad

Joined Jul 7, 2009
1,583
There has never been a copyprotection that can't be craked. It mostly annoys the honest customer.
This isn't copy protection, as people will be free to copy everything. Its intention is to simply make sure the program will always run, modified or not, with the proper resource files. It will be completely transparent to the user.

The purpose is to prevent dishonest manufacturers from stealing the code and passing it off as their own.
 

Thread Starter

someonesdad

Joined Jul 7, 2009
1,583
Another possibility is to verify the computors serial number to ensure your licensed version only runs on one machine
Yet another possibility is to process the input file where you 'wrap' it with a template of your choosing.
There is no need to limit the machines it runs on.

There are no "input" files except for data files that will be in CSV format. So I don't understand your second suggestion.
 

rjenkins

Joined Nov 6, 2005
1,013
It's impossible to completely protect it if the source is available.

As Alberto suggested, do things like checking the logo graphic file exists and possibly do a CRC or MD5 check of that file to ensure it's not been tampered with.

For the branding & copyright, use simple coding like base64 to disguise the text as numeric data and hide it within a block of numbers or a constant table or whatever.
If you have the option of making it multilingual you can use a number for each fixed text message and call a function to display that text in the selected language.
That allows for further complication and confusion.

You could code all the standard text, (or eg. texts below 64) & CRC them to make it difficult to figure out which is the copyright and protect 'your' texts.

Have customer-added texts only above a certain number, eg. 1024+ in a plain-text table and call a different function to output them.

You could also possibly add a 'call home' routine ('Check for updates') which also logs changes to file CRCs ('so you don't accidentally replace customers modded code') which would allow you to monitor who is changing the program.

You could have the company name & copyright at the very start of the documentation files and only CRC the first few characters of those, to specifically detect changes to the copyright, which shows someone is specifically being naughty rather than just editing the main content.
 

Thread Starter

someonesdad

Joined Jul 7, 2009
1,583
rjenkins, thanks for the suggestions. As I said in my OP, hashing can be used, but hashing isn't enough. I've also considered obfuscation; it's not sufficient, but it can be useful. The goal is to make it too much work to steal.

One additional technique available is to put in a compiled module (remember, this is python, so it's a bunch of textual scripts being executed by an interpreter) and put both resources and hashing techniques in that too. Unfortunately, I can't see any fundamental benefit from doing that, as someone just has to look at the calls being made and substitute them or change them.
 

GetDeviceInfo

Joined Jun 7, 2009
2,195
this is where I would think that protection code should be a component of the interpreter. As mentioned, when the interpreter loads the code file, it could first wrap it with a template.
 

Thread Starter

someonesdad

Joined Jul 7, 2009
1,583
this is where I would think that protection code should be a component of the interpreter. As mentioned, when the interpreter loads the code file, it could first wrap it with a template.
GetDeviceInfo, I don't understand what you mean by wrap it with a template.

Let me give a very simple, but realistic example of what I'm trying to do.

1. Here's the application's whole set of source code -- a single file named app.py:

-------------------------------------
import wx
bmp = wx.Image("pictures/zoom.bmp", wx.BITMAP_TYPE_BMP).ConvertToBitmap()
print "Hello world"
-------------------------------------

2. The user runs this application by typing python app.py at the command line (or some equivalent for their system).

The goal is to let users have access to this app.py file and let them change it if desired. HOWEVER, I need to figure out a way so that selected code like the bmp = wx.Image... line cannot be changed -- or, the program refuses to run if it is removed or doesn't contain the correct information.

I can, for example, change the code so that the bmp variable is instantiated by a call into a compiled DLL.

Changing the interpreter is one possibility, but that suffers from two weaknesses. First, it bypasses our customers being able to use the very standard and mature standard python distribution. This opens up to possible maintenance issues. Second, there's nothing to stop the user from running the app.py file with another copy of the normal python interpreter.

I'm beginning to thing there's no easy solution. It might be something I'll have to find a crypto/security expert for. Alas, the last time I looked into something like that was about 10 years ago and they guy wanted $400/hour. I doubt my customer would spring for that bill...:)
 

hardsoft

Joined Sep 7, 2009
13
At the end of the day, this task seems impossible because anyone with enough motivation will be able to see what's going on and reproduce it one way or another.

The best you may be able to do is write a custom graphic module and ship it in binary format. This could be a ton of work. Another thing, that could be used with hashing, is to created an indexed image file that will contain all your images, with the images broken up into multiple pieces and with many multiple copies of the same image. Again, this would be a binary file. A second binary file will call out the sections of images / decode / and combine them as necessary during run time. Be sure to use different versions of the copy throughout your code and to combine the parts from multiple copies.

This way, if the logo was stored as 6 separate images, in 3 different index point sets, someone reverse engineering your code would have to figure out the indexing for every different time the image is called. The more you break up the image, the more work he'll have, but so will you.

Then again, they could simply rip out all such code and generate their own image system if they wanted, so it seems like a lost cause. At least you can tell your customer they can't simply exchange a couple of image files, some work would have to be done.
 
Top