Category Archives: code

Git Tip: Delete Remote Tags

Filter and delete local and remote tags. In the following example, I want to delete all tags starting by alpha-:

git tag | grep ^alpha- | xargs git tag -d
git tag | grep ^alpha- | xargs -I % git push origin :refs/tags/%

Keeping an eye on some interesting Bitcoin and Ethereum products

Bitcoin

  • OpenBazaar – decentralized eBay.
  • Blockai – proving ownership.
  • BitPesa – cheap remittance, RIP Western Union.
  • Blockstream – sidechains, allow new features on top of the Bitcoin blockchain without changing the protocol.
  • Trezor – hardware wallet, secure your bitcoins.

Ethereum

  • Akasha – decentralized social network.
  • Slock.it – IoT and decentralization, e.g. rent your unused WiFi bandwidth, smart locks, etc. Previously: TheDAO.
  • uPort – Identity and reputation services.
  • Swarm – serverless hosting.

I sincerely hope to add a Decred section for the next update of this post.

Note: this is no investor advice, the topics and the solutions they propose looks very interesting to me, but I’ve no idea if there is a market for it and how they manage their companies. I was super excited about bonafide.io (identity and reputation services) which ended operations ~a year ago and TheDAO which resulted in an epic Ethereum fail (hard fork).

Shell tip: Generate Numbered Images

Generate images 00.jpg, 01.jpg, etc. via imagemagick using 8 threads:

seq -f "%02g" 0 99 | xargs -P 8 -I % convert -background white -fill blue -gravity center -size 600x400 -pointsize 300 label:% %.jpg

Sample images:
99 42 05 00

Android Dev Tip: Hide that Annoying Java Icon

When you run ./gradlew in your Term, it sometimes spawns a few java utilities, like the linter. On OS X, these java utilities greet you with an annoying Java icon that appears and never disappears in your Dock, plus you lose focus on your Term.

shot

To hide this icon forever (for all java apps started from your shell), add this line to your ~/.bashrc or ~/.zshrc:

export JAVA_TOOL_OPTIONS="-Dapple.awt.UIElement=true"

List MAC addresses on your local network

Just in case you’re stuck in an hotel with a $40/day wifi access. The usual solution is to spoof a MAC address (ie. use one from someone already connected). But how to list them?

On OS X, the easiest is to use the arp command:

$ arp -an | grep -v incomplete

If you want to make sure the device is currently connected, use nmap (replace 10.0.*.* by your subnet):

$ nmap -sP 10.0.*.*

And then, you just have to spoof one of the MAC address:

$ sudo ifconfig en0 ether CA:FE:CA:FE:CA:FE

Copy/Paste to your Android device or emulator

I’ve been using the stock Android emulator these days. And the lack of copy/paste is annoying. There is a way to input text to any adb connect devices using adb shell input text, so I added a couple of functions to my .zshrc (works with bash too).

ai () { echo "$@" | sed "s/%/%%/g" | sed "s/ /\%\s/g" | xargs adb shell input text }
alias aip='pbpaste -Prefer txt | sed "s/%/%%/g" | sed "s/ /\%\s/g" | xargs adb shell input text'

Usage, to input text (including spaces) to your device:

$ ai test is cool

Usage, to paste the text in your clipboard to your device:

$ aip

Note: works on OS X, replace pbpaste -Prefer txt by the linux / windows equivalent.

Favorite Chrome Extensions

  • StayFocusd – to limit my time spend on Hacker News and Reddit to a maximum of 10 minutes / day.
  • PocketSend to Kindle – to save articles to read them later (I’ll drop Pocket soon, I noticed I’m reading much more stuff on my Kindle).
  • Disconnect – to stay out of sight from (some) analytics and other trackers.
  • Pushbullet – to write SMS from my computer and share stuff between my Android devices and my computer.
  • uBlock Origin – because some sites are cluttered with ads, good players whitelisted.
  • HTTPS everywhere – use SSL when available.
  • Stylish (to apply a Dark theme on my favorite sites: GitHub for example – reduce eye strain).
  • Tampermonkey – with a bunch of home-made scripts hacks.

 

Pseudo Random Number Generator and Bitcoin

I’ve stumbled upon two implementations of bitcoin libraries that generate new bitcoin addresses based on a secret. To generate that secret, the first one is using time as seed to srand(), the second one is using its own random generator which is not cryptographically secure (it’s open source, so quite easy to reverse engineer it and get the full set of generated secrets). I contacted the authors of these libraries in the hope they’ll fix that. An attacker could easily find some (or all) valid private keys these libraries generated.

Reminder: on your computer time is discrete. Using srand(time(0)) is very common, and if you’re using it for your new procedural generated game, it’s good enough, but when it comes to generate bitcoin addresses: be careful. Example:

srand(time(0));
int secret = rand();
char *address = bitcoin_address(secret);

In that case, time(0) is very discrete since it only has a one second resolution. Say that code has been published on January 1st, 2011. You can list all private keys generated by that code until today:

int january_1_2011 = 1293840000;
for (int i = january_1_2011; i < time(0); i++) {
  srand(i);
  int secret = rand();
  char *address = bitcoin_address(secret);
}

That’s only 146 927 544 keys (number of seconds from January 1st, 2011 to today), millions of bitcoin private keys can be easily tested and money transferred to the attacker.

So if you ever have to use a random function when you work on a Bitcoin project:

  • Make sure you understand PRNGs.
  • Never use default language implementation of random – they’re never cryptographically secure.
  • Never trust the OS.
  • Make sure to seed with high resolution data (and search for “Entropy Gathering”) if you ever have to seed yourself.

Replaced ack by ag

I’m using ack 100 times a day, it’s written in Perl and a bit slow when grepping a big directory. I recently switched to ag (aka the_silver_searcher). It’s much faster and ignores file patterns from your .gitignore. I see no reason to use ack anymore so:

$ brew install the_silver_searcher
$ brew remove ack
$ echo "alias ack=ag" >> ~/.zshrc # my muscle memory doesn't want to switch

Use Genetic Programming To Generate File Converters

Introduction

Here is an old article I wrote in January 2009. Since then, I received some emails and a lot of 404 errors, so I decided to repost it on this “new” blog. You can find the code here on github.

Genetic algorithm, another metaheuristic inspired by evolution of species. You can read the description of the algorithm on the genetic algorithm article on Wikipedia.

I applied the algorithm to a concrete problem: help lazy programmers and non-programmers to write file converters. I had to write file converters to unify all (more or less) formatted input files into one kind of CSV file. Each of these converters is made using the right combination of filtering / regexp matching / line splitting.

Description

So I wrote a program based on genetic algorithm where an individual is composed by 3 “genes”:

  • a list of filters (remove consecutive spaces, replace tabs by spaces, …)
  • a list of basics regexp to extract formatted basic blocks(int, float, date, line separator, …)
  • a list of cleaning functions (remove empty cell, merge cells, …)

The fitness of an individual is calculated using a sample file to parse that goes through the genes (filters / regexp / cleaning functions). The result of this process is a CSV file that can be compared to the expected sample result (that I wrote manually). The comparison uses the SequenceMatcher of the difflib Python stantard module, it returns the fitness: a float. A fitness close to 1.0 means the results is close to the expected CSV format. When the fitness is exactly 1.0, that means the converter works perfectly for the sample.

Here is a sample file I wanted to convert:

08/12        Hello world 06/12
Paris 121231231         - 22,29
08/12   Something something ... 04/12
1111 1111 1111 1111         - 14,35
08/12   something else
        - 12,96
26/11   Vir AAAAA
AAAAAA 2008     264,51

into this csv file:

"08/12","Hello world 06/12","Paris 121231231","-22,29"
"08/12","Something something ... 04/12","1111 1111 1111 1111","-14,35"
"08/12","something else","","-12,96"
"26/11","Vir AAAAA","AAAAAA 2008","264,51"

The program prints for each generation the 5 bests individual (I’m using elitism, so the best individual is always kept from a generation to the next one). A sample run:

$ python ga.py tests/sample1.txt tests/sample1.expected result1.pickled
Generation: 0 (mutation rate=10)
0.53772
0.42507
0.30406
0.28598
0.26389
Generation: 1 (mutation rate=10)
0.53772
0.42507
0.32684
0.30406
0.26799

...

Generation: 271 (mutation rate=15)
1.0
0.96025
0.95107
0.91779
0.87886
"08/12","Hello world 06/12","Paris 121231231","-22,29"
"08/12","Something something ... 04/12","1111 1111 1111 1111","-14,35"
"08/12","something else","","-12,96"
"26/11","Vir AAAAA","AAAAAA 2008","264,51"

filters: str_remove_consecutive_spaces, str_remove_somespaces, str_strip, str_remove_consecutive_spaces
regex: ([0-9]{2}/[0-9]{2})(.+?)(
)(.+?)(-?[0-9]+[.,]+[0-9]+)
cleaners: clean_strip, _in, _in, _in

Conclusion

The good:

  • It’s fun to watch your program evolve 😉
  • You are lazy and don’t want to write _many_ of this kind of file converters. This was used in a real life problem, with more than 200 different input file formats.
  • With a good interface and if the basic functions and genes size are well-defined, a non-programmer can create his own parser.

The bad:

  • The resulted parsers are not optimal

The ugly:

  • Of course it’s quite easy to write such genes by hand
  • It may never converge to 1.0 and generated parsers of fitness != 1.0 are useless (this may not be the case for other kind of GA applications)
  • You need many well-chosen primitives to cover a wide set of solutions

What next? Write a simple Web interface to help people generate their own converters. Add new construction blocks. Use the same idea to automatically plot any kind of formatted data.

Git tip: pre-commit hook – avoid commit FIXMEs

In my code, I sometimes add some FIXME tag (just the FIXME string in a comment or in a debug print). And before staging them in git, I review my diff to check for them. And rarely I miss one FIXME line or debug print, stage it and then commit it. To avoid that, I just wrote a quick pre-commit hook that greps the regexp "[Ff][Ii][Xx][Mm][Ee]" in staged diff using the command git diff --cached -G "[Ff][Ii][Xx][Mm][Ee]".

Here is the full script:

#!/bin/sh
exec 1>&2
FORBIDDEN_RE="[Ff][Ii][Xx][Mm][Ee]"
if [ $(git diff --cached -G "$FORBIDDEN_RE"|wc -l) != 0 ]; then
    git --no-pager diff --cached -G "$FORBIDDEN_RE"
    echo
    echo "ERROR: You're trying to commit a line containing the $FORBIDDEN_RE re"
    exit 1
fi

Put that in the file .git/hooks/pre-commit and next time you’re trying to commit a file containing “FIXME”, it will abort and print you the involved diff.

Note that you can ignore the pre-commit hook with the following command:

$ git commit --no-verify

System Wide gitignore

I now added my specific temporary folder to my system wide gitignore. Small impact for me, a huge impact for multi-project people I work with 😉

$ sudo git config --system core.excludesfile /etc/gitignore
$ sudo sh -c "(echo '*~'; echo ".DS_Store"; echo "tmp-max/") > /etc/gitignore"

Redis ZSET Underlying Datastructure

Sorted Sets: ZSET

Redis ZSETs are a very good to store ordered data like leaderboards:

  • You need to get the current score for an userid (every SQL or NOSQL
    can do that efficiently)
  • You need to get the current rank for an userid
  • You need to get a list of pairs userid/score sorted by score

Datastructure

In the Redis documentation, we can read complexity is O(lg(n)) to ZADD / ZRANK / ZREM and O(1) for ZSCORE. How are they implemented ? O(lg(n)) you said… hmmm smells like a tree or binary search. ZSCORE (a kind of search) in constant time… hmmm smells like a hashtable. So how is it implemented ? a mix of hashtable and tree ? No, but that’s close, it’s a mix of hashtable and Skip List. Skip list are an alternative to balanced binary tree.

You should read the original Skip List paper.

From the redis documentation:

ZSETs are ordered sets using two data structures to hold the same elements in order to get O(log(N)) INSERT and REMOVE operations into a sorted data structure.

The elements are added to an hash table mapping Redis objects to scores. At the same time the elements are added to a skip list mapping scores to Redis objects (so objects are sorted by scores in this “view”).

So, how to perform a ZRANGEBYSCORE in linear time ?

  1. search the min element. O(lg(n)) complexity (with n=|set|)
  2. start from min and follow links to the next element until you reach max. O(m) complexity (with m elements returned)

Also, there is a modification from the original skip list implementation, level 1 list is doubly linked, so it gives you: ZREVRANGE, yeah !

Algorithm Complexity

I started to code some random problems from Project Euler. To have an extra hint of which kind of algorithm I’m looking for, I made the following table.

Read it like: “Assuming a computer can do 10 000 000 operations/sec, with a O(n^2) algorithm, you can solve a n ~ 3000 under a second”

Complexity      |   n for 1 second  |   n for 1 minute
------------------------------------------------------
O(1)            |   inf             |   inf
O(lg(n))        |   huge            |   2^(10^7)
O(sqrt(n))      |   10^14           |   36*10^16
O(n)            |   10 000 000      |   600 000 000
O(n*lg(n))      |   526 172         |   24 446 750
O(n*sqrt(n))    |   46 415          |   711 378
O(n^2)          |   3162            |   24 494
O(2^n)          |   23              |   29
O(n!)           |   10              |   12

Project Euler problems should be solved under 1 minute (even in Python, most problems I solved has been < 2 secs), so if the input N is ~10 000 000, you should look for a O(n*lg(n)) solution or better.

Also if you’re developping Python code, you should have a look at these Python Time Complexity Tables. I think other language libraries use the same underlying datastructures and algorithms.

Game Design Lesson – Super Mario Bros (NES)

I read this very interesting snippet in the Wikia Super Mario Bros’ page

“Interestingly, in the very first portion of World 1-1, the developers designed it so that the a newcomer almost always gets a Mushroom. In the first level, there are blocks that the player goes under. A menacing Goomba approaches the player, and instinctively the player jumps over it. By the time the player reaches the Goomba and jumps, they will hit a ? block above that would reveal a mushroom. The mushroom goes to the right, hits a pipe and comes towards the player. Since the mushroom resembles the Goomba, the player thinks to jump over it again. Doing this, however, will almost always lead the player to jump right into the Mushroom since after they jump they hit another block from above which causes them to come back to the ground and hit the mushroom. This was to teach players that Mushrooms were a positive thing in the game.”

POST 10 http requests, maximum 5 in parallel, data POSTed is id=1, 2, …

$ seq 1 10 | xargs -t -P 5 -I % curl -s --data id=% http://te.st/ > /dev/null
curl -s --data id=1 http://te.st/
curl -s --data id=2 http://te.st/
curl -s --data id=3 http://te.st/
curl -s --data id=4 http://te.st/
curl -s --data id=5 http://te.st/
curl -s --data id=6 http://te.st/
curl -s --data id=7 http://te.st/
curl -s --data id=8 http://te.st/
curl -s --data id=9 http://te.st/
curl -s --data id=10 http://te.st/