Here’s the humble trick:
I’ve mapped a key combination to open vim in insert mode and another to yank everything and exit, then I just paste it wherever I want.
That’s the most suckless thing I was able to come up with, as it does not depend on anything, will never break, and you can use it everywhere. You also get all the perks of your local, full fledged Vim install such as autocomplete.
It still has some overhead though as it requires some more key presses than just typing the message, so I really only use it for emails or longer messages.
I first added a custom shortcut in my system settings which runs the following command when I press meta+v: (alacritty is the terminal I use)
alacritty -t "Scratchpad" -e vim -c "startinsert"
And in my vimrc
" yank everything
nmap <leader>ya gg0vG$"+y
" yank everything and force quit
nmap <leader>yq gg0vG$"+y:q!<CR>
I use
0vG$
and notVG
because the latter adds an unwanted linebreak at the end.
Actually, you really want to use this line instead so that you don’t lose the precious time you spent typing your email on a bad manipulation. You may change the path to whatever you like.
nmap <leader>yq :w! /tmp/last_scratchpad.txt<CR>gg0vG$"+y:q!<CR>
For some people exiting vim cleared their clipboard. Adding the following line
autocmd VimLeave * call system("xsel -ib", getreg('+'))
to your vimrc
as described here can fix it.
When trying to make a recommender system, one popular (and currently state of the art as of September 2019) approach is to approximate the rating that a user would give to an unrated item using the ratings of the other users. To do so, one can use a low rank approximation of the (mostly empty) rating matrix $R$ where $R_{ui}$ is the rating given by the user $u$ to the item $i$.
This method is used by big names in the recommendation game like Netflix, Deezer, Spotify…
The main modeling idea of matrix factorization is to view the users and items as low dimensional vectors such that the dot product between a user’s vector and an item’s vector gives an approximation of the rating the user would give to that item.
This way, the rating matrix is predicted as the product of two smaller matrices, hence the term matrix factorization.
Mathematically, this translates to :
where:
As $U$ and $I$ are low dimensional, too small to learn the ratings individually, constraining them on the known ratings will intuitively force the embeddings to encode general information, thus leading to hopefully coherent ratings on the unknown (user, item)
pairs.
We can derive an objective function from this equation and use it to tune $U$ and $I$ with a gradient descent approach.
Finding the best $U,I$ is equivalent to finding the $U$ and $I$ that minimize
where $\lambda(\lVert U\rVert^2 + \lVert I\rVert^2)$ is a regularization term that will prevent $U$ and $I$ from diverging.
Although not always ideal, an easy way of exploring data and interactively doing stuff on it is through a Jupyter notebook.
You can either use the very nice deepo docker image to setup a local notebook server with all the machine learning tools pre-installed or just use google colab.
I’ll use Pytorch to illustrate the maths with some code.
from torch import nn
class MinimalMatrixFactorization(nn.Module):
def __init__(self, n_users, n_items, n_features=20):
super().__init__()
self.user_features = nn.Embedding(n_users, n_features)
self.item_features = nn.Embedding(n_items, n_features)
self.user_features.weight.data.uniform_(-0.01, 0.01)
self.item_features.weight.data.uniform_(-0.01, 0.01)
def forward(self, user, item):
return (self.user_features(user) * self.item_features(item)).sum(1)
def train_model(model, dataloader, criterion, optimizer, n_epochs=15):
loss_hist = []
for _ in tqdm(range(n_epochs)):
curr_loss = 0
for user_ids, item_ids, ratings in dataloader:
optimizer.zero_grad()
# Predict and calculate loss
prediction = model(user_ids, item_ids)
loss = criterion(prediction, ratings)
curr_loss += loss.item()
# Backpropagate
loss.backward()
# Update the parameters
optimizer.step()
curr_loss /= len(dataloader)
loss_hist.append(curr_loss)
print(f"RMSE: {np.sqrt(curr_loss):.10} \t MSE: {curr_loss:.10}")
plt.plot(loss_hist)
And there you have the simplest form of matrix factorization.
The process of making recommendations is then simply a similarty measure between users and items. The favored metric for that is the cosine similarity defined as follow:
$$ \text{coss}(u, i) = \frac{\langle u, i \rangle}{\lVert u \rVert \times \lVert i \rVert} $$
it measures the cosine of the angle $\theta$ between the two vectors $u$ and $i$ because by definition $\langle u, i \rangle = \lVert u\rVert \times \lVert i\rVert \cos(\theta)$.
Now we can compute the similarity between a user and the items, ranking them accordingly.
coss = nn.CosineSimilarity(dim=-1, eps=1e-6)
u = model.user_features(torch.as_tensor(42)) # retreive the embeddings of user 42
items = model.item_features.weight
similarities = coss(u, items)
values, indices = similarities.topk(14)
for rank, (v, i) in enumerate(zip(values, indices), 1):
print(f"{rank}. id: {i.item()}, similarity: {v.item()})
Some user tend to like everything and some items tend to be liked by a majority of users. Introducing user and item biases will help our model learn things that matter.
Let the user/item bias $B_{ui} = \mu + \frac{b_u + \tilde{b}_i}{2}$ where $\mu$ is a constant, $b_u$ a scalar that depends on the user and $\tilde{b}_i$ one that depends on the item.
Intuitively, $\mu$ could be the global average rating and $b_u$ and $\tilde{b}_i$ the specific deviation of the item and the user, i.e. the difference between the global average rating and their own average rating.
Now we can just add bias matrix to our model equation:
Those intuitive bias proved to be the best in practice, outperforming learned bias by a wide margin.
We can initialize the bias before the training process like so:
class BiasMatrixFactorization(MinimalMatrixFactorization):
def __init__(self, n_users, n_items, n_features=20):
super().__init__(n_users, n_items, n_features)
def init_bias(self, train_df):
self.user_bias = torch.zeros(len(self.user_features.weight))
self.item_bias = torch.zeros(len(self.item_features.weight))
# The average known rating
self.mu = train_df['rating'].mean()
# The user deviation from the average known rating
ubias = train_df.groupby(train_df['user_id'])['rating'].mean() - self.mu
self.user_bias[ubias.index] = torch.as_tensor(ubias.values, dtype=torch.float32) / 2
# The item deviation from the average known rating
ibias = train_df.groupby(train_df['item_id'])['rating'].mean() - self.mu
self.item_bias[ibias.index] = torch.as_tensor(ibias.values, =torch.float32) / 2
def forward(self, user, item):
return (self.user_features(user) * self.item_features(item)).sum(1) + \
self.user_bias[user].squeeze() + self.item_bias[item].squeeze() + self.mu
“Good” models, i.e. with good performance according to the usual metrics (e.g. RMSE), may create unintentional overlap of disparate groups of users and items. This behaviour is called “folding” and can lead to spurious recommendations.
Furthermore, a single bad suggestion can outweigh many good ones (e.g. horror movies in Disney). Models are especially subject to folding during cold start phase.
To hinder this tendency, one can split the rating matrix in multiple sub-matrices, sperating user according to some metadata (e.g. the country they are from, their age…) but this results in a net loss of usefull information. Other leads are using specialized training approaches such as Bayesian Personalized Ranking or weighting the unknown.
The latter proved successful on the Netflix Prize. $W$ is set to the matrix such that $W_{ui} = 1$ if user $u$ rated item $i$ and $\alpha$ otherwise, where $\alpha$ is the roughly the order of sparsity of $R$.
Doing so shares evenly the total weight between observed and unobserved ratings, which will make the model less likely to mix unrelated groups.
Knowing where the countries, the age groups or the genders are relatively from each other could be usefull to make group-specific recommendations, or to make more accurate recommendations during the cold start phase, when the user haven’t rated anything yet.
To embed these groups in the same space as the users and the items, we may:
The latter is a recent invention (to our knowledge) of a research group I was in. We used an approach inspired from Natural Language Processing and added a shared “group vector” to the user or the item.
where $G$’s columns are a reference to the unique group vector corresponding to the user’s one. This approach warrants further research.
In addition to being among the simplest and -in my humble opinion- the most elegant methods, matrix factorization remains one of the most efficient collaborative filtering algorithm; but not the only one.
In its simplest form, NMF replaces the dot product between the users and items by a function learned with a multilayer perceptron.
We can either let the model learn everything at the same time or first fit the embeddings using calssical matrix factorization and then train the neural network after that.
Auto-encoders recently gathered quite a lot of attention. My findings are that basic auto-encoders are less intuitive and less practical to train than MF algorithms, and while the encoded embeddings did make vectorial sense in my experimentations, the results were worse than MF with a strong tendency to overfit.
They take the user represented as the vector of all of its ratings as input which does not scale well. The item are not naturaly embeddable. A trick can be to input a user with the maximum rating on a specific item and no other rating. This is works better than averaging the embeddings of the users that rated it (weighted by the rating) as it would pack all the items in the same region of the embedding space.
Auto-encoders trained for recommendation must learn to generalize and not limit themselves to reconstructing the (sparse) input vector. To do so, one can randomly mask some entries of the input vector and still penalize the model for getting these masked ratings wrong. This proved to be a key factor in the training of auto-encoders.
Another way to improve the training of auto-encoders for recommendation is to use dense re-feeding, a process in which the prediction (a dense vector) is re-fed to the neural network afer a first gradient update. Indeed, in an idealized scenario $\hat{y} = f(x)$ should be a fixed point of the auto-encoder.
Using contractive or denoising auto-encoders could lead to a better latent space quality.
Even more recently, variationnal auto-encoders were on the more promising side. Indeed, approximating distributions rather than data points makes them more robust than traditional auto-encoders. They also offer the possiblility to disentangle the latent space which could be interesting for recommendation purposes.
]]>The Fish shell is a shell with very sane defaults that can do a lot out of the box and makes you feel at home right after installing it.
Beware though, some not-so-commonly-used features of bash do not exist in the fish shell and it isn’t compatible with POSIX sh
. Where it really shines is as an interactive shell.
The tutorial gives a quick and great overview of its capabilities.
When in doubt
man cat
help cd
jq --help
, jq -h
apropos "list dir"
, man -k "list dir"
search the manualcd -
go to the last visited directorydirs
print the dirs stackpushd .
add current directory to the dirs stackpopd
pop and go to the last one> my-file
overwrites the file’s content>> my-file
appends to the file’s existing content< my-file
inputs the file’s content, same as cat my-file |
but without starting an additional processmkdir my-new-directory
make a new directorydiff <(ls) <(ls -a)
, <()
treats the output of the command as a file (doesn’t work in fish)All the following commands support -v
for verbose and -i
for interactive
mv my-file new-file
moves a file
cp my-file new-file
copies a file
rm my-file
, rm -r my-directory
remove a file or directory
file.conf{,.old}
will expand to file.conf file.conf.old
cp file.conf{,.old}
create a backup filemv file.conf{.old,}
revert to the backupconvert file{.jpg,.png}
Pipes !
seq 100 | grep 3 | wc -l
counts in how many numbers “3” appears between 1 and 100Tee !
seq 100 | tee 100.txt
splits the output between stdout and the argumentInteracting with the clipboard
cat file.txt | xclip -selection clipboard
copy textxclip -selection clipboard -o > file.txt
paste textMake them in a compiled language such as Rust, Go, C…
Use the first line of the file to describe how to run it:
#!/usr/bin/env python3
def factorial(n):
res = 1
for i in range(2, n + 1):
res *= i
return res
if __name__ == "__main__":
import sys
n = int(sys.argv[1])
print(factorial(n))
./factorial.py 5
120
factorial() { echo 1; seq $1 | paste -s -d\* | bc; }
factorial 5
120
Can be put in config file to be reusable.
alias la='ls -1a --group-directories-first'
Can be put in config file to be reusable.
wget -mkEpnp http://example.org
make an offline mirror of a site, short version of--mirror --convert-links --adjust-extension --page-requisites --no-parent
while read in; do echo "$in"; done < file.txt
, repeat a command for every line of a file (doesn’t work in fish, but can be used with sh -c '...'
)tar -zcC /my/source/path my-folder > ~/my-backup-$(date +%s).tar.gz
make a quick backupMost shells (fish, bash, zsh…) support two modes: the vi
mode and the emacs
mode with the mappings of each editor. Shells are in emacs
mode by default.
Normal and insert mode, just like vim with nearly all of its bindings. The Vi mode is especially nice in the Fish shell.
fish_vi_key_bindings
fishset -o vi
bash & co<C-/>
undo (doesn’t work in Fish)<C-a>
go to the beginning of the line<C-e>
go to the end of the line<C-k>
delete from cursor to the end of the command line<C-u>
delete from cursor to the beginning of the command line<C-r>
triggers history search<C-w>
delete from cursor to beginning of the word<C-y>
paste word or text that was cut using one of the deletion shortcuts (such as the one above) after the cursor<C-x-x>
move between start of command line and current cursor position (and back again)<Alt>b
move backward one word (or go to start of word the cursor is currently on)<Alt>f
move forward one word (or go to end of word the cursor is currently on)<Alt>d
delete to end of word starting at cursor (whole word if cursor is at the beginning of word)<Alt>.
or <Esc>.
adds the last command’s last word<C-r>
search the history backwards<C-g>
escape from history searching mode<C-p>
previous command in history (i.e. walk back through the command history)<C-n>
next command in history (i.e. walk forward through the command history)<C-l>
clear the screen<C-s>
stops the output to the screen (for long running verbose command)<C-q>
allow output to the screen (if previously stopped using command above)<C-c>
terminate the command<C-z>
suspend/stop the command use bg
to run it in background, jobs
to list background jobs and fg
to bring them back&
after a command to run it in backgroundnohup
before a command to detach it’s life cyle from the current terminal&>/dev/null
redirects the output of the process to /dev/null
, use &>/dev/null/ &
after a command to ignore its output and run it in backgroundnohup command &>/dev/null &
hence creates a totally independent process, command </dev/null &>/dev/null &
achieves the same goal by instantly sending EOF to the program (using this you won’t see the Done
after the backgrounded process ends)!!
run last command, sudo !!
to add a forgotten sudo.!blah
run the most recent command that starts with “blah” (e.g. !ls)!blah:p
print out the command that !blah would run (also adds it as the latest command in the command history)!$
the last word of the previous command (same as Alt + .)!$:p
print out the word that !$ would substitute!*
the previous command except for the first word (e.g. find a b
gives a b
)!*:p
print out what !* would substitute^foo^bar
same as !!:s/foo/bar/
, replace foo
with bar
in the last command and run it<C-o>
, this will execute the picked command and write the following one in the promptThis project was built with Mathis Chenuet, based on previous work from Félix Alié, Damien Delmas and Alexis Schad; under the supervision of the late and unforgettable Prof. Ghitalla.
The natural way to represent interaction between entities is as a network graph: Alice called Bob 42 times, she went to Charlie’s place twice a month… All these events can be represented as edges of a graph in which the vertices are the actors. However, we’ll be missing a crucial component: time.
Sure, it’s possible to treat the graph as a video, making edges appearing gradually, but the big picture won’t be visible.
To fix our thoughts, we’ll set the data to be the collection of messages sent from a phone.
While it’s rather easy to visualize a message exchange between two people, it’s harder to visualize the interactions between someone and all his or her contacts in a single graph. A solution one could come up with would be to list all their contacts (sorted by ascending total message count) on the $y$ axis, put the time the $x$ one and plot dots or a heat map to represents messages.
Although this approach works and will be good enough 90% of the time, it’s not ideal as it’s definitely not compact enough when the number of contacts is rather high.
One could argue that a threshold on message count would solve this issue but that’s a straight information loss. We wondered how we could do things differently.
The main idea is to represent time as a series of nodes in a graph, a “skeleton” of timestamps. Entities are then linked to some of the skeleton’s vertebras by an edge if something related to it happened during the time range materialized by the vertebra, creating a bipartite graph. The edges can also be weighted by “how much of it” happened during that range.
Following these rules we can create a human-unreadable graph:
Once the structure of the graph is built, the nodes must be positioned correctly to make it interpretable. To achieve this we used a force-directed graph drawing algorithm: ForceAtlas2.
In order to give a proper shape to the “skeleton” timeline, we essentially increased the attraction between two vertebra nodes and voilà: the graph instantly becomes more coherent with an easy to follow timeline. Entities are placed according to when they interacted with the skeleton which makes global behaviors easy to grasp.
With a few cosmetic adjustments the result is quite nice one the eyes !
As for the technology, we used sigma-js to handle most the graph-related stuff.
This representations is unique as it allows for an overview of relations between the subject and the entites he interacted with. The concept can easily be extended to multiple subjects with interwining skeletons.
This visualization technique applies to sets of (label, sequence-number)
pairs, in most use-cases the sequence-number
is a timestamp but it doesn’t inherently have to be time-related.
We built a demo tool that takes a CSV as input for you to try:
is_in_stack
where is_in_stack[i]
tells whether $i \in Stack$.float('inf')
ormath.inf
.sorted ➜ Bisection search
iterative DFS ➜ stack
BFS ➜ queue
Optimal substructure ➜ Dynamic programming to compute all the optimal solution of size $0 \ldots n$ and deduce the solution.
Examples: subarray, knapsack, longest substring
Solve an example with your brain and hands and see what “algorithm” you used.
Bottlenecks, Unnecessary work, Duplicated work Walk through your best algorithm and look for these flaws.
Can I save time by using more space ?
Stack
l = []
, l.append()
, l.pop()
, l[i]
, len(l)
del l[i]
, l[inf:sup:step]
l.sort()
, sorted(l)
Queue
dq = deque()
, dq.popleft()
, dq.appendleft(x)
+ normal list operations except slicesDictionary / Hashtable
dic = {}
, key in dic
, dic[key]
, del dic[key]
on average, worst case $O(n)$Set
s = set([])
, x in s
, s.add(x)
, del s[x]
s1|s2
, s1&s2
, s1-s2
, s1^s2
Heap / Priority Queue
h = []
heappush(h, x), heappop(h)
heap = heapfy([1, 5, 2, 3...])
left = 2*i; right = left+1; father = i//2
if h[0]
in a 1-based arrayleft = 2*i+1; right = 2*i+2; father = depends on parity
otherwiseUnion find
def f():
# Base case
...
# Recurisve calls
...
Bisection search ends when left
and right
cross.
def bisect(xs, key, f):
left, right = 0, len(xs)
while left < right:
mid = left + (right - left) // 2 # avoid integer overflow (unnecessary in python)
if f(xs[mid], key): left = mid + 1 # Move right
else: right = mid # Move left
return left
When will they cross ?
f(x, key) = x < key
➜ Move right
when x == key
f(x, key) = x <= key
➜ Move left
when x == key
bisect_left = lambda xs,key: bisect(xs, key, lambda x,y: x<y) # First occ
bisect_right = lambda xs,key: bisect(xs, key, lambda x,y: x<=y) - 1 # Last
def merge(t, i, j, k, aux):
# merge t[i:k] in aux[i:k] assuming t[i:j] and t[j:k] are sorted
a, b = i, j # iterators
for s in range(i, k):
if a == j or (b < k and t[b] < t[a]):
aux[s] = t[b]
b += 1
else:
aux[s] = t[a]
a += 1
def mergeSort(t):
aux = [None] * len(t)
def mergeRec(i, k):
# merge sort t from i to k
if k > i + 1:
j = i + (k - i) // 2
mergeRec(i, j)
mergeRec(j, k)
merge(t, i, j, k, aux)
t[i:k] = aux[i:k]
mergeRec(0, len(t))
def countingSort(array, maxval):
"""in-place counting sort
O(n + maxval)"""
m = maxval + 1
# count the occurence of every possible value
count = [0] * m # /!\
for a in array:
count[a] += 1
i = 0
for a in range(m):
for _ in range(count[a]):
array[i] = a
i += 1
return array
res[x][y]
may throw index errors on corner cases.#-- 1 Naive recursive solution
def f():
# Base case
...
# Recurisve calls
...
@lru_cache(maxsize=None)
can automate the memoization process by using a dictionary to store the result of function calls (less efficient than an array).
#-- 2 Adding the memoization
# a slot for every possible parameters
res = [[-1] * len(X) for _ in range(len(Y))]
# @lru_cache(maxsize=None) built-in auto-memoization
def f(x, y):
# Base case
...
# Already seen case
if res[x][y] != -1:
return res[x][y]
# Recurisve calls
... # Just update the res[x][y] slot
return res[x][y]
#-- 3 Bottom up
def f(X, Y):
res = [[-1] * len(X) for _ in range(len(Y)) # pre-allocate memory, unnecessary in python
for x in range(1, len(X)): # skipping dummy value
for y in range(1, len(Y)):
... # update the res[x][y] slot
return res[-1][-1]
chr(65) # -> 'A'
ord('A') # -> 65
chr(97) # -> 'a'
ord('a') # -> 97
def basebTo10(x, b=2): # Horner scheme
u = 0
for a in x:
u = b * u + a
return u
def base10Tob(q, b=2):
s = ''
while q > 0:
q, r = divmod(q, b)
s = str(r) + s
return s
# when our base is a power of 2
def base10ToPowOf2(q, pow=1):
s = ''
while q > 0 or not s:
q, s = q >> pow, str(q & pow) + s
return s
#-- When there is no 0 in our base
# for instance counting excel columns
def base10TobNoZero(q, b=2):
s = ''
while q > 0 or not s:
q, r = divmod(q - 1, b) # -1 to remove 0
s = chr(65 + r) + s
return s
x << y
Returns x with the bits shifted to the left by y places (and new bits on the right-hand-side are zeros). This is the same as multiplying x by 2**y.
x >> y
Returns x with the bits shifted to the right by y places. This is the same as //‘ing x by 2**y.
x & y
Does a “bitwise and”. Each bit of the output is 1 if the corresponding bit of x AND of y is 1, otherwise it’s 0. Commutative and Associative.
x | y
Does a “bitwise or”. Each bit of the output is 0 if the corresponding bit of x AND of y is 0, otherwise it’s 1. Commutative and Associative.
~ x
Returns the complement of x - the number you get by switching each 1 for a 0 and each 0 for a 1. This is the same as -x - 1.
x ^ y
Does a “bitwise exclusive or”. Each bit of the output is the same as the corresponding bit in x if that bit in y is 0, and it’s the complement of the bit in x if that bit in y is 1. Commutative and Associative.
Efficient when there is less than 64 possible values
Set | Binary |
---|---|
$\emptyset$ | 0 |
${i}$ | 1 << i |
${0, 1, \ldots, n-1}$ | (1 << n) - 1 |
$A \cup B$ | `A |
$A \cap B$ | A & B |
$(A \setminus B) \cup (B \setminus A)$ | A ^ B |
$A \subseteq B$ | A & B == A |
$i \in A$ | (1 << i) & A |
${min(A)}$ | -A & A |
You should ideally now about:
(Most are overkill for an interview but nice to know as a software engineer)
i + 1 is either
i
+ A[i + 1]
A[i + 1]
++
if the current element is the candidate --
otherwiseb != 1
return = gcd(b, a % b)
res = res * a
if n odda = a * a
otherwiselcs[p][q]
= lcs ending at index p
in str1 and q
in str2 (DP)lcs[p][q]
= str1[p] == str2[q] ? 1 + lcs[p-1][q-1] : max(lcs[p][q-1], lcs[p-1][q])
Questions about the stack, the tests, the integration process, documentation of production incidents…
Standard dev environment ? Mandatory ?
What are the weekly meetings ?
What’s the remote policy ?
What’s the product schedule and deployment frequency ?
What about conference/event attending ?
Is there any dedicated time for self-training ? Contributing to FOSS ?
What does it take to be successful here ?
To evaluate the global software engineering quality using the joel (StackOverflow co-founder) test:
NeuroEvolution of Augmenting Topologies is a way to both create and train neural networks using genetic algorithms. It was introduced in 2002 by Kenneth O. Stanley and Risto Miikkulainen in the paper Evolving Neural Networks through Augmenting Topologies.
I implemented it with a few variations and heuristics of my own as my first real CS project.
The main idea is very simple: a neural network is a graph, neurons are nodes and inter-neurons connections are weighted edges.
Start with a population of neural networks with random weights and the minimal structure: all the input nodes connected to all the output ones. Then, evolve in one of two way:
After that, the weights are perturbed, similar individuals are crossed over to create new ones and finally, everyone is evaluated. Weak individuals are dismissed and the cycle starts over.
Although the process is very straight forward, there’s one tricky part: crossing individuals over.
Crossing two paths in the travelling salesman problem is rather simple, crossing over two neural networks is inherently hard. For a crossover between neural networks to make sense, they must share a lot of common traits.
Each structural mutation (connection or neuron), also called gene, has a unique, increasing id: the “innovation number”. It must be the same even if a given mutation occurs in two neural networks with several generations (and thus mutations) in between.
The distance between two individuals is then given by the following formula:
where:
A threshold $d_s$ is used to infer species, i.e. groups of similar individuals, form the distance matrix. Cross-over is only allowed within species.
Gene innovation number: | 1 | 2 | 3 | 4 | 5 | |
---|---|---|---|---|---|---|
Gene type: | common | disjoint | disjoint | common | excess | |
Worse performing parent | x | x | x | |||
Best performing parent | x | x | x | x | ||
Offspring | x | x | x | x |
Keep original genes from the best parent, average the weights for common ones.
If the fitness of a species does not increase enough over a generation, its stagnation is incremented. Whenever there’s a significant fitness leap the stagnation is reset.
The potential is the number used to attribute the number of offsprings to a species. It measures how dynamic the evolution of a species is.
Stagnant species have fewer offsprings, this strategy helps overcoming local fitness maximums.
At some stage of the training, the strucutre of the graph can be frozen and gradient descent used for a quicker convergence.
]]>A total eclipse is what someone located in the umbra sees, the moon blocks all the incoming light of the sun. A partial eclipse is what someone in the penumbra sees, the moon only blocks some the sunlight.
Without further loss of precision, we can consider the sun, the moon and the earth as disks, each sitting on one of the 3 planes normal to the sun-earth axis and containing a body’s center.
To detect whether a total or partial eclipse is occurring, all we have to do is to find out if either the penumbra or the umbra intersects the earth’s disk. In concrete terms, we need to compute the bit of umbra and penumbra that is the closest to the sun-center-to-earth-center axis and see whether they fall on the earth’s disk or not.
The (normalized) vector moon-center to projection of the earth’s center on the moon plane, MC_EP
gives the direction on the sun-facing planes in which to seek the points that define the lines leading to the interesting shadow bits, as shown in the following schemas.
In a given position, two lines (on for total, and one for partial) are enough determine if an eclipse is occurring. Each of these lines is defined by two points: one on the sun’s perimeter and another on the moon’s perimeter as demonstrated in the first schema.
If they intersect the earth’s disk then there’s an eclipse.
Note that in some circumstances this is not exactly correct. In such case the rules are slightly different and there is no total eclipse.
For different reasons, we’ll need to compute 3 intersections between a line and a plane.
MC_EP
, between
MC_EP
passing by the point of the moon’s perimeter that is the closest to the earth’s center (i.e. in the direction of MC_EP
) andMC_EP
passing by the point of the moon’s perimeter that is the closest to the earth’s center andLet $A$, $B$ be two points, the intersection $I$ of the line directed by $\vec{AB}$ with the plane of normal $\vec{n}$ passing by $O$ is given by:
In python:
def intersectionPoint(O, n, A, B):
"""retruns, if it exists, the intersection point between:
-the plane of normal vector n containing O
-the line (AB)"""
AB = B - A
if abs(np.dot(AB, n)) < espilon:
raise ValueError("No intersection")
d = -(np.dot(A - O, n) / np.dot(AB, n))
X = A + d * AB
return X
That’s enough to create an isEclipse
function that can tell for a given position of the bodies whether an eclipse is occurring or not.
def isEclipse(S, E, M, r_s, r_e, r_m):
"""returns a tuple of boolean, (partial, total)
S, E, M: np array [x,y,z] of the position of the Sun, the Earth and the Moon
r_s, r_e, r_m their radius"""
partial, total = False, False
earth_sun = S - E
moon_sun = S - M
# The moon must be between the earth and the sun
if norm(moon_sun) > norm(earth_sun):
return partial, total
# The point of the moon's sun-facing plane which is on the sun-earth line
earth_on_moon_plane = intersectionPoint(
M, moon_sun, S, E
)
# Moon to Earth's Projection direction
MC_EP = earth_on_moon_plane - M
MC_EP = MC_EP / norm(MC_EP)
# The lines pass by the same point of the moon
moon_point = M + r_m * MC_EP
# The lines pass by opposite points of the sun
sun_point_partial = S - r_s * MC_EP
sun_point_total = S + r_s * MC_EP
# moon_point is the point of the moon that is the closest to the sun-earth axis
# hence the bit of shadow that is the closest to the earth's center is on the line that
# passes by sun_point_partial (for partial eclipse) and moon_point
# or by sun_point_total (for total eclipse) and moon_point
shadow_for_partial = intersectionPoint(
E, earth_sun, sun_point_partial, moon_point
)
shadow_for_total = intersectionPoint(
E, earth_sun, sun_point_total, moon_point
)
if distance(shadow_for_partial, E) < r_e:
partial = True
if distance(shadow_for_total, E) < r_e:
total = True
return partial, total
Now that we can tell if there’s an eclipse in a given position, we need to compute all future positions given the current (observable) one.
The odeint
python function can solve differential equations looking like $y'(t) = f(y(t), t)$ for a set of instants $t$ given initial conditions $y(0) = y_0$ .
Let’s use $p_i: t \mapsto \begin{pmatrix} x_i(t) \\ y_i(t) \\ z_i(t) \end{pmatrix}$ as the position function for the body $i$.
Using the fundamental principle of dynamic, for all body $i$ we have:
That gives us a way to compute the derivative of the vector $\begin{pmatrix}p_i(t) \\ p_i'(t)\end{pmatrix}$:
Knowing the derivative of the vector at a given time and its value for $t = 0$, that is the current measurable positions and speed of the celestial bodies, we can use odeint
to compute their future positions.
In practice, our $f$ function will take a flat vector of all bodies positions and speeds and will output it’s derivative.
def f(y, t):
"""y: flat array of bodies [x1,y1,z1,dx1,dy1,dz1, x2, y2, ...]
returns a flat array
to be used in odeint"""
s = np.zeros((nb_bodies, 6))
y = y.reshape((nb_bodies, 6))
for i in range(nb_bodies):
for j in range(0, nb_bodies):
if i != j:
a = y[j][0:3] - y[i][0:3] # p_j - p_i
d3 = sum(a**2)**1.5 # squarred then cubed distance
s[i][3:6] += a / d3 * masses[j]
s[i][3:6] *= G
s[i][0:3] = y[i][3:6]
y = y.reshape(nb_bodies * 6)
return s.reshape(nb_bodies * 6)
# an instant every seconds for 3 days
instants = np.linspace(0, 3, 3600 * 24 * 3 + 1)
positions = sint.odeint(f, init_bodies, intstants)
Now just check if each position is a partial or a total eclipse ! My result is:
Partial Eclipse
Beginning: 2016-03-08T23:20:06
End: 2016-03-09T04:35:30
Total Eclipse
Beginning: 2016-03-09T00:16:41
End: 2016-03-09T03:39:00
which is very close (< 1min error !) to the actual timestamps from the french’s wikipedia of the eclipse.
The full code with the necessary init values is available here.
Since this method gives the position of the eclipse on the earth’s disk, it can be extended to infer where the eclipse will be visible from.
]]>