{ "cells": [ { "cell_type": "markdown", "id": "520c157a", "metadata": {}, "source": [ "# LSH Banding Technique" ] }, { "cell_type": "markdown", "id": "ce2b3905", "metadata": {}, "source": [ "In this section, we discuss the more traditional approach to LSH which follows the workflow of ***shingling $\\rightarrow$ minhashing $\\rightarrow$ banding*** (*the actual LSH step*).\n", "\n", "Recall: We can express documents as *k*-shingles (or whichever token we choose) and consequently perform a mminhashing to obtain signatures. We arrange these into columns of a matrix--where rows correspond to each item of a signature. Also, suppose that the signatures now have invariant lengths (i.e., number of rows) due to the prior 'preprocessing' done.\n", "\n", "We obtain a structure similar to what is shown below:\n", "\n", "```{figure} ./images/band_structure.PNG\n", ":name: band-structure\n", "\n", "Segmented signature matrix. Four segments/bands with three rows per segment. Taken from {cite:ps}`rajaraman2011mining`.\n", "```\n", "\n", "Notice that unlike those discussed in Minhashing, this signature matrix are now segmented into bands containing three (3) rows per band. Why do this?\n", "\n", "Let's break it down. If we simply apply a hash function to full signatures then it will most likely be that we will only get the completely identical signatures--losing pairs that hold some similarity (i.e., candidate pairs) in some segments of their respective signatures. *Note: We end up discarding similar but not identical documents.* This presents another compelling reason for the banded LSH approach.\n", "\n", "A natural course to this is to \"hash\" the items several times using different hash functions banking on the idea that similar items will more likely be hashed to the same bucket--otherwise, dissimilar items. The book terminology for items hashed into the same buckets are *candidate pairs*. Narrowing down the search? Voila! *Candidate pairs* instead of that $n \\choose 2$ number of pairs.\n", "\n", "Alternatively, for minhashed signatures like the one shown in {numref}`band-structure`, hashing can be applied per band/segment. Hash functions can either be varied per band/segment or the same. In effect, multiple hashing and/or segmented hashing addresses the overfit on getting only identical but not similar items.\n", "\n", "Banded signatures are then hashed, forming different hash tables for each band. Candidate pairs are then determined according to those hashed in the same buckets. See {numref}`bandhash-mechanics` for the underlying mechanics.\n", "\n", "```{figure} ./images/bandhash_mechanics.PNG\n", ":width: 700px\n", ":name: bandhash-mechanics\n", "\n", "Underlying mechanics of the \"Band Hashing\" method for two banded signature sets $a$ and $b$. In this figure, A and B are considered as candidate pairs because their 3rd bands are hashed in the same buckets. Here only one hash function is used but bands get their own set of hash buckets.\n", "```" ] }, { "cell_type": "markdown", "id": "26b50f7b", "metadata": {}, "source": [ "Following a more tolerant approach wherein any pair are classified as *candidate pairs* as long as they are hashed in the same bucket in any of the bands formed, hence number of bands and the resulting number of rows in each band as a tuning parameter. Wherein more number of bands will increase the probability of any pair--despite having low similarity--to be tagged as *candidate pairs*. We then state that the number of bands determine the similarity threshold as shown below--where document pairs with pairwise similarities above the threshold (dashed line) will be tagged as *candidate pairs*." ] }, { "cell_type": "code", "execution_count": 1, "id": "370d67b0", "metadata": { "ExecuteTime": { "end_time": "2022-04-02T10:11:03.399022Z", "start_time": "2022-04-02T10:11:03.090383Z" }, "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ " \n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.plotly.v1+json": { "config": { "plotlyServerURL": "https://plot.ly" }, "data": [ { "line": { "color": "deeppink", "width": 6 }, "type": "scatter", "visible": false, "x": [ 0, 0.010101010101010102, 0.020202020202020204, 0.030303030303030304, 0.04040404040404041, 0.05050505050505051, 0.06060606060606061, 0.07070707070707072, 0.08080808080808081, 0.09090909090909091, 0.10101010101010102, 0.11111111111111112, 0.12121212121212122, 0.13131313131313133, 0.14141414141414144, 0.15151515151515152, 0.16161616161616163, 0.17171717171717174, 0.18181818181818182, 0.19191919191919193, 0.20202020202020204, 0.21212121212121213, 0.22222222222222224, 0.23232323232323235, 0.24242424242424243, 0.25252525252525254, 0.26262626262626265, 0.27272727272727276, 0.2828282828282829, 0.29292929292929293, 0.30303030303030304, 0.31313131313131315, 0.32323232323232326, 0.33333333333333337, 0.3434343434343435, 0.3535353535353536, 0.36363636363636365, 0.37373737373737376, 0.38383838383838387, 0.393939393939394, 0.4040404040404041, 0.4141414141414142, 0.42424242424242425, 0.43434343434343436, 0.4444444444444445, 0.4545454545454546, 0.4646464646464647, 0.4747474747474748, 0.48484848484848486, 0.494949494949495, 0.5050505050505051, 0.5151515151515152, 0.5252525252525253, 0.5353535353535354, 0.5454545454545455, 0.5555555555555556, 0.5656565656565657, 0.5757575757575758, 0.5858585858585859, 0.595959595959596, 0.6060606060606061, 0.6161616161616162, 0.6262626262626263, 0.6363636363636365, 0.6464646464646465, 0.6565656565656566, 0.6666666666666667, 0.6767676767676768, 0.686868686868687, 0.696969696969697, 0.7070707070707072, 0.7171717171717172, 0.7272727272727273, 0.7373737373737375, 0.7474747474747475, 0.7575757575757577, 0.7676767676767677, 0.7777777777777778, 0.787878787878788, 0.797979797979798, 0.8080808080808082, 0.8181818181818182, 0.8282828282828284, 0.8383838383838385, 0.8484848484848485, 0.8585858585858587, 0.8686868686868687, 0.8787878787878789, 0.888888888888889, 0.8989898989898991, 0.9090909090909092, 0.9191919191919192, 0.9292929292929294, 0.9393939393939394, 0.9494949494949496, 0.9595959595959597, 0.9696969696969697, 0.9797979797979799, 0.98989898989899, 1 ], "y": [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2.220446049250313e-16, 8.881784197001252e-16, 3.6637359812630166e-15, 1.4765966227514582e-14, 5.873079800267078e-14, 2.289279876777073e-13, 8.761880110341735e-13, 3.2949198924825396e-12, 1.2177370223298567e-11, 4.4252823627743965e-11, 1.5819390242199916e-10, 5.565180538624759e-10, 1.9274469797991856e-09, 6.57454612973396e-09, 2.2094784890569485e-08, 7.318282102541929e-08, 2.3898635836960835e-07, 7.697074048129693e-07, 2.445714746279748e-06, 7.669159237222445e-06, 2.3740026791507773e-05, 7.256571590152916e-05, 0.0002190885960516864, 0.0006535266086732383, 0.0019265283115986742, 0.005613915872504149, 0.01617488006589729, 0.0460896960639241, 0.12991340535646834, 0.36231649758991713, 1 ] }, { "line": { "color": "deeppink", "width": 6 }, "type": "scatter", "visible": false, "x": [ 0, 0.010101010101010102, 0.020202020202020204, 0.030303030303030304, 0.04040404040404041, 0.05050505050505051, 0.06060606060606061, 0.07070707070707072, 0.08080808080808081, 0.09090909090909091, 0.10101010101010102, 0.11111111111111112, 0.12121212121212122, 0.13131313131313133, 0.14141414141414144, 0.15151515151515152, 0.16161616161616163, 0.17171717171717174, 0.18181818181818182, 0.19191919191919193, 0.20202020202020204, 0.21212121212121213, 0.22222222222222224, 0.23232323232323235, 0.24242424242424243, 0.25252525252525254, 0.26262626262626265, 0.27272727272727276, 0.2828282828282829, 0.29292929292929293, 0.30303030303030304, 0.31313131313131315, 0.32323232323232326, 0.33333333333333337, 0.3434343434343435, 0.3535353535353536, 0.36363636363636365, 0.37373737373737376, 0.38383838383838387, 0.393939393939394, 0.4040404040404041, 0.4141414141414142, 0.42424242424242425, 0.43434343434343436, 0.4444444444444445, 0.4545454545454546, 0.4646464646464647, 0.4747474747474748, 0.48484848484848486, 0.494949494949495, 0.5050505050505051, 0.5151515151515152, 0.5252525252525253, 0.5353535353535354, 0.5454545454545455, 0.5555555555555556, 0.5656565656565657, 0.5757575757575758, 0.5858585858585859, 0.595959595959596, 0.6060606060606061, 0.6161616161616162, 0.6262626262626263, 0.6363636363636365, 0.6464646464646465, 0.6565656565656566, 0.6666666666666667, 0.6767676767676768, 0.686868686868687, 0.696969696969697, 0.7070707070707072, 0.7171717171717172, 0.7272727272727273, 0.7373737373737375, 0.7474747474747475, 0.7575757575757577, 0.7676767676767677, 0.7777777777777778, 0.787878787878788, 0.797979797979798, 0.8080808080808082, 0.8181818181818182, 0.8282828282828284, 0.8383838383838385, 0.8484848484848485, 0.8585858585858587, 0.8686868686868687, 0.8787878787878789, 0.888888888888889, 0.8989898989898991, 0.9090909090909092, 0.9191919191919192, 0.9292929292929294, 0.9393939393939394, 0.9494949494949496, 0.9595959595959597, 0.9696969696969697, 0.9797979797979799, 0.98989898989899, 1 ], "y": [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2.220446049250313e-16, 4.440892098500626e-16, 1.1102230246251565e-15, 2.886579864025407e-15, 7.993605777301127e-15, 2.0872192862952943e-14, 5.417888360170764e-14, 1.376676550535194e-13, 3.446132268436486e-13, 8.484324354185446e-13, 2.055910997000865e-12, 4.905409412003792e-12, 1.1530998378361801e-11, 2.6719737533653642e-11, 6.106071204214913e-11, 1.3767498252548194e-10, 3.064062337188034e-10, 6.733935631331178e-10, 1.461968768268207e-09, 3.1366571562330137e-09, 6.652890904845776e-09, 1.3954552180450719e-08, 2.895508210976061e-08, 5.945279268892989e-08, 1.208335927982418e-07, 2.431616384912516e-07, 4.846332634089734e-07, 9.568770720225217e-07, 1.8721180568004314e-06, 3.630358263118083e-06, 6.97920197256785e-06, 1.330451219039297e-05, 2.5154871267751844e-05, 4.7180712240524336e-05, 8.780346932624195e-05, 0.0001621605757101685, 0.0002972641975098611, 0.000540973288431501, 0.0009774856341965288, 0.0017538896951904137, 0.0031253105073756338, 0.005530980220265436, 0.009721007643928692, 0.01696453684309973, 0.02938419473885656, 0.0504748063848125, 0.08585794210540987, 0.14423822288439325, 0.2381861237526698, 0.3832805221111335, 0.5909566393990848, 0.8415381407944971, 1 ] }, { "line": { "color": "deeppink", "width": 6 }, "type": "scatter", "visible": false, "x": [ 0, 0.010101010101010102, 0.020202020202020204, 0.030303030303030304, 0.04040404040404041, 0.05050505050505051, 0.06060606060606061, 0.07070707070707072, 0.08080808080808081, 0.09090909090909091, 0.10101010101010102, 0.11111111111111112, 0.12121212121212122, 0.13131313131313133, 0.14141414141414144, 0.15151515151515152, 0.16161616161616163, 0.17171717171717174, 0.18181818181818182, 0.19191919191919193, 0.20202020202020204, 0.21212121212121213, 0.22222222222222224, 0.23232323232323235, 0.24242424242424243, 0.25252525252525254, 0.26262626262626265, 0.27272727272727276, 0.2828282828282829, 0.29292929292929293, 0.30303030303030304, 0.31313131313131315, 0.32323232323232326, 0.33333333333333337, 0.3434343434343435, 0.3535353535353536, 0.36363636363636365, 0.37373737373737376, 0.38383838383838387, 0.393939393939394, 0.4040404040404041, 0.4141414141414142, 0.42424242424242425, 0.43434343434343436, 0.4444444444444445, 0.4545454545454546, 0.4646464646464647, 0.4747474747474748, 0.48484848484848486, 0.494949494949495, 0.5050505050505051, 0.5151515151515152, 0.5252525252525253, 0.5353535353535354, 0.5454545454545455, 0.5555555555555556, 0.5656565656565657, 0.5757575757575758, 0.5858585858585859, 0.595959595959596, 0.6060606060606061, 0.6161616161616162, 0.6262626262626263, 0.6363636363636365, 0.6464646464646465, 0.6565656565656566, 0.6666666666666667, 0.6767676767676768, 0.686868686868687, 0.696969696969697, 0.7070707070707072, 0.7171717171717172, 0.7272727272727273, 0.7373737373737375, 0.7474747474747475, 0.7575757575757577, 0.7676767676767677, 0.7777777777777778, 0.787878787878788, 0.797979797979798, 0.8080808080808082, 0.8181818181818182, 0.8282828282828284, 0.8383838383838385, 0.8484848484848485, 0.8585858585858587, 0.8686868686868687, 0.8787878787878789, 0.888888888888889, 0.8989898989898991, 0.9090909090909092, 0.9191919191919192, 0.9292929292929294, 0.9393939393939394, 0.9494949494949496, 0.9595959595959597, 0.9696969696969697, 0.9797979797979799, 0.98989898989899, 1 ], "y": [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4.440892098500626e-16, 1.7763568394002505e-15, 4.440892098500626e-15, 1.199040866595169e-14, 3.108624468950438e-14, 7.771561172376096e-14, 1.865174681370263e-13, 4.3565151486291143e-13, 9.889866703360894e-13, 2.1875834477214084e-12, 4.721112389916016e-12, 9.957368263258104e-12, 2.0553336810280598e-11, 4.156630595275601e-11, 8.245626403891038e-11, 1.606088595451638e-10, 3.07464720350481e-10, 5.790017354456722e-10, 1.0734368949272266e-09, 1.960705375836369e-09, 3.530949754804169e-09, 6.273314312466027e-09, 1.1002540212246004e-08, 1.9060132472326075e-08, 3.2630713242554066e-08, 5.523457935474596e-08, 9.24873021501682e-08, 1.5326049018771215e-07, 2.5144006943200736e-07, 4.085673435128001e-07, 6.577740853108693e-07, 1.0496003118865005e-06, 1.6605314037132501e-06, 2.605435805658196e-06, 4.055560249960344e-06, 6.264392273580555e-06, 9.604590503364818e-06, 1.4620391591502191e-05, 2.2101533204921253e-05, 3.31869208031188e-05, 4.9509187538987476e-05, 7.339517705151621e-05, 0.00010814250606339115, 0.00015839910244641597, 0.00023068142291060578, 0.0003340785012735381, 0.00048120375855220665, 0.0006894754513645518, 0.0009828307091700461, 0.0013940083751409205, 0.0019675733867758893, 0.0027639010844991985, 0.003864393913946995, 0.005378264495370932, 0.0074512844296174, 0.010276959244847617, 0.014110629985331635, 0.01928699029257286, 0.026241390009076215, 0.035534979951413415, 0.04788308621461457, 0.06418494751335302, 0.08555074617047209, 0.11331820474795229, 0.1490452606384568, 0.19445681274228854, 0.25131199566093587, 0.3211450320888084, 0.4048222410952891, 0.5018630106447575, 0.6095215370295464, 0.7217716699867638, 0.8286594765352622, 0.9170544911120257, 0.9744926507439662, 0.9974751568622975, 1 ] }, { "line": { "color": "deeppink", "width": 6 }, "type": "scatter", "visible": false, "x": [ 0, 0.010101010101010102, 0.020202020202020204, 0.030303030303030304, 0.04040404040404041, 0.05050505050505051, 0.06060606060606061, 0.07070707070707072, 0.08080808080808081, 0.09090909090909091, 0.10101010101010102, 0.11111111111111112, 0.12121212121212122, 0.13131313131313133, 0.14141414141414144, 0.15151515151515152, 0.16161616161616163, 0.17171717171717174, 0.18181818181818182, 0.19191919191919193, 0.20202020202020204, 0.21212121212121213, 0.22222222222222224, 0.23232323232323235, 0.24242424242424243, 0.25252525252525254, 0.26262626262626265, 0.27272727272727276, 0.2828282828282829, 0.29292929292929293, 0.30303030303030304, 0.31313131313131315, 0.32323232323232326, 0.33333333333333337, 0.3434343434343435, 0.3535353535353536, 0.36363636363636365, 0.37373737373737376, 0.38383838383838387, 0.393939393939394, 0.4040404040404041, 0.4141414141414142, 0.42424242424242425, 0.43434343434343436, 0.4444444444444445, 0.4545454545454546, 0.4646464646464647, 0.4747474747474748, 0.48484848484848486, 0.494949494949495, 0.5050505050505051, 0.5151515151515152, 0.5252525252525253, 0.5353535353535354, 0.5454545454545455, 0.5555555555555556, 0.5656565656565657, 0.5757575757575758, 0.5858585858585859, 0.595959595959596, 0.6060606060606061, 0.6161616161616162, 0.6262626262626263, 0.6363636363636365, 0.6464646464646465, 0.6565656565656566, 0.6666666666666667, 0.6767676767676768, 0.686868686868687, 0.696969696969697, 0.7070707070707072, 0.7171717171717172, 0.7272727272727273, 0.7373737373737375, 0.7474747474747475, 0.7575757575757577, 0.7676767676767677, 0.7777777777777778, 0.787878787878788, 0.797979797979798, 0.8080808080808082, 0.8181818181818182, 0.8282828282828284, 0.8383838383838385, 0.8484848484848485, 0.8585858585858587, 0.8686868686868687, 0.8787878787878789, 0.888888888888889, 0.8989898989898991, 0.9090909090909092, 0.9191919191919192, 0.9292929292929294, 0.9393939393939394, 0.9494949494949496, 0.9595959595959597, 0.9696969696969697, 0.9797979797979799, 0.98989898989899, 1 ], "y": [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5.551115123125783e-16, 2.220446049250313e-15, 7.771561172376096e-15, 2.275957200481571e-14, 6.38378239159465e-14, 1.6986412276764895e-13, 4.313216450668733e-13, 1.049160758270773e-12, 2.457478665007784e-12, 5.559996907322784e-12, 1.2182477249211843e-11, 2.591427072928809e-11, 5.363209876207975e-11, 1.081995604224062e-10, 2.1315282872080843e-10, 4.106764928124562e-10, 7.749356711883593e-10, 1.4339862630663447e-09, 2.605220483786752e-09, 4.6518638940895585e-09, 8.171769927400874e-09, 1.4135155712580172e-08, 2.4095616391051067e-08, 4.0509627208251686e-08, 6.721495737771477e-08, 1.1013953205019078e-07, 1.7834128396287952e-07, 2.8551910569163397e-07, 4.521885524910729e-07, 7.087878506339962e-07, 1.1000756577894677e-06, 1.6913090676862197e-06, 2.576857556335632e-06, 3.892127304450099e-06, 5.829954782154978e-06, 8.662996310415672e-06, 1.2774113363756001e-05, 1.869735854764798e-05, 2.7172937406172437e-05, 3.9220496651393155e-05, 5.623631864282164e-05, 8.012154344316791e-05, 0.00011345046346267207, 0.00015969032413265527, 0.0002234870139763423, 0.0003110346503316652, 0.0004305514885013517, 0.0005928899401050902, 0.0008123149247248884, 0.0011074924361911265, 0.0015027391937711965, 0.00202959462061425, 0.0027287880795007213, 0.003652687019798595, 0.004868324812376312, 0.0064611193524264365, 0.008539402854709222, 0.011239886090223306, 0.014734170904726618, 0.01923639432041191, 0.025012022354935826, 0.03238769194378521, 0.041761796343971036, 0.05361518326903303, 0.06852083273247067, 0.08715063696261771, 0.11027634343714143, 0.13876027725020035, 0.17352960768189996, 0.2155257597142144, 0.2656184376869235, 0.3244724402809831, 0.39235662420970896, 0.4688908721778041, 0.5527431978792987, 0.6413210902324147, 0.7305544134032147, 0.8149410066463328, 0.8880999359635918, 0.9440841635453141, 0.9794946028375476, 0.9957720923445, 0.9997904631059815, 1 ] }, { "line": { "color": "deeppink", "width": 6 }, "type": "scatter", "visible": true, "x": [ 0, 0.010101010101010102, 0.020202020202020204, 0.030303030303030304, 0.04040404040404041, 0.05050505050505051, 0.06060606060606061, 0.07070707070707072, 0.08080808080808081, 0.09090909090909091, 0.10101010101010102, 0.11111111111111112, 0.12121212121212122, 0.13131313131313133, 0.14141414141414144, 0.15151515151515152, 0.16161616161616163, 0.17171717171717174, 0.18181818181818182, 0.19191919191919193, 0.20202020202020204, 0.21212121212121213, 0.22222222222222224, 0.23232323232323235, 0.24242424242424243, 0.25252525252525254, 0.26262626262626265, 0.27272727272727276, 0.2828282828282829, 0.29292929292929293, 0.30303030303030304, 0.31313131313131315, 0.32323232323232326, 0.33333333333333337, 0.3434343434343435, 0.3535353535353536, 0.36363636363636365, 0.37373737373737376, 0.38383838383838387, 0.393939393939394, 0.4040404040404041, 0.4141414141414142, 0.42424242424242425, 0.43434343434343436, 0.4444444444444445, 0.4545454545454546, 0.4646464646464647, 0.4747474747474748, 0.48484848484848486, 0.494949494949495, 0.5050505050505051, 0.5151515151515152, 0.5252525252525253, 0.5353535353535354, 0.5454545454545455, 0.5555555555555556, 0.5656565656565657, 0.5757575757575758, 0.5858585858585859, 0.595959595959596, 0.6060606060606061, 0.6161616161616162, 0.6262626262626263, 0.6363636363636365, 0.6464646464646465, 0.6565656565656566, 0.6666666666666667, 0.6767676767676768, 0.686868686868687, 0.696969696969697, 0.7070707070707072, 0.7171717171717172, 0.7272727272727273, 0.7373737373737375, 0.7474747474747475, 0.7575757575757577, 0.7676767676767677, 0.7777777777777778, 0.787878787878788, 0.797979797979798, 0.8080808080808082, 0.8181818181818182, 0.8282828282828284, 0.8383838383838385, 0.8484848484848485, 0.8585858585858587, 0.8686868686868687, 0.8787878787878789, 0.888888888888889, 0.8989898989898991, 0.9090909090909092, 0.9191919191919192, 0.9292929292929294, 0.9393939393939394, 0.9494949494949496, 0.9595959595959597, 0.9696969696969697, 0.9797979797979799, 0.98989898989899, 1 ], "y": [ 0, 0, 0, 6.661338147750939e-15, 1.1546319456101628e-13, 1.0802470029602773e-12, 6.685763054292693e-12, 3.123390435177953e-11, 1.1872614003038962e-10, 3.8554381909250424e-10, 1.1057277315984493e-09, 2.8679725261326894e-09, 6.846372357927066e-09, 1.5243390882879737e-08, 3.1983677484248574e-08, 6.376180960998568e-08, 1.215760014883216e-07, 2.2291393830808914e-07, 3.9479625768557014e-07, 6.779285607327878e-07, 1.132264235037539e-06, 1.8443385377819865e-06, 2.9367994371920503e-06, 4.580633226503039e-06, 7.0106631583355394e-06, 1.0544987758542845e-05, 1.560912229248146e-05, 2.2765712469174915e-05, 3.2750804062398053e-05, 4.65177758592894e-05, 6.529017626177858e-05, 9.062484581079389e-05, 0.00012448685844512752, 0.00016933797279305995, 0.00022824044988323244, 0.00030497826360287306, 0.0004041979022556541, 0.0005315711297310122, 0.0006939822378738691, 0.0008997424700299961, 0.0011588344196999945, 0.0014831892944251468, 0.0018869999661662584, 0.0023870726828494337, 0.00300322016245802, 0.0037586984938813295, 0.004680689781035863, 0.005800831730121803, 0.007155794322208586, 0.008787902246742418, 0.010745799790239596, 0.013085152252833021, 0.015869374557570426, 0.01917037335707572, 0.023069283445510402, 0.027657172454913392, 0.03303567945675212, 0.03931754302107726, 0.04662696236881403, 0.05509972143138486, 0.06488298998696673, 0.07613469886695734, 0.08902236812233477, 0.1037212490447369, 0.1204116246734751, 0.13927510126833165, 0.16048971851342264, 0.18422371338134957, 0.2106277973113716, 0.2398258555727879, 0.27190405938374274, 0.30689850405573427, 0.34478165816886897, 0.3854481354039656, 0.4287005832737344, 0.47423681441284504, 0.5216396662706526, 0.5703714259132469, 0.6197749369703565, 0.6690836278273794, 0.7174425505231613, 0.7639419679147613, 0.8076639456444403, 0.8477407094687268, 0.8834212289794156, 0.9141397659764425, 0.9395774023834275, 0.959705548739545, 0.9748000949135128, 0.9854172446043155, 0.992327923656115, 0.9964168438146014, 0.998563187249651, 0.9995289085573575, 0.9998829085244709, 0.9999805618277763, 0.999998300065759, 0.9999999540218464, 0.9999999999296791, 1 ] }, { "line": { "color": "deeppink", "width": 6 }, "type": "scatter", "visible": false, "x": [ 0, 0.010101010101010102, 0.020202020202020204, 0.030303030303030304, 0.04040404040404041, 0.05050505050505051, 0.06060606060606061, 0.07070707070707072, 0.08080808080808081, 0.09090909090909091, 0.10101010101010102, 0.11111111111111112, 0.12121212121212122, 0.13131313131313133, 0.14141414141414144, 0.15151515151515152, 0.16161616161616163, 0.17171717171717174, 0.18181818181818182, 0.19191919191919193, 0.20202020202020204, 0.21212121212121213, 0.22222222222222224, 0.23232323232323235, 0.24242424242424243, 0.25252525252525254, 0.26262626262626265, 0.27272727272727276, 0.2828282828282829, 0.29292929292929293, 0.30303030303030304, 0.31313131313131315, 0.32323232323232326, 0.33333333333333337, 0.3434343434343435, 0.3535353535353536, 0.36363636363636365, 0.37373737373737376, 0.38383838383838387, 0.393939393939394, 0.4040404040404041, 0.4141414141414142, 0.42424242424242425, 0.43434343434343436, 0.4444444444444445, 0.4545454545454546, 0.4646464646464647, 0.4747474747474748, 0.48484848484848486, 0.494949494949495, 0.5050505050505051, 0.5151515151515152, 0.5252525252525253, 0.5353535353535354, 0.5454545454545455, 0.5555555555555556, 0.5656565656565657, 0.5757575757575758, 0.5858585858585859, 0.595959595959596, 0.6060606060606061, 0.6161616161616162, 0.6262626262626263, 0.6363636363636365, 0.6464646464646465, 0.6565656565656566, 0.6666666666666667, 0.6767676767676768, 0.686868686868687, 0.696969696969697, 0.7070707070707072, 0.7171717171717172, 0.7272727272727273, 0.7373737373737375, 0.7474747474747475, 0.7575757575757577, 0.7676767676767677, 0.7777777777777778, 0.787878787878788, 0.797979797979798, 0.8080808080808082, 0.8181818181818182, 0.8282828282828284, 0.8383838383838385, 0.8484848484848485, 0.8585858585858587, 0.8686868686868687, 0.8787878787878789, 0.888888888888889, 0.8989898989898991, 0.9090909090909092, 0.9191919191919192, 0.9292929292929294, 0.9393939393939394, 0.9494949494949496, 0.9595959595959597, 0.9696969696969697, 0.9797979797979799, 0.98989898989899, 1 ], "y": [ 0, 2.1030710506408923e-09, 6.729828261331505e-08, 5.11046232332113e-07, 2.1535429370889148e-06, 6.572077689281919e-06, 1.6353356373510763e-05, 3.534572800889535e-05, 6.891118871710233e-05, 0.00012417693956201514, 0.00021028613506868243, 0.0003386472702371357, 0.0005231814083258568, 0.0007805661422487509, 0.001130474802680137, 0.001595808969650947, 0.0022029218065093836, 0.0029818291116385787, 0.0039664042724080195, 0.005194552507539085, 0.006708358902238198, 0.008554203782149217, 0.010782837950554236, 0.013449409246465316, 0.01661343079625688, 0.020338680262977693, 0.024693018391212518, 0.029748114258389857, 0.0355790639455692, 0.042263888915609726, 0.049882900331855606, 0.0585179159771676, 0.06825131746505264, 0.0791649372046822, 0.09133876722827827, 0.10484948564993879, 0.11976880132689971, 0.1361616233439199, 0.1540840693104959, 0.1735813351680484, 0.1946854591990701, 0.2174130240685278, 0.2417628527526524, 0.26771376672975933, 0.29522248727832767, 0.32422177244154904, 0.354618892302059, 0.38629455263755363, 0.4191023806444003, 0.452869084993901, 0.48739539479617644, 0.5224578669640316, 0.5578116280972822, 0.5931940848308783, 0.6283295956350483, 0.6629350480578825, 0.6967262299495014, 0.7294248238495458, 0.7607657939963989, 0.790504879796009, 0.8184258632783461, 0.8443472466827296, 0.8681279653813714, 0.8896717757032742, 0.9089300002677201, 0.9259023864230707, 0.9406359347198117, 0.9532216791176121, 0.963789540455422, 0.9725015180456722, 0.9795436171807425, 0.9851170179733757, 0.9894290592621413, 0.9926846292410084, 0.9950785160224036, 0.9967891772559185, 0.9979742465279243, 0.9987679211554763, 0.999280192343635, 0.9995977083038465, 0.9997859263935414, 0.9998921288697229, 0.99994885695087, 0.9999773574568234, 0.9999907229914456, 0.9999965205030554, 0.9999988210707954, 0.9999996450402843, 0.9999999069777626, 0.9999999793397878, 0.9999999962451973, 0.9999999994675027, 0.9999999999448664, 0.9999999999962195, 0.9999999999998523, 0.9999999999999974, 1, 1, 1, 1 ] }, { "line": { "color": "deeppink", "width": 6 }, "type": "scatter", "visible": false, "x": [ 0, 0.010101010101010102, 0.020202020202020204, 0.030303030303030304, 0.04040404040404041, 0.05050505050505051, 0.06060606060606061, 0.07070707070707072, 0.08080808080808081, 0.09090909090909091, 0.10101010101010102, 0.11111111111111112, 0.12121212121212122, 0.13131313131313133, 0.14141414141414144, 0.15151515151515152, 0.16161616161616163, 0.17171717171717174, 0.18181818181818182, 0.19191919191919193, 0.20202020202020204, 0.21212121212121213, 0.22222222222222224, 0.23232323232323235, 0.24242424242424243, 0.25252525252525254, 0.26262626262626265, 0.27272727272727276, 0.2828282828282829, 0.29292929292929293, 0.30303030303030304, 0.31313131313131315, 0.32323232323232326, 0.33333333333333337, 0.3434343434343435, 0.3535353535353536, 0.36363636363636365, 0.37373737373737376, 0.38383838383838387, 0.393939393939394, 0.4040404040404041, 0.4141414141414142, 0.42424242424242425, 0.43434343434343436, 0.4444444444444445, 0.4545454545454546, 0.4646464646464647, 0.4747474747474748, 0.48484848484848486, 0.494949494949495, 0.5050505050505051, 0.5151515151515152, 0.5252525252525253, 0.5353535353535354, 0.5454545454545455, 0.5555555555555556, 0.5656565656565657, 0.5757575757575758, 0.5858585858585859, 0.595959595959596, 0.6060606060606061, 0.6161616161616162, 0.6262626262626263, 0.6363636363636365, 0.6464646464646465, 0.6565656565656566, 0.6666666666666667, 0.6767676767676768, 0.686868686868687, 0.696969696969697, 0.7070707070707072, 0.7171717171717172, 0.7272727272727273, 0.7373737373737375, 0.7474747474747475, 0.7575757575757577, 0.7676767676767677, 0.7777777777777778, 0.787878787878788, 0.797979797979798, 0.8080808080808082, 0.8181818181818182, 0.8282828282828284, 0.8383838383838385, 0.8484848484848485, 0.8585858585858587, 0.8686868686868687, 0.8787878787878789, 0.888888888888889, 0.8989898989898991, 0.9090909090909092, 0.9191919191919192, 0.9292929292929294, 0.9393939393939394, 0.9494949494949496, 0.9595959595959597, 0.9696969696969697, 0.9797979797979799, 0.98989898989899, 1 ], "y": [ 0, 2.6025505717175434e-07, 4.1640731011094445e-06, 2.1080448894039705e-05, 6.66231721203614e-05, 0.00016264673132493357, 0.0003372359937166003, 0.0006246850809209503, 0.0010654595666566014, 0.001706134848847074, 0.0025993023123953574, 0.0038034337217621728, 0.005382693195492783, 0.007406685156393533, 0.00995012586971633, 0.013092425635188976, 0.016917168454379072, 0.021511476129220553, 0.026965244344202888, 0.03337023943309858, 0.04081904632244959, 0.04940386066720159, 0.05921512153012931, 0.07033998517162787, 0.0828606456532559, 0.09685251402649508, 0.11238227484499352, 0.12950584651624109, 0.14826628044613033, 0.1686916428006915, 0.19079293170595701, 0.21456209143562344, 0.23997019311764922, 0.26696585816746177, 0.2954740054049857, 0.3253950049726626, 0.35660432107188067, 0.3889527205419885, 0.4222671148738587, 0.45635208898353385, 0.4909921507899987, 0.5259547114449021, 0.5609937773923579, 0.595854303123921, 0.6302771187833216, 0.6640043113273781, 0.6967849037987532, 0.7283806467421908, 0.758571711394728, 0.7871620584941459, 0.8139842516560445, 0.8389034920963289, 0.8618206731588045, 0.8826742888674284, 0.9014410796868064, 0.9181353587767389, 0.9328070300114644, 0.9455383805876103, 0.9564398010595435, 0.9656446486125436, 0.9733035199127928, 0.9795782332248375, 0.984635832157867, 0.988642913608182, 0.9917605504863645, 0.9941400281257958, 0.9959195463965109, 0.9972219636824253, 0.9981535812125139, 0.9988038941367613, 0.9992461758568993, 0.9995387195597063, 0.9997265385665003, 0.9998433253934373, 0.9999134862173882, 0.9999540985964094, 0.9999766802894576, 0.9999886998990904, 0.9999948003388005, 0.9999977395584081, 0.9999990770208532, 0.9999996484270092, 0.9999998760619524, 0.9999999599434793, 0.9999999882632885, 0.9999999969243087, 0.9999999992909533, 0.999999999859126, 0.9999999999764944, 0.999999999996814, 0.9999999999996643, 0.999999999999974, 0.9999999999999987, 1, 1, 1, 1, 1, 1, 1 ] }, { "line": { "color": "deeppink", "width": 6 }, "type": "scatter", "visible": false, "x": [ 0, 0.010101010101010102, 0.020202020202020204, 0.030303030303030304, 0.04040404040404041, 0.05050505050505051, 0.06060606060606061, 0.07070707070707072, 0.08080808080808081, 0.09090909090909091, 0.10101010101010102, 0.11111111111111112, 0.12121212121212122, 0.13131313131313133, 0.14141414141414144, 0.15151515151515152, 0.16161616161616163, 0.17171717171717174, 0.18181818181818182, 0.19191919191919193, 0.20202020202020204, 0.21212121212121213, 0.22222222222222224, 0.23232323232323235, 0.24242424242424243, 0.25252525252525254, 0.26262626262626265, 0.27272727272727276, 0.2828282828282829, 0.29292929292929293, 0.30303030303030304, 0.31313131313131315, 0.32323232323232326, 0.33333333333333337, 0.3434343434343435, 0.3535353535353536, 0.36363636363636365, 0.37373737373737376, 0.38383838383838387, 0.393939393939394, 0.4040404040404041, 0.4141414141414142, 0.42424242424242425, 0.43434343434343436, 0.4444444444444445, 0.4545454545454546, 0.4646464646464647, 0.4747474747474748, 0.48484848484848486, 0.494949494949495, 0.5050505050505051, 0.5151515151515152, 0.5252525252525253, 0.5353535353535354, 0.5454545454545455, 0.5555555555555556, 0.5656565656565657, 0.5757575757575758, 0.5858585858585859, 0.595959595959596, 0.6060606060606061, 0.6161616161616162, 0.6262626262626263, 0.6363636363636365, 0.6464646464646465, 0.6565656565656566, 0.6666666666666667, 0.6767676767676768, 0.686868686868687, 0.696969696969697, 0.7070707070707072, 0.7171717171717172, 0.7272727272727273, 0.7373737373737375, 0.7474747474747475, 0.7575757575757577, 0.7676767676767677, 0.7777777777777778, 0.787878787878788, 0.797979797979798, 0.8080808080808082, 0.8181818181818182, 0.8282828282828284, 0.8383838383838385, 0.8484848484848485, 0.8585858585858587, 0.8686868686868687, 0.8787878787878789, 0.888888888888889, 0.8989898989898991, 0.9090909090909092, 0.9191919191919192, 0.9292929292929294, 0.9393939393939394, 0.9494949494949496, 0.9595959595959597, 0.9696969696969697, 0.9797979797979799, 0.98989898989899, 1 ], "y": [ 0, 0.005088788547027767, 0.020203367027157304, 0.044895743981044234, 0.07844334451021973, 0.1198834550978749, 0.1680582460806478, 0.2216674900311073, 0.2793256906733199, 0.3396201906279347, 0.4011669387154754, 0.4626609469078342, 0.5229190114135649, 0.5809129550096499, 0.6357924033051541, 0.686896869272013, 0.7337576262248074, 0.7760904481216537, 0.8137807507309149, 0.8468629575925162, 0.8754960372829691, 0.8999371249302158, 0.9205149750868888, 0.9376047271746013, 0.9516051351013023, 0.9629190557351788, 0.9719376396791948, 0.9790283491978901, 0.984526661586147, 0.9887311130162146, 0.9919012015313655, 0.9942575953733671, 0.9959840762521138, 0.9972306753102488, 0.9981175197836518, 0.9987389881792429, 0.9991678599250183, 0.9994592326855949, 0.999654060105525, 0.9997822303367798, 0.9998651592575744, 0.9999179115600292, 0.9999508889797247, 0.9999711398150651, 0.9999833498983898, 0.9999905747444077, 0.9999947679229374, 0.9999971536355198, 0.9999984834522727, 0.9999992092002438, 0.9999995967280992, 0.9999997990419713, 0.9999999022265811, 0.9999999535964164, 0.9999999785376255, 0.999999990336145, 0.9999999957685596, 0.9999999982004042, 0.9999999992575692, 0.9999999997032885, 0.999999999885299, 0.9999999999571787, 0.999999999984588, 0.9999999999946623, 0.9999999999982248, 0.9999999999994342, 0.9999999999998277, 0.9999999999999499, 0.9999999999999862, 0.9999999999999963, 0.9999999999999991, 0.9999999999999998, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ] }, { "line": { "color": "deeppink", "width": 6 }, "type": "scatter", "visible": false, "x": [ 0, 0.010101010101010102, 0.020202020202020204, 0.030303030303030304, 0.04040404040404041, 0.05050505050505051, 0.06060606060606061, 0.07070707070707072, 0.08080808080808081, 0.09090909090909091, 0.10101010101010102, 0.11111111111111112, 0.12121212121212122, 0.13131313131313133, 0.14141414141414144, 0.15151515151515152, 0.16161616161616163, 0.17171717171717174, 0.18181818181818182, 0.19191919191919193, 0.20202020202020204, 0.21212121212121213, 0.22222222222222224, 0.23232323232323235, 0.24242424242424243, 0.25252525252525254, 0.26262626262626265, 0.27272727272727276, 0.2828282828282829, 0.29292929292929293, 0.30303030303030304, 0.31313131313131315, 0.32323232323232326, 0.33333333333333337, 0.3434343434343435, 0.3535353535353536, 0.36363636363636365, 0.37373737373737376, 0.38383838383838387, 0.393939393939394, 0.4040404040404041, 0.4141414141414142, 0.42424242424242425, 0.43434343434343436, 0.4444444444444445, 0.4545454545454546, 0.4646464646464647, 0.4747474747474748, 0.48484848484848486, 0.494949494949495, 0.5050505050505051, 0.5151515151515152, 0.5252525252525253, 0.5353535353535354, 0.5454545454545455, 0.5555555555555556, 0.5656565656565657, 0.5757575757575758, 0.5858585858585859, 0.595959595959596, 0.6060606060606061, 0.6161616161616162, 0.6262626262626263, 0.6363636363636365, 0.6464646464646465, 0.6565656565656566, 0.6666666666666667, 0.6767676767676768, 0.686868686868687, 0.696969696969697, 0.7070707070707072, 0.7171717171717172, 0.7272727272727273, 0.7373737373737375, 0.7474747474747475, 0.7575757575757577, 0.7676767676767677, 0.7777777777777778, 0.787878787878788, 0.797979797979798, 0.8080808080808082, 0.8181818181818182, 0.8282828282828284, 0.8383838383838385, 0.8484848484848485, 0.8585858585858587, 0.8686868686868687, 0.8787878787878789, 0.888888888888889, 0.8989898989898991, 0.9090909090909092, 0.9191919191919192, 0.9292929292929294, 0.9393939393939394, 0.9494949494949496, 0.9595959595959597, 0.9696969696969697, 0.9797979797979799, 0.98989898989899, 1 ], "y": [ 0, 0.6376835024100829, 0.8700865946435332, 0.9539103039360759, 0.9838251199341029, 0.994386084127496, 0.9980734716884013, 0.9993464733913269, 0.9997809114039483, 0.9999274342840985, 0.9999762599732085, 0.9999923308407628, 0.9999975542852537, 0.9999992302925952, 0.9999997610136416, 0.999999926817179, 0.9999999779052151, 0.9999999934254539, 0.999999998072553, 0.999999999443482, 0.9999999998418061, 0.9999999999557472, 0.9999999999878226, 0.9999999999967051, 0.9999999999991238, 0.9999999999997711, 0.9999999999999413, 0.9999999999999852, 0.9999999999999963, 0.9999999999999991, 0.9999999999999998, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ] } ], "layout": { "annotations": [ { "font": { "size": 12 }, "showarrow": false, "text": "Similarity Threshold 1.00", "visible": false, "x": 0, "xanchor": "left", "xref": "x", "y": 1, "yanchor": "top", "yref": "y domain" }, { "font": { "size": 12 }, "showarrow": false, "text": "Similarity Threshold 0.99", "visible": false, "x": 0, "xanchor": "left", "xref": "x", "y": 1, "yanchor": "top", "yref": "y domain" }, { "font": { "size": 12 }, "showarrow": false, "text": "Similarity Threshold 0.95", "visible": false, "x": 0, "xanchor": "left", "xref": "x", "y": 1, "yanchor": "top", "yref": "y domain" }, { "font": { "size": 12 }, "showarrow": false, "text": "Similarity Threshold 0.92", "visible": false, "x": 0, "xanchor": "left", "xref": "x", "y": 1, "yanchor": "top", "yref": "y domain" }, { "font": { "size": 12 }, "showarrow": false, "text": "Similarity Threshold 0.79", "visible": true, "x": 0, "xanchor": "left", "xref": "x", "y": 1, "yanchor": "top", "yref": "y domain" }, { "font": { "size": 12 }, "showarrow": false, "text": "Similarity Threshold 0.55", "visible": false, "x": 0, "xanchor": "left", "xref": "x", "y": 1, "yanchor": "top", "yref": "y domain" }, { "font": { "size": 12 }, "showarrow": false, "text": "Similarity Threshold 0.45", "visible": false, "x": 0, "xanchor": "left", "xref": "x", "y": 1, "yanchor": "top", "yref": "y domain" }, { "font": { "size": 12 }, "showarrow": false, "text": "Similarity Threshold 0.14", "visible": false, "x": 0, "xanchor": "left", "xref": "x", "y": 1, "yanchor": "top", "yref": "y domain" }, { "font": { "size": 12 }, "showarrow": false, "text": "Similarity Threshold 0.01", "visible": false, "x": 0, "xanchor": "left", "xref": "x", "y": 1, "yanchor": "top", "yref": "y domain" } ], "shapes": [ { "line": { "dash": "dash", "width": 2.5 }, "type": "line", "visible": false, "x0": 1, "x1": 1, "xref": "x", "y0": 0, "y1": 1, "yref": "y domain" }, { "line": { "dash": "dash", "width": 2.5 }, "type": "line", "visible": false, "x0": 0.9862327044933592, "x1": 0.9862327044933592, "xref": "x", "y0": 0, "y1": 1, "yref": "y domain" }, { "line": { "dash": "dash", "width": 2.5 }, "type": "line", "visible": false, "x0": 0.9460576467255959, "x1": 0.9460576467255959, "xref": "x", "y0": 0, "y1": 1, "yref": "y domain" }, { "line": { "dash": "dash", "width": 2.5 }, "type": "line", "visible": false, "x0": 0.9226808345905884, "x1": 0.9226808345905884, "xref": "x", "y0": 0, "y1": 1, "yref": "y domain" }, { "line": { "dash": "dash", "width": 2.5 }, "type": "line", "visible": true, "x0": 0.7943282347242815, "x1": 0.7943282347242815, "xref": "x", "y0": 0, "y1": 1, "yref": "y domain" }, { "line": { "dash": "dash", "width": 2.5 }, "type": "line", "visible": false, "x0": 0.5492802716530588, "x1": 0.5492802716530588, "xref": "x", "y0": 0, "y1": 1, "yref": "y domain" }, { "line": { "dash": "dash", "width": 2.5 }, "type": "line", "visible": false, "x0": 0.4472135954999579, "x1": 0.4472135954999579, "xref": "x", "y0": 0, "y1": 1, "yref": "y domain" }, { "line": { "dash": "dash", "width": 2.5 }, "type": "line", "visible": false, "x0": 0.1414213562373095, "x1": 0.1414213562373095, "xref": "x", "y0": 0, "y1": 1, "yref": "y domain" }, { "line": { "dash": "dash", "width": 2.5 }, "type": "line", "visible": false, "x0": 0.01, "x1": 0.01, "xref": "x", "y0": 0, "y1": 1, "yref": "y domain" } ], "sliders": [ { "active": 4, "currentvalue": { "prefix": "Number of bands: " }, "pad": { "t": 50 }, "steps": [ { "args": [ { "visible": [ true, false, false, false, false, false, false, false, false ] }, { "annotations[0].visible": true, "annotations[1].visible": false, "annotations[2].visible": false, "annotations[3].visible": false, "annotations[4].visible": false, "annotations[5].visible": false, "annotations[6].visible": false, "annotations[7].visible": false, "annotations[8].visible": false, "shapes[0].visible": true, "shapes[1].visible": false, "shapes[2].visible": false, "shapes[3].visible": false, "shapes[4].visible": false, "shapes[5].visible": false, "shapes[6].visible": false, "shapes[7].visible": false, "shapes[8].visible": false, "title.text": "Probability of becoming a candidate given a similarity
The S-curve at 1 bands with signature size 100" } ], "label": "1", "method": "update" }, { "args": [ { "visible": [ false, true, false, false, false, false, false, false, false ] }, { "annotations[0].visible": false, "annotations[1].visible": true, "annotations[2].visible": false, "annotations[3].visible": false, "annotations[4].visible": false, "annotations[5].visible": false, "annotations[6].visible": false, "annotations[7].visible": false, "annotations[8].visible": false, "shapes[0].visible": false, "shapes[1].visible": true, "shapes[2].visible": false, "shapes[3].visible": false, "shapes[4].visible": false, "shapes[5].visible": false, "shapes[6].visible": false, "shapes[7].visible": false, "shapes[8].visible": false, "title.text": "Probability of becoming a candidate given a similarity
The S-curve at 2 bands with signature size 100" } ], "label": "2", "method": "update" }, { "args": [ { "visible": [ false, false, true, false, false, false, false, false, false ] }, { "annotations[0].visible": false, "annotations[1].visible": false, "annotations[2].visible": true, "annotations[3].visible": false, "annotations[4].visible": false, "annotations[5].visible": false, "annotations[6].visible": false, "annotations[7].visible": false, "annotations[8].visible": false, "shapes[0].visible": false, "shapes[1].visible": false, "shapes[2].visible": true, "shapes[3].visible": false, "shapes[4].visible": false, "shapes[5].visible": false, "shapes[6].visible": false, "shapes[7].visible": false, "shapes[8].visible": false, "title.text": "Probability of becoming a candidate given a similarity
The S-curve at 4 bands with signature size 100" } ], "label": "4", "method": "update" }, { "args": [ { "visible": [ false, false, false, true, false, false, false, false, false ] }, { "annotations[0].visible": false, "annotations[1].visible": false, "annotations[2].visible": false, "annotations[3].visible": true, "annotations[4].visible": false, "annotations[5].visible": false, "annotations[6].visible": false, "annotations[7].visible": false, "annotations[8].visible": false, "shapes[0].visible": false, "shapes[1].visible": false, "shapes[2].visible": false, "shapes[3].visible": true, "shapes[4].visible": false, "shapes[5].visible": false, "shapes[6].visible": false, "shapes[7].visible": false, "shapes[8].visible": false, "title.text": "Probability of becoming a candidate given a similarity
The S-curve at 5 bands with signature size 100" } ], "label": "5", "method": "update" }, { "args": [ { "visible": [ false, false, false, false, true, false, false, false, false ] }, { "annotations[0].visible": false, "annotations[1].visible": false, "annotations[2].visible": false, "annotations[3].visible": false, "annotations[4].visible": true, "annotations[5].visible": false, "annotations[6].visible": false, "annotations[7].visible": false, "annotations[8].visible": false, "shapes[0].visible": false, "shapes[1].visible": false, "shapes[2].visible": false, "shapes[3].visible": false, "shapes[4].visible": true, "shapes[5].visible": false, "shapes[6].visible": false, "shapes[7].visible": false, "shapes[8].visible": false, "title.text": "Probability of becoming a candidate given a similarity
The S-curve at 10 bands with signature size 100" } ], "label": "10", "method": "update" }, { "args": [ { "visible": [ false, false, false, false, false, true, false, false, false ] }, { "annotations[0].visible": false, "annotations[1].visible": false, "annotations[2].visible": false, "annotations[3].visible": false, "annotations[4].visible": false, "annotations[5].visible": true, "annotations[6].visible": false, "annotations[7].visible": false, "annotations[8].visible": false, "shapes[0].visible": false, "shapes[1].visible": false, "shapes[2].visible": false, "shapes[3].visible": false, "shapes[4].visible": false, "shapes[5].visible": true, "shapes[6].visible": false, "shapes[7].visible": false, "shapes[8].visible": false, "title.text": "Probability of becoming a candidate given a similarity
The S-curve at 20 bands with signature size 100" } ], "label": "20", "method": "update" }, { "args": [ { "visible": [ false, false, false, false, false, false, true, false, false ] }, { "annotations[0].visible": false, "annotations[1].visible": false, "annotations[2].visible": false, "annotations[3].visible": false, "annotations[4].visible": false, "annotations[5].visible": false, "annotations[6].visible": true, "annotations[7].visible": false, "annotations[8].visible": false, "shapes[0].visible": false, "shapes[1].visible": false, "shapes[2].visible": false, "shapes[3].visible": false, "shapes[4].visible": false, "shapes[5].visible": false, "shapes[6].visible": true, "shapes[7].visible": false, "shapes[8].visible": false, "title.text": "Probability of becoming a candidate given a similarity
The S-curve at 25 bands with signature size 100" } ], "label": "25", "method": "update" }, { "args": [ { "visible": [ false, false, false, false, false, false, false, true, false ] }, { "annotations[0].visible": false, "annotations[1].visible": false, "annotations[2].visible": false, "annotations[3].visible": false, "annotations[4].visible": false, "annotations[5].visible": false, "annotations[6].visible": false, "annotations[7].visible": true, "annotations[8].visible": false, "shapes[0].visible": false, "shapes[1].visible": false, "shapes[2].visible": false, "shapes[3].visible": false, "shapes[4].visible": false, "shapes[5].visible": false, "shapes[6].visible": false, "shapes[7].visible": true, "shapes[8].visible": false, "title.text": "Probability of becoming a candidate given a similarity
The S-curve at 50 bands with signature size 100" } ], "label": "50", "method": "update" }, { "args": [ { "visible": [ false, false, false, false, false, false, false, false, true ] }, { "annotations[0].visible": false, "annotations[1].visible": false, "annotations[2].visible": false, "annotations[3].visible": false, "annotations[4].visible": false, "annotations[5].visible": false, "annotations[6].visible": false, "annotations[7].visible": false, "annotations[8].visible": true, "shapes[0].visible": false, "shapes[1].visible": false, "shapes[2].visible": false, "shapes[3].visible": false, "shapes[4].visible": false, "shapes[5].visible": false, "shapes[6].visible": false, "shapes[7].visible": false, "shapes[8].visible": true, "title.text": "Probability of becoming a candidate given a similarity
The S-curve at 100 bands with signature size 100" } ], "label": "100", "method": "update" } ] } ], "template": { "data": { "bar": [ { "error_x": { "color": "#2a3f5f" }, "error_y": { "color": "#2a3f5f" }, "marker": { "line": { "color": "#E5ECF6", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "bar" } ], "barpolar": [ { "marker": { "line": { "color": "#E5ECF6", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "barpolar" } ], "carpet": [ { "aaxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "baxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "type": "carpet" } ], "choropleth": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "choropleth" } ], "contour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "contour" } ], "contourcarpet": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "contourcarpet" } ], "heatmap": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmap" } ], "heatmapgl": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmapgl" } ], "histogram": [ { "marker": { "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "histogram" } ], "histogram2d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2d" } ], "histogram2dcontour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2dcontour" } ], "mesh3d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "mesh3d" } ], "parcoords": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "parcoords" } ], "pie": [ { "automargin": true, "type": "pie" } ], "scatter": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatter" } ], "scatter3d": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatter3d" } ], "scattercarpet": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattercarpet" } ], "scattergeo": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergeo" } ], "scattergl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergl" } ], "scattermapbox": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattermapbox" } ], "scatterpolar": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolar" } ], "scatterpolargl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolargl" } ], "scatterternary": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterternary" } ], "surface": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "surface" } ], "table": [ { "cells": { "fill": { "color": "#EBF0F8" }, "line": { "color": "white" } }, "header": { "fill": { "color": "#C8D4E3" }, "line": { "color": "white" } }, "type": "table" } ] }, "layout": { "annotationdefaults": { "arrowcolor": "#2a3f5f", "arrowhead": 0, "arrowwidth": 1 }, "autotypenumbers": "strict", "coloraxis": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "colorscale": { "diverging": [ [ 0, "#8e0152" ], [ 0.1, "#c51b7d" ], [ 0.2, "#de77ae" ], [ 0.3, "#f1b6da" ], [ 0.4, "#fde0ef" ], [ 0.5, "#f7f7f7" ], [ 0.6, "#e6f5d0" ], [ 0.7, "#b8e186" ], [ 0.8, "#7fbc41" ], [ 0.9, "#4d9221" ], [ 1, "#276419" ] ], "sequential": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "sequentialminus": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ] }, "colorway": [ "#636efa", "#EF553B", "#00cc96", "#ab63fa", "#FFA15A", "#19d3f3", "#FF6692", "#B6E880", "#FF97FF", "#FECB52" ], "font": { "color": "#2a3f5f" }, "geo": { "bgcolor": "white", "lakecolor": "white", "landcolor": "#E5ECF6", "showlakes": true, "showland": true, "subunitcolor": "white" }, "hoverlabel": { "align": "left" }, "hovermode": "closest", "mapbox": { "style": "light" }, "paper_bgcolor": "white", "plot_bgcolor": "#E5ECF6", "polar": { "angularaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "radialaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "scene": { "xaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "yaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "zaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" } }, "shapedefaults": { "line": { "color": "#2a3f5f" } }, "ternary": { "aaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "baxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "caxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "title": { "x": 0.05 }, "xaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 }, "yaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 } } }, "title": { "text": "Probability of becoming a candidate given a similarity
The S-curve at 10 bands with signature size 100" }, "xaxis": { "title": { "text": "Jaccard Similarity of Documents" } }, "yaxis": { "title": { "text": "Probability" } } } }, "text/html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# from plotly.offline import init_notebook_mode, iplot\n", "# init_notebook_mode()\n", "\n", "import plotly.graph_objs as go\n", "import numpy as np\n", "\n", "def _get_factors(x):\n", " factors = []\n", " for i in range(1, x + 1):\n", " if x % i == 0:\n", " factors.append(i)\n", " return factors\n", "\n", "def _get_valinit(factors):\n", " index = len(factors) // 2 \n", " return factors[index]\n", "\n", "def _prob_of_s(s, b, r):\n", " \"\"\"Return the probability of similarity s given b and r\"\"\"\n", " return 1 - (1 - s**r)**b\n", "\n", "def _get_approx_thresh(b, r):\n", " \"\"\"Return approximate similarity threshold for chosen b and r\"\"\"\n", " thresh = (1/b) ** (1/r)\n", " return thresh\n", "\n", "\n", "m = 100 # signature size\n", "b_list = _get_factors(m) # list of possible b-values\n", "s_list = np.linspace(0, 1, num=100)\n", "p_lists = [np.array([_prob_of_s(s, b, r=m/b) for s in s_list]) for b in b_list]\n", "thresh_list = [_get_approx_thresh(b, r=m/b) for b in b_list]\n", "\n", "data = [\n", " go.Scatter(\n", " visible=False,\n", " line=dict(color=\"deeppink\", width=6),\n", " x=s_list,\n", " y=p_list)\n", " for p_list in p_lists\n", "]\n", "\n", "vlines = [\n", " go.layout.Shape({\n", " 'line': {'dash': 'dash', 'width': 2.5},\n", " 'type': 'line',\n", " 'visible': False,\n", " 'x0': thresh,\n", " 'x1': thresh,\n", " 'xref': 'x',\n", " 'y0': 0,\n", " 'y1': 1,\n", " 'yref': 'y domain'})\n", " for thresh in thresh_list\n", "]\n", "\n", "annots = [\n", " go.layout.Annotation({\n", " 'font': {'size': 12},\n", " 'showarrow': False,\n", " 'text': f'Similarity Threshold {thresh:.2f}',\n", " 'visible': False,\n", " 'x': 0,\n", " 'xanchor': 'left',\n", " 'xref': 'x',\n", " 'y': 1,\n", " 'yanchor': 'top',\n", " 'yref': 'y domain'})\n", " for thresh in thresh_list\n", "]\n", "\n", "\n", "\n", "mid_index = len(b_list) // 2 # start index\n", "data[mid_index][\"visible\"] = True\n", "vlines[mid_index][\"visible\"] = True\n", "annots[mid_index][\"visible\"] = True\n", "\n", "layout = go.Layout(\n", " title={'text': f\"\"\"Probability of becoming a candidate given a similarity
The S-curve at {b_list[mid_index]} bands with signature size {m}\"\"\"},\n", " annotations=annots,\n", " shapes=vlines,\n", ")\n", "\n", "fig = go.Figure(data=data, layout=layout,)\n", "\n", "steps = []\n", "for i in range(len(data)):\n", " title = f\"\"\"Probability of becoming a candidate given a similarity
The S-curve at {b_list[i]} bands with signature size {m}\"\"\"\n", " shape_args = {f\"shapes[{idx}].visible\":idx==i for idx in range(len(fig.data))}\n", " annotation_args = {f\"annotations[{idx}].visible\":idx==i for idx in range(len(fig.data))}\n", " layout_args = {\"title.text\": title}\n", " layout_args.update(shape_args)\n", " layout_args.update(annotation_args)\n", " \n", " step = dict(\n", " method='update',\n", " args=[\n", " # trace updates\n", " {'visible': [t == i for t in range(len(data))]},\n", " \n", " # layout updates\n", " layout_args,\n", " ],\n", " label=f\"{b_list[i]}\"\n", " )\n", " step[\"args\"][0][\"visible\"][i] = True\n", " steps.append(step)\n", " \n", "sliders = [\n", " go.layout.Slider(\n", " active=mid_index,\n", " currentvalue={\"prefix\": \"Number of bands: \"},\n", " pad={'t':50},\n", " steps=steps\n", " )\n", "]\n", "\n", "fig.layout.xaxis.title = 'Jaccard Similarity of Documents'\n", "fig.layout.yaxis.title = 'Probability'\n", "\n", "fig.update_layout(\n", " sliders=sliders\n", ")\n", "fig\n", "# iplot(fig)" ] } ], "metadata": { "celltoolbar": "Tags", "hide_input": false, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 5 }