### What’s new

 30.09.16 The official opening.

## Overview

### Introduction

The Video­Com­ple­tion pro­ject in­tro­duces the first bench­mark for video-com­ple­tion meth­ods. We pre­sent re­sults for dif­fer­ent meth­ods on a range of di­verse test se­quences which are avail­able for view­ing on a player equipped with a mov­able zoom re­gion. Ad­di­tion­ally, we pro­vide the re­sults of an ob­jec­tive analy­sis us­ing qual­ity met­rics that were care­fully se­lected in our study on video-com­ple­tion per­cep­tual qual­ity. We be­lieve that our work can help rank ex­ist­ing meth­ods and as­sist de­vel­op­ers of new gen­eral-pur­pose video-com­ple­tion meth­ods.

### Data set

Our cur­rent data set con­sists of 7 video se­quences with ground-truth com­ple­tion re­sults. We con­sider ob­ject re­moval, so the test se­quences are con­structed by com­pos­ing var­i­ous fore­ground ob­jects over a set of back­ground videos. Some of these back­ground videos in­clude left-view se­quences from the stereo­scopic-video data set RMIT3dv [1]. As fore­ground ob­jects we use those em­ployed in the video-mat­ting bench­mark [2] as well as sev­eral 3D mod­els. To seam­lessly in­sert a 3D model in a back­ground video we use Blender [3] mo­tion-track­ing tools. Each video-com­ple­tion method takes the com­pos­ited se­quence and the cor­re­spond­ing ob­ject mask as in­put.

A 3D model inserted in the background video using motion tracking to construct a test sequence with ground truth.

### Evaluation Methodology

Video-com­ple­tion re­sults are sel­dom ex­plic­itly ex­pected to ad­here to ground truth and are usu­ally judged only by their plau­si­bil­ity, which is as­sessed by a hu­man ob­server. It makes ob­jec­tive qual­ity as­sess­ment of video com­ple­tion an in­her­ent prob­lem. How­ever, by re­lax­ing the re­quire­ment of com­plete ad­her­ence to ground truth we can in­crease cor­re­la­tion with per­cep­tual com­ple­tion qual­ity. This bench­mark em­ploys four qual­ity met­rics: MS-DSSIM, MS-DSSIMdt, MS-CDSSIM, MS-CDSSIMdt. Thor­ough de­scrip­tion and com­par­a­tive analy­sis of these and other met­rics can be found in our pa­per (to be pub­lished soon).

MS-DSSIM met­ric mea­sures ad­her­ence of com­ple­tion re­sult V to ground truth video Vref in a multi-scale fash­ion with scale weights de­ter­mined us­ing per­cep­tual qual­ity data. It is based on the struc­tural sim­i­lar­ity in­dex (SSIM) [4] val­ues com­puted for all 9×9 lu­mi­nance patches P(x) within the spa­tio-tem­po­ral hole $\Omega$.

Here su­per­script i de­notes the level of the Gauss­ian pyra­mid—that is, Vref0 is the orig­i­nal ground-truth video, and Vref1 is the video blurred and sub­sam­pled by a fac­tor of two in both spa­tial di­men­sions.

MS-DSSIMdt met­ric cap­tures tem­po­ral co­herency along ground-truth op­ti­cal-flow vec­tors $s_x = (vx,vy,-1)$.

MS-CDSSIM re­lies on the as­sump­tion that com­ple­tion re­sult should be lo­cally sim­i­lar to the ground truth—that is, each patch P(x) within the spa­tio-tem­po­ral hole $\Omega$ should have a sim­i­lar ground-truth patch Pref(y).

MS-CDSSIMdt is a tem­po­ral sta­bil­ity met­ric that uses the same as­sump­tions as MS-CDSSIM. Es­sen­tially it cap­tures the changes in patch ap­pear­ance from frame to frame, as op­posed to eval­u­at­ing con­sis­tency with ground-truth op­ti­cal flow us­ing MS-DSSIMdt. To do so we find for a given patch the most sim­i­lar patch from the pre­vi­ous frame within a cer­tain win­dow, com­pute the dis­tances from these patches to the most sim­i­lar ground-truth patches and then com­pare the re­spec­tive dis­tances.

Here $\Omega^{w\times w}_{prev}(x)$ de­notes a square win­dow of $w\times w$ pix­els (we use $w$ equal to 1/​10th of the frame width) spa­tially cen­tered at $x$ and lo­cated in the pre­vi­ous frame.

Ex­act com­pu­ta­tion of MS-CDSSIM and MS-CDSSIMdt quickly be­comes im­prac­ti­cal for larger spa­tio-tem­po­ral holes, so we re­sort to ap­prox­i­mate so­lu­tions based on the Patch­Match [5] al­go­rithm.

### Participate

We in­vite de­vel­op­ers of video-com­ple­tion meth­ods to use our bench­mark. We can eval­u­ate the sub­mit­ted data and re­port qual­ity scores to the de­vel­oper. In cases where the de­vel­oper specif­i­cally grants per­mis­sion, we will pub­lish the re­sults on our site. The test se­quences with the re­spec­tive com­ple­tion masks are avail­able for down­load: Deck, Li­brary, Foun­tain, Wires, Tower, Sky­scrap­ers, Sign.

For eval­u­a­tion re­quests or if you have any ques­tions or sug­ges­tions please feel free to con­tact us by email: abokov@graph­ics.cs.msu.ru.

## Evaluation

 rank Deck Library Fountain Wires Tower Skyscrapers Sign Objective metric values Background Reconstruction+ [6] 1.9 0.2211 0.1922 0.2174 0.0701 0.0902 0.1082 0.0831 PFClean Remove Rig+ [7] 2.7 0.3074 0.1871 0.0771 0.0942 0.1434 0.1634 0.1063 Planar Structure Guidancei [8] 5.4 0.3185 0.6035 0.6826 0.2406 0.1775 0.3025 0.4386 Nuke F_RigRemoval+ [9] 2.0 0.2912 0.2113 0.0782 0.1203 0.0681 0.0911 0.1042 Telea Inpaintingi [10] 4.7 0.3336 0.6236 0.6145 0.2065 0.1413 0.1333 0.3675 Complex Scenesm [11] 4.3 0.3073 0.2524 0.1163 0.1624 0.1956 0.3556 0.2374 Background Reconstruction+ [6] 1.7 0.0131 0.0051 0.0364 0.0071 0.0072 0.0092 0.0041 PFClean Remove Rig+ [7] 2.9 0.0183 0.0072 0.0031 0.0113 0.0135 0.0164 0.0092 Planar Structure Guidancei [8] 6.0 0.1566 0.3016 0.4586 0.0926 0.0676 0.1456 0.1866 Nuke F_RigRemoval+ [9] 2.9 0.0205 0.0124 0.0042 0.0124 0.0031 0.0061 0.0093 Telea Inpaintingi [10] 4.4 0.0132 0.0925 0.1975 0.0165 0.0104 0.0195 0.0465 Complex Scenesm [11] 3.1 0.0184 0.0093 0.0083 0.0112 0.0093 0.0163 0.0214 Background Reconstruction+ [6] 2.0 0.1181 0.0563 0.1194 0.0401 0.0552 0.0812 0.0431 PFClean Remove Rig+ [7] 2.4 0.1414 0.0451 0.0201 0.0562 0.0883 0.1224 0.0462 Planar Structure Guidancei [8] 5.9 0.2006 0.2936 0.4096 0.1586 0.1186 0.2275 0.2886 Nuke F_RigRemoval+ [9] 2.4 0.1403 0.0754 0.0272 0.0723 0.0411 0.0691 0.0503 Telea Inpaintingi [10] 4.6 0.1835 0.2865 0.3135 0.1305 0.0924 0.1023 0.2125 Complex Scenesm [11] 3.7 0.1192 0.0532 0.0283 0.0904 0.1175 0.2786 0.0994 Background Reconstruction+ [6] 1.9 0.0151 0.0072 0.0244 0.0091 0.0092 0.0102 0.0071 PFClean Remove Rig+ [7] 2.0 0.0172 0.0071 0.0041 0.0112 0.0153 0.0133 0.0092 Planar Structure Guidancei [8] 6.0 0.0716 0.1056 0.1286 0.0606 0.0396 0.0816 0.0916 Nuke F_RigRemoval+ [9] 2.6 0.0204 0.0154 0.0062 0.0143 0.0081 0.0091 0.0113 Telea Inpaintingi [10] 4.9 0.0235 0.0655 0.0965 0.0275 0.0195 0.0164 0.0495 Complex Scenesm [11] 3.7 0.0183 0.0113 0.0083 0.0184 0.0174 0.0235 0.0214

+ regions that weren't reconstructed by the algorithm were filled afterwards using Telea image inpainting [10]

m owing to prohibitively high memory consumption the test sequences were downscaled to 1280×720 resolution

i image inpainting algorithms

1. Deck
2. Library
3. Fountain
4. Wires
5. Tower
6. Skyscrapers
7. Sign
1. Source
3. BGR [6]
4. PFClean [7]
5. Planar [8]
6. RigRemoval [9]
7. Telea [10]
8. Complex [11]
0 %