# Balancing Vectors in Any Norm

## Transcript

1 2018 IEEE 59th Annual Symposium on Foundations of Computer Science Balancing Vectors in Any Norm † ∗ ‡ § , Nicole Tomczak-Jaegermann , Aleksandar Nikolov , Kunal Talwar Daniel Dadush ∗ Centrum Wiskunde & Informatica, Amsterdam, Netherlands Email: [email protected] † University of Toronto, Toronto, Ontario,Canada Email: [email protected] ‡ Google Brain Email: [email protected] † University of Alberta Edmonton, Alberta, Canada Email: [email protected] —In the vector balancing problem, we are given Abstract of combinatorial discrepancy, known as vector balancing, n R C and K symmetric convex bodies in , and our goal is which captures some of the most powerful techniques in the ≥ to determine the minimum number β , known as the 0 area, and is of intrinsic interest. K to C vector balancing constant from , such that for any In many instances, the best known Vector Balancing: sequence of vectors in C there always exists a signed combi- techniques for finding good bounds in combinatorial dis- βK . Many fundamental results in nation of them lying inside discrepancy theory, such as the Beck-Fiala theorem (Discrete crepancy were derived by working with more general vector Appl. Math ‘81), Spencer’s “six standard deviations suffice” balancing problems, where convex geometric techniques can theorem (Trans. Amer. Math. Soc ‘85) and Banaszczyk’s vector n , the R ⊆ C, K be applied. Given symmetric convex bodies balancing theorem (Random Structures & Algorithms ‘98) vector balancing constant of C into is defined as K correspond to bounds on vector balancing constants. { The above theorems have inspired much research in recent N ∥ ∥ ∑ years within theoretical computer science. In this work, we ∥ ∥ : u x C, K min ):=sup vb( ∥ ∥ i i show that all vector balancing constants admit “good” approx- N K x 1 , 1 } ∈{− =1 i imate characterizations, with approximation factors depending } . First, we show n only polylogarithmically on the dimension that a volumetric lower bound due to Banaszczyk is tight , ,...,u C ∈ ∈ ,u N N N 1 within a factor. Our proof is algorithmic, and we O (log n ) show that Rothvoss’s (FOCS ‘14) partial coloring algorithm } { 0: is the norm induced s := min sK ≥ x ∈ ‖ where ‖ x can be analyzed to obtain these guarantees. Second, we present K by K . a novel convex program which encodes the “best possible way” to apply Banaszczyk’s vector balancing theorem for bounding As an example, one may consider Spencer’s “six stan- vector balancing constants from above, and show that it is dard deviations” theorem [3], independently obtained by 2 . 5 (log tight within an O n ) factor. This also directly yields Gluskin [4], which states that every set system on points n a corresponding polynomial time approximation algorithm √ . ) n and sets can be colored with discrepancy at most n ( O both for vector balancing constants, and for the hereditary In the vector balancing context, the more general statement discrepancy of any sequence of vectors with respect to an √ n n arbitrary norm. (also proved in [3], [4]), O ( ,B )= ) n is that vb( B ∞ ∞ { } n n x ‖ 1 ≤ ‖ , : R ∈ x = B where we use the notation -Discrepancy; Convex Geometry; Gaussian mea- Keywords p p sure; M-ellipsoid; K-convexity. norm. To encode ∈ ∞ ,  , to denote the unit ball of the [1 ] p p Spencer’s theorem, we simply represent the set system using n × n I. I NTRODUCTION , where if =1 U 1 } its incidence matrix U ∈{ 0 , ji 0 and j is in set i element otherwise. Here the columns of The discrepancy of a set system is defined as the mini- n norm 1 1 ∈{− x } 1 , and thus the sign vector ,  have U ± mum, over the set of 1 colorings of the elements, of the im- ∞ √ ) n satisfying O = ‖ Ux indeed yields the desired ‖ ( elements in the 1 − and +1 balance between the number of ∞ coloring. most imbalanced set. Classical combinatorial discrepancy In fact, vector balancing was studied earlier, and indepen- theory studies bounds on the discrepancy of set systems, dently from combinatorial discrepancy. In 1963 Dvoretzky in terms of their structure. The tools developed for deriving posed the general problem of determining for a ) vb( K, K bounds on the discrepancy of set systems have found many . The more general version given symmetric convex body K applications in mathematics and computer science [1], [2], with two different bodies was introduced by Barany and from the study of pseudorandomness, to communication Grinberg [5] who proved that for any symmetric convex complexity, and most recently, to approximation algorithms n K, K . In addition to Spencer’s , n vb( ≤ ) in body R K and privacy. Here we study a geometric generalization 1 2575-8454/18/\$31.00 ©2018 IEEE DOI 10.1109/FOCS.2018.00010

3 and later by Eldan and Singh [18] who used the solution B. Prior work. of a random linear maximization problem. An important For both questions 1 and and 2 above, prior work has difference between the constructive and existential partial  discrepancy. Bounds or  mostly dealt with the case of 2 ∞ coloring methods, is that the constructive methods only to  for some p  on vector balancing constants from q p guarantee that the “uncolored” coordinates of a partial col- and have also been studied, as described earlier, but q 0 are in 1) , oring . This relaxation 1 − x ( instead of equal to without a unified approach. The question of obtaining near- seems to make the constructive methods more robust, i.e. the optimal results for general vector balancing and hereditary conditions needed for such “fractional” partial colorings are discrepancy problems has not been studied before. somewhat milder, without having noticeable drawbacks in In terms of coloring algorithms, Bansal [14] gave a most applications. partial coloring based random walk algorithm which on n m × The main alternative to the partial coloring method R ∈ U discrepancy , produces a full coloring with  ∞ √ m comes from Banaszczyk’s vector balancing theorem [8]. U U, B ) hd( U log rk( m log rk( , where )) is the rank ) O ( ∞ Banaszczyk’s method proves the existence of a full col-  . Recently, Larsen [23] gave an algorithm for the U of 2 √ m oring when K has Gaussian measure 1 , in contrast to 2 / . )) log(rk( U )) hd( U, B norm achieving discrepancy O ( 2 m Giannopoulos’s result which gives a partial coloring but U, B hd( In terms of certifying lower bounds on )) , the ∞ cn − . Banaszczyk’s method was 2 requires measure only main tool has been the so-called determinant lower bound only very recently made constructive in the sequence of of [10], where it was shown that works [19]–[21]. In particular, [20] showed an equivalence 1 /k m 1 of Banaszczyk’s theorem to the existence of certain sub- hd( U, B | ) B det( | ):=max ) max ≥ detLB( U ∞ B k 2 gaussian signing distributions, and [21] gave a random walk- . Ma- U with the maximum over of × k submatrices B k based algorithm to build such distributions. ˇ sek [24] built upon the results of [14] to show that tou A. Approximating Vector Balancing and Hereditary Dis- √ m U log rk( m log U ( O ≤ ) )) ) detLB( . U, B hd( crepancy ∞ Given the powerful tools mentioned above, a natural For certifying tight upper bounds, [25], [26] showed that the question is whether they can be extended to get nearly γ , defined by U norm of 2 optimal bounds for any vector balancing or hereditary dis- γ = ‖ ( U ‖ B ‖ ):=min {‖ AB, : U A crepancy problem. More precisely, we will be interested in 2 2 2 → →∞ 1 k m × × k n the following computational and mathematical questions: ,k ∈ N } ,B R ∈ A R ∈ N and a symmetric convex ) 1) Given vectors ( u i i =1 A ‖ where ‖ is the maximum  , A norm of any row of n →∞ 2 2 , can we (a) efficiently compute in body K R  , B norm of any column of is the maximum and ‖ B ‖ 1 2 → 2 a coloring whose -discrepancy is approximately K satisfies N ? (b) efficiently approxi- ,K ) ) u hd(( bounded by i i =1 N ) ? ,K ) hd(( mate u i γ Ω( log(rk( / ) U ( ) U detLB( ≤ ))) U =1 i 2 n √ , does ⊆ R 2) Given two symmetric convex bodies C, K m U O mγ )) ≤ ) ( ( log hd( ≤ U, B 2 ∞ C, K admit a “good” characterization? Namely, ) vb( √ are there simple certificates which certify nearly tight ( O which implies a approximation to log m log rk( U ))  ∞ ? C, K ) vb( upper and lower bounds on  , it was shown in [25] that a hereditary discrepancy. For 2 -approximation to )) yields an O (log rk( U γ relaxation of To begin, a few remarks are in order. Firstly, question 2 m hd( U, B ) . We note that part of the strategy of [25], [26] 2 can be inefficiently encoded as question 1b, by letting 2 N norm via an averaged version of  ,  is to replace the u ( . Thus “good” ) denote a sufficiently fine net of C 2 ∞ i i =1 where one optimizes over the averaging coefficients, which characterizations for hereditary discrepancy transfer over to norm by itself an easier special case. makes the  vector balancing, and, for this reason, we restrict for now 2 While at first glance it Moving to general norms.: the discussion to the former. For question 1a, one may be do not apply may seem that the above techniques for  tempted to ask whether we can directly compute a coloring ∞ N to more general norms, this is in some sense deceptive. ) ,K ) disc(( u whose K -discrepancy is approximately i i =1 N n Notwithstanding complexity considerations, every norm can ,K ) ) B = K . Unfortunately, even for hd(( instead of u i ∞ =1 i n n , where in particular be isometrically embedded into  ∈ − ) ( [ and 1 , 1] u , it was shown in [22] that it is ∞ i =1 i n n m any polyhedral norm with facets can be embedded into ) ) or ,B 0 is u NP-hard to distinguish whether disc(( i ∞ i =1 √ √ m × m N , . Vice versa, starting from a matrix U ∈ R B n ) n ) is guaranteed by Spencer’s theo- (note that O ( Ω( ∞ n with rk( U )= ,it and rank factorization U = AB rem), thus one cannot hope for any non-trivial approximation m , where ) ) = hd( B,K hd( is direct to verify that U, B guarantee in this context. ∞ n Ax n is an } 1 ≤ : ‖ -dimensional symmet- ‖ { K R ∈ x = We now discuss prior work on these questions and then ∞ N × n , one ric polytope with R U facets. Thus, for any m ∈ continue with our main results. 3

4 Comparing to prior work, our coloring and hereditary can equivalently restate the guarantees of [14] as yielding √ discrepancy approximation algorithms give uniformly better )) hd( U, K log and m n log ( O colorings of discrepancy √ (or at at least no worse) guarantees in almost every setting ( of [26] as a O U, K ) for n m ) approximation to hd( log log which has been studied. Furthermore our methods provide any n -dimensional symmetric polytope K with m facets. A a unified approach for studying discrepancy in arbitrary natural question is therefore whether there exist correspond- norms, which we expect to have further applications. ing coloring and approximation algorithms whose guaran- tees depend only polylogarithmically on the dimension of Interestingly, our results imply a tighter relationship be- the norm and not on the complexity of its representation. tween vector balancing and hereditary discrepancy than The upper and lower bound tools mentioned above (the one might initially expect. We observe that neither the norm, respectively) are determinant lower bound and the γ volumetric lower bound we use nor our factorization based 2 insufficient for this task, as the polylogarithmic dependence upper bound “see” the difference between vector balanc- m in the bounds is known to be inherent. on ing constants and hereditary discrepancy. More precisely, We note that polynomial bounds in can K for general n both bounds remain invariant when replacing by hd( U, K ) be achieved by simply approximating K by a sandwich- . This has the relatively non- ) ,K : i ∈ [ N ] } u vb(conv {± i √ ⊆ ing ellipsoid E ⊆ K nE and applying the corre- obvious implication that √ , which yield O coloring ) n log ( n  sponding results for 2 √ ,K : ) ] N [ ∈ i } ) U, K hd( u {± vb(conv ≤ log n n ) approximations guarantees respectively. O ( and i Interestingly, these guarantees are identical to what can be (1) O ≤ (log n ) hd( U, K ) . n achieved by replacing K by a symmetric polytope with 3 We believe it is an interesting question to understand whether 2 , and applying the facets, giving a sandwiching factor of a polylogarithmic separation indeed exists between the above  results. ∞ quantities (we are currently unaware of any examples), as II. R ESULTS it would give a tangible geometric obstruction for tighter approximations. Our main results are that polylogarithmic approximations to hereditary discrepancy and vector balancing constants ROOFS P UTLINE OF THE III. O with respect to arbitrary norms are indeed possible. In n × N Starting with hereditary discrepancy, to push beyond the and a symmetric convex body R particular, given U ∈ n limitations of prior approaches the first two tasks at hand are: (specified by an appropriate oracle), we give ⊆ R K (1) find a stronger lower bound and (2) develop techniques randomized polynomial time algorithms for computing col- to avoid the “union bound”. Fortunately, a solution to the and approximating hd( n (log O orings of discrepancy )) U, K 2 5 . first problem was already given by Banaszczyk [27], which n ) factor. Furthermore, if K ) U, K O up to an (log hd( we present in slightly adapted form below. is a polyhedral norm with at most facets, our approx- m imation algorithm for U, K ) always achieves a tighter hd( ,...,u ) ∈ =( u Lemma 1 (Volume Lower Bound) . Let U N 1 bound, and hence gives an γ approximation factor than the n × N n 2 } { √ K and be a symmetric convex body. For S ⊆ R ⊆ R 2 . 5 m, n log ) approximation. To achieve log n log (min O consisting of the denote the submatrix of U N ] , let U [ S these results, we first show that Rothvoss’ partial coloring columns indexed by S .For k ∈ [ n ] , define algorithm [17] is nearly optimal for general hereditary h h N discrepancy by showing near-tightness with respect to a ,K ( U, K ) ) := volLB ) u (( volLB i k k i =1 volumetric lower bound of Banaszczyk [27]. Second, we − 1 k /k ∈ { x ∈ . R (2) : U x vol } ) ( K := max k S show that the “best possible way” to apply Banaszczyk’s S k S ⊆ [ N ] , | = | vector balancing theorem [8] for the purpose of bounding Then, we have that U, K ) from above can be encoded as a convex program, hd( 2 . 5 n ) (log O and prove that this bound is tight to within an h h N ) (( u ( ) U, K ) := volLB ,K volLB i =1 i factor. As a consequence, we show that Banaszczyk’s the- h := max (3) . ) U, K hd( ≤ ) U, K ( volLB orem is essentially “universal” for vector balancing. To k ∈ k n [ ] analyze these approaches we rely on a novel combination of A formal proof of the above is given in the full version, tools from convex geometry and discrepancy. In particular, and follows from the argument in [27]. At a high level, we give a new way to prove lower bounds on Gaussian the proof is a simple covering argument, where it is argued measure using only volumetric information, which could be k 1] that for any subset S , | S | = k , every point in [0 , is of independent interest. Furthermore, we make a natural k under the norm ) } at distance at most hd( U, K 1 from { 0 , geometric conjecture which would imply that Rothvoss’ } { k ∈ x . Equivalently an U : K ∈ x := C induced by R algorithm is (in a hereditary sense) optimal for finding partial S k hd( placed around the points of ) scaling of C U, K { 0 , 1 } colorings in any norm, and prove the conjecture for the k 1] , and hence by a standard lattice argument must . cover [0 , special case of  2 4

5 k inequality (1). This invariance is proved in the full version , 1] have volume at least that of , namely 1 . This yields [0 by showing that the volume lower bound is convex in each the desired lower bound after rearranging. u , and hence maximized at extreme points. i We note that the volume lower bound extends in the Our proof of Theorem 2 is algorithmic, and relies on iter- obvious way to vector balancing. In particular, for two n ated applications of Rothvoss’s partial coloring algorithm. , we define symmetric convex bodies C, K ⊆ R We now explain our high level strategy as well as the h k volLB ) , u ):=supvolLB(( C, K ( ) ,K i =1 i differences with respect to prior approaches. For simplicity of the presentation, we shall assume that k ] with the supremum over and sequences u ∈ n [ ∈ ,...,u 1 k × n n ,...,e ∈ R and that the volume lower bound ) =( U e 1 n . Then we have C h n )=1 (( e ) ,K . This can be (approximately) volLB i i =1 h volLB ) ( . C, K C, K (4) vb( ≥ ) achieved by applying a standard reduction to the case where into U K , “folding” n ≤ N is non-singular, so , and U This lower bound can be substantially stronger than the appropriately guessing the volume lower bound.  determinant lower bound for discrepancy. As a simple ∞ n := ] let ⊆ S , subset [ any K n For 2 × n S be the matrix having a row for each R example, let U ∈ n ∈ x { x K : ∈ ∀ the denote } S \ =0 i ] [ n coordinate i , the determinant . Since U has rank n } 1 , 1 {− vector in section of K S . Since the vectors of U now induced by [ . lower bound is restricted to k × k ] n matrices for ∈ k correspond to the coordinate basis, it is direct to verify that k Hadamard’s inequality implies that for any k × matrix √ √ 1 /k h /k 1 − n ≤ n ≤ . k B det( 1 | | ) ± B with entries we have volLB K vol )= max ) ) . ,K e ( (( S i k i =1 n [ n ] ,k := | S | S ⊆ , x ∈ R A moment’s thought, however, reveals that for ∈{− 1 , 1 } must ‖ and hence any coloring x = ‖ x Ux ‖ ‖ h ∞ 1 n In particular, the assumption volLB e )=1 ) im- ,K (( i =1 i . Using the previous logic, the n = ‖ ‖ x have discrepancy 1 plies that volume lower bound yields by standard estimates vol (5) ⊆ ∀ , n ] . S 1 ≥ ) ( K [ S | S | /n m 1 − n volLB( U, B 1 vol { x ∈ R ≥ : ‖ x ‖ ) ≤ ( } ) n ∞ 1 Under this condition, our goal can now be stated as finding − /n /n n 1 n 1 ≥ =( n ! / 2 n/ ) ) =vol ( , ) e (2 B n n 1 n ∈ O (log ) K . ∈{− 1 x , 1 } a coloring When K is a symmetric polytope with m facets, the which is essentially tight. algorithm by Bansal [14] uses a “sticky” random walk inside n A. From Volume to Coloring with increments computed via an SDP. The SDP is 1] , [0 The above example gives hope that the volume lower used to guarantee that the variance in the normal direction 2 n bound can circumvent a dependency on the facet complexity ) ) , while the ,K of any facet is at bounded by e hd(( s i =1 i of the norm. Our first main result shows that indeed this is variance in the direction of any (active) coordinate directions the case: is at least , for a small parameter s . As this only gives s probabilistic error guarantees for each constraint in isolation, Theorem 2 For (Tightness of the Volume Lower Bound) . a union bound is used to get a global guarantee, incurring N n × n √ any R ∈ U and symmetric convex body K ,we R in dependence in the final bound. log m ) ( O the have that To avoid the “union bound”, we instead use Rothvoss’s h h partial coloring algorithm, which simply samples a random volLB ) ≤ O (log( n ( U, K ) ≤ hd( . )) U, K ( U, K ) volLB n and computes the closest point in R ∈ X Gaussian vector Furthermore, there exists a randomized polynomial time n as the candidate 1] in K ∩ Euclidean distance 1 X [ , x − to algorithm that computes a coloring of -discrepancy K with U K has “large enough” Gaussian partial coloring. As long as h volLB O (log n )) , given a membership oracle for K . ( U, K measure, Rothvoss shows that x has at least a constant ± . While this method can 1 fraction of its components at We note that the above immediately implies the corre- in better leverage the geometry of K than Bansal’s method sponding approximate tightness of the volume lower bound ), (in particular, it does not need an explicit description of K for vector balancing. The above bound can also be shown it is apriori unclear why Gaussian measure should be large to be tight. In particular, the counterexample to the 3- enough in the present context. discrep-  permutations conjecture from [28], which has ∞ Our main technical result is that if all the coordinate Ω(log ) , can be shown to have volume lower bound ancy n have volume at least K sections of (i.e. condition (5)), 1 . The computations for this are somewhat technical, (1) O K of dimension close then there indeed exists a section of so we defer a detailed discussion to the full version. As , whose Gaussian measure is “large” after an appropriate n to mentioned previously, an interesting property of the vol- δ 1) , scaling. Specifically, we show that for any , there ∈ (0 ume lower bound is its invariance under taking convex h h such that the n exists a subspace H of dimension (1 − δ ) U {± ) (conv U, K ( . )=volLB ,K } hulls, namely volLB δn − O (1 /δ ) ( (the ∩ H ) is at least 2 K 2 Gaussian measure of In combination with Theorem 2, this establishes the claimed 5

6 exact statement is given in the full version). We sketch the In particular, the above implies that the geometric average ideas in the next subsection. (corresponding of the lengths of the shortest δn E axes of with Gaussian mea- K The existence of a large section of E ), must be at least to the minimum volume section of √ /δ (1 O − ) sure which is not too small in fact suffices to run Rothvoss’s 2 since the ball of volume 1 in dimension δn n √ partial coloring algorithm. Conveniently, one does not need (1 δn ) axes n ) δ − . But then, the longest has radius Ω( √ /δ − O (1 ) to know the section explicitly, as its existence is only used in 2 n . This completes the proof all have have length the analysis of the algorithm. Since condition 5 is hereditary, sketch. on we can now find partial colorings of (1) K -discrepancy O C. The Discrepancy of Partial Colorings any subset of coordinates. Thus, applying O (log n ) partial Our analysis of Rothvoss’s algorithm opens up the tanta- coloring phases in the standard way yields the desired full lizing possibility that it may indeed be optimal for finding coloring. partial colorings in a hereditary sense. More precisely, we A useful restatement of the above is that Rothvoss’s conjecture that if when run on an instance with norm ball U algorithm can always find partial colorings with discrepancy , the algorithm almost always produces partial colorings K O (1) times the volume lower bound. with K -discrepancy at least D , then there exists a subset of B. Finding a section with large Gaussian measure. such that every partial coloring of U of the columns of S U S We now sketch how to find a section of K of large . The starting point for this conjecture Ω( has discrepancy ) D h ≥ ) K ( Gaussian measure under the assumption that vol S is our upper bound of (volLB O )) U, K , on the discrepancy ( | S | n ⊆ . The main tool we require is the M-ellipsoid [ ] 1 , ∀ S of the partial colorings the algorithm computes. It remains is an from convex geometry [29]. The M-ellipsoid E of K to show that the volume lower bound is also a lower bound ellipsoid which approximates K well from the perspective on the hereditary discrepancy of colorings. We now partial ) n ( O translates of E suffice to cover K 2 of covering, that is provide a purely geometric conjecture, which would imply and vice versa. the above “hereditary optimality” for Rothvoss’s algorithm. The main idea is to use the volumetric assumption to As in the last subsection, we may assume that n (0 ∈ δ , for E − n ) δ (1 show that the longest 1) , axes of ,...,e ) is the standard basis of R and that =( e U n 1 √ (1 O ) /δ − n n , and then 2 of our choice, have length at least ) ,K )=1 . To prove the conjecture, it suffices e volLB(( i i =1 use the subspace generated by these axes for the section ⊆ S n ] of coordi- [ to show that there exists some subset ) /δ (1 O H K , we have that a of we use. On this subspace 2 K nates, such that all partial colorings have -discrepancy √ ball, and thus by the scaling of contains the H ∩ n E . For concreteness, let us ask for partial colorings Ω(1) /δ ) n ( O (1 O ) ( cover ) translates of 2 K ∩ H covering estimate 2 S | | 2 / coordinates (the precise constant which color at least √ √ n n has Gaussian measure H ball on n ball. Since the the bounds( x )= , define 1 − [ ∈ x will not matter). For 1] , / 2 at least 1 , the prior covering estimate indeed implies that 1 , 1 }} . With this notation, our goal is to ∈{− { x ]: n [ ∈ i i n ( O ) − ) /δ (1 O S ∩ K ( , noting that 2 H has Gaussian measure ) 2 |≥ ) x bounds( | , ] ∈ find S ⊆ [ n , such that 1] , 1 − [ ∀ x ∑ O ) /δ (1 ∩ K ( away from the origin only reduces ) H shifting 2 Ω(1) ≥ ‖ e x . | | S ‖ , 2 / i i i K S ∈ Gaussian measure. Using an M-ellipsoid with appropriate We explain the candidate geometric obstruction to low ∩ K regularity properties (as in [30]), one can scale H by discrepancy partial colorings, which is a natural generaliza- O ) /δ (1 factor so that the preceding argument yields another 2 discrepancy.  tion of the so-called spectral lower bound for 2 − δn 2 Gaussian measure at least . Assume that for some subset ] n [ ⊆ S , we have that We now explain why the − δ ) n longest axes of E (1 √ S K S | (6) | , c ⊆ B S are indeed long enough. By the covering estimates, for any 2 satisfy K and S S [ n ] , ⊆ | = δn , the sections E | n S S S B := ( ) , for some constant c> 0 . Since any where B S 2 2 S − − O (1 /δn 1 1 (1 O ) ) /δ /δn /δ , bounds( x ) |≥| | | / 2 , clearly S x 1 partial coloring [ 1] , ∈ − ) , ( 2 ≥ ≥ 2 E vol ) ( K vol S δn S δn √ / S | , we must have that | ≥ 2 ‖ x ‖ has 2 where the last inequality is by assumption. Using a form ∥ ∥ ∥ ∥ ∑ ∑ 1 ∥ ∥ ∥ ∥ of the restricted invertibility principle for determinants (see √ x ≤ . x e e (7) ≤ √ ∥ ∥ ∥ ∥ i i i i S K B | | c S 2 c S e.g. the full version of [31]), one can show that if all 2 S i S ∈ i ∈ coordinate sections of E of dimension δn have large volume, has discrepancy at In particular, every partial coloring on S E then so does every section of of the same dimension. 1 √ , as desired. =Ω(1) least 2 c Precisely, one gets that Given the above, we may now reduce the conjecture to /δn 1 the following natural geometric question: vol ( ∩ W ) ≥ E min δn δn dim( )= W . Conjecture 3 (Restricted Invertibility for Convex Bodies) ) ( − /δn 1 n ) 1 /δ /δn 1 − ( O There exists an absolute constant , such that for any 1 ≥ c 2 vol min ≥ E . ) ( S δn n δn δn | S = | of volume N and symmetric convex body K ⊆ R n ∈ 6

7 ε at most S ⊆ [ n ] , S = ∅ , such that K ,...,ε ⊆ ∈{− 1 , 1 } such that , there exists 1 N S 1 √ S S | . B | c 2 ∥ ∥ N ∥ ∥ ∑ ∥ ∥ To see that this indeed implies the required statement, note v 5 ⇐⇒ ≤ ε ∥ ∥ i i n ∥ ∥ volLB(( e that if ) , then by definition there exists ,K )=1 i =1 i i =1 K ∥ ∥ K . Now applying ) ≤ ( 1 [ vol , such that 1 |≥ A | , A n ⊆ ] N A ∥ ∥ | A | ∑ √ ∥ ∥ +1 /q 1 /p /q 1 1 − 2 / K the above conjecture to yields the desired result. ≤ ,n max } n u ε 5 L q { . A ∥ ∥ i i ∥ ∥ Two natural relaxations of the conjectures are to ask (1) i =1 q does it hold for ellipsoids and (2) does it hold for general In other words, we have that sections instead of coordinate sections? Our main evidence √ for this conjecture is that indeed both these statements are /q 2 − 1 /p +1 / 1 /q 1 n n B vb( q ,n L 5 ≤ ) } ,B max n { . p q true. We note that (1) indeed implies the optimality of  Rothvoss’s partial coloring algorithm for discrepancy. 2 The volume lower bound (Lemmas 1) can be used to √ Our results here are slightly stronger than (1)+(2), as we in O ( show that this bound is tight up to the factor. q ) n some sense manage to get “halfway there” with coordinates ,...u u contains n vectors B Indeed one can show that 1 n p sections, by working with the M-ellipsoid, and only for the such that the matrix U := ( u ,...,u ) has determinant 1 n 2 1 1 − 1 / − /p last step do we need to resort to general sections. We note 1 max { } (see [33] or [31]). By ,n e ≥ ) U det( n 1 /n 1 /q that the above conjecture is closely related to the Bourgain- vol( standard estimates, B ) ≥ for an absolute cn q Tzafriri restricted invertibility principle [32], and indeed our c> . Plugging these estimates into Lemma 1 constant 0 /p 1 +1 /q 2 − /q 1 ′ 1 / proof for ellipsoids reduces to it. c ≥ ) ,n } ,B n { max for a shows vb( B q p ′ . > 0 c constant D. A Factorization Approach for Vector Balancing. It is easy to see that, unlike the example above, in gen- and K and applying Banaszczyk’s eral simply rescaling C While Theorem 2 gives an efficient and approximately theorem to the rescaled bodies may not give a tight bound optimal method of balancing a given set of vectors, it does . However, we will show that we can get ) vb( on C, K not give an efficiently computable tight upper bound on such tight bounds if we expand the class of transformations the vector balancing constant or on hereditary discrepancy. we allow on C and K from simple rescaling to arbitrary Even though we proved that, after an appropriate scaling, linear transformations. It turns out that the most convenient the volume lower bound also gives an upper bound on the language for this approach is that of linear operators between vector balancing constant, we are not aware of an efficient normed spaces. We can generalize the notion of a vector algorithm for computing the volume lower bound, which balancing constant between a pair of convex bodies to is itself a maximum over an exponential number of terms. U n Y between two - arbitrary linear operators : X → To address this shortcoming, we study a different approach , , and Y ‖·‖ dimensional normed spaces X , with norm to vector balancing which relies on applying Banaszczyk’s X , as follows with norm ‖·‖ theorem in an optimal way in order to get an efficiently Y computable, and nearly tight, upper bound on both vector ∥ ∥ N ∑ ∥ ∥ balancing constants and hereditary discrepancy. ∥ ∥ vb( U )=sup min , x ( U ε (8) ) i i ∥ ∥ Recall that Banaszczyk’s vector balancing theorem states 1 ∈{− } ,...,ε 1 ε , 1 N Y i =1 that if a body K 1 / 2 has Gaussian measure at least , n ,K ) . In order to apply the theorem to 5 ≤ then vb( B } = { x : ‖ x ‖ is the unit ball of ≤ 1 B where 2 X X of small Gaussian measure, we can use rescaling. bodies K X N and , and the supremum is over positive integers r is the smallest number such that the In particular, if ,...,x . This definition is indeed a B ∈ x sequences 1 X N 1 , then the theorem tells us Gaussian measure of is rK C K are two generalization of the geometric one. If and 2 n n ,K . A natural way to use this upper ) 5 r ≤ vb( that B , and we define centrally symmetric convex bodies in R 2 n n is to find a mapping B different from C bound for bodies X the corresponding normed spaces and R ) =( , ‖·‖ 2 C C n n , and then use the theorem as above. As of C into B ) ) ‖·‖ , I R =( vb( , then the vector balancing constant X 2 K K an illustration of this idea, let us see how we can get → recovers X X of the formal identity operator I : C K n n balls)  and  (the ,B ) B vb( nearly tight bounds on vb( C, K . However, the more abstract setting makes it plain ) p q p q by applying Banaszczyk’s theorem. Let us take an arbitrary that a simple rescaling is not the right approach to applying n ,...,u B ∈ , and rescale them to u sequence of points is an arbitrary X Banaszczyk’s theorem to arbitrary norms: if N 1 p /p 1 − 1 / 2 n v define new points ,n := u / max { 1 } . The rescaled may not be defined on the same vector B X and norm, then i i 2 n n lie in and we can apply Banaszczyk’s B ,...,v v points does not B so that it is a subset of space, and rescaling B 1 N X 2 2 √ /q n 1 B , qn theorem to them and the convex body L K := even make sense. Instead, when dealing with general norms, q 1 n which has Gaussian measure at least via a linear B as long as we choose into B it becomes very natural to embed X 2 2 n n . Our approach is so that T ( B ) ⊆ B map T : X →  to be a large enough constant. We get that there exist signs L X 2 2 7

8 λ ( U ) is based on this idea, and, in particular, on choosing such a map optimally. T ) inf f ( A  -norm, which has To formalize the above, we use the s.t. been extensively studied in the theory of operator ideals, ∗ and in asymptotic convex geometry (see e.g. [34]–[36]). For , ‖ A ‖≤ 1 : X → X A n n -dimensional normed → into an Y a linear operator S :  2  . A 0 -norm of S  , the is defined as space Y ‖·‖ with norm Y ∗ ‖ is the operator is the dual space of , and X A ‖ Above, X ( ) ∫ 1 / 2 norm. The first constraint is equivalent to the constraint 2 S  ):= ( ) ‖ S ( ‖ x dγ x ( ) , ‖≤ where T U = ST is the factorization in the definition 1 ‖ n Y λ ) . The last constraint says that of A U should be positive ( ∗ T definite, which is important so that A can be written as T n γ where Z R is the standard Gaussian measure on . I.e., if n and f ( A ) is well-defined. n )= (  , then S is a standard Gaussian random variable in R We utilize this convex formulation and Lagrange duality 2 1 / 2 . It is direct to verify that ( ) · ) is a norm on  ‖ E S ( Z ) ( ‖ Y to derive a dual formulation of ) as a supremum over ( U λ n , for any normed Y to  the space of linear operators from 2 “dual certificates”. Such a formulation is useful in approx- as above. The reason the -norm is useful to us is  Y space ( ) λ in terms of U vb( imately characterizing ) U because it for which the set r the fact that the smallest ∈ { = K x reduces our task to relating the dual certificates to the terms n 2 ≤ r } has Gaussian measure at least 1 / : is ‖ Sx ‖ R Y in the volume lower bound (2). If we can show that every ( approximately  S ) , due to the concentration of measure dual certificate bounds from below one of the terms of the phenomenon. volume lower bound (up to factors polynomial in log n ), then λ , We now define our main tool: a factorization constant also bounds the volume lower ) U we can conclude that ( λ which, for any two n X and Y -dimensional normed spaces ) U vb( bound from below, and therefore as well. X → Y is defined by : U and an operator Before we can give the dual formulation, we need to ∗  -norm, defined via trace of the introduce the dual norm  n n : T : X →   λ { ( S ( ) ‖ T ‖ ):=inf U : → } ST = Y,U . ,S  n 2 2 , let → duality: for any linear operator  R : Y 2 ∗ n 1 ( tr( . RS ): ):=sup S :  R } → Y, ( S ) ≤ {   λ ( U ) is the minimum of In other words, ( S ) ‖ T ‖ over 2 n all ways to factor U through  = T ‖ U ST . Here ‖ as 2 ∗ form a dual pair, and in particular we The norms  and  . This x ‖ / } ‖ is the operator norm, equal to Tx ‖ max {‖ 2 X have definition captures an optimal application of Banaszczyk’s ∗ n theorem. Using the theorem, it is not hard to show that ≤ ) R ( } , . 1 (  → Y S R ): RS tr(  { )=sup : 2 . Our main result for an absolute constant ) U ( Cλ C ) U vb( ≤ For a finite dimensional space Y , both suprema above are is showing are in fact equal up to a factor ) U ( λ and ) U vb( achieved. . To prove this, we formulate log n which is polynomial in The derivation of our dual formulation uses standard tools, λ ( as a convex minimization problem. Such a formulation ) U but is quite technical due to the complicated nature of the is important both for our structural results, which rely on X . We give the formulation for norms function ) A ( f such Lagrange duality, and also for giving an algorithm to com- ± = conv {± x ,..., . This is without loss x } that B m X 1 pute λ ( U ) efficiently, and, therefore, approximate vb( U ) of generality since every symmetric convex body can be efficiently, which turns out to be sufficient to approximate approximated by a symmetric polytope (but has implications hereditary discrepancy in arbitrary norms. for the complexity of our algorithms). The dual formulation as an opti- U ( The most immediate way to formulate λ ) − 1 is as follows: over operators )  ( UT mization problem is to minimize n m ∑ T  → X : and subject to the constraint ‖ T ‖≤ 1 . 2 ∗ 1 / 3 / 3 ∗ 2 x ) ) R U ) p ⊗ x sup tr(( ( RU i i i T : Unfortunately, this optimization problem is not convex in =1 i the value of the objective function is finite for any nonzero ∗ n 1 , ) R ≤ → Y : R 1 ( s.t.  2 − , for example. The +( ( T )) T 0= , but infinite for T 2 m ∑ key observation that allows us to circumvent this issue is ,p ,...,p ≥ 0 . =1 p 1 i m that the objective function is completely determined by the =1 i ∗ ∗ , and is in fact convex in T is . Here T A operator A := T ⊗ is the rank-1 operator from the dual space x Above x the dual operator of T . We use f ( A ) to denote this objective i i ∗ ∗ ∗ 1 − x x 〉 )( 〈 x ⊗ )= x . x to X , given by ( ,x X ) where T is an operator such function, i.e. to denote  ( UT i i i i ∗ . In the full version we prove that this function T = A that T We relate the volume lower bound to this dual via deep ∗ -convexity), K and the  norms ( is well-defined and convex. Then, our convex formulation of  inequalities between the 8