{"id":568,"date":"2020-09-04T14:46:33","date_gmt":"2020-09-04T06:46:33","guid":{"rendered":"https:\/\/blog.espnlol.com\/?p=568"},"modified":"2022-04-20T18:21:20","modified_gmt":"2022-04-20T10:21:20","slug":"%e5%9c%a864%e4%bd%8d%e5%b9%b3%e5%8f%b0%e4%b8%8a%e7%9a%84lucene%ef%bc%8c%e5%ba%94%e8%af%a5%e4%bd%bf%e7%94%a8mmapdirectory%e8%bd%ac","status":"publish","type":"post","link":"https:\/\/blog.espnlol.com\/?p=568","title":{"rendered":"\u572864\u4f4d\u5e73\u53f0\u4e0a\u7684Lucene\uff0c\u5e94\u8be5\u4f7f\u7528MMapDirectory[\u8f6c]"},"content":{"rendered":"\n<p>\u4ece3.1\u7248\u672c\u5f00\u59cb\uff0cLucene\u548cSolr\u5f00\u59cb\u572864\u4f4d\u7684Windows\u548cSolaris\u7cfb\u7edf\u4e2d\u9ed8\u8ba4\u4f7f\u7528MMapDirectory\uff0c\u4ece3.3\u7248\u672c\u5f00\u59cb\uff0c64\u4f4d\u7684Linux\u7cfb\u7edf\u4e5f\u542f\u7528\u4e86\u8fd9\u4e2a\u914d\u7f6e\u3002\u8fd9\u4e2a\u53d8\u5316\u4f7f\u4e00\u4e9bLucene\u548cSolr\u7684\u7528\u6237\u6709\u4e9b\u8ff7\u832b\uff0c\u56e0\u4e3a\u7a81\u7136\u4e4b\u95f4\u4ed6\u4eec\u7684\u7cfb\u7edf\u7684\u67d0\u4e9b\u884c\u4e3a\u548c\u539f\u6765\u4e0d\u4e00\u6837\u4e86\u3002\u5728\u90ae\u4ef6\u5217\u8868\u4e2d\uff0c\u4e00\u4e9b\u7528\u6237\u53d1\u5e16\u8be2\u95ee\u4e3a\u4ec0\u4e48\u4f7f\u7528\u7684\u8d44\u6e90\u6bd4\u539f\u6765\u591a\u4e86\u5f88\u591a\u3002\u4e5f\u6709\u5f88\u591a\u4e13\u5bb6\u5f00\u59cb\u544a\u8bc9\u4eba\u4eec\u4e0d\u8981\u4f7f\u7528MMapDirectory\u3002\u4f46\u662f\u4eceLucene\u7684commiter\u7684\u89c6\u89d2\u6765\u770b\uff0cMMapDirectory\u7edd\u5bf9\u662f\u8fd9\u4e9b\u5e73\u53f0\u7684\u6700\u4f73\u9009\u62e9\u3002<\/p>\n\n\n\n<p>\u5728\u8fd9\u7bc7blog\u4e2d\uff0c\u6211\u4f1a\u8bd5\u7740\u89e3\u91ca\u89e3\u91ca\u5173\u4e8evirtual memory\u7684\u4e00\u4e9b\u57fa\u672c\u5e38\u8bc6\uff0c\u4ee5\u53ca\u8fd9\u4e9b\u5e38\u8bc6\u662f\u600e\u4e48\u88ab\u7528\u4e8e\u63d0\u5347lucene\u7684\u6027\u80fd\u3002\u4e86\u89e3\u4e86\u8fd9\u4e9b\uff0c\u4f60\u5c31\u4f1a\u660e\u767d\u90a3\u4e9b\u4e0d\u8ba9\u4f60\u4f7f\u7528MMapDirectory\u7684\u4eba\u662f\u9519\u8bef\u7684\u3002\u7b2c\u4e8c\u90e8\u5206\u6211\u4f1a\u5217\u51fa\u4e00\u4e9b\u914d\u7f6e\u7684\u7ec6\u8282\uff0c\u53ef\u4ee5\u907f\u514d\u51fa\u73b0\u201cmmap failed\u201d\u8fd9\u6837\u7684\u9519\u8bef\u6216\u8005\u7531\u4e8ejava\u5806\u7684\u4e00\u4e9b\u7279\u6027\u5bfc\u81f4lucene\u65e0\u6cd5\u8fbe\u5230\u6700\u4f18\u7684\u6027\u80fd\u3002<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Virtual Memory<a href=\"http:\/\/en.wikipedia.org\/wiki\/Virtual_memory\">[1]<\/a><\/h3>\n\n\n\n<p>\u6211\u4eec\u4ece\u64cd\u4f5c\u7cfb\u7edf\u7684\u5185\u6838\u5f00\u59cb\u8bf4\u8d77\u3002\u4ece70\u5e74\u4ee3\u5f00\u59cb\uff0c\u8f6f\u4ef6\u7684I\/O\u6a21\u5f0f\u5c31\u662f\u8fd9\u6837\u7684\uff1a\u53ea\u8981\u4f60\u9700\u8981\u8bbf\u95eedisk\u7684\u6570\u636e\uff0c\u4f60\u5c31\u4f1a\u5411kernal\u53d1\u8d77\u4e00\u4e2asyscall\uff0c\u628a\u4e00\u4e2a\u6307\u5411\u67d0\u4e2abuffer\u7684\u6307\u9488\u4f20\u8fdb\u53bb\uff0c\u7136\u540e\u8bfb\u6216\u8005\u5199\u78c1\u76d8\u3002\u5982\u679c\u4f60\u4e0d\u60f3\u9891\u7e41\u7684\u53d1\u8d77\u5927\u91cf\u7684syscall\uff0c(\u56e0\u4e3a\u7528\u6237\u8fdb\u7a0b\u53d1\u8d77syscall\u4f1a\u6d88\u8017\u5f88\u591a\u8d44\u6e90)\uff0c\u4f60\u5e94\u8be5\u4f7f\u7528\u8f83\u5927\u7684buffer\uff0c\u8fd9\u6837\u6bcf\u6b21\u591a\u8bfb\u4e00\u4e9b\uff0c\u8bbf\u95ee\u78c1\u76d8\u7684\u6b21\u6570\u4e5f\u5c31\u5c11\u4e86\u3002\u8fd9\u4e5f\u662f\u4e3a\u4ec0\u4e48\u6709\u4eba\u5efa\u8bae\u628aLucene\u7684\u6574\u4e2aindex\u90fdload\u5230Java heap\u4e2d\u7684\u4e00\u4e2a\u539f\u56e0(\u4f7f\u7528RAMDirectory)\u3002<\/p>\n\n\n\n<p>\u4f46\u662f\u6240\u6709\u7684\u73b0\u4ee3\u64cd\u4f5c\u7cfb\u7edf\uff0c\u50cfLinux\uff0cWindows(NT+), Mac OS X, \u4ee5\u53casolaris\u90fd\u63d0\u4f9b\u4e86\u4e00\u4e2a\u66f4\u597d\u7684\u65b9\u5f0f\u6765\u5b8c\u6210I\/O\uff1a\u4ed6\u4eec\u7528\u590d\u6742\u7684\u6587\u4ef6\u7cfb\u7edfcache\u548c\u5185\u5b58\u7ba1\u7406\u6765\u5e2e\u4f60buffer\u6570\u636e\u3002\u5176\u4e2d\u6700\u91cd\u8981\u7684\u4e00\u4e2afeature\u53eb\u505aVirtual Memory\uff0c\u662f\u4e00\u4e2a\u5904\u7406\u8d85\u5927\u6570\u636e(\u6bd4\u5982lucene index)\u7684\u5f88\u597d\u7684\u89e3\u51b3\u65b9\u6848\u3002<strong>Virtual Memory<\/strong>\u662f\u8ba1\u7b97\u673a\u4f53\u7cfb\u7ed3\u6784\u7684\u4e00\u4e2a\u91cd\u8981\u90e8\u5206\uff0c\u5b9e\u73b0\u5b83\u9700\u8981\u786c\u4ef6\u7ea7\u7684\u652f\u6301\uff0c\u4e00\u822c\u79f0\u4f5cmemory management unit(MMU)\uff0c\u662fCPU\u7684\u4e00\u90e8\u5206\u3002\u5b83\u7684\u5de5\u4f5c\u65b9\u5f0f\u975e\u5e38\u7b80\u5355\uff1a\u6bcf\u4e2a\u8fdb\u7a0b\u90fd\u6709\u72ec\u7acb\u7684\u865a\u62df\u5730\u5740\u7a7a\u95f4\uff0c\u6240\u6709\u7684library\uff0c\u5806\uff0c\u6808\u7a7a\u95f4\u90fd\u6620\u5c04\u5728\u8fd9\u4e2a\u865a\u62df\u7a7a\u95f4\u91cc\u3002\u5728\u5927\u591a\u6570\u60c5\u51b5\u4e0b\uff0c\u8fd9\u4e2a\u865a\u62df\u5730\u5740\u7a7a\u95f4\u7684\u8d77\u59cb\u504f\u79fb\u91cf\u90fd\u662f0\uff0c\u5728\u8fd9\u91ccload\u7a0b\u5e8f\u4ee3\u7801\uff0c\u56e0\u4e3a\u7a0b\u5e8f\u4ee3\u7801\u7684\u5730\u5740\u6307\u9488\u4e0d\u4f1a\u53d8\u5316\u3002\u6bcf\u4e2a\u8fdb\u7a0b\u90fd\u4f1a\u770b\u5230\u4e00\u4e2a\u5927\u7684\uff0c\u4e0d\u95f4\u65ad\u7684\u5148\u884c\u5730\u5740\u7a7a\u95f4\uff0c\u5b83\u88ab\u79f0\u4e3avirtual memory\uff0c\u56e0\u4e3a\u8fd9\u4e2a\u5730\u5740\u7a7a\u95f4\u548cphysical memory\u6ca1\u6709\u534a\u6bdb\u94b1\u5173\u7cfb\uff0c\u53ea\u662f\u8fdb\u7a0b\u770b\u8d77\u6765\u50cfmemory\u800c\u5df2\u3002\u8fdb\u7a0b\u53ef\u4ee5\u50cf\u8bbf\u95ee\u771f\u5b9e\u5185\u5b58\u4e00\u6837\u8bbf\u95ee\u8fd9\u4e2a\u865a\u62df\u5730\u5740\u7a7a\u95f4\uff0c\u4e5f\u4e0d\u9700\u8981\u5173\u5fc3\u4e0e\u6b64\u540c\u65f6\u8fd8\u6709\u5f88\u591a\u5176\u4ed6\u8fdb\u7a0b\u4e5f\u5728\u4f7f\u7528\u5185\u5b58\u3002\u5e95\u5c42\u7684OS\u548cMMU\u4e00\u8d77\u5408\u4f5c\uff0c\u628a\u8fd9\u4e9b\u865a\u62df\u5730\u5740\u6620\u5c04\u5230\u771f\u5b9e\u7684memory\u4e2d\u3002\u8fd9\u4e2a\u5de5\u4f5c\u9700\u8981<strong>page table<\/strong>\u7684\u5e2e\u52a9\uff0cpage table\u7531\u4f4d\u4e8eMMU\u786c\u4ef6\u91cc\u7684TLBs(translation lookaside buffers, \u5b83cache\u4e86\u9891\u7e41\u88ab\u8bbf\u95ee\u7684page)\u652f\u6301\u3002\u8fd9\u6837\uff0cOS\u53ef\u4ee5\u628a\u6240\u6709\u8fdb\u7a0b\u7684\u5185\u5b58\u8bbf\u95ee\u8bf7\u6c42\u53d1\u5e03\u5230\u771f\u5b9e\u53ef\u7528\u7684\u7269\u7406\u5185\u5b58\u4e0a\uff0c\u800c\u4e14\u5bf9\u4e8e\u8fd0\u884c\u7684\u7a0b\u5e8f\u6765\u8bf4\u662f\u5b8c\u5168\u900f\u660e\u7684\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/images0.cnblogs.com\/i\/161247\/201403\/211648074594480.jpg\" alt=\"\"\/><\/figure>\n\n\n\n<p>Schematic drawing of virtual memory<em>(image from Wikipedia&nbsp;<a href=\"http:\/\/en.wikipedia.org\/wiki\/Virtual_memory\">[1]<\/a>,&nbsp;<a href=\"http:\/\/en.wikipedia.org\/wiki\/File:Virtual_memory.svg\">http:\/\/en.wikipedia.org\/wiki\/File:Virtual_memory.svg<\/a>, licensed by CC BY-SA 3.0)<\/em><\/p>\n\n\n\n<p>\u4f7f\u7528\u4e86\u8fd9\u6837\u7684\u865a\u62df\u5316\u4e4b\u540e\uff0cOS\u8fd8\u9700\u8981\u505a\u4e00\u4ef6\u4e8b\uff1a\u5f53\u7269\u7406\u5185\u5b58\u4e0d\u591f\u7684\u65f6\u5019\uff0cOS\u8981\u80fd\u51b3\u5b9aswap out\u4e00\u4e9b\u4e0d\u518d\u4f7f\u7528\u7684pages\uff0c\u91ca\u653e\u7269\u7406\u7a7a\u95f4\u3002\u5f53\u4e00\u4e2a\u8fdb\u7a0b\u8bd5\u7740\u8bbf\u95ee\u4e00\u4e2apage out\u7684\u865a\u62df\u5730\u5740\u65f6\uff0c\u5b83\u4f1a\u518d\u6b21\u88abreload\u8fdb\u5185\u5b58\u3002\u5728\u8fd9\u4e2a\u8fc7\u7a0b\u91cc\uff0c\u7528\u6237\u8fdb\u7a0b\u4e0d\u9700\u8981\u505a\u4efb\u4f55\u4e8b\u60c5\uff0c\u5bf9\u8fdb\u7a0b\u6765\u8bf4\uff0c\u5185\u5b58\u7ba1\u7406\u662f\u5b8c\u5168\u900f\u660e\u7684\u3002\u8fd9\u5bf9\u5e94\u7528\u7a0b\u5e8f\u6765\u8bf4\u662f\u5929\u5927\u7684\u597d\u4e8b\uff0c\u56e0\u4e3a\u5b83\u4e0d\u5fc5\u5173\u5fc3\u5185\u5b58\u662f\u5426\u591f\u7528\u3002\u5f53\u7136\uff0c\u8fd9\u5bf9\u4e8e\u9700\u8981\u4f7f\u7528\u5927\u91cf\u5185\u5b58\u7684\u5e94\u7528\uff0c\u6bd4\u5982Lucene\uff0c\u4e5f\u4f1a\u6765\u5e26\u4e00\u4e9b\u95ee\u9898\u3002<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Lucene &amp; Virtual Memory<\/h3>\n\n\n\n<p>\u6211\u4eec\u6765\u770b\u4e00\u4e2a\u4f8b\u5b50\uff0c\u5047\u8bbe\u6211\u4eec\u628a\u6574\u4e2a\u7684\u7d22\u5f15load\u8fdb\u4e86\u5185\u5b58(\u5176\u5b9e\u662fvirtual memory)\u3002\u5982\u679c\u6211\u4eec\u5206\u914d\u4e86\u4e00\u4e2aRAMDirectory\uff0c\u5e76\u4e14\u628a\u6240\u6709\u7684\u7d22\u5f15\u6587\u4ef6\u90fdload\u8fdb\u53bb\u4e86\uff0c\u90a3\u4e48\u6211\u4eec\u5176\u5b9e\u8fdd\u80cc\u4e86OS\u7684\u610f\u613f\u3002OS\u672c\u8eab\u662f\u4f1a\u5c3d\u529b\u4f18\u5316\u78c1\u76d8\u8bbf\u95ee\uff0c\u6240\u4ee5OS\u4f1a\u5728\u7269\u7406\u5185\u5b58\u4e2dcache\u4f4f\u6240\u6709\u7684\u78c1\u76d8IO\u3002\u800c\u6211\u4eec\u628a\u8fd9\u4e9b\u6240\u6709\u672c\u5e94cache\u4f4f\u7684\u5185\u5bb9copy\u5230\u4e86\u6211\u4eec\u81ea\u5df1\u7684\u865a\u62df\u5730\u5740\u7a7a\u95f4\u4e86\uff0c\u6d88\u8017\u4e86\u5927\u91cf\u7684\u7269\u7406\u5185\u5b58\u3002<strong>\u800c\u7269\u7406\u5185\u5b58\u662f\u6709\u9650\u7684\uff0cOS\u53ef\u80fd\u4f1a\u628a\u6211\u4eec\u5206\u914d\u7684\u8fd9\u4e2a\u8d85\u5927\u7684RAMDirectory\u8e22\u51fa\u7269\u7406\u5185\u5b58\uff0c\u4e5f\u5c31\u662f\u653e\u5230\u4e86\u78c1\u76d8\u4e0a(OS swap file)\u3002<\/strong>\u4e8b\u5b9e\u4e0a\uff0c\u6211\u4eec\u662f\u5728\u548cOS kernel\u6253\u67b6\uff0c\u7ed3\u679c\u5c31\u662fOS\u628a\u6211\u4eec\u8f9b\u8f9b\u82e6\u82e6\u4ece\u78c1\u76d8\u4e0a\u8bfb\u53d6\u7684\u6570\u636e\u53c8\u8e22\u56de\u4e86\u78c1\u76d8\u3002\u6240\u4ee5RAMDirectory\u5e76\u4e0d\u662f\u4f18\u5316\u7d22\u5f15\u52a0\u8f7d\u65f6\u8017\u7684\u597d\u4e3b\u610f\u3002\u800c\u4e14\uff0cRAMDirectory\u8fd8\u6709\u4e00\u4e9b\u548cGC\u4ee5\u53caconcurrency\u76f8\u5173\u7684\u95ee\u9898\u3002\u56e0\u4e3a\u6570\u636e\u5b58\u50a8\u5728swap space\uff0cJAVA\u7684GC\u8981\u6e05\u7406\u5b83\u662f\u5f88\u8d39\u52b2\u7684\u3002\u8fd9\u4f1a\u5bfc\u81f4\u5927\u91cf\u7684\u78c1\u76d8IO\uff0c\u5f88\u6162\u7684\u7d22\u5f15\u8bbf\u95ee\u901f\u5ea6\uff0c\u4ee5\u53ca\u7531\u4e8eGC\u4e0d\u65b9\u4fbf\u800c\u5bfc\u81f4\u7684\u957f\u8fbe\u6570\u5206\u949f\u7684\u5ef6\u8fdf\u3002<\/p>\n\n\n\n<p>\u5982\u679c\u6211\u4eec\u4e0d\u7528RAMDirectory\u6765\u7f13\u5b58index\uff0c\u800c\u662f<strong>\u4f7f\u7528NIOFSDirectory\u6216\u8005SimpleFSDirectory\uff0c\u4f1a\u6709\u53e6\u5916\u7684\u95ee\u9898\uff1a\u6211\u4eec\u7684\u4ee3\u7801\u9700\u8981\u6267\u884c\u5f88\u591asyscall\u6765\u62f7\u8d1d\u6570\u636e\uff0c\u6570\u636e\u6d41\u5411\u662f\u4ece\u78c1\u76d8\u6216\u6587\u4ef6\u7cfb\u7edf\u7f13\u5b58\u5411Java heap\u7684buffer\u3002<\/strong>\u5728\u6bcf\u4e2a\u641c\u7d22\u8bf7\u6c42\u4e2d\uff0c\u8fd9\u6837\u7684IO\u90fd\u5b58\u5728\u3002<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Memory Mapping Files<\/h3>\n\n\n\n<p>\u4e0a\u9762\u95ee\u9898\u7684\u89e3\u51b3\u65b9\u6848\u5c31\u662fMMapDirectory\uff0c\u5b83\u4f7f\u7528virtual memory\u548cmmap\u6765\u8bbf\u95ee\u78c1\u76d8\u6587\u4ef6\u3002<\/p>\n\n\n\n<p>\u5728\u672c\u6587\u524d\u534a\u90e8\u5206\u8bb2\u8ff0\u7684\u65b9\u6cd5\uff0c\u6211\u4eec\u90fd\u662f\u4f9d\u8d56\u7cfb\u7edf\u8c03\u7528\u5728\u6587\u4ef6\u7cfb\u7edfcache\u4ee5\u53caJava heap\u4e4b\u95f4\u62f7\u8d1d\u6570\u636e\u3002\u90a3\u4e48\u600e\u4e48\u624d\u80fd\u76f4\u63a5\u8bbf\u95ee\u6587\u4ef6\u7cfb\u7edfcache\u5462\uff1f\u8fd9\u5c31\u662fmmap\u7684\u4f5c\u7528\uff01<\/p>\n\n\n\n<p>\u7b80\u5355\u8bf4MMapDirectory\u5c31\u662f\u628alucene\u7684\u7d22\u5f15\u5f53\u4f5cswap file\u6765\u5904\u7406\u3002mmap()\u7cfb\u7edf\u8c03\u7528\u8ba9OS\u628a\u6574\u4e2a\u7d22\u5f15\u6587\u4ef6\u6620\u5c04\u5230\u865a\u62df\u5730\u5740\u7a7a\u95f4\uff0c\u8fd9\u6837Lucene\u5c31\u4f1a\u89c9\u5f97\u7d22\u5f15\u5728\u5185\u5b58\u4e2d\u3002\u7136\u540eLucene\u5c31\u53ef\u4ee5\u50cf\u8bbf\u95ee\u4e00\u4e2a\u8d85\u5927\u7684byte[]\u6570\u636e(\u5728Java\u4e2d\u8fd9\u4e2a\u6570\u636e\u88ab\u5c01\u88c5\u5728ByteBuffer\u63a5\u53e3\u91cc)\u4e00\u6837\u8bbf\u95ee\u78c1\u76d8\u4e0a\u7684\u7d22\u5f15\u6587\u4ef6\u3002Lucene\u5728\u8bbf\u95ee\u865a\u62df\u7a7a\u95f4\u4e2d\u7684\u7d22\u5f15\u65f6\uff0c\u4e0d\u9700\u8981\u4efb\u4f55\u7684\u7cfb\u7edf\u8c03\u7528\uff0cCPU\u91cc\u7684MMU\u548cTLB\u4f1a\u5904\u7406\u6240\u6709\u7684\u6620\u5c04\u5de5\u4f5c\u3002\u5982\u679c\u6570\u636e\u8fd8\u5728\u78c1\u76d8\u4e0a\uff0c\u90a3\u4e48MMU\u4f1a\u53d1\u8d77\u4e00\u4e2a\u4e2d\u65ad\uff0cOS\u5c06\u4f1a\u628a\u6570\u636e\u52a0\u8f7d\u8fdb\u6587\u4ef6\u7cfb\u7edfCache\u3002\u5982\u679c\u6570\u636e\u5df2\u7ecf\u5728cache\u91cc\u4e86\uff0cMMU\/TLB\u4f1a\u76f4\u63a5\u628a\u6570\u636e\u6620\u5c04\u5230\u5185\u5b58\uff0c\u8fd9\u53ea\u9700\u8981\u8bbf\u95ee\u5185\u5b58\uff0c\u901f\u5ea6\u5f88\u5feb\u3002\u7a0b\u5e8f\u5458\u4e0d\u9700\u8981\u5173\u5fc3paging in\/out\uff0c\u6240\u6709\u7684\u8fd9\u4e9b\u90fd\u4ea4\u7ed9OS\u3002\u800c\u4e14\uff0c\u8fd9\u79cd\u60c5\u51b5\u4e0b\u6ca1\u6709\u5e76\u53d1\u7684\u5e72\u6270\uff0c\u552f\u4e00\u7684\u95ee\u9898\u5c31\u662fJava\u7684ByteBuffer\u5c01\u88c5\u540e\u7684byte[]\u7a0d\u5fae\u6162\u4e00\u4e9b\uff0c\u4f46\u662fJava\u91cc\u8981\u60f3\u7528mmap\u5c31\u53ea\u80fd\u7528\u8fd9\u4e2a\u63a5\u53e3\u3002\u8fd8\u6709\u4e00\u4e2a\u5f88\u5927\u7684\u4f18\u70b9\u5c31\u662f\u6240\u6709\u7684\u5185\u5b58issue\u90fd\u7531OS\u6765\u8d1f\u8d23\uff0c\u8fd9\u6837\u6ca1\u6709GC\u7684\u95ee\u9898\u3002<\/p>\n\n\n\n<p><em>What does this all mean to our Lucene\/Solr application?<\/em><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>\u4e3a\u4e86\u4e0d\u548cOS\u4e89\u5185\u5b58\uff0c\u5e94\u8be5\u7ed9JVM\u5206\u5c3d\u53ef\u80fd\u5c11\u7684heap\u7a7a\u95f4(-Xmx option). \u7d22\u5f15\u7684\u8bbf\u95ee\u5168\u90e8\u90fd\u4ea4\u7ed9OS cache\u3002\u800c\u4e14\u8fd9\u6837\u5bf9JVM\u7684GC\u4e5f\u597d\u3002<\/strong><\/li><li><strong>\u91ca\u653e\u5c3d\u53ef\u80fd\u591a\u7684\u5185\u5b58\u7ed9OS\uff0c\u8fd9\u6837\u6587\u4ef6\u7cfb\u7edf\u7f13\u5b58\u7684\u7a7a\u95f4\u4e5f\u5927\uff0cswap\u7684\u6982\u7387\u4f4e\u3002<\/strong><\/li><\/ul>\n\n\n\n<p><strong>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Don\u2019t be afraid \u2013 Some clarification to common misunderstandings<\/h2>\n\n\n\n<p>Since version 3.1,&nbsp;<strong>Apache Lucene<\/strong>&nbsp;and&nbsp;<strong>Solr&nbsp;<\/strong>use&nbsp;MMapDirectory&nbsp;by default on 64bit Windows and Solaris systems; since version 3.3 also for 64bit Linux systems. This change lead to some confusion among Lucene and Solr users, because suddenly their systems started to behave differently than in previous versions. On the Lucene and Solr mailing lists a lot of posts arrived from users asking why their Java installation is suddenly consuming three times their physical memory or system administrators complaining about heavy resource usage. Also consultants were starting to tell people that they should&nbsp;<strong>not&nbsp;<\/strong>use&nbsp;MMapDirectory&nbsp;and change their solrconfig.xml to work instead with slow&nbsp;SimpleFSDirectory&nbsp;or&nbsp;NIOFSDirectory&nbsp;(which is much slower on Windows, caused by a JVM bug&nbsp;<a href=\"http:\/\/bugs.sun.com\/bugdatabase\/view_bug.do?bug_id=6265734\">#6265734<\/a>). From the point of view of the Lucene committers, who carefully decided that using&nbsp;MMapDirectory&nbsp;is the best for those platforms, this is rather annoying, because they know, that Lucene\/Solr can work with much better performance than before. Common misinformation about the background of this change causes suboptimal installations of this great search engine everywhere.<\/p>\n\n\n\n<p>In this blog post, I will try to explain the basic operating system facts regarding virtual memory handling in the kernel and how this can be used to largely improve performance of Lucene<em>(\u201cVIRTUAL MEMORY for DUMMIES\u201d)<\/em>. It will also clarify why the blog and mailing list posts done by various people are wrong and contradict the purpose of&nbsp;MMapDirectory. In the second part I will show you some configuration details and settings you should take care of to prevent errors like \u201cmmap failed\u201d and suboptimal performance because of stupid Java heap allocation.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Virtual Memory<sup><a href=\"http:\/\/en.wikipedia.org\/wiki\/Virtual_memory\">[1]<\/a><\/sup><\/h3>\n\n\n\n<p>Let\u2019s start with your operating system\u2019s kernel: The naive approach to do I\/O in software is the way, you have done this since the&nbsp;1970s \u2013 the&nbsp;pattern is simple: whenever you have to work with data on disk, you execute a&nbsp;<em>syscall&nbsp;<\/em>to your operating system kernel, passing a pointer to some buffer (e.g. a&nbsp;byte[]&nbsp;array in Java) and transfer some bytes from\/to disk. After that you parse the buffer contents and do your program logic. If you don\u2019t want to do too many syscalls (because those may cost a lot processing power), you generally use large buffers in your software, so synchronizing the data in the buffer with your disk needs to be done less often. This is one reason, why some people suggest to load the whole Lucene index into Java heap memory (e.g., by using&nbsp;RAMDirectory).<\/p>\n\n\n\n<p>But all modern operating systems like Linux, Windows (NT+), MacOS X, or Solaris provide a much better approach to do this 1970s style of code by using their sophisticated file system caches and memory management features. A feature called&nbsp;<em>\u201cvirtual memory\u201d<\/em>&nbsp;is a good alternative to handle very large and space intensive data structures like a Lucene index. Virtual memory is an integral part of a computer architecture; implementations require hardware support, typically in the form of a&nbsp;<em>memory management unit (MMU)<\/em>&nbsp;built into the CPU. The way how it works is very simple: Every process gets his own virtual address space where all libraries, heap and stack space is mapped into. This address space in most cases also start at offset zero, which simplifies loading the program code because&nbsp;no relocation of address pointers needs to be done. Every process sees a large unfragmented linear address space it can work on. It is called \u201cvirtual memory\u201d because this address space has nothing to do with physical memory, it just looks like so to the process. Software can then access this large address space as if it were real memory without knowing that there are other processes also consuming memory and having their own virtual address space. The underlying operating system works together with the MMU (memory management unit) in the CPU to map those virtual addresses to real memory once they are accessed for the first time. This is done using so called page tables, which are backed by&nbsp;<em>TLBs<\/em>located in the MMU hardware&nbsp;<em>(translation lookaside buffers, they cache frequently accessed pages)<\/em>. By this, the operating system is able to distribute all running processes\u2019 memory requirements to the real available memory, completely transparent to the running programs.&nbsp;By using this virtualization, there is one more thing, the operating system can do: If there is not enough physical memory, it can decide to \u201cswap out\u201d pages no longer used by the processes, freeing physical memory for other processes or caching more important file system operations. Once a process tries to access a virtual address, which was paged out, it is reloaded to main memory and made available to the process. The process does not have to do anything, it is completely transparent. This is a good thing to applications because they don\u2019t need to know anything about the amount of memory available; but also leads to problems for very memory intensive applications like Lucene.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Lucene &amp; Virtual Memory<\/h3>\n\n\n\n<p>Let\u2019s take the example of loading the whole index or large parts of it into \u201cmemory\u201d&nbsp;<em>(we already know, it is only virtual memory)<\/em>. If we allocate a&nbsp;RAMDirectory&nbsp;and load all index files into it, we are working against the operating system: The operating system tries to optimize disk accesses, so it caches already all disk I\/O in physical memory. We copy all these cache contents into our own virtual address space, consuming horrible amounts of physical memory (and we must wait for the copy operation to take place!).&nbsp;<strong>As physical memory is limited, the operating system may, of course, decide to swap out our large&nbsp;RAMDirectory&nbsp;and where does it land? \u2013 On disk again (in the OS swap file)!<\/strong>&nbsp;In fact, we are fighting against our O\/S kernel who pages out all stuff we loaded from disk&nbsp;<a href=\"https:\/\/www.varnish-cache.org\/trac\/wiki\/ArchitectNotes\">[2]<\/a>. So&nbsp;RAMDirectory&nbsp;is not a good idea to optimize index loading times! Additionally,&nbsp;RAMDirectory&nbsp;has also more problems related to garbage collection and concurrency. Because the data residing in swap space, Java\u2019s garbage collector has a hard job to free the memory in its own heap management. This leads to high disk I\/O, slow index access times, and minute-long latency in your searching code caused by the garbage collector driving crazy.<\/p>\n\n\n\n<p>On the other hand, if we don\u2019t use&nbsp;RAMDirectory&nbsp;to buffer our index and useNIOFSDirectory&nbsp;or&nbsp;SimpleFSDirectory, we have to pay another price: Our code has to do a lot of syscalls to the O\/S kernel to copy blocks of data between the disk or filesystem cache and our buffers residing in Java heap. This needs to be done on every search request, over and over again.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Memory Mapping Files<\/h3>\n\n\n\n<p>The solution to the above issues is&nbsp;MMapDirectory, which uses virtual memory and a kernel feature called \u201cmmap\u201d&nbsp;<a href=\"http:\/\/en.wikipedia.org\/wiki\/Memory-mapped_file\">[3]<\/a>&nbsp;to access the disk files.<\/p>\n\n\n\n<p>In our previous approaches, we were relying on using a syscall to&nbsp;<strong>copy&nbsp;<\/strong>the data between the file system cache and our local Java heap. How about&nbsp;<strong>directly accessing<\/strong>&nbsp;the file system cache? This is what mmap does!<br>Basically mmap does the same like handling the Lucene index as a swap file. The&nbsp;mmap()syscall tells the O\/S kernel to virtually map our whole index files into the previously described virtual address space, and make them look like RAM available to our Lucene process. We can then access our index file on disk just like it would be a large&nbsp;byte[]&nbsp;array (in Java this is encapsulated by a&nbsp;ByteBuffer&nbsp;interface to make it safe for use by Java code). If we access this virtual address space from the Lucene code we don\u2019t need to do any syscalls, the processor\u2019s MMU and TLB handles all the mapping for us. If the data is only on disk, the MMU will cause an interrupt and the O\/S kernel will load the data into file system cache. If it is already in cache, MMU\/TLB map it directly to the physical memory in file system cache. It is now just a native memory access, nothing more! We don\u2019t have to take care of paging in\/out of buffers, all this is managed by the O\/S kernel. Furthermore, we have no concurrency issue, the only overhead over a standard&nbsp;byte[]&nbsp;array is some wrapping caused by Java\u2019s&nbsp;ByteBufferinterface (it is still slower than a real&nbsp;byte[]&nbsp;array, but &nbsp;that is the only way to use mmap from Java and is much faster than all other directory implementations shipped with Lucene). We also waste no physical memory, as we operate directly on the O\/S cache, avoiding all Java GC issues described before.<\/p>\n\n\n\n<p><em>What does this all mean to our Lucene\/Solr application?<\/em><br><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>We should not work against the operating system anymore, so allocate as less as possible heap space (-Xmx&nbsp;Java option).<\/strong>&nbsp;Remember, our index accesses rely on passed directly to O\/S cache! This is also very friendly to the Java garbage collector.<\/li><li><strong>Free as much as possible physical memory to be available for the O\/S kernel as file system cache.&nbsp;<\/strong>Remember, our Lucene code works directly on it, so reducing the number of&nbsp;<em>paging\/swapping<\/em>&nbsp;between disk and memory. Allocating too much heap to our Lucene application hurts performance! Lucene does not require it with&nbsp;MMapDirectory.<\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Why does this only work as expected on operating systems and Java virtual machines with 64bit?<\/h3>\n\n\n\n<p>One limitation of 32bit platforms is the size of pointers, they can refer to any address within 0 and 2<sup>32<\/sup>-1, which is 4 Gigabytes. Most operating systems limit that address space to 3 Gigabytes because the remaining address space is reserved for use by device hardware and similar things. This means the overall linear address space provided to any process is limited to 3 Gigabytes, so you cannot map any file larger than that into this \u201csmall\u201d address space to be available as bigbyte[]&nbsp;array. And when you mapped that one large file, there is no virtual space (address like \u201chouse number\u201d) available anymore. As physical memory sizes in current systems already have gone beyond that size, there is no address space available to make use for mapping files without wasting resources<em>&nbsp;(in our case \u201caddress space\u201d, not physical memory!)<\/em>.<\/p>\n\n\n\n<p>On 64bit platforms this is different: 2<sup>64<\/sup>-1 is a very large number, a number in excess of 18 quintillion bytes, so there is no real limit in address space. Unfortunately, most hardware (the MMU, CPU\u2019s bus system) and operating systems are limiting this address space to 47 bits for user mode applications (Windows: 43 bits)&nbsp;<a href=\"http:\/\/en.wikipedia.org\/wiki\/X86-64#Virtual_address_space_details\">[4]<\/a>. But there is still much of addressing space available to map terabytes of data.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Common misunderstandings<\/h3>\n\n\n\n<p>If you have read carefully what I have told you about virtual memory, you can easily verify that the following is true:<br><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>MMapDirectory&nbsp;does not consume additional memory and the size of mapped index files is not limited by the physical memory available on your server.<\/strong>&nbsp;By mmap() files, we only reserve address space not memory! Remember, address space on 64bit platforms is for free!<\/li><li><strong>MMapDirectory&nbsp;will not load the whole index into physical memory.<\/strong>&nbsp;Why should it do this? We just ask the operating system to map the file into address space for easy access, by no means we are requesting more. Java and the O\/S optionally provide the option to try loading the whole file into RAM (if enough is available), but Lucene does not use that option (we may add this possibility in a later version).<\/li><li><strong>MMapDirectory&nbsp;does not overload the server when \u201ctop\u201d reports horrible amounts of memory.<\/strong>&nbsp;\u201ctop\u201d (on Linux) has three columns related to memory: \u201cVIRT\u201d, \u201cRES\u201d, and \u201cSHR\u201d. The first one (VIRT, virtual) is reporting allocated virtual address space (and that one is for free on 64 bit platforms!). This number can be multiple times of your index size or physical memory when merges are running in&nbsp;IndexWriter. If you have only one&nbsp;IndexReader&nbsp;open it should be approximately equal to allocated heap space (-Xmx) plus index size. It does not show physical memory used by the process. The second column (RES, resident) memory shows how much (physical) memory the process allocated for operating and should be in the size of your Java heap space. The last column (SHR, shared) shows how much of the allocated virtual address space is shared with other processes. If you have several Java applications using&nbsp;MMapDirectory&nbsp;to access the same index, you will see this number going up. Generally, you will see the space needed by shared system libraries, JAR files, and the process executable itself (which are also mmapped).<\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">How to configure my operating system and Java VM to make optimal use of MMapDirectory?<\/h3>\n\n\n\n<p>First of all, default settings in Linux distributions and Solaris\/Windows are perfectly fine. But there are some paranoid system administrators around, that want to control everything (with lack of understanding). Those limit the maximum amount of virtual address space that can be allocated by applications. So please check that \u201culimit -v\u201d and \u201culimit -m\u201d both report \u201cunlimited\u201d, otherwise it may happen that&nbsp;MMapDirectory&nbsp;reports&nbsp;<em>\u201cmmap failed\u201d<\/em>&nbsp;while opening your index. If this error still happens on systems with lot\u2019s of very large indexes, each of those with many segments, you may need to tune your kernel parameters in&nbsp;\/etc\/sysctl.conf: The default value of&nbsp;vm.max_map_count&nbsp;is 65530, you may need to raise it. I think, for Windows and Solaris systems there are similar settings available, but it is up to the reader to find out how to use them.<\/p>\n\n\n\n<p>For configuring your Java VM, you should rethink your memory requirements: Give only the really needed amount of heap space and leave as much as possible to the O\/S. As a rule of thumb: Don\u2019t use more than \u00bc of your physical memory as heap space for Java running Lucene\/Solr, keep the remaining memory free for the operating system cache. If you have more applications running on your server, adjust accordingly. As usual the more physical memory the better, but you don\u2019t need as much physical memory as your index size. The kernel does a good job in paging in frequently used pages from your index.<\/p>\n\n\n\n<p>A good possibility to check that you have configured your system optimally is by looking at both &#8220;top&#8221;<em>&nbsp;(and correctly interpreting it, see above)<\/em>&nbsp;and the similar command &#8220;<a href=\"http:\/\/guichaz.free.fr\/iotop\/\">iotop<\/a>&#8221; (can be installed, e.g., on Ubuntu Linux by &#8220;apt-get install iotop&#8221;). If your system does lots of swap in\/swap out for the Lucene process, reduce heap size, you possibly used too much. If you see lot&#8217;s of disk I\/O, buy more RUM (<a href=\"http:\/\/mail-archives.apache.org\/mod_mbox\/lucene-java-user\/201207.mbox\/%3CCAAHmpkhJ7KU3X0wm2VHwDkO0UZd%3D6%2Behh0qWzpzw-WdFvB%2BQ_A%40mail.gmail.com%3E\">Simon Willnauer<\/a>) so mmapped files don&#8217;t need to be paged in\/out all the time, and finally:&nbsp;<a href=\"http:\/\/www.youtube.com\/watch?v=H7PJ1oeEyGg\">buy SSDs<\/a>.<\/p>\n\n\n\n<p><em>Happy mmapping!<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u4ece3.1\u7248\u672c\u5f00\u59cb\uff0cLucene\u548cSolr\u5f00\u59cb\u572864\u4f4d\u7684Windows\u548cSolar &hellip; <a href=\"https:\/\/blog.espnlol.com\/?p=568\">\u7ee7\u7eed\u9605\u8bfb <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-568","post","type-post","status-publish","format-standard","hentry","category-elk"],"_links":{"self":[{"href":"https:\/\/blog.espnlol.com\/index.php?rest_route=\/wp\/v2\/posts\/568","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.espnlol.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.espnlol.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.espnlol.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.espnlol.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=568"}],"version-history":[{"count":1,"href":"https:\/\/blog.espnlol.com\/index.php?rest_route=\/wp\/v2\/posts\/568\/revisions"}],"predecessor-version":[{"id":569,"href":"https:\/\/blog.espnlol.com\/index.php?rest_route=\/wp\/v2\/posts\/568\/revisions\/569"}],"wp:attachment":[{"href":"https:\/\/blog.espnlol.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=568"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.espnlol.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=568"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.espnlol.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=568"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}