dplavery92's comments

dplavery92 | 1 year ago | on: Tencent Hunyuan-Large

The title of Tencent's paper [0] as well as their homepage for the model [1] each use the term "Open-Source" in the title, so I think they are making the claim.

[0] https://arxiv.org/pdf/2411.02265 [1] https://llm.hunyuan.tencent.com/

dplavery92 | 1 year ago | on: C++ proposal: There are exactly 8 bits in a byte

Eight is a nice power of two.

dplavery92 | 2 years ago | on: Kalman Filter Explained Simply

You can also construct multiple hypothesis trackers from multiple Kalman Filters, but there is a little more machinery. For example, Interacting Multiple Models (IMM) trackers may use Kalman Filters or Particle Filters, and a lot of the foundational work by Bar-Shalom and others focuses on Kalman Filters.

dplavery92 | 2 years ago | on: Kalman Filter Explained Simply

The Kalman filter has a family of generalizations in the Extended Kalman Filter (EKF) and Unscented Kalman Filter (UKF.)

Also common in robotics applications is the Particle Filter, which uses a Monte Carlo approximation of the uncertainty in the state, rather than enforcing a (Gaussian) distribution, as in the traditional Kalman filter. This can be useful when the mechanics are highly nonlinear and/or your measurement uncertainties are, well, very non-Gaussian. Sebastian Thrun (a CMU robotics professor in the DARPA "Grand Challenge" days of self-driving cars) made an early Udacity course on Particle Filters.

dplavery92 | 2 years ago | on: Simulating fluids, fire, and smoke in real-time

I was encountering the same problem on my Intel MBP, and per another one of the comments here, find that switching from Chrome to Safari to view the page allows me to view the whole page, view it smoothly, and without my CPU utilization spiking or my fans spinning up.

dplavery92 | 2 years ago | on: OpenAI's board has fired Sam Altman

I don't think anyone in this thread knows what happened, but since we're in a thread speculating why the CEO of the leading AI company was suddenly sacked, the possibility of an unacceptable interpersonal scandal isn't any more outlandish than others' suggestions of fraud, legal trouble for OpenAI, or foundering financials. The suggestion here is simply that Altman having done something "big and dangerous" is not a foregone conclusion.

In the words of Brandt, "well, Dude, we just don't know."

dplavery92 | 2 years ago | on: UHZ1: NASA telescopes discover record-breaking black hole

Correct, the article places UHZ1 at 13.2 billion light-years away, so roughly ~500 Gy into our 13.7-billion-year-old universe.

dplavery92 | 2 years ago | on: Mars has a layer of molten rock inside

https://en.wikipedia.org/wiki/Earthrise

dplavery92 | 2 years ago | on: Medieval staircases were not built going clockwise for the defender's advantage

From the captioned art in the article: "Siege, from the Peterborough Psalter, early 14th century, via the KBR Museum, Belgium. Yes, those defenders are all women."

dplavery92 | 2 years ago | on: A non-mathematical introduction to Kalman filters for programmers

I think a great place to start is https://www.bzarg.com/p/how-a-kalman-filter-works-in-picture...

Unlike the OP article, it does make use of the math formalism for Kalman filters, but it is a relatively gentle introduction that does a very good job visualizing and explaining the intuition of each term. I have gotten positive feedback (no pun intended!) from interns or junior hires using this resource to familiarize themselves with the topic.

If you are making a deeper study and are ready to dive into a textbook that more thoroughly explores theory and application, there is a book by Gibbs[1] that I have used in the past and is well-regarded in some segments of industry that rely on these techniques for state estimation and GNC.

[1] https://onlinelibrary.wiley.com/doi/book/10.1002/97804708900...

dplavery92 | 2 years ago | on: Like diffusion but faster: The Paella model for fast image generation

From Sections 3 and 4 of the VQGAN paper[1] upon this work is built: "To generate images in the megapixel regime, we ... have to work patch-wise and crop images to restrict the length of [the quantized encoding vector] s to a maximally feasible size during training. To sample images, we then use the transformer in a sliding-window manner as illustrated in Fig.3." ... "The sliding window approach introduced in Sec.3.2 enables image synthesis beyond a resolution of 256×256pixels."

From the Paella paper[2]: "Our proposal builds on the two-stage paradigm introduced by Esser et al. and consists of a Vector-quantized Generative Adversarial Network (VQGAN) for projecting the high dimensional images into a lower-dimensional latent space... [w]e use a pretrained VQGAN with an f=4 compression and a base resolution of 256×256×3, mapping the image to a latent resolution of 64×64indices." After training, in describing their token predictor architecture: "Our architecture consists of a U-Net-style encoder-decoder structure based on residual blocks,employing convolutional[sic] and attention in both, the encoder and decoder pathways."

U-Net, of course, is a convolutional neural network architecture. [3]. The "down" and "up" encoder/decoder blocks in the Paella code are batch-normed CNN layers. [4]

[1] https://arxiv.org/pdf/2012.09841.pdf [2] https://arxiv.org/pdf/2211.07292.pdf [3] https://arxiv.org/abs/1505.04597 [4] https://github.com/dome272/Paella/blob/main/src/modules.py#L...

dplavery92 | 2 years ago | on: Like diffusion but faster: The Paella model for fast image generation

Transformers are not forced to use a specific input (or output) shape; the original ViT paper demonstrates interpolating positional embeddings to inference with arbitrary image shapes.

dplavery92 | 2 years ago | on: Like diffusion but faster: The Paella model for fast image generation

Presumably a transformer model or similar that uses positional encodings for the tokens could do that, but the U-Net decoder here uses a fixed-shape output and learns relationships between tokens (and sizes of image features) based on the positions of those tokens in a fixed-size vector. You could still apply this process convolutionally and slide the entire network around to generate an image that is an arbitrary multiple of the token size, but image content in one area of the image will only be "aware" of image content at a fixed-size neighborhood (e.g. 256x256).

dplavery92 | 2 years ago | on: AI Canon

Eh, it's a little tricky. A lot of research marketed under the "AI" umbrella would be categorized under cs.LG (https://arxiv.org/list/cs.LG/recent), cs.CV (https://arxiv.org/list/cs.CV/recent), cs.CL (https://arxiv.org/list/cs.CL/recent), and to a lesser degree cs.NE (https://arxiv.org/list/cs.NE/recent). Oh, and of course, cs.AI (https://arxiv.org/list/cs.AI/recent). Not every one of those areas has grown monotonically, but the growth in CV and CL especially has been explosive over the last ten years.

dplavery92 | 2 years ago | on: Translating Akkadian clay tablets with ChatGPT?

Alternatively, "The Entertainment" in Infinite Jest.

dplavery92 | 3 years ago | on: Llama.cpp 30B runs with only 6GB of RAM now

Sure, but when one 12gb GPU costs ~$800 new (e.g. for the 3080 LHR), "a couple of dozens" of them is a big barrier to entry to the hobbyist, student, or freelancer. And cloud computing offers an alternative route, but, as stated, distribution introduces a new engineering task, and the month-to-month bills for the compute nodes you are using can still add up surprisingly quickly.

dplavery92 | 3 years ago | on: C++ Neural Network in a Weekend (2020)

NNs are potentially very powerful arbitrary function approximators, but you have very limited control (or, arguably, insight) into the precise nature of the solutions their optimization arrives at. Because of that, they've been especially well suited to problems in vision and NLP where we have basic intuition about the phenomenology but can't practically manage a formal description of that intuition (and enumerating that description is probably not of great intellectual interest): what, in pixel space, makes a cat a cat or a dog a dog? What, in patterns of natural words, indicates sarcasm or positive/negative sentiment?

They also get tons of use in results-oriented modeling of lots of other statistics questions in structured data (home prices, resource allocation, voter turnouts, etc.) but in this luddite's opinion, these sorts of applications tend to be pretty fraught if they short-change the convenience of the model training paradigm for a deeper understanding of the data phenomenology.

dplavery92 | 3 years ago | on: US Department of Energy: Fusion Ignition Achieved

Be that as it may, a number of positions at LLNL, including many of those affiliated with NIF, require that candidate is a US person and is eligible for a DOE security clearance. A security clearance is not necessarily binary on being a US person, but a number of national-security related positions may require not only the clearance, but also that the candidate is a US person (or outright forbid foreign nationals.)

dplavery92 | 3 years ago | on: US Department of Energy: Fusion Ignition Achieved

This is not quite correct. LLNL is a Federally Funded Research & Development Center (FFRDC) which is owned, as a facility, by the government, but managed and staffed by a non-profit contracting organization called Lawrence Livermore National Security, LLC (LLNS) under a contract funded by DOE/NNSA. The board of LLNS is made up of representatives from universities (California + TAMU), other scientific non-profits (Battelle Memorial Institute), and private nuclear ventures (e.g. Bechtel.) LLNS pays, with very few exceptions, staff salaries at LLNL, and they are not beholden to the government civilian pay schedule.

https://www.llnl.gov/about/management-sponsors

dplavery92 | 3 years ago | on: Demo of =GPT3() as a spreadsheet feature

For what it's worth, I'm also very bad at plotting graphs with any kind of accuracy, which is why I use plotting software instead of doing it by hand.

I get the feeling that my visual system and the language I use are respectively pretty bad at processing and conveying precise information from a plot, (beyond simple descriptors like "A is larger than B" or "f(x) has a maximum"). I guess I would find it mildly surprising if any Vision-Language model were able to perform those tasks very well, because the representations in question seem pretty poorly suited.

I get that popular diffusion models for image generation are doing a bad job composing concepts in a scene and keeping relationships constant over the image--even if Stable Diffusion could write in human script, it's a bad bet that the contents of a legend would match a pie chart that it drew. But other Vision-Language models, designed for image captioning or visual question answering, rather than generating diverse, stylistic images, are pretty good at that compositional information (up to, again, the "simple descriptions" level of granularity I mentioned before.)